Is the current model of load/performance testing broken?
Friday, December 9, 2011 at 12:53PM I am wondering if our current model of web load testing is broken, for several reasons, one of them possibly fundamental to the current model.
I was at my London Web Performance Meetup the other night at Betfair and Andrew Harding, a perf QA engineer there, gave a talk on “Continuous Integration - A Performance Engineer’s Tale” in which he talked about how they were tackling this challenge at Betfair.
He said something that resonated with something that I have been thinking about for a while now, to whit,
“We’ve had to look at separating load injection from performance measurement”.
Now, Betfair did this for various reasons, mainly to do with the need to keep the perf testing environment “warm” – it takes too long to warm up the environment, load the caches, overcome TCP “slow start” etc and achieve a “stable” system – compared to the time they might have between check-ins/builds etc. So it’s easier to keep a constant load flowing through the environment to keep as much of the system as possible “warmed up” until you deploy your incremental changes in the next build and do your testing.
Because they use “traditional” load testing tools like LoadRunner which are designed to “start script – inject load – measure response – stop script – generate report” this causes a problem. They aren’t “starting and stopping” all the time, hence they don’t get the nice reports out the end. [I might be over-simplifying this, I am not a LoadRunner expert, but bear with me for now!].
So they “measure performance” using other tools e.g. Real-user monitoring (RUM) or Application Performance Management (APM) tools outside of the load-injection toolset.
Now, I see this as a real problem for the vendor’s like HP who make LoadRunner. LoadRunner has lots of awesome features but if you are just using it to “generate load” and not using its measurement and reporting features then there are far cheaper (and open-source) alternatives. The rash of “cloud-based” load testing services that are springing up off the back of Amazon EC2 are also increasing the downward pressure on the costs of the “traditional” vendor solutions.
The “cloud testing” vendors raise another issue with the measurement of performance “from the point of injection”.
Measurement requires, as much as possible, a stable platform from which to measure, which in it’s current form the cloud most certainly is not. Study after study after study shows that performance in the cloud is variable, and even more so under load. Some people have even given up on the cloud for that reason and moved back to dedicated servers (http://code.mixpanel.com/2011/10/27/why-we-moved-off-the-cloud/. Well worth reading the comments below as a lot of people both agree and disagree).
So if you want timing resolution to within (at least) 10msec to measure variation in time-to-first-byte then doing it using a varying “measuring stick” isn’t going to work. So, another argument for separating load injection from performance measurement.
But this isn’t the fundamental reason I am concerned that things might be “broken” for the current load-testing model.
Currently load-testing is [mostly] based on a classic HTTP request/response paradigm – “start a timer, make a HTTP request, stop the timer, record the duration, repeat as necessary, report the outcome”.
Ok, so @LDNWebPerf we’ve looked at HTML5 apps and at WebSockets and, guess what, the web apps of the future can (and do) break this simplistic “classic HTTP request/response paradigm”.
HTML5 apps might make extensive use of localStorage to eliminate roundtrips and WebSockets changes everything into ONE HTTP request followed by a lot of BI-DIRECTIONAL communication over the WebSocket connection. The same issue exists with some of the other “push” techniques that might use chunked-encoding to send data asynchronously down a “long-lived” request/response. SPDY might cause problems too because of the way it multi-plexes the Request/Responses too!
So how are you going to measure your “performance under load” when there isn’t a nice “Request/Response” to “start/stop” your timer?
Well, the answer is… I don’t know, yet.
There are two immediately “obvious” solutions – go “up the stack” or “down the stack”.
By “up the stack” I mean that the load testing tool much be much more “web application aware” – it must be inside the browser, seeing exactly how the web application is sending/receiving information especially over WebSockets and be “instrumented” to measure it.
It might also require it to be “framework-aware” so that it understands frameworks like Dojo/Comet/Jquery/whatever and “knows” how the methods they use to send/receive information and how to inject instrumentation into those frameworks to measure them.
By “down the stack” I mean back to network sniffing and measurement “on the wire” to see exactly what how long things take, but again probably with the requirement to be more “application-aware” so it can re-assemble the network packets and translate them into HTTP, WebSocket and application-level performance timing data that’s more meaningful to the performance engineer and the application developer.
Anyway, I am not sure that I have explained this as articulately as I would have liked but my basic message is that:
- If we can’t measure performance from the “point of injection”, for whatever reason, then you will need to invest in other tools that can measure performance from different locations (e.g. RUM, APM etc)
- Hence, if all you are doing is generating load, then you’ll rapidly become a commodity item and be paid accordingly…
- New web technologies like WebSockets can change the HTTP Request/Response paradigm making the job of performance measurement more difficult so the tools will need to evolve and perhaps become more “application-aware”.
I’d love to hear what everyone else thinks so please comment away!
Reader Comments (3)
You covered some great points there. I also belive that Load Testing as we know it has to change as the apps they are testing are changing. I do however think that the era of tools like SilkPerformer, Load Runner & Co is not yet over. Professional tools like them will find a way to become smarter with new ways of Web x.0 apps. On the other side - you will always want to generate some HTTP base load against your servers and then use a browser-driven test (one that uses real browsers to simulate the load) to measure performance from the end-user-device (desktop browser, mobile browser or mobile app). Some Cloud based load testing services already offer this feature to use an actual browser.
On the other side I see UEM (User Experience Management), RUM, ... (or whatever you want to call it) as the essential ingredient for doing performance management. Why? because a) you wont be able to test everything in your pre-prod environment and b) you need to monitor your end user experience anyway because you need to react on problems that happen in your production environment. If you then bundle UEM with APM you are in the best situation to not only monitor your end-users but also analyze any problems faster to reduce and avoid downtime or any severe performance related impacts to your business.
On the topic of load testing you can also read one of my recent blogs:
P.S.: Great to see that your Meetup gets so many great topics
I don't think so - see my detailed response at http://applicationperformanceengineeringhub.com/is-the-current-model-of-loadperformance-testing-broken/
Hi Steve,
Very interesting blog indeed.
I agree when you say that apps are changing and that they are not so static than before. In fact they are much more responsive and reactive than ever, taking benefit of this ""BI-DIRECTIONAL"" communication and asynchronous calls. Load-testing tools, on the other hand, are not all ready for that new way of doing Web.
We, at Neotys, have tried to look ahead and our load testing tool, NeoLoad, is ready for the new challenge of push and HTML5 in particular.
As you describe, the difference is that you can't just start and stop the stopwatch anymore. To properly make measurements we have modeled the application response into ""messages"". That means that the entire response is split in several messages that correspond each to a specific notification from the server.
This new vision ensures that you can retrieve the correct information while assessing the performance (min, avg and max of the response time for each message for instance). It prevents you from having the only response time of the HTTP request which, like you say, does not mean anything. We focus on application ""messages"" performance.
Do not hesitate to share any thoughts.
Olivier Hanoun, Neotys Performance Engineer