I am wondering if our current model of web load testing is broken, for several reasons, one of them possibly fundamental to the current model.
I was at my London Web Performance Meetup the other night at Betfair and Andrew Harding, a perf QA engineer there, gave a talk on “Continuous Integration - A Performance Engineer’s Tale” in which he talked about how they were tackling this challenge at Betfair.
He said something that resonated with something that I have been thinking about for a while now, to whit,
“We’ve had to look at separating load injection from performance measurement”.
Now, Betfair did this for various reasons, mainly to do with the need to keep the perf testing environment “warm” – it takes too long to warm up the environment, load the caches, overcome TCP “slow start” etc and achieve a “stable” system – compared to the time they might have between check-ins/builds etc. So it’s easier to keep a constant load flowing through the environment to keep as much of the system as possible “warmed up” until you deploy your incremental changes in the next build and do your testing.
Because they use “traditional” load testing tools like LoadRunner which are designed to “start script – inject load – measure response – stop script – generate report” this causes a problem. They aren’t “starting and stopping” all the time, hence they don’t get the nice reports out the end. [I might be over-simplifying this, I am not a LoadRunner expert, but bear with me for now!].
So they “measure performance” using other tools e.g. Real-user monitoring (RUM) or Application Performance Management (APM) tools outside of the load-injection toolset.
Now, I see this as a real problem for the vendor’s like HP who make LoadRunner. LoadRunner has lots of awesome features but if you are just using it to “generate load” and not using its measurement and reporting features then there are far cheaper (and open-source) alternatives. The rash of “cloud-based” load testing services that are springing up off the back of Amazon EC2 are also increasing the downward pressure on the costs of the “traditional” vendor solutions.
The “cloud testing” vendors raise another issue with the measurement of performance “from the point of injection”.
Measurement requires, as much as possible, a stable platform from which to measure, which in it’s current form the cloud most certainly is not. Study after study after study shows that performance in the cloud is variable, and even more so under load. Some people have even given up on the cloud for that reason and moved back to dedicated servers (http://code.mixpanel.com/2011/10/27/why-we-moved-off-the-cloud/. Well worth reading the comments below as a lot of people both agree and disagree).
So if you want timing resolution to within (at least) 10msec to measure variation in time-to-first-byte then doing it using a varying “measuring stick” isn’t going to work. So, another argument for separating load injection from performance measurement.
But this isn’t the fundamental reason I am concerned that things might be “broken” for the current load-testing model.
Currently load-testing is [mostly] based on a classic HTTP request/response paradigm – “start a timer, make a HTTP request, stop the timer, record the duration, repeat as necessary, report the outcome”.
Ok, so @LDNWebPerf we’ve looked at HTML5 apps and at WebSockets and, guess what, the web apps of the future can (and do) break this simplistic “classic HTTP request/response paradigm”.
HTML5 apps might make extensive use of localStorage to eliminate roundtrips and WebSockets changes everything into ONE HTTP request followed by a lot of BI-DIRECTIONAL communication over the WebSocket connection. The same issue exists with some of the other “push” techniques that might use chunked-encoding to send data asynchronously down a “long-lived” request/response. SPDY might cause problems too because of the way it multi-plexes the Request/Responses too!
So how are you going to measure your “performance under load” when there isn’t a nice “Request/Response” to “start/stop” your timer?
Well, the answer is… I don’t know, yet.
There are two immediately “obvious” solutions – go “up the stack” or “down the stack”.
By “up the stack” I mean that the load testing tool much be much more “web application aware” – it must be inside the browser, seeing exactly how the web application is sending/receiving information especially over WebSockets and be “instrumented” to measure it.
It might also require it to be “framework-aware” so that it understands frameworks like Dojo/Comet/Jquery/whatever and “knows” how the methods they use to send/receive information and how to inject instrumentation into those frameworks to measure them.
By “down the stack” I mean back to network sniffing and measurement “on the wire” to see exactly what how long things take, but again probably with the requirement to be more “application-aware” so it can re-assemble the network packets and translate them into HTTP, WebSocket and application-level performance timing data that’s more meaningful to the performance engineer and the application developer.
Anyway, I am not sure that I have explained this as articulately as I would have liked but my basic message is that:
- If we can’t measure performance from the “point of injection”, for whatever reason, then you will need to invest in other tools that can measure performance from different locations (e.g. RUM, APM etc)
- Hence, if all you are doing is generating load, then you’ll rapidly become a commodity item and be paid accordingly…
- New web technologies like WebSockets can change the HTTP Request/Response paradigm making the job of performance measurement more difficult so the tools will need to evolve and perhaps become more “application-aware”.
I’d love to hear what everyone else thinks so please comment away!
I will be presenting on this topic at UKCMG on Wednesday 10th October - http://www.ukcmg.org.uk/ifOct2012.html