Sometimes it happens that we wanted to load test an application and then in the whole process we forget the basic premise of why we started it.Atleast I sometimes miss the point.
My Use Case:
I have an instance with a certain configuration(x cores and y gb ram) and I want to test how much load an application deployed on this instance can take.
Application A: A standalone application with no external api interactions.So given a request it should respond.Its a ml app to be specific which does generate certain recommendations.
Expectation:
How much load the application can take ?
or
How much load an application take and serve within reasonable time ?
Here I would refer to load as the request per second that the app can handle.
Assume you have a 1 core machine and the app is a cpu intensive app such as some number crunching.
So if the app takes like say 1 sec to serve the request, the rps is just 1req/s.
What if you have 2 cores, as the app was in python, as python is single threaded, even if you have 2 cores, it doesnt help.So you need to spawn 1 more process to utilize the other core.So now multi process thing is dependent on which framework you are using.
So now, we are serving 2 req/s.
Now lets say your app takes 2 seconds to serve a request and we have 2 cores.
Now your request rate is 1 req/s as 2 requests take 2 secs to complete in overall.
But we are missing a point here.
What happens if we increase the number of requests assume we are send 8 requests now at once.
So the overall time taken for all the requests to complete is 8 seconds..Some of the requests take more time waiting for cpu time as we only have 2 cores at a certain moment for computation.So in worst case it takes 8 secs for the last two of the 8 requests.
So just increasing the load on the application for testing, will not help much as the system is already saturated and hence even if you increase load there will not be much improvement in request rate.And worse the latency starts to increase as requests have to wait for more time to get a chance of cpu time.
That brings up to saturation.So load testing is also about trying to understand the saturation limits of the system beyond which it will not scale and deteriorates.
Req/sec and latency are 2 important metrics to keep in mind.Req/sec as an independent metric is not much useful.So ok your app can serve x req/s within y time ? That is a bit more meaningful.