Validation of Scalability¶
Introduction¶
This page will be the magnum opus of all the hard work I’ve done across the many iterations that it took to complete this project. I will be tying back theory to all of the configuration and implementation I’ve done to deliver a feasible narration from A to Z, pertaining to various details of my project. It is important to read the documentation found on the other pages in order to have a full grasp on the context.
The load balancing testing shall take place on my server, which setup is very similar to a cloud deployment as you can see in the graph above. When making a call to api.oksolution.nl that is the application link layer communication that takes place.
Theory¶
Why Scalability?
The userbase of the internet has grown exponentially and we must build applications ready for large scale usage. Read more about my conclusions on Monoliths in the First Microservice module within the Scalability scope. There are many advantages of building scaling applications, I believe the computing costs for hosting it in the cloud being is one of the most used reasons. Since you are building applications for a wide audience, it will likely only function properly when working on the platform of large cloud providers. More info about this subject on the pages prior.
What is Loadbalance testing?
In order to validate & prove that my project scales correctly, a load-balance test is required. In this test I will utilize a non-trivial microservice(see previous page) in order to test the efficacy of the system under load & see if it scales horizontally and still behaves as expected.
Load balance testing¶
I initially chose Locust as my framework for setting up loadbalance tests, but I had difficulties setting up the locustfile.py and then began looking for alternatives and found Apache JMeter.
After starting JMeter I created a thread group, in which I defined that the program could use 25 different threads, which it explains it is meant to simulate users.
I added a whole bunch of Http requests that could be used with test with, but lets focus on the relevant ones: Request on /game/2/comment & Request post on /comment
The first get request focuses on getting the comments from the second game. The post request creates a comment with the following body:
{
"gameId": 2,
"commentMessage":"test"
}
Both requests require a bearer token that is generated from the front-end. For that reason I added a Http Header Manager to the request where I defined Authorization & for the post request a content-type (application/json)
Then I headed into Rancher where I could manually finetune the scalability. In order to do this I had to get the exact metrics from Grafana which I implemented into Rancher. Using this plugin I could track the cpu usage of individual pods. I scaled up my main service to see their usage:
So I began basing my metrics based on those values, but then found out that their usage was way higher because the docker container was still being initialized.
Usage like this is more regular during idle:
I finetuned my service to this:
resources:
limits:
cpu: 80m
requests:
cpu: 030m
According to my calculations, this means the limit is 0.08CPU and 0.03CPU is the estimated minimum. I adjusted my load-balancer so that it would scale-up when it hit the threshold of 040mCpu.
So lets circle back for a second about the nature of the test.
Since I can insert the token straight into JMeter, the login part is skipped. GameService & CommentService are both used – but obviously the comments more, since the user is requesting all the comments belonging to a game.
Goal: CommentMicroService scales up, GameService doesn’t have to
Bonus goal: The throughput increases as the service scales up
Result:
Start scenario:
After 40 seconds a pod is added. CPU on both containers shot up.
5 minutes have elapsed, 3/5 containers have finished deploying 2 more are being added. The throughput is 173,768/minute.
I restarted the test because my internet was becoming extremely laggy. So roughly after 6:50 minutes the pods are spinning at their full potential. The throughput has increased to 223,044/minute. After stopping the test, the comment service pod returned to 1/1.
I’ve verified through the mysql interface that the comments were created correctly.