Under the Hood

Revving up Shopmonkey v1 with horizontal scaling

Written by Andrew Kunzel | May 30, 2023 5:04:22 PM

Hey there! Today, I want to discuss a technique we have been working on to get things running more smoothly at Shopmonkey v1. It's called horizontal scaling, and it's like being able to add bays and techs on demand when things get busy. So, let's dive in and talk about how we tackled those annoying slowdowns you have faced. Horizontal scaling only solves some of the responsiveness problems with Shopmonkey. Still, it helped some of the major ones we saw. So we're still working on the problem affecting 1% of requests. 

Alright, let's start with the basics. Our system is made up of lots of tiny services that work together to power Shopmonkey. Today, we'll focus on one special service called the API. This critical piece of our system handles all the requests and keeps things moving. But when it needs more resources, our whole application starts to crawl. Not good, right?

Now, these services can be constrained by a few things: compute power, memory, and hard drive space. Our API only hogs a little memory or hard drive space, but it runs in a single process and can only use one CPU. So, we say it's "CPU bound." That means we can only scale vertically by running it on a bigger computer for so long before we hit a wall. So that's where horizontal scaling comes to the rescue. So rather than giving the API a bigger server to run on, we just run more of them simultaneously!

So, let's talk about those slowdowns we faced. We found a couple of issues causing trouble, but we made minor tweaks that unlocked a world of improvement. First up, we looked at the API's CPU resources. Initially, our request was set to 1 CPU, with a higher limit. But guess what? Our API could only use one CPU, so those extra resources were twiddling their thumbs. So it needs to be more helpful, right? We decided to scale things back a bit.

We lowered the CPU request from 1 to 0.75, so the system starts taking action before it reaches its maximum capacity. It also means adding more services while everything is running smoothly. For example, instead of waiting until we hit 75% CPU utilization across the board, we now scale when we hit 75% of 0.75 CPU (around 56%). These smaller services can scale up quicker, resulting in better performance for you and your shop.

Here's an example. Imagine we have three copies of our program running at the same time. One is maxed out at 100% CPU usage, while others are just chugging along with low utilization. 

The average (50%) is below the scaling threshold of the request (70%), so about a third of the time, you get an unresponsive application. Talk about frustrating! But, by making these changes, we can scale more aggressively and avoid this situation. We may be using smaller pods, but we're running more of them and adding resources before real issues arise. Here is that same load from before but on our smaller pods with our more aggressive scaling policies. It would have started adding more copies ages ago when the average utilization crossed 56% before you saw an unresponsive application. 

So, what does all this mean for you and your shop? Well, it means we're continuing to work on V1 where we can to improve the application performance and your experience. We're constantly working behind the scenes to fine-tune our system so your work doesn't get interrupted by slowdowns or hiccups. So our goal is to keep your shop running smoothly and efficiently, just like a well-oiled machine.