Wayfair Tech Blog

Prescalr - Automated Scaling for Kubernetes Applications

Kubernetes is a container orchestration tool that helps businesses run their application services smoothly, adjusting resources as needed to handle different levels of activity. However, sometimes it takes time for Kubernetes to notice when there's suddenly a lot more traffic or users.

Prescalr steps in to quickly respond to these busy moments. It automatically and quickly adds more capacity to your applications when there's a sudden surge in users or traffic. This ensures your services stay fast and reliable, providing a seamless experience for customers without any delays.

Birth of Prescalr

In Kubernetes, applications sometimes need to adjust how many resources they use based on how many people are using them. This is important during events like datacenter changes or big sales days, where a lot of people might be using an app at once.

Normally, Kubernetes uses a system called Horizontal Pod Autoscaler (HPA) to figure out when to add more resources to an application or a microservice. However, during big events like WayDay or Cyber5, waiting for the HPA to notice that more resources are needed can cause delays. This can slow down the application and affect how well it works for customers, which can impact sales.

Prescalr helps fix this problem by automatically adding more resources to an application as soon as they are needed. This means applications can handle busy times better and don't need as much human help to keep running smoothly. With Prescalr, applications can react faster to changes and keep working well, even when lots of people are using them all at once.

Details

With Prescalr, you can easily opt in to automatic scaling by adding a label to your application's setup. Prescalr handles everything else for you.

When Prescalr gets a secure request through its interface, it looks for all the parts of your application that have the label set up. It checks how many copies of each part are running, including those that might already be adjusting themselves with HPA (Horizontal Pod Autoscaler).

Prescalr then adjusts how many copies of each part are running, based on what you've set as the minimum and maximum number of copies. Before making any changes, Prescalr remembers how many copies were running already and restores that back to the original state after another secure request to bring it back to the original state.

If things get quieter and you want to save money by running fewer copies of your application, Prescalr can do that too. It makes fewer copies and sets everything back to how it was before.

What Problems did Prescalr solve for Wayfair?

Prescalr offers two specialized APIs to optimize how applications handle varying workloads:

Switchover API

Prescalr is designed to minimize delays during traffic transitions between busy and less busy data centers. It prepares applications and network access points (ingress) components in advance at the destination data center, ensuring they can handle increased traffic immediately.

This proactive approach prevents service disruptions and performance issues during peak periods, safeguarding revenue streams. This has been really helpful especially during major sale events. Prescalr ensured that all critical applications and ingress are scaled up and ready to handle a surge immediately in the event of a traffic transition between a busy to less busy datacenter without slowing down or crashing.

image4.png
image3.png
(i) Highly Elevated latency during a cold storefront switchover before the prescalr changes

The below chart (ii) shows ingress latency is within SLOs and stays consistent during a storefront switchover event after the prescalr changes for ingress.

image5.png
image7.png
(ii) Within SLO latency after prescalr changes for ingress.
image2.png
Workflow of the Prescalr Switchover API

Batch API

This API focuses on optimizing resources for batch processing tasks, such as sending out large volumes of customer emails. Traditional scaling methods may keep resources at maximum levels even when they're not fully utilized, leading to unnecessary costs.

When a batch application needs to send out a large number of emails, Prescalr adjusts resources dynamically to handle the workload efficiently. It scales up resources during peak email sending times and scales them back down afterward to save costs.

By using Prescalr's Batch API, reduced resource usage significantly, leading to substantial cost savings. With Batch API, Prescalr was able to save around $2Million USD / year in idle resources.

The graph below highlights how Prescalr adjusts the resources needed for these applications, i.e scaling up only when the emails are about to get sent out and scaling back after the job is completed.

image1.png
Resources (CPU) controlled by Prescalr Batch API 
image6.png
Workflow of the Prescalr Batch API

What do our users say?

"Prescalr has been an excellent tool in allowing us to save over 50% Kubernetes costs for high traffic batch applications where HPA does not scale well, it also offers flexibility in providing specific scaling thresholds to different applications in the tiered architecture to optimize savings even further" - Kalyan Bemalkhedkar, Senior Engineering Manager, Platform & Services.

“Prescalr is indispensable for ensuring our Retail Pricing Service performs reliably across different datacenters. It effortlessly handles traffic surges during failovers without needing code updates or infrastructure changes. This allows us to focus on resolving issues quickly and getting our website back online.” - Peter Minassian, Staff Engineer, Pricing Team.

Share