Skip to main content

Command Palette

Search for a command to run...

Load Balancing: Why It Exists and When You Need It. (1/n)

Updated
3 min read

Definition: Load balancing is the process of distributing incoming workload across multiple servers to improve performance, reliability, and scalability.

Done with textbook definition, let’s deep dive:

Let say you initially have only one server serving your backend service, there will be no problem until you have couple of users (1-1000) but once you have thousands of them your server becomes slow and becomes overwhelmed with the amount of request it receives.
One possible solution is vertical scaling

Vertical Scaling
Vertical scaling is a way to increase a system’s capacity by adding more resources to a single machine instead of adding more machines. It means adding more CPU cores, RAM or expanding storage. For example, upgrading a server from 4 CPU cores and 8 GB RAM to 16 CPU cores and 64 GB RAM.

But vertical scaling has few disadvantages:

  • Hardware limits: there’s a maximum size you can scale to

  • Single point of failure: if the machine goes down, everything goes down

  • Often more expensive at higher levels

  • May require downtime during upgrades

Among which mainly single point of failure is what we are trying to avoid using Horizontal Scaling.

Horizontal Scaling
Horizontal scaling is a way to increase system capacity by adding more machines (nodes) rather than making a single machine more powerful. For example, increasing our server count from 1 to 3 instances which serves same service. It gives us a advantage of Fault tolerance: if one machine fails, others keep running.

Now we need to distribute requests among them, that’s where load balancing comes into the picture. Now go back and read the definition and it makes sense now.

Let’s take an example, assume you have 3 servers running and serving login service to the users, now the role of load balancer is to distribute all the incoming traffic among these 3 servers evenly(ideally) so that none of the server is overwhelmed and this removes the backend single point of failure. If one server goes down, traffic can be redirected to healthy instances (assuming the load balancer itself is highly available).

Breaking the misconception:
A common misconception is that a load balancer routes traffic between different microservices. In reality, a load balancer distributes traffic among multiple instances of the same microservice. Routing requests to the correct microservice (auth, payments, orders, etc.) is the responsibility of an API Gateway.

Algorithms to implement load balancing

There are multiple Core Load Balancing Algorithms such as Round Robin, weighted round robin, based on least connections and response time, hashed based and consistent hashing (most important), which we can deep dive into in the upcoming blogs

Flip side of the coin

There can be few problems arise with the implementation of load balancer,

  • Added System Complexity

  • Single Point of Failure

  • Increased Latency

  • Session Management Problems

Load Balancer as SPOF (Single Point of Failure)

What if load balancer itself dies? Solution: Redundancy

We can have copy of load balancer as standby which will serve if the current ones break down, active load balancer coupled with multiple standby of them.

My taught on its worth despite its shortfalls

Load balancing trades a small increase in complexity and latency for significantly higher availability, fault tolerance, and scalability. In most production systems, downtime is far more costly than the overhead introduced

In short:

If availability matters → load balancing is mandatory.

If availability doesn’t matter → don’t over engineer.

Thank you.

Upcoming

In the upcoming blog we will deep dive into

  • Implementing a basic load balancer simulation

  • Getting in depth with the load balancing algorithms

  • Concept of Rate limiting