Load Balancing: Why It Exists and When You Need It. (1/n)

Definition: Load balancing is the process of distributing incoming workload across multiple servers to improve performance, reliability, and scalability.

Done with textbook definition, let’s deep dive:

Let say you initially have only one server serving your backend service, there will be no problem until you have couple of users (1-1000) but once you have thousands of them your server becomes slow and becomes overwhelmed with the amount of request it receives.
One possible solution is vertical scaling

Vertical Scaling

Vertical scaling is a way to increase a system’s capacity by adding more resources to a single machine instead of adding more machines. It means adding more CPU cores, RAM or expanding storage. For example, upgrading a server from 4 CPU cores and 8 GB RAM to 16 CPU cores and 64 GB RAM.

But vertical scaling has few disadvantages:

Hardware limits: there’s a maximum size you can scale to
Single point of failure: if the machine goes down, everything goes down
Often more expensive at higher levels
May require downtime during upgrades

Among which mainly single point of failure is what we are trying to avoid using Horizontal Scaling.

Horizontal Scaling

Horizontal scaling is a way to increase system capacity by adding more machines (nodes) rather than making a single machine more powerful. For example, increasing our server count from 1 to 3 instances which serves same service. It gives us a advantage of Fault tolerance: if one machine fails, others keep running.

Now we need to distribute requests among them, that’s where load balancing comes into the picture. Now go back and read the definition and it makes sense now.

Let’s take an example, assume you have 3 servers running and serving login service to the users, now the role of load balancer is to distribute all the incoming traffic among these 3 servers evenly(ideally) so that none of the server is overwhelmed and this removes the backend single point of failure. If one server goes down, traffic can be redirected to healthy instances (assuming the load balancer itself is highly available).

Breaking the misconception:

A common misconception is that a load balancer routes traffic between different microservices. In reality, a load balancer distributes traffic among multiple instances of the same microservice. Routing requests to the correct microservice (auth, payments, orders, etc.) is the responsibility of an API Gateway.

Algorithms to implement load balancing

There are multiple Core Load Balancing Algorithms such as Round Robin, weighted round robin, based on least connections and response time, hashed based and consistent hashing (most important), which we can deep dive into in the upcoming blogs

Flip side of the coin

There can be few problems arise with the implementation of load balancer,

Added System Complexity
Single Point of Failure
Increased Latency
Session Management Problems

Load Balancer as SPOF (Single Point of Failure)

What if load balancer itself dies? Solution: Redundancy

We can have copy of load balancer as standby which will serve if the current ones break down, active load balancer coupled with multiple standby of them.

My taught on its worth despite its shortfalls

Load balancing trades a small increase in complexity and latency for significantly higher availability, fault tolerance, and scalability. In most production systems, downtime is far more costly than the overhead introduced

In short:

If availability matters → load balancing is mandatory.

If availability doesn’t matter → don’t over engineer.

Thank you.

Upcoming

In the upcoming blog we will deep dive into

Implementing a basic load balancer simulation
Getting in depth with the load balancing algorithms
Concept of Rate limiting

Load Balancing: Why It Exists and When You Need It. (1/n)

Algorithms to implement load balancing

Flip side of the coin

Load Balancer as SPOF (Single Point of Failure)

My taught on its worth despite its shortfalls

Upcoming

Comments

Backend

Implementing a Basic Load Balancer Simulation using Round Robin (2/n)

More from this blog

Implementing a Basic Load Balancer Simulation using Round Robin (2/n)

Python Fundamentals: Python programming for complete beginners.

Command Palette

Algorithms to implement load balancing

Flip side of the coin

Load Balancer as SPOF (Single Point of Failure)

My taught on its worth despite its shortfalls

Upcoming

Comments

Backend

Implementing a Basic Load Balancer Simulation using Round Robin (2/n)

More from this blog