What is load balancing?
A load balancer is a core networking solution responsible for distributing incoming traffic among servers hosting the same application content. By balancing application requests across multiple servers, a load balancer prevents any application server from becoming a single point of failure, thus improving overall application availability and responsiveness. For example, when one application server becomes unavailable, the load balancer simply directs all new application requests to other available servers in the pool. Load balancers also improve server utilization and maximize availability. Load balancing is the most straightforward method of scaling out an application server infrastructure. As application demand increases, new servers can be easily added to the resource pool, and the load balancer will immediately begin sending traffic to the new server.
How load balancing works
All load balancers are capable of making traffic decisions based on traditional OSI layer 2 and 3 information. More advanced load balancers, however, can make intelligent traffic management decisions based on specific layer 4 – 7 information contained within the request issued by the client. Such application-layer intelligence is required in many application environments, including those in which a request for application data can only be met by a specific server or set of servers. Load balancing decisions are made quickly, usually in less than one millisecond, and high-performance load balancers can make millions of decisions per second.
Load balancers also typically incorporate network address translation (NAT) to obfuscate the IP address of the back-end application server. For example, application clients connect directly to a “virtual” IP address on the load balancer, rather than to the IP address of an individual server. The load balancer then relays the client request to the right application server. This entire operation is transparent to the client, for whom it appears they are connecting directly to the application server.
An administrator-selected algorithm implemented by the load balancer determines the physical or virtual server and sends the request. Once the request is received and processed, the application server sends its response to the client via the load balancer. The load balancer manages all bi-directional traffic between the client and the server. It maps each application response to the right client connection, ensuring that each user receives the proper response.
Load balancers can also be configured to guarantee that subsequent requests from the same user, and part of the same session, are directed to the same server as the original request. Called persistency, this capability meets a requirement for many applications that must maintain “state.”
Load balancers also monitor the availability, or health, of application servers to avoid the possibility of sending client requests to a server resource that is unable to respond. There are a variety of mechanisms to monitor server resources. For example, the load balancer can construct and issue application-specific requests to each server in its pool. The load balancer then validates the resulting responses to determine whether the server is able to handle incoming traffic. If the load balancer discovers a server that is unable to respond properly, it marks the server as “down” and no longer sends requests to that server.
Load distribution with load balancing algorithms
Load balancing algorithms define the criteria that a load balancer uses to select the server to which a client request is sent. Different load balancing algorithms use different criteria. For example, when a load balancer applies a least connection algorithm, it sends new requests to the available server with the fewest active connections among servers in the pool. Another popular algorithm for distributing traffic is round robin, which sends incoming requests to the next available server in a fixed sequence, with no consideration of the current load being handled by each server.
These algorithms use values computed from traffic characteristics such as IP address, port numbers and application data tokens. The most common are hashing algorithms, where hashes of certain connection information or header information are used to make load balancing decisions. Some load balancers also use attributes of the back-end server, such as CPU and memory utilization, to make load balancing decisions.
Commonly used load balancing algorithms include:
Persistency of load distribution