Distributed Server Allocation

Allocation of servers for different components used by applications in a distributed, fault-tolerant and secure manner, resulting in high availability, compliance and self-heal capabilities of applications running in a private cloud

Worked On

UX, UI, Backend service, Allocation Logic

Tech Stack

Java + Struts, MySQL, VanillaJS

Date

May, 2017

Challenge

Servers can be grouped based on types (VM/PS), hardware configs (RAM, Disk Type, Storage), IP Subnets (CIDR/VPC), Compliance (SOC/HIPAA), OS (Linux/Windows/Unix) and so on. Allocating different servers depending on the component for which they perform efficiently and to be aware of unforeseen failures like power outages, fire-incident at the rack level, KVM host issues and have self-heal capabilities, resulting in very low downtime for applications.

Solution

An algorithm that takes into account the desired configurations and other variable factors into account, thereby allocating servers in a distributed way, providing a guarantee that it is highly performant, fault-tolerant and standard-compliant.

About Server Allocation

To host more than a single application/database, we usually procure separate clusters for each. To host it in the private cloud, we need to allocate the servers in a manner that is rack & chassis aware, leading to minimal outages due to hardware failures within a cage.

Problem

Manual allocations were very slow and inefficient. Sometimes, unreachable nodes were being allotted. This resulted in a delay in server provisioning and the launch of a product. Unavailability of IPs in the desired VPC/CIDR range caused delays.

How I solved this?

I came up with a proprietary algorithm that allocates servers dynamically across racks, chassis and physical servers based on required hardware specs and CIDR Subnets. This also ensured that the allocated servers were reachable, void of hardware & software issues and compliant with the VPC separation.

I boosted it further by templatizing the workflows and the applications could be mapped to certain templates with desired configurations. This was then used to allocate web servers, application servers, database clusters and big-data services as well, covering all the components required for a product release and making the server provisioning very efficient and quick.

Impact

Server provisioning time got reduced from ~ 1 day to less than a minute #agility #1500X_improvement
This increased the fault-tolerance capability during outages caused by power failures at rack/chassis level #robust
During nodes fail-over, an immediate replacement will be performed automatically, resulting in highly available servers for the application #high_availability #self_heal
Infra cost metrics associated with each team could be easily identified and scaled-down based on usage #cost_aware_business

Distributed Server Allocation

Worked On

Tech Stack

Date

Challenge

Solution

Let’s start a Project!

Sitemap

Address