menu
techminis

A naukri.com initiative

google-web-stories
source image

Medium

1M

read

55

img
dot

Image Credit: Medium

Automated Migration and Scaling of Hadoop™ Clusters

  • Pinterest's big data infrastructure utilizes frameworks like MapReduce™, Spark™, and Flink™ on Hadoop™ YARN™, spread across multiple clusters on AWS with Auto Scaling Groups.
  • Challenges faced included limitations in IP addresses, costs of running parallel clusters, and risks involved in switching over applications to new clusters.
  • The migration strategy involved introducing a new Auto Scaling Group (ASG) for in-place migration with the desired configuration.
  • Hadoop Control Center (HCC) was introduced to automate cluster operations and simplify scaling processes like scale-in and scale-out.
  • HCC streamlines cluster administration, provides tools for resizing ASGs, monitoring nodes, reporting, and managing Hadoop-related tasks.
  • HCC's architecture includes a main manager node, worker nodes per VPC, and adopts a process for efficient scale-in through automation.
  • HCC's decommissioning process involves queues, instance selection based on container count, and monitoring to prevent data loss during scale-in.
  • HCC integrates with AWS(™) APIs to manage ASG resizing directly, reducing the dependency on Terraform for ASG size changes.
  • Future capabilities for HCC include ingesting AWS™ events, node rotation, bad node detection, and automated node remediation.
  • HCC aims to streamline cluster operations, automate scaling processes, and simplify cluster management for Pinterest's big data infrastructure.

Read Full Article

like

3 Likes

For uninterrupted reading, download the app