High availability Amazon EMR on EC2 clusters with instance fleet configuration is now supported by AWS. EMR is a cloud big data processing platform that uses open source frameworks like Apache Spark, Presto and Flink. High availability (HA) provides continuous uptime and fault tolerance for Hadoop clusters, removing single points of failure with redundant standby nodes. Instance fleets provide enhanced resiliency and flexibility, with improved EC2 instance selection through allocation strategies. Improved target capacity management makes instance fleets more resilient to fluctuations in specific pools, while multiple subnets offer enhanced availability, allowing Amazon EMR to choose the best purchasing options and instances across zones for cluster launch.
To launch a high availability instance fleet cluster using the Amazon EMR console, create a new cluster, select use high availability, choose desired instance types and target capacities, allocate strategies, subnets, and review cluster configuration before creating the cluster. Launch a high availability cluster using AWS CloudFormation by creating a CloudFormation template, a CloudFormation stack, and a list of clusters response. The describe-cluster command verifies whether the high availability cluster launched successfully with three nodes in running state and provisionedOnDemandCapacity equals to 3.
Instances can fail or become unhealthy for multiple reasons, including disk space issues, high CPU utilization, critical cluster daemons shutting down with errors and more. With multi-master instance group nodes running or multi-master instance group nodes running percentage Amazon CloudWatch metrics, primary nodes can be monitored for health and status to ensure smooth operations. Allocation strategies should be enabled, subnets dedicated to EMR clusters, and core nodes configured for enhanced data availability with at least four nodes to minimize the risk of HDSF data losses on production clusters.
Through setting up a high availability instance fleet cluster with Amazon EMR on EC2, instance diversity increases, and better spot capacity management is provided. High availability makes it possible to endure failures, maintain uninterrupted operations, and to provide an additional layer of reliability to critical components of clusters. Thus, EMR clusters are created to withstand failures and to maintain continuous operation while providing enhanced resiliency, instance diversity, and better spot capacity management within a single Availability Zone.