Coordination between resources in AWS CloudFormation

Update 2023: the practice outlined in this post has been outdated. This post is for archive ony.

One of the reasons I prefer CloudFormation over Terraform is access to Helper scripts. Many legacy applications are not built with statelessness and the installation depends on host information of other layers in the stack. This requires communication among instances during stack creation. The cloudformation Helper script (cfn-init, cfn-signal, cfn-hup and cfn-getmetadata) plays a key role to bring non-cloud optimized solutions alive in an automated cloud environment.

Here we provide an example of a solution stack with three layers – an application layer, a database layer and a search engine layer. In each layer, configuring a newly created node requires private IP address of all nodes in all layers. This is common in many enterprise applications without service discovery mechanism or load balancing cross layers.

Application
Application
Database
Database
Search Engine
Search Engine

This is common in many enterprise applications without service discovery or load balancing. Self healing mechanism is in place but load triggered auto-scaling is not in the picture. The key is to hold off installation until all EC2 instances are provisioned with private IP. A good way to indicate this is checking auto scaling group status through aws cli and make sure sufficient number of instances are reporting InService. The diagram below shows a simplified scenario with auto-scaling group A and auto-scaling group B. Instances in each group reports status to their respective group through internal mechanism. In the mean time, instances in each group also query the status of both groups using coordinator script issuing aws cli commands.

AutoScalingGroup Layer A
AutoScalingGroup Layer A
List of Instance Status
List of Instance Status
EC2 Instance
EC2 Instance
EC2 Instance
EC2 Instance
Update
Update
Query
Query
Update
Update
Query
Query
AutoScalingGroup Layer B
AutoScalingGroup Layer B
List of Instance Status
List of Instance Status
EC2 Instance
EC2 Instance
EC2 Instance
EC2 Instance
Update
Update
Query
Query
Update
Update
Query
Query
Query
Query
Query
Query
Query
Query
Query
Query

Cloud Formation has some useful helpers:

  • cfn-init: you may call cfn-init from user data. It allows you to execute config set defined in the metadata of the same resource (typically EC2 instance or a launch config). Think of config set in metadata as an Ansible playbook. cfn-init gives you a chance to initiate the execution of a playbook upon instance start.
  • cfn-signal: cfn-signal provides a mechanism for an instance to notify an outside resource (an auto scaling group in this case) of success or failure status of itself. It can be called from within user data but after cfn-init and other initialization activities.
  • cfn-hup: cfn-hup provides a mechanism for an instance to be notified of changes of outside resource in the stack, and trigger activities. You typically configure cfn-hup trigger in user data through cfn-init in metadata.

The role of each node is pre-determined by the auto scaling group that creates it, and can be indicated in a file on the instance node. The aforementioned coordinator script plays the following tasks:

  • It collects local IP from instance metadata. self-awareness of the role is achieved by using user data.
  • It pulls installer from S3 based on the role of the server. Avoid using wget because it requires public access to the installer. Instead, utilize AWS::CloudFormation::Authentication in metadata to initiate protected access.
  • It waits for the instance creation from each stack by querying each stack periodically.
  • Once all stack have required number of instances in service, it collects private IP address on all nodes in each stack and stores them.
  • It launches the installer on each server and configure the application using the IP information of all stacks.
  • It flags installation status in a file.

This script is executed at the end of CloudFormation::Init but before cfn-signal. Due to the fact that this script needs to wait for instance availability and coordinate the actual installation, the execution time may take long and it is important to ensure the creation policy of auto scaling group is configured with sufficient timeout. Otherwise the duration of coordinator script before sending signal may fail the auto scaling group creation due to timeout.

Autoscaling Group
Autoscaling Group
LaunchConfig for each EC2 instance
LaunchConfig for each EC2 instance
UserData
UserData
Metadata
Metadata
aws configure
aws configure
update cfn-bootstrap
update cfn-bootstrap
cfn-init launch-config config-set
cfn-init launch-config config-set
cfn-signal autoscaling-group
cfn-signal autoscaling-group
CloudFormation::Authentication
     S3AccessCreds
[Not supported by viewer]
CloudFormation::Init
     config-set
          ……
          configure cfn-hup
          coordinator script
[Not supported by viewer]
List of Instance Status
List of Instance Status
aws cli
aws cli
Autoscaling Group Status
Autoscaling Group Status
cfn-signal
cfn-signal
CloudFormation Stack Update
CloudFormation Stack Update
Notify
Notify
cfn-hup as Linux service
cfn-hup as Linux service
Configure
Configure

The cfn-hup service is configured in /etc/cfn/hooks.d/cfn-auto-reloader.conf with triggers=post.update and action to execute cfn-init again. This ensures that on stack update, the coordinator script can be launched again with updated stack information.

The implementation template is available on my Github, as coordination-example project. The template creates an application layer in public subnet, a database layer and a search engine layer both in private subnets. The private subnets connect to Internet through NAT instance which serves as a bastion host as well. There is still some work (multi-AZ, load balancer, smart addressing, etc), but this should be a sufficient jump start to move many legacy environment to cloud.

The resource types in CloudFormation might come off as overwhelming. Here is an incomplete diagram of their relationships:

EC2::Route
EC2::Route
EC2::RouteTable
EC2::RouteTable
EC2::SubnetRouteTableAssociation
EC2::SubnetRouteTableAssociation
EC2::Subnet
EC2::Subnet
RouteTableId
RouteTable…
RouteTableId
RouteTableId
SubnetId
SubnetId
EC2::VPCGatewayAttachment
EC2::VPCGatewayAttachment
EC2::VPC
EC2::VPC
EC2::InternetGateway
EC2::InternetGateway
VpcId
VpcId
EC2::Instance
EC2::Instance
EC2::LaunchTemplate
EC2::LaunchTemplate
LaunchTemplateName
LaunchTemplateName
IAM::InstanceProfile
IAM::InstanceProfile
IAM::Role
IAM::Role
InstanceId for Cidr match
InstanceId for Cidr match
GatewayId for Cidr match
GatewayId for Cidr match
VpcId
VpcId
IamInstanceProfile
IamInstanceProfile
SubnetId
SubnetId
AutoScaling:AutoScalingGroup
AutoScaling:AutoScalingGroup
AutoScaling:LaunchConfiguration
AutoScaling:LaunchConfiguration
EC2::SecurityGroup
EC2::SecurityGroup
EC2::SecurityGroupIngress
EC2::SecurityGroupIngress
EC2::SecurityGroupEgress
EC2::SecurityGroupEgress
EC2::NetworkInterface
EC2::NetworkInterface
Viewer does not support full SVG 1.1

Happy Cloud.