AWS CloudFormation

If you’re doing any production-level work in AWS, you should be using AWS CloudFormation. It’s really easy to get started. Let’s walk through the basics.

Why use CloudFormation?

Here’s a common scenario: creating an EC2 instance and assigning an Elastic IP address. Let’s say it’s for a web server. Great! That’s easy. Just spin up an EC2 instance. Choose the correct image, size, security groups, VPC, subnet, keypair, and so on. Then create and assign it an Elastic IP address. No problem!

Now deploy it in QA. Then in production. But production has different security groups. You should probably set up CloudWatch alerts in production too. All of this is getting expensive, so maybe we should turn off the development stack overnight. But at this point we don’t just have one EC2 instance – we also have RDS, some S3 buckets, DynamoDB, and so on. We’ll need all of that configured in each environment. It’s 6 months later now and we need to recreate everything in a different region – did you document how to set everything up?

CloudFormation takes care of all of that for you.

CloudFormation can provision, update, delete, and monitor changes in virtually any AWS service. You can make S3 buckets with specific policies, make IAM roles allowed to access those buckets, spin up a Redshift cluster with that role attached, and so on. You can even create EC2 instances with Elastic IP addresses attached to them (and the VPC, security groups, and subnet associated with that instance).

Here’s an example:

AWSTemplateFormatVersion: 2010-09-09
Description: Create an EC2 instance.
    Type: String
    Description: Name of the instance.
    Type: AWS::EC2::Instance
      ImageId: ami-0123456789abcdef0
      KeyName: mykeypair
      InstanceType: t3.nano
        - sg-0123456789abcdef0
      SubnetId: subnet-01234567
          DeviceName: /dev/xvda
            VolumeSize: 20
        - {Key: "Name", Value: !Ref InstanceNameParameter}
    Type: AWS::EC2::EIP
      Domain: vpc
      InstanceId: !Ref EC2Instance
    Value: !Ref ElasticIP

That looks like a lot and the formatting takes some getting used to. But at this point, you can go right into the AWS Console and upload that CloudFormation template and have an EC2 instance and IP address created and set up in a few seconds. JSON is also a supported template format.

Drift Detection

This is a really cool feature. Let’s say your stack has been created and now it’s a few months later and someone changed some settings. AWS CloudFormation can detect when changes are made outside of CloudFormation and alert you.

Updates and Deleting a Stack

You guessed it – if you update your CloudFormation template, AWS will intelligently figure out what it needs to do to update your stack.

Here’s an example – let’s say we need to increase the size of the disk on that EC2 instance. We would simply change the value in the template and use CloudFormation to update the stack. AWS would create a new instance with a larger disk¬†and attach the Elastic IP address to the new instance automatically. The old EC2 instance would then be terminated.


The best part is that templates are easy to reuse and work with most AWS services, not just EC2. There’s a slight learning curve, but the benefits are worth it.


AWS Certified Solutions Architect

Quick post – I’ve been busy studying for the AWS¬†Certified Solutions Architect – Associate exam for the past few weeks – good news, I passed it a few days ago! Shoot me a note if you ever need some solutions architected.

I primarily did this because I’ve been using AWS for years now – but so has everyone else – this would be a differentiator. There was also a lot missing in-between the cracks (I learned how to give instances in a private subnet Internet access to install/update software without giving them public IP addresses and without spending hours reading Stack Overflow posts).



Jupyter Notebooks with PySpark on AWS EMR

One of the biggest, most time-consuming parts of data science is analysis and experimentation. One of the most popular tools to do so in a graphical, interactive environment is Jupyter.

Combining Jupyter with Apache Spark (through PySpark) merges two extremely powerful tools. AWS EMR lets you set up all of these tools with just a few clicks. In this tutorial I’ll walk through creating a cluster of machines running Spark with a Jupyter notebook sitting on top of it all.



EC2 + Route53 for Dynamic DNS

Recently I ran into a problem while working with Amazon EC2 servers. Servers without dedicated elastic IP addresses would get a different IP address every time they were started up! This proved to be a challenge when trying to SSH in to the servers.

How can I have a dynamic domain name that always points to my EC2 server?

Amazon’s Route53 came to mind. Route53, however, does not have a simple way to point a subdomain directly to an EC2 instance. You can set up load balancers between Route53 and your instance, but that’s a hassle. You can also set up an elaborate private network with port forwarding – yuck.

I wanted a simple way to set a Route53 subdomain’s A record to point to an EC2 instance’s public IP address, on startup.

Enter go-route53-dyn-dns. This is a simple Go project that solves this problem. It is a small binary that reads a JSON configuration file and updates Route53 with an EC2 instance’s public IP address.

Included in the GitHub file is how to set everything up.

The project is here: go-route53-dyn-dns.