NAT Instances: Why use the first option on the Marketplace?
So, today I was trying to generate some kind of architecture on AWS through Terraform where a NAT instance was needed for a private subnet that needed internet connection for updating packages. Honestly, I could use a NAT gateway for this, but I wanted to go the extra mile and brush off my knowledge on designing VPC architecture on AWS. I remember doing that when practicing for the AWS Solutions Architect Associate exam, and it has been a long time without applying any cloud knowledge until now that I have a cloud engineering job. Plus, Terraform is a very cool tool, so I wanted to get the grip on best practices and such.
When I tried to follow some of the guides on how to do that (see the official guide from Amazon or an A Cloud Guru video on creating VPCs), I felt confident it was going to work in the end. Unfortunately, what happened is that my instances in the private subnet weren't able to ping anywhere (not even to the instances in the public subnet, even with an explicit security group allowing ICMP traffic to them) and I had a good time doubting my capacities on reading and watching tutorials (which is a good trigger for your daily dosage of imposter syndrome).
Gladly, and I really thank the author of the guide, I was able to find this. After applying the indicated commands (in summary, enabling IP forwarding on sysctl
and creating a rule on IPTables), I was finally able to get my instances on the private subnet to contact the internet. I knew my settings were working, because if not, not even AWS was providing a working guide, which would be horrible. But then the question came. What was I using that was out of the formula?
And I have the answer. When I was choosing the AMI to be used as a NAT Instance, I decided to go with the most updated version. Not necessarily it was the first option on the list of Community AMIs, but you know, the name had 2020 on it. So, this time, I had recreated it with the first option on the list, which seems to be an option from 2018. Well, after deploying and being able to do a SSH jump to the instance on the private subnet, it was completely able to ping and update without any issues. Also, if I remember the video on A Cloud Guru, Ryan used the first option on the list, which was an image of 2018 (maybe the same one, not sure though). He didn't choose it specifically, it was only the first on the list.
So here it goes my friends, maybe this post won't have a direct answer to the question, but here is where I would say that it is the best to try to work with something that works first, so you can have a checkpoint and be able to tweak your settings later. And in case something doesn't work, well, you have a usable checkpoint, so just rollback to it.
I plan to upload a simple guide to Terraform soon, at least with the basics on how to generate an instance, setting up your user and generate a random password in a dynamic way.
Tags: 2020, tech, aws, nat-instances, vpc, architecture