Simple Cloud Hardening
I've written about simple server-hardening techniques in the past. Those articles were inspired in part by the Linux Hardening in Hostile Networks book I was writing at the time, and the idea was to distill the many different hardening steps you might want to perform on a server into a few simple steps that everyone should do. In this article, I take the same approach only with a specific focus on hardening cloud infrastructure. I'm most familiar with AWS, so my hardening steps are geared toward that platform and use AWS terminology (such as Security Groups and VPC), but as I'm not a fan of vendor lock-in, I try to include steps that are general enough that you should be able to adapt them to other providers.
One of the big advantages with cloud infrastructure is the ability to compartmentalize your infrastructure. If you have a bunch of servers racked in the same rack, it might be difficult, but on cloud infrastructures, you can take advantage of the technology to isolate one customer from another or to isolate one of your infrastructure types from the others. Although this doesn't come completely for free (it adds some extra overhead when you set things up), it's worth it for the strong isolation it provides between environments.
One of the first security measures you should put in place is separating each of your environments into its own high-level account. AWS allows you to generate a number of different accounts and connect them to a central billing account. This means you can isolate your development, staging and production environments (plus any others you may create) completely into their own individual accounts that have their own networks, their own credentials and their own roles totally isolated from the others. With each environment separated into its own account, you limit the damage attackers can do if they compromise one infrastructure to just that account. You also make it easier to see how much each environment costs by itself.
In a traditional infrastructure where dev and production are together, it is much easier to create accidental dependencies between those two environments and have a mistake in one affect the other. Splitting environments into separate accounts protects them from each other, and that independence helps you identify any legitimate links that environments need to have with each other. Once you have identified those links, it's much easier to set up firewall rules or other restrictions between those accounts, just like you would if you wanted your infrastructure to talk to a third party.
One advantage to cloud infrastructure is that you have a lot tighter control over firewall rules. AWS Security Groups let you define both ingress and egress firewall rules, both with the internet at large and between Security Groups. Since you can assign multiple Security Groups to a host, you have a lot of flexibility in how you define network access between hosts.
My first recommendation is to deny all ingress and egress traffic by default and add specific rules to a Security Group as you need them. This is a fundamental best practice for network security, and it applies to Security Groups as much as to traditional firewalls. This is particularly important if you use the Default security group, as it allows unrestricted internet egress traffic by default, so that should be one of the first things to disable. Although disabling egress traffic to the internet by default can make things a bit trickier to start with, it's still a lot easier than trying to add that kind of restriction after the fact.
You can make things very complicated with Security Groups; however, my recommendation is to try to keep them simple. Give each server role (for instance web, application, database and so on) its own Security Group that applies to each server in that role. This makes it easy to know how your firewall rules are being applied and to which servers they apply. If one server in a particular role needs different network permissions from the others, it's a good sign that it probably should have its own role.
The role-based Security Group model works pretty well, but it can be inconvenient when you want a firewall rule to apply to all your hosts. For instance, if you use centralized configuration management, you probably want every host to be allowed to talk to it. For rules like this, I take advantage of the Default Security Group and make sure that every host is a member of it. I then use it (in a very limited way) as a central place to define any firewall rules I want to apply to all hosts. One rule I define in particular is to allow egress traffic to any host in the Default Security Group—that way I don't have to write duplicate ingress rules in one group and egress rules in another whenever I want hosts in one Security Group to talk to another.
On cloud infrastructure, you are able to define hosts that have an internet-routable IP and hosts that only have internal IPs. In AWS Virtual Private Cloud (VPC), you define these hosts by setting up a second set of private subnets and spawning hosts within those subnets instead of the default public subnets.
Treat the default public subnet like a DMZ and put hosts there only if they truly need access to the internet. Put all other hosts into the private subnet. With this practice in place, even if hosts in the private subnet were compromised, they couldn't talk directly to the internet even if an attacker wanted them to, which makes it much more difficult to download rootkits or other persistence tools without setting up elaborate tunnels.
These days it seems like just about every service wants unrestricted access to web ports on some other host on the internet, but an advantage to the private subnet approach is that instead of working out egress firewall rules to specific external IPs, you can set up a web proxy service in your DMZ that has more broad internet access and then restrict the hosts in the private subnet by hostname instead of IP. This has an added benefit of giving you a nice auditing trail on the proxy host of all the external hosts your infrastructure is accessing.
AWS provides a rich set of access control list tools by way of IAM. This lets you set up very precise rules about which AWS resources an account or role can access using a very complicated syntax. While IAM provides you with some pre-defined rules to get you started, it still suffers from the problem all rich access control lists have—the complexity makes it easy to create mistakes that grant people more access than they should have.
My recommendation is to use IAM only as much as is necessary to lock down basic AWS account access (like sysadmin accounts or orchestration tools for instance), and even then, to keep the IAM rules as simple as you can. If you need to restrict access to resources further, use access control at another level to achieve it. Although it may seem like giving somewhat broad IAM permissions to an AWS account isn't as secure as drilling down and embracing the principle of least privilege, in practice, the more complicated your rules, the more likely you will make a mistake.
Cloud environments provide a lot of complex options for security; however, it's more important to set a good baseline of simple security practices that everyone on the team can understand. This article provides a few basic, common-sense practices that should make your cloud environments safer while not making them too complex.