The cloud: where security isn't easy, but it's necessary.
In my last article, I started a series on some of the challenges related to spawning secure servers on Amazon EC2. In that column, I discussed some of the overall challenges EC2 presents for security compared to a traditional infrastructure and elaborated on how I configure security groups and manage secrets. In this article, I finish up the topic with a few more practices I put in place when deploying servers to EC2. As with the previous article, although my examples are with EC2, most of these practices are something you easily could adapt to any cloud environment or even your own infrastructure in most cases.
I know that everyone has their own favorite configuration management system. Without getting into a holy war, we use Puppet for configuration management, so I thought it would be worthwhile to highlight a few security practices related to Puppet itself. I'll keep all of the Puppet-specific tips in this section, so you can skip ahead to the next section if this doesn't interest you.
The first thing to cover in a Puppet deployment is how to spawn servers. I'm a big fan of using vanilla off-the-shelf Amazon AMIs instead of rolling my own. Maintaining my own system images adds a lot of overhead, particularly as things change, and I'd much rather let the Debian team manage that workload. I take the official Debian AMIs and add a userdata script that installs Puppet, and that's it.
In a client/server model, when the client first attempts to check in to the Puppetmaster, it will present a certificate it has generated and request that the Puppetmaster sign it. Once the Puppetmaster signs it, it will trust that the client who presents that certificate is who it says it is. Although Puppet allows for automatic signing of all certificate requests, that's generally regarded as an insecure practice. Instead, what I do is have my spawning script launch another simple script that just checks for a new certificate from the host I spawned for the next ten minutes. If the host checks in within that ten-minute window, it gets signed; otherwise, the script exits. In this way, I create a very small window for an attacker to impersonate a server, and even then, I'd see the duplicate certificate request that I didn't sign on the Puppetmaster. This gives me a lot of the benefits of auto-signing without the risks.
I know a lot of Puppet advocates favor Puppet deployments without a Puppetmaster. In that model, the Puppet configurations are shipped to the hosts, and they apply them directly. Although there can be benefits to this approach, it has a few drawbacks. First, you lose the ability to deploy something like hiera-gpg with ease. More important to me, you lose the handy feature that every Puppet client has generated a valid internal certificate that has been signed by an internal trusted CA (the Puppetmaster). I heavily re-use these certificates internally whenever I need to set up TLS on any server, and we require TLS on all external network services. When another non-root service needs access to the certificate (such as an Nginx server or Postgres), I just make a copy of it that the particular service can see and store it elsewhere on the filesystem.
We also have modified Puppet so that certificates are generated with a Subject Alt Name of role.domainname in addition to the standard hostname.domain each certificate already would have. Because we cluster our internal services for fault tolerance, we refer to services by role, and that role actually could point to three or more servers, each with its own domain name. By adding the role name as a Subject Alt Name to the certificate, it's easy to re-use these certificates for all internal services.
I'll be honest, dynamic IPs are a pain. That said, if you are using public EC2 services, you don't have much of a choice. All IPs are assigned dynamically and change whenever you halt the server. This adds an additional level of complexity to the environment, because you don't know ahead of time what IP a server will have. To combat this, I have set up an internal DNS server that has been configured to allow dynamic DNS updates. For a host to update its DNS record, however, it needs to have a shared secret that gets generated on the DNS server and exported to Puppet. Once Puppet starts configuring the server, it creates the local key file that contains the secret and builds a local nsupdate script that updates any records the host needs. From that point on, every time Puppet runs, the host confirms all of its dynamic DNS records are accurate. I've also added a hook so that when I decommission a host, Puppet will remove all of those records.
There are a few other general security practices I put in place. First, as I mentioned before, because each host has a certificate signed by an internal trusted CA for Puppet, we take advantage of those certs to require TLS for all network communications between hosts. Given that you are sharing a network with other EC2 hosts, you want to make sure nobody can read your traffic as it goes over this network. In addition, the use of TLS helps us avoid man-in-the-middle attacks.
We also take additional steps with package management. We host internal package repositories that mirror only upstream packages we actually use. We sign any internal packages we create, and hosts validate the signatures before they install packages. Finally, we go to extra steps to restrict access to our environments. Our production environment in particular is restricted to a small number of sysadmins and is accessed via a separate VPN. We require key-based SSH authentication and have an internal policy for sysadmins to password-protect their SSH keys (read my “Secret Agent Man” article in the December 2013 issue for some tips related to SSH agent and password-protected key management).
Although this isn't a complete list of security practices I use (or that you should use), hopefully it's enough to give you an overview of the sorts of considerations you should make if you want to host a secure environment in the cloud.