Manage a fleet of servers in a way that's documented, scalable and fun with Puppet.
At some point, you probably have installed or configured a piece of software on a server or desktop PC. Since you read Linux Journal, you've probably done a lot of this, as well as developed a range of glue shell scripts, Perl snippets and cron jobs.
Unless you are more disciplined than I was, every server has a unique, hand-crafted version of those config files and scripts. It might be as simple as a backup monitor script, but each still needs to be managed and installed.
Installing a new server usually involves copying over config files and glue scripts from another server until things “work”. Subtle problems may persist if a particular condition appears infrequently. Any improvement is usually made on an ad hoc basis to a specific machine, and there is no way to apply improvements to all servers or desktops easily.
Finally, in typical scenarios, all the learning and knowledge invested in these scripts and configuration files are scattered throughout the filesystem on each Linux system. This means there is no easy way to know how any piece of software has been customized.
If you have installed a server and come back to it three years later wondering what you did, or manage a group of desktops or a private cloud of virtual machines, configuration management and Puppet can help simplify your life.
Configuration management is a solution to this problem. A complete solution provides a centralized repository that defines and documents how things are done that can be applied to any system easily and reproducibly. Improvements simply can be rolled out to systems as required. The result is that a large number of servers can be managed by one administrator with ease.
Many different configuration management tools for Linux (and other platforms) exist. Puppet is one of the most popular and the one I cover in this article. Similar tools include Chef, Ansible and Salt as well as many others. Although they differ in the specifics, the general objectives are the same.
Puppet's underlying philosophy is that you tell it what you want as an end result (required state), not how you want it done (the procedure), using Puppet's programming language. For example, you might say “I want ssh key XYZ to be able to log in to user account foo.” You wouldn't say “cat this string to /home/foo/.ssh/authorized_keys.” In fact, the simple procedure I defined isn't even close to being reliable or correct, as the .ssh directory may not exist, the permissions could be wrong and many other things.
You declare your requirements using Puppet's language in files called manifests with the suffix .pp. Your manifest states the requirements for a machine (virtual or real) using Puppet's built-in modules or your own custom modules, which also are stored in manifest files. Puppet is driven from this collection of manifests much like a program is built from code. When the puppet apply command is run, Puppet will compile the program, determine the difference in the machine's state from the desired state, and then make any changes necessary to bring the machine in line with the requirements.
This approach means that if you run puppet apply on a machine that is up to date with the current manifests, nothing should happen, as there are no changes to make.
Puppet is a tool (actually a whole suite of tools) that includes the Puppet execution program, the Puppet master, the Puppet database and the Puppet system information utility. There are many different ways to use it that suit different environments.
In this article, I explain the basics of Puppet and the way we use it to manage our servers and desktops, in a simplified form. I use the term “machine” to refer to desktops, virtual machines and hypervisor hosts.
The approach I outline here works well for 1–100 machines that are fairly similar but differ in various ways. If you are managing a cloud of 1,000 virtual servers that are identical or differ in very predictable ways, this approach is not optimized for that case (and you should write an article for the next issue of Linux Journal).
This approach is based around the ideas outlined in the excellent book Puppet 3 Beginners Guide by John Arundel. The basic idea is this:
Store your Puppet manifests in git. This provides a great way to manage, track and distribute changes. We also use it as the way servers get their manifests (we don't use a Puppet master). You easily could use Subversion, Mercurial or any other SCM.
Use a separate git branch for each machine so that machines are stable.
Each machine then periodically polls the git repository and runs puppet apply if there are any changes.
There is a manifest file for each machine that defines the desired state.
For the purposes of this article, I'm using the example of configuring developers' desktops. The example desktop machine is a clean Ubuntu 12.04 with the hostname puppet-test; however, any version of Linux should work with almost no differences. I will be working using an empty git repository on a private git server. If you are going to use GitHub for this, do not put any sensitive information in there, in particular keys or passwords.
Puppet is installed on the target machine using the commands shown in Listing 1. The install simply sets up the Puppet Labs repository and installs git and Puppet. Notice that I have used specific versions of puppet-common and the puppetlabs/apt module. Unfortunately, I have found Puppet tends to break previously valid code and its own modules even with minor upgrades. For this reason, all my machines are locked to specific versions, and upgrades are done in a controlled way.
I usually edit the manifests on my desktop and then commit them to git and push to the origin repository. I have uploaded my repository to GitHub as an easy reference at https://github.com/davidbartonau/linuxjournal-puppet, which you may wish to copy, fork and so on.
In your git repository, create the file manifests/puppet-test.pp, as shown in Listing 2. This file illustrates a few points:
The name of the file matches the hostname. This is not a requirement; it just helps to organize your manifests.
It imports the apt package, which is a module that allows you to manipulate installed software.
The top-level item is “node”, which means it defines the state of a server(s).
The node name is “puppet-test”, which matches the server name. This is how Puppet determines to apply this specific node.
The manifest declares that it wants the vim package installed and the emacs package absent. Let the flame wars commence!
Now you can use this Puppet configuration on the machine itself. If you ssh in to the machine (you may need ssh -A agent forwarding so you can authenticate to git), you can run the commands from Listing 3, replacing gitserver with your own.
This code clones the git repository into /etc/puppet/linuxjournal and then runs puppet apply using the custom manifests directory. The puppet apply command looks for a node with a matching name and then attempts to make the machine's state match what has been specified in that node. In this case, that means installing vim, if it isn't already, and removing emacs.
It would be nice to create the developer user, so you can set up that configuration. Listing 4 shows an updated puppet-test.pp that creates a user as per the developer variable (this is not a good way to do it, but it's done like this for the sake of this example). Note how the variable is preceded by $. Also the variable is substituted into strings quoted using “but not with” in the same way as bash.
Let's apply the new change on the desktop by pulling the changes and re-running puppet apply as per Listing 5. You now should have a new user created.
Putting all this code inside the node isn't very reusable. Let's move the user into a developer_pc module and call that from your node. To do this, create the file modules/developer_pc/manifests/init.pp in the git repository as per Listing 6. This creates a new module called developer_pc that accepts a parameter called developer name and uses it to define the user.
You then can use the module in your node as demonstrated in Listing 7. Note how you pass the developer parameter, which is then accessible inside the module.
Apply the changes again, and there shouldn't be any change. All you have done is refactored the code.
Say you would like to standardize your vim config for all the developers and stop word wrapping by setting up their .vimrc file. To do this in Puppet, you create the file you want to use in /modules/developer_pc/files/vimrc as per Listing 8, and then add a file resource in /modules/developer_pc/manifests/init.pp as per Listing 9. The file resource can be placed immediately below the user resource.
The file resource defines a file /home/$developer/.vimrc, which will be set from the vimrc file you created just before. You also set the owner and group on the file, since Puppet typically is run as root.
The require clause on the file takes an array of resources and states that those resources must be processed before this file is processed (note the uppercase first letter; this is how Puppet refers to resources rather than declaring them). This dependency allows you to stop Puppet from trying to create the .vimrc file before the user has been created. When resources are adjacent, like the user and the file, they also can be “chained” using the -> operator.
Apply the changes again, and you now can expect to see your custom .vimrc set up. If you run puppet apply later, if the source vimrc file hasn't changed, the .vimrc file won't change either, including the modification date. If one of the developers changes .vimrc, the next time puppet apply is run, it will be reverted to the version in Puppet.
A little later, say one of the developers asks if they can ignore case as well in vim when searching. You easily can roll this out to all the desktops. Simply change the vimrc file to include set ignorecase, commit and run puppet apply on each machine.
Often you will want to create files where the content is dynamic. Puppet has support for .erb templates, which are templates containing snippets of Ruby code similar to jsp or php files. The code has access to all of the variables in Puppet, with a slightly different syntax.
As an example, our build process uses a file called $HOME/Projects/override.properties that contains the name of the build root. This is typically just the user's home directory. You can set this up in Puppet using an .erb template as shown in Listing 10. The erb template is very similar to the static file, except it needs to be in the template folder, and it uses <%= %> for expressions, <% %> for code, and variables are referred to with the @ prefix.
You use the .erb template by adding the rules shown in Listing 11. First, you have to ensure that there is a Projects directory, and then you require the override.properties file itself. The -> operator is used to ensure that you create the directory first and then the file.
Running Puppet each time you want to make a change doesn't work well beyond a handful of machines. To solve this, you can have each machine automatically check git for changes and then run puppet apply (you can do this only if git has changed, but that is an optional).
Next, you will define a file called puppetApply.sh that does what you want and then set up a cron job to call it every ten minutes. This is done in a new module called puppet_apply in three steps:
Create your puppetApply.sh template in modules/puppet_apply/files/puppetApply.sh as per Listing 12.
Create the puppetApply.sh file and set up the crontab entry as shown in Listing 13.
Use your puppet_apply module from your node in puppet-test.pp as per Listing 14.
You will need to ensure that the server has read access to the git repository. You can do this using an SSH key distributed via Puppet and an IdentityFile entry in /root/.ssh/config.
If you apply changes now, you should see that there is an entry in root's crontab, and every ten minutes puppetApply.sh should run. Now you simply can commit your changes to git, and within ten minutes, they will be rolled out.
Many times you don't want to replace a config file, but rather ensure that certain options are set to certain values. For example, I may want to change the SSH port from the default of 22 to 2022 and disallow password logins. Rather than manage the entire config file with Puppet, I can use the augeas resource to set multiple configuration options.
Refer to Listing 15 for some code that can be added to the developer_pc class you created earlier. The code does three things:
Installs openssh-server (not really required, but there for completeness).
Ensures that SSH is running as a service.
Sets Port 2022 and PasswordAuthentication no in /etc/ssh/sshd_config.
If the file changes, the notify clause causes SSH to reload the configuration.
Once puppetApply.sh automatically runs, any subsequent SSH sessions will need to connect on port 2022, and you no longer will be able to use a password.
When defining rules in Puppet, it is important to keep in mind that removing a rule for a resource is not the same as a rule that removes that resource. For example, suppose you have a rule that creates an authorized SSH key for “developerA”. Later, “developerA” leaves, so you remove the rule defining the key. Unfortunately, this does not remove the entry from authorized_keys. In most cases, the state defined in Puppet resources is not considered definitive; changes outside Puppet are allowed. So once the rule for developerA's key has been removed, there is no way to know if it simply was added manually or if Puppet should remove it.
In this case, you can use the ensure => 'absent' rule to ensure packages, files, directories, users and so on are deleted. The original Listing 1 showed an example of this to remove the emacs package. There is a definite difference between ensuring that emacs is absent versus no rule declaration.
At our office, when a developer or administrator leaves, we replace their SSH key with an invalid key, which then immediately updates every entry for that developer.
Many modules are listed on Puppet Forge covering almost every imaginable problem. Some are really good, and others are less so. It's always worth searching to see if there is something good and then making a decision as to whether it's better to define your own module or reuse an existing one.
We don't keep all of our machines sitting on the master branch. We use a modified gitflow approach to manage our repository. Each server has its own branch, and most of them point at master. A few are on the bleeding edge of the develop branch. Periodically, we roll a new release from develop into master and then move each machine's branch forward from the old release to the new one. Keeping separate branches for each server gives flexibility to hold specific servers back and ensures that changes aren't rolled out to servers in an ad hoc fashion.
We use scripts to manage all our branches and fast-forward them to new releases. With roughly 100 machines, it works for us. On a larger scale, separate branches for each server probably is impractical.
Using a single repository shared with all servers isn't ideal. Storing sensitive information encrypted in Hiera is a good idea. There was an excellent Linux Journal article covering this: “Using Hiera with Puppet” by Scott Lackey in the March 2015 issue.
As your number of machines grows, using a single git repository could become a problem. The main problem for us is there is a lot of “commit noise” between reusable modules versus machine-specific configurations. Second, you may not want all your admins to be able to edit all the modules or machine manifests, or you may not want all manifests rolled out to each machine. Our solution is to use multiple repositories, one for generic modules, one for machine-/customer-specific configuration and one for global information. This keeps our core modules separated and under proper release management while also allowing us to release critical global changes easily.
The approach outlined in this article works well for us. I hope it works for you as well; however, you may want to consider some additional points.
As our servers differ in ways that are not consistent, using Facter or metadata to drive configuration isn't suitable for us. However, if you have 100 Web servers, using the hostname of nginx-prod-099 to determine the install requirements would save a lot of time.
A lot of people use the Puppet master to roll out and push changes, and this is the general approach referred to in a lot of tutorials on-line. You can combine this with PuppetDB to share information from one machine to another machine—for example, the public key of one server can be shared to another server.
This article has barely scratched the surface of what can be done using Puppet. Virtually everything about your machines can be managed using the various Puppet built-in resources or modules. After using it for a short while, you'll experience the ease of building a second server with a few commands or of rolling out a change to many servers in minutes.
Once you can make changes across servers so easily, it becomes much more rewarding to build things as well as possible. For example, monitoring your cron jobs and backups can take a lot more work than the actual task itself, but with configuration management, you can build a reusable module and then use it for everything.
For me, Puppet has transformed system administration from a chore into a rewarding activity because of the huge leverage you get. Give it a go; once you do, you'll never go back!