For all users who have wondered “did someone backdoor this?”, there should be developers ensuring that the code they put out into the world can be verified and tampering detected. Package maintainers and users also must exercise diligence in order to avoid running untrusted code. This article walks you through the chain of custody between a hypothetical OSS project's developers and users, explaining what could go right or wrong at each step.
There's a great deal to be said for secure coding practices. However, if the program the user receives is not the one the developer created—complete and unchanged—those secure coding practices may not matter. In this article, I follow the paths that a hypothetical piece of software, foobard, may take from its development team to its users, describing how that path can be exploited and how it can be protected.
Alice and Bob are great at coding. They maintain a robust test suite and accept only patches that pass all tests. They regularly fuzz test the application as a whole and use static analysis tools to alert them of potential flaws in their code. Their architecture is extremely well thought out, and their choices in dependencies are sane. Throughout these examples, I'm assuming that foobard, as written by Alice and Bob, does not present any unknown security risks. Unfortunately, there are many places that this can fall apart before foobard reaches users.
Alice and Bob are using CVS to maintain foobard. After all, it's not a huge project, and CVS is what they have always used. The server that hosts the foobard CVS repo, however, was compromised and began serving up spyware tarballs on one of its Web pages. Alice and Bob don't know exactly what access the attacker achieved or how long ago the compromise happened, so they can't trust their server backups to have an unmodified copy of the repo. CVS offers no built-in integrity checking mechanism for the code itself, and modifying CVS history is trivial. Alice and Bob can try to cobble together—from whatever is stored on their laptops or in other theoretically safe locations—enough data to spot-check the foobard repo and ensure that none of their code has been changed by the attacker. However, spot-checking provides little guarantee about the code's overall integrity and won't help reconstruct a full, known-good history.
If that sounds far-fetched, consider that even the server owner may be the attacker. You may remember that popular open-source host SourceForge was caught changing a hosted project's installer to install malware in addition to the requested software (see Resources for more information).
This could have been prevented if Alice and Bob were using a modern source code management tool such as Git or Mercurial, both of which use hashes to identify commits, and both of which allow code signing. In Git, you GPG-sign a tag, and in Mercurial, you GPG-sign the manifest key in a changelog entry. In either case, that signature can be used not just to verify the integrity of one commit, but also of that commit and all of its ancestors. This doesn't mean there is no way to corrupt the authoritative repository on the server, but when best practices are used, it becomes astronomically difficult for attackers to hide that corruption, requiring a timed compromise of multiple machines.
This protection, of course, relies in part on the secrecy of the private GPG key(s) used for signing tags (or manifests). If Alice or Bob loses a copy of such a private key, it must be revoked and replaced as soon as possible, before an attacker has had time to brute-force the key's passphrase.
Now that that's sorted, with Alice and Bob migrating to Git and tagging releases with GPG-signed tags, they've increased the security of one link in the chain. I'll go so far as to assume that, having learned this lesson, Alice and Bob also learned to sign any release tarballs they offer. By changing these two practices, Alice and Bob also have mitigated some risks from unreliable DNS (when one can verify the code itself, one need not care if it came from the expected URL) and potential SSL issues (for the same reason: they're checking the code not trusting its origin). Another member of the Open Source community, Carol, now can get a known-good copy of the foobard source. Of course, before she can use foobard, Carol needs to build it.
The build scripts for foobard include checking for, and if need be retrieving, several dependencies. Although these dependencies were well chosen, foobard's build system will retrieve and build these packages blindly without checking their integrity at all. This is arbitrary code execution with the permissions of whatever user ran foobard's build script. Users' ISPs already are injecting ads into Web sites using their position between users and the Internet, so there is no reason to believe that they (or a state actor, or a DNS registrar, or a router manufacturer, or a server compromise) never will cause you to grab something other than the dependencies you expected.
To solve this, Alice and Bob have two choices:
Ensure that the build script exits with an explanatory error when a dependency is not found locally, so that Carol can get dependencies in her usual, probably sane, way.
Ensure that the build script does appropriate integrity checking of any dependencies it downloads, and that any dependencies' build scripts do the same, all the way down the dependency tree.
Let's assume that Alice and Bob chose option one, as it's by far the least laborious. Now, in theory, Carol can get a known-good copy of foobard and build it without running or installing software of unknown origin on her machine. This is good, because once the machine doing the compiling is compromised, the binary cannot be trusted (nor can anything else on that system). They are depending on either Carol or some tool she runs to check the signatures on the code she downloads.
Carol, it turns out, is a package maintainer for a binary Linux distribution. It doesn't matter which one for the purposes of this article. Now that she has gotten a known-good copy of foobard, and known-good copies of all relevant dependencies, and has built foobard, Carol is packaging it up for a repository that will provide the prebuilt binary to thousands of users. She should, in turn, ensure that the packages she generates are signed before being passed on to package mirrors.
The state of things at the time of this writing (mid-September 2015) is that binary Linux distributions vary in how they check the integrity of the software that they package. Major distributions, such as Red Hat, Fedora and Debian, for example, do sign official packages cryptographically, and their package managers reject packages with bad signatures. Gentoo uses a Git-backed package management strategy that signs commit hashes rather than individual packages, achieving the same general effect plus protection of the package metadata and prevention of metadata replay attacks. However, the source code those ebuilds retrieve is not checked, as far as I can tell.
None of these Linux distributions have published policies that I can find that would bar the signing and distribution of packages for code that was not signed by its developer, or that pulls in unsigned code or binaries at build time. In short, most package managers are verifying the authenticity of packages, but package management teams don't seem to be differentiating between packages made from known-good code and packages made from code of which they cannot verify the integrity.
To the best of my knowledge, current package managers still consider “valid code signing key” to be a binary property. That is, a code signing key either is considered valid by your package manager for signing any package, or is not considered valid at all. As such, people who maintain a portage overlay (or deb/rpm repository) with your favorite game in it could sign (or their compromised key could sign) binutils or sudo. So, package maintainers who think their packages' importance is not high enough to merit a diligent approach to information security may cause your system to replace crucial system utilities typically run as root or capable of mediating root access.
Linux and other open-source software is used around the world: in medical care, the power grid, the Internet and countless other bits of infrastructure that we rely on every day. Luckily, it's possible to make the kinds of software supply chain attacks described here incredibly difficult to pull off. Doing so will take concerted effort by developers, distribution maintainers (both packagers and maintainers of the packaging systems), as well as users.
OSS Developers Should:
Use a source control system with integrated integrity checking, such as Git or Mercurial, for managing all projects.
Cryptographically sign each release in the source control system (via tag or equivalent) and each release tarball.
Carefully safeguard their private keys: both code-signing keys and the SSH keys used to commit code.
Rapidly revoke and replace keys that may be compromised. Remember: new GPG/SSH keys are free; the damage to your project's reputation if compromised code goes out with your valid signature is irreversible.
Ensure that the build system generates errors for missing dependencies, rather than blindly downloading and building them without integrity checking.
Get their GPG keys signed by other developers, and in turn, sign those developers' keys, so that users have a better idea of which GPG keys to trust.
Choose dependencies with similarly good distribution practices, and file bugs with dependencies that are not following these recommendations.
Linux Distributions Should:
Use caution in obtaining source code for generating packages, checking that the code is signed by a trusted key and not building against any untrusted code such as something downloaded by the build system without integrity checking.
Make contact with upstream developers when public Git/Mercurial history changes to ensure that the change was expected and not a sign of tampering.
File bugs with upstream developers who do not use modern source control systems and/or don't cryptographically sign releases.
Never accept packages that are not cryptographically signed by the package maintainer.
Set a date to stop packaging code that was not signed by its development team, communicate that date upstream and stick to it.
Ensure that the package manager checks signatures on all packages it retrieves, and that it checks for revocation of package-signing keys.
Check the cryptographic signatures of any additional files that a package may download.
Ensure that the package manager warns the user if a package's integrity cannot be verified, either due to a failed signature check or to the package relying on some resource (such as a proprietary blob from a third-party site) that is not signed at all.
Design package management tools that allow a particular package-signing key to be valid only for certain packages.
Users Should:
Be suspicious of any program not signed by its developer (or package maintainer), whether that software is open source or being distributed as a compiled binary. Ideally, one never would run unsigned code at all. However, in applications that are not life-critical, one may need to compromise at minimizing the amount of unsigned code in use, and not running unsigned code as root.
Exercise due diligence in obtaining source code to compile: check that the code is signed by a reasonably trusted key and does not download anything at build time without authenticating it.
File bugs with developers who do not use modern source control systems and/or do not cryptographically sign releases.
Not enable package repositories if those repositories' maintainers are not signing packages, or if the maintainers' keys can't be verified.
Some of these things are being done most of the time, and the overall picture is improving. Running software inevitably involves trust, as no one has both the time and the skill to audit every piece of code running on their systems. We can do a better job of making sure that we only trust code that came from the people we think it came from.