Interview with Katerina Barone-Adesi about developing with the Snabb Switch network toolkit, working outside the Linux kernel for cleaner code and faster throughput.
Recently, I had the pleasure of sitting down with Katerina Barone-Adesi, a developer from the Snabb Switch community to talk about Snabb Switch, writing network infrastructure code and more. In case you aren't familiar with Snabb Switch, it is described in its README as “a simple and fast packet networking toolkit”. The thing that caught my attention was Snabb Switch's unusual architecture: it's small and fast, but written largely in LuaJIt, which I didn't expect. It circumvents the kernel almost entirely, preferring to interface directly with network hardware and network code that is slower than that in Snabb Switch.
SS: Can you give me some examples of what Snabb Switch can be used for?
KB: Snabb should be interesting to anyone who wants to do high-speed networking with the flexibility of writing software and seriously needs to consider several gigabits a second or more of traffic. Secondarily, it is interesting to hobbyists and researchers. Alexander Gall, at SWITCH, wrote an L2VPN on Snabb, because the features he wanted were not available elsewhere. It is a really powerful tool for individuals and small teams who need to do unusual or new things with networks.
SS: So, how did Snabb Switch get started?
KB: Snabb was started by Luke Gorrie, who has a fairly deep background in networking, along with a professional history using Smalltalk, Forth, etc. He was influenced by some of Alan Kay's ideas about building small, comprehensible systems that individuals can understand—and also by the desire to do flexible, high-performance software-defined networking on commodity x86 hardware.
It started with a code budget—the binary should fit on a floppy, compile in less than a second (and less than a minute including dependencies), and be less than 10,000 lines of code. It has grown a bit beyond that. Essentially, it lets people do more or less line speed networking using 10 gigabit cards, on a single core, without the overhead of using the kernel. The kernel is involved peripherally (for instance, it uses hugepage support), but Snabb implements its own drivers, and the cards it is handling are not managed by the kernel.
SS: A lot of what's out there in this space today is big...when you're introducing people to these tools (either building them or using them) do they usually get where you're going right away? If not, what are the parts that are hardest for them to wrap their heads around?
KB: I would say the biggest initial hurdle, assuming some knowledge of networking, is that Snabb is a fairly young project. Some parts are well documented, others are not, and there are not many examples of how its libraries work, even scattered around GitHub. Snabb itself is pretty small and well factored, which helps a lot. More generally, networking is a vast space. When it comes to introducing people to using tools built on Snabb, there is often some incredulity about using a garbage-collected high-level language to do high-performance networking, but LuaJIt has a solid track record in this space by now. Aside from that, most people are not used to thinking of the timescales involved. Depending on the expected average packet size, if you are driving a 10 gigabit card, you have tens to hundreds of nanoseconds per packet. With these constraints, you need to think about when you will hit RAM, and things like system calls are prohibitively expensive.
SS: Why did you end up choosing LuaJIt instead of, for example, straight C?
KB: Snabb is written in LuaJIt, so the choice of language (and license—it's APLv2) came with the choice to use Snabb. That said, using LuaJIt is excellent for projects like this. It is a just-in-time trace compiler, so the actual code being run is shaped by the packets being seen, and there are no branch mispredictions within a trace. It also makes it extremely easy to use C when appropriate. It has an extremely nice FFI, good support for packed structs, and calling C code seems as easy as it would be if the whole project were in C. The code itself tends to be more concise and quicker to write—it is quite comparable to Python or Ruby in that regard. It is quite pleasant not to have to read past boilerplate details.
SS: What tools are you using to track down bottlenecks and get the optimization you need for the tight time constraints?
KB: The tools that we are using to track down bottlenecks include PMU counters and a combination of logical thinking about what is likely to be a bottleneck, followed by measuring changes to see whether they actually make an improvement. Having a small system makes the latter a surprisingly tractable approach. We also look at LuaJIt's trace dumps, which include IR (Intermediate Representation) and x86 assembly.
SS: Have you had to break parts out to C yet?
KB: I have not personally had to break parts out to C yet. People occasionally do—for instance, if they want to hide an unpredictable branch from the trace compiler. Another alternative that a colleague of mine used recently is DynASM, which allows you to generate assembly dynamically. He used it for some AVX2 instructions to access memory 256 bits at a time. One of the very surprising things about LuaJIt is that it often is as fast as C, and swapping code out for C blindly can genuinely make performance worse, as you lose the benefits of traces.
SS: How much do you have to know about the specific hardware this will be running on to keep the speed up? Is architecture plus network card enough, or are you falling into “oh, sorry, you have a brand of RAM we never tested...” level of optimization specificity?
KB: So far, micro-architecture (Haswell vs. Sky Lake) and raw CPU speed do make a noticeable difference in performance for some applications. For others, like the packetblaster app that ships with Snabb, any modern CPU Snabb can run on is so far beyond what is needed that the bottleneck is the network cards.
SS: So, to recap, Snabb Switch is an open-source userspace application written mostly in LuaJIt that's pushing network traffic on run-of-the-mill COTS x86_64 easily at 1Gbps, with a 100Gbps target in 2016...by bypassing the Linux kernel for faster performance...and you say you're getting comparable performance to C code from LuaJIt, without excessive scaffolding for performance testing.
SS: So, how big is this thing again?
KB: The source is less than 5MB; the resulting binary is something like 2MB before compression.
SS: Impressive. However, this isn't just some endpoint software, it's infrastructure—what does your team do to ensure code quality and security?
KB: For code quality, my team uses continuous integration, tests and code review. All code is reviewed by at least one person other than the author before being committed to the main development branch. We have a variety of end-to-end and unit tests. Pflua also has property-based tests, which compare its results with and without optimization to libpcap's results on randomly generated filters. We have plans to do more property-based and fuzz testing. Additionally, the small size of the system makes it easier to pay close attention to code quality. For security, all of the above, along with thinking hard and paying close attention to RFCs and their notes on security. Our fragmentation reassembly code rejects overlapping fragments as more recent RFCs recommend, for example.
SS: When you bring on new developers, how do you inculcate them into this kind of rigor given how little it's present in other programming shops, and how essential it is for developing infrastructure software?
KB: I introduced it to my team when I joined in 2014. We started using continuous integration software and property-based testing at that point, both of which significantly increased the reliability of pflua. Code reviews also help. We have had only one person join our team since; as we grow more, perhaps I will have a better answer for that. The code also had a lot of conceptual rigor to begin with. My colleague who shaped pflua most deeply, Andy Wingo, spends most of his time working on compilers, which also demands a fairly high level of rigor and coherent design.
SS: If you were to guess, about what percentage of your project's effort is spent on new code/features vs. testing/refining/optimizing, writing test code and scaffolding for fuzz testing vs. documenting and planning tasks?
KB: It depends on the phase of each project, I'd say. They tend to start off with a flurry of features and planning, and some testing is done in parallel. Later in each project, we optimize and test more. The efforts compliment each other well, as more extensive tests catch subtle mistakes attempts at optimization can introduce. Some of the features are specifically linked to performance goals, which makes them fall thoroughly into being both new code/features and testing/refining/optimizing at the same time. My colleague Andy Wingo's recent work with dynasm is an example. Earlier in the lwaftr project, almost all of the time was going to new code and features, with perhaps 20% going into testing them. Before the alpha release, the whole team focused almost entirely on performance for a few weeks. At the moment, I think it might be around 50/50.
SS: If you were to break it down to two to five principles the Snabb Switch toolkit is built on, what would they be?
KB: I would actually break it down to just one: systems should be small, and individuals should be able to hold the whole system in their head. Most of the other principles are consequences of this, and the high-speed software-defined networking/network function virtualization niche it is designed for.
SS: Why is this so important to Snabb Switch in particular?
KB: It is an important principle for software wherever it can reasonably be applied, and it is one thing that strongly drew me toward Snabb. I can only speculate that for Snabb in particular, it came from Luke's background with Smalltalk—where he took the quotes describing this principle from—and other minimal systems like Forth.
SS: What do you like best about working on Snabb Switch?
KB: Snabb is a really fun and interesting project to be involved with these days. Between the design principles and being able not only to replace old network functions but also implement entirely new ones on commodity x86 hardware and network cards that cost a few hundred bucks, I think it has a really exciting future.
SS: So, what got you interested in tweaking this type of networking code?
KB: This is my day job. I joined Igalia in 2014, when the networking team was working on pflua. Pflua is a library on top of Snabb that uses a subset of libpcap's filtering language. It is smaller, faster, and the subset that it supports is believed to be entirely compatible except for libpcap optimizer bugs.
I got into networking as a hobby back around 2000 though. I ran a small home network on Linux, various BSDs, read a lot of books, played around, etc. I have liked systems programming and dynamic languages for about as long, so a job that involves all of these elements is pretty fun.
SS: Did you make a specific effort to choose companies where you'd be doing open-source work, or was it luck?
KB: Igalia does only free software, and this was a major factor in my choosing to work for them. I did also interview at other companies that do a mixture of free and non-free software.
SS: Do you have any advice for coders interested in Snabb Switch in particular, or in moving from places higher up the stack into more infrastructure-y areas of programming work like what you're doing?
KB: For coders interested in Snabb Switch in particular, it depends on their background; ones with more relevant experience can probably jump in and implement something Luke brain-dumps about on the mailing list, or something else entirely. For those newer to these kinds of programming, I would recommend starting by writing a Snabb app. These can be as simple as a packet blaster or an app that echoes packets between interfaces, or more complex than the l2vpn or lwAFTR, and can be built up incrementally. Some of the tests can be run, and app development can be done, on any modern x86 machine running Linux. I prototyped the lwAFTR on my development laptop, which has no Ethernet cards. Snabb has a mechanism for plugging apps into each other that makes running it on real hardware a simple matter of changing a couple lines of configuration and recompiling.
SS: Thanks again, Katerina, for sharing your time and expertise.