Diving into Syzkaller’s mutation mechanism (1/2)

4 min readFeb 23, 2023

If you just want to quickly check Syskaller’s mutation mechanism, jump to Section 2. Also if you noticed anything incorrect in this article, please let me know!

Introduction

Fuzzing is hands-down the most popular technique for program testing at this moment. To put it simply, fuzzing is an indefinite process of feeding randomly mutated inputs to a program under test and observing crashes that these mutated inputs induce. These mutated inputs are still partially legitimate because they are mutated from valid inputs, while they bear some unexpected corner cases that can ultimately make the program crash.

Fuzzer’s efficiency greatly depends on the quality of mutated inputs, where they have to be valid enough to prevent the program from early-rejecting mutated inputs, while still having some subtleties that can probably cause problems. Mutating inputs in such a way is one huge challenge that has sparked a lot of research, and it only gets worse if inputs are expected to satisfy certain formats or templates to be even considered by the program in the first place.

In that sense, fuzzing OS kernels is far from trivial apart from all the technical difficulties that it imposes. Specifically, a minimal acceptable input should consist of a sequence of syscalls and their arguments. Obviously, mutating any of these is less straightforward than simply changing the value of input bytes. In this article, I’m going to provide a brief overview of how Syzkaller (the most popular OS kernel fuzzer) is dealing with this by summarizing its input mutation mechanism.

0. Selecting a mutation input

This is technically before mutation so I’m not going to dig deep into this, but it’ll be nice to briefly summarize how Syzkaller selects inputs to mutate. While looping infinitely, syz-fuzzer (the actual fuzzing component of Syzkaller) selects a random input from the input pool (called a corpus) when it has no other tasks to do in the queue (e.g., triaging or smashing inputs; will be probably covered by the upcoming articles) and it does not generate inputs from scratch (which rarely takes place compared to mutation). After selecting an input, the syz-fuzzer passes this input to the Syzkaller mutator that performs the actual mutation.

[Reference: syz-fuzzer/proc.go:loop()]

1. Selecting a mutation method

The first thing the mutator does after accepting an input is to select the mutation method. There are five mutation methods that Syzkaller incorporates — splicing/inserting/removing syscalls and squashing/mutating arguments — and the mutator may apply them multiple times until it either encounters any failed attempt to mutate or probabilistically decides to stop (1/3 of chance after each mutation), or the mutated input exceeds the number of maximum syscalls. Each mutation method has a different probability to be selected, with inserting a syscall and mutating arguments being the two most probable methods.

[Reference: prog/mutation.go:Mutate()]

2. Performing mutation

Unsurprisingly, the five mutation methods all have a different level of complexity in their actual mutation algorithm. It’s unclear whether it was intended for sophisticated methods to be more probable to be chosen than simpler ones, but the two least probable methods are very simple, while the two most probable ones are much more sophisticated (roughly 10x more code for them than simpler ones). In the following, I’m going to explain each method one by one from the simplest to the most complex.

2.1. Removing a syscall

This is the simplest one and it does exactly what its name suggests; it selects a syscall in the input and removes it. There is no weight on syscalls regarding which is more likely to be removed.

[Reference: prog/mutation.go:removeCall()]

2.2. Splicing syscalls

In fuzzing, splicing is a fairly standard mutation method where you take two inputs and combine them patch by patch. The actual splicing strategy depends on fuzzers, but in Syzkaller, it basically inserts the entire second input into a random point of the first input (where the second input is also chosen randomly from the corpus). For example, assume the mutator is about to splice input A that consists of two parts A1:A2 and another input B. What the mutator does is randomly choose a point in input A (assume that point is between A1 and A2) and insert input B into that point (so the end result would look like A1:B:A2). It removes some syscalls at the end, however, if the end result exceeds the maximum number of syscalls.

[Reference: prog/mutation.go:splice()]

2.3. Squashing a pointer argument

Squashing pointer arguments is not an ordinary mutation strategy, but it’s necessary for Syzkaller because every argument has its own kind and they are mutated by different logic depending on it. The gist of squashing a pointer is converting a complex memory object (e.g., union or struct) to a sequence of flat byte buffers, so that the subsequent mutations can mutate the content of the memory object on a byte level. In the figure above, the squashing operation will lump together the fields a, b, c, d, and the padding in between to a single byte buffer (a_to_d). It still leaves some parts as separate entities, however, if their value is to be determined by the result of another syscall. Again, the mutator randomly chooses which pointer it’ll squash.

Notice that squashing itself is not changing the input at all from the kernel’s perspective; the kernel will see the same byte representation of the input, regardless of whether they are from structs or byte arrays. Instead, the squashing logic additionally mutates a random byte buffer after squashing the memory object. You can come up with a mental image that the mutator breaks structs or unions as if they were just byte arrays.

[References: prog/mutation.go:squashAny(), prog/any.go:squashPtr(), squashPtrImpl()]

To be continued…

I originally planned to explain all methods altogether in one article, but I found it requires even more text than the ones above to explain the rest two methods. I’ll continue explaining them in the following articles.

Links to the other installments: 1, 2.