Diving into Syzkaller’s mutation mechanism (2/2)

7 min readFeb 24, 2023

Before we start…

So this is where the real fun starts because, while the previous three methods are more or less crude and monolithic, the rest two methods are not quite monolithic and consist of a handful of parts. I found those little parts are worth looking into because they have subtle meanings and implications on the mutated inputs. Let’s start with “inserting a syscall” first.

2.4. Inserting a syscall

Before jumping into the details, let’s briefly think about what this mutation should aim at. Hint: there are lots of syscalls and each of them activates different parts of the kernel. This begs a question. If a random syscall is inserted into an input (i.e., an existing sequence of syscalls), what are the odds that this new syscall actually interacts with the existing ones? After all, these syscalls are supposed to interact with each other to create interesting program states (possibly leading to crashes), but if a new syscall is irrelevant to the rest at all, the resulting mutated input will just execute the same program state plus some irrelevant execution, which is just a waste of time.

So it all boils down to inserting a relevant syscall to make the mutated input more interesting. What Syzkaller does for that matter is choose a more relevant syscall to the existing syscalls in the input. This can be further broken down into two parts; calculating the relevance between a syscall pair, and choosing a new syscall based on the calculated relevance.

2.4.1. Calculating the relevance between a syscall pair

To make it clear, the term “relevance” is just what I use in this article. The original Syzkaller author called it a “call-to-call priority”, but I thought using this term as-is can be confusing for this article because there are many “priorities” going on in Syzkaller.

Anyway, they are calculated when syz-fuzzer starts up and builds a choice table that specifies possible syscalls and their relevances. Syzkaller calculates the relevances of each syscall pair in two ways. The first way is utilizing static information, specifying the syscall specifications (i.e., argument types and their data-flow directions), and the second way is utilizing dynamic information in the corpus, indicating which syscall pairs are likely to appear at the same time in a single input.

[References: syz-fuzzer/fuzzer.go:main(), prog/prio.go:BuildChoiceTable(), CalculatePriorities()]

How it derives the relevances from this information is largely a heuristic. Let’s look at the static information part first. As I mentioned, the static information specifies the details of each syscall, or specifically, the types of each argument in a syscall. Syzkaller further classifies these types as resources according to their expected usages and calculates which resources a given syscall uses. It also weights certain resource usages more than others to treat them more importantly when calculating syscall relevances, where these weight values are heuristically-determined constants.

The static information part is not finished yet. Now that Syzkaller identified the resource usage of each syscall, the next task is utilizing this usage to estimate the relevance of each syscall pair. The algorithm itself is quite simple; given an (ordered) pair of syscalls, assign the relevance score (or priority in Syzkaller terms) between them depending on their producing/consuming resources. This is when the weight in the previous paragraph plays a role, because this weight indicates how significantly a given resource is produced or consumed by a syscall. Using these per-resource weights and another heuristic formula, Syzkaller estimates the relevance of a syscall pair. The figure describes the simplified version of static relevance calculation, where the numbers in parentheses indicate the weights of the resource usage.

[References: prog/prio.go:calcStaticPriorities(), calcResourceUsage(), noteUsage()]

Using dynamic information is much simpler. By dynamic information, what it means is the likelihood of two system calls appearing in the same input, and to check that, Syzkaller scans all inputs in the current corpus; the more they appear together, the higher their relevance is. After calculating both static and dynamic relevances, Syzkaller combines both of them by multiplying two.

[References: prog/prio.go:calcDynamicPriorities()]

2.4.2. Inserting a new syscall

Now let’s go back to when the mutator decides to insert a new syscall. Before choosing which syscall is going to be inserted, it first needs to decide where this new syscall is to be inserted. Again, the mutator randomly selects the index of the existing syscall sequence where the new syscall is to be inserted, but this time, this random index is biased to the end of the sequence; the likelihood of choosing the nth index increases proportionally to n.

What’s the meaning of this bias? Basically, putting a new syscall at the end means doing relevant computations more (given that the new syscall is indeed relevant to the existing ones). This makes sense because the original input was already working to some extent, and extending this with another relevant syscall may contribute to building up new program states. Meanwhile, although inserting a syscall in between can potentially break the rest of the syscall sequences, it cannot be suppressed altogether because it can still create new program states.

[References: prog/mutation.go: insertCall()]

After selecting the insertion point, the mutator carries on to choosing a syscall to insert, and this is where the relevance score calculated above plays a part. The basic idea here is choosing one another random syscall preceding the insertion point (let’s call this a reference syscall) and putting a new relevant syscall to this reference syscall after that. The syscall to be newly created and inserted is chosen at random, but to make it more relevant to this reference syscall, the random draw is weighted by the relevance between this reference syscall and each of the would-be syscalls.

[References: prog/rand.go:generateCall(), prog/prio.go:choose()]

One interesting detail of this mutation is that, if this newly inserted syscall ends up making the input longer than the limit, it opts for removing inserted syscalls, not the existing ones. I thought this is weird at first because there’s no point in inserting a syscall only to remove it. But it turns out that sometimes creating a syscall can end up introducing multiple other syscalls (e.g., if the new one requires other syscalls to make necessary context), where in that case removing some of the inserted syscalls makes more sense.

[Reference: prog/mutation.go:insertCall()]

2.5. Mutating syscall arguments

Mutating syscall arguments is probably the most sophisticated mutation method in terms of implementation, but when it comes to what it does, the concept is surprisingly straightforward. The rough summary of this mutation method would be to choose a random syscall in the input and mutate some of its arguments at random. The way how randomly it chooses them, however, is something special.

2.5.1. Choosing a syscall

When mutating arguments, the first task Syzkaller does is choosing a syscall in the input, so that it can mutate its arguments afterward. The choice is not fully random but again weighted per syscall according to which arguments it has; we don’t want to mutate a syscall whose arguments have little value to mutate, let alone a syscall without any argument in the first place. To be specific, the weight of a syscall is determined as the sum of the individual weight of each argument. The weight of an argument is largely determined by its type, and the amount is generally larger if the argument is expected to have more impact when mutated.

[References: prog/mutation.go:mutateArg(), chooseCall()]

Just a minor note. Syzkaller makes use of a very similar algorithm whenever it needs to choose something randomly with weights (or priorities in Syzkaller’s terms), while it’s never made into a stand-alone function or a module. Conceptually, what it does is stack up all the weights in the number space (e.g., if we have weights {2,3,5,4}, the stacked-up version would be {2,5,10,14}), draw a random number smaller than the sum of all weights (say it was 8), and select the first weight whose accumulated version until that point is the first larger than the random number (e.g., the accumulated weight gets larger the first time when it reaches 5 (10>8), so the weight 5 is chosen). You can easily recognize this pattern to identify where Syzkaller is doing a weighted random draw.

2.5.2. Mutating arguments

Choosing an argument to mutate is very similar to the above, except Syzkaller is doing a random draw with the arguments instead of the syscalls, and it can potentially do this multiple times (c.f., choosing a syscall is done only once). The actual mutation operation depends on the type of the chosen argument, which is too technical to explain them one by one. One interesting detail is that some argument mutations can introduce additional syscalls as a side-effect, which can increase the number of syscalls in the input.

[References: prog/mutation.go:chooseArg()]

Wrapping up…

Before writing this stuff, I was just analyzing the details of mutation that I needed for personal reasons and felt like summarizing them would be a good idea for both my future self and someone who may want to jump-start the Syzkaller hacking similar to me. Maybe I should deal with other parts of Syzkaller like the overall workflow soon or later, although I think I should have dealt with them first in the right order.

Links to the other installments: 1, 2.

Diving into Syzkaller’s mutation mechanism (2/2)

Before we start…

2.4. Inserting a syscall

2.5. Mutating syscall arguments

Wrapping up…

Written by Gwangmu Lee