QEMU: How to Add Instructions

QEMU is a powerful system emulator with many use cases. One common use case in my experience doing systems-related research is to run code with custom instructions. For example, I used QEMU for experimenting with the functionality of a custom CPU architecture called Capstone. I also coupled custom instructions implemented in QEMU with compiler changes to instrument program execution.

In this post, I summarise the minimal changes needed to add a custom instruction to QEMU. Here I am talking about QEMU with its TCG (Tiny Code Generator) backend, which first translates code into specialised IR code before translating the IR code into host instructions. QEMU also has a KVM backend, but that effectively turns QEMU into a VMM and only works when the host and the target architectures match. I will not go into details about how the different backends work but plan to write another post for that in the future. There are also nuances like system emulation vs user emulation mode. Those should not matter in the scope of this post.

Hypothetical Goal

For purpose of demonstration, let’s say we want to add an instruction diffacc, which takes two general-purpose registers rs1 and rs2, computes the difference of the two register values, and sums up all such results in a separate state. For example, assuming a0, a1, and a2 have values 0, 4, 9, respectively, after diffacc a0, a1 and diffacc a2, a0, we should have recorded the accumulated result 13.

Step 0: Pick a base architecture

QEMU includes code specific to each target architecture. Such code describes things like how to decode each instruction and how to generate IR code for each instruction. There are also target-architecture-specific code for different emulation modes. For system emulation mode, some code needs to define how system-wide functionalities like address translation and interrupt delivery work, and for user emulation mode, how each system call should be emulated. Such code resides in target/<arch> and linux-user/<arch>. As Step 0, just pick whatever target architecture you want to modify. I picked RISC-V for example, and its architecture-specific code is in target/riscv and linux-user/riscv.

Step 1: Define how to decode the instruction

Assuming RISC-V, target/riscv/insn32.decode defines how each 4-byte instruction is decoded. The file target/riscv/insn16.decode does the same for compressed instructions (C extension). What we need to do is to add diffacc in either of those two files. For example, we can add it to target/riscv/insn32.decode as follows:

diffacc          0000001 .....    ..... 001 ..... 1011011 @r

The format of those files is straightforward. The first column defines the mnemonic/name of the instruction, then we provide the bit pattern to match the instruction, . for “don’t care”. The last column @r specifies the instruction format, which defines the arguments produced when decoding this instruction:

%rs2       20:5
%rs1       15:5
%rd        7:5
&r    rd rs1 rs2
@r       .......   ..... ..... ... ..... ....... &r                %rs2 %rs1 %rd

The build process converts those .decode files to C code that performs the parsing. The generated header files also include struct type definitions that correspond to decoded instructions. For our diffacc instruction, the type is named arg_diffacc and contains three fields rd, rs1, and rs2. In addition, a function trans_diffacc is declared:

static bool trans_diffacc(DisasContext *ctx, arg_diffacc *a);

which brings us to the next step.

Step 2: Define how to translate it into IR code

QEMU calls the function trans_diffacc() whenever it has decoded a diffacc instruction and needs to translate it into TCG IR code which QEMU will typically translate in turn into host instructions. We need to implement trans_diffacc() to tell QEMU how to generate the right TCG IR.

Sounds easy! But before you rush to write something like sum += a->rs1 + a->rs2, let me remind you that we are still doing metaprogramming: this trans_diffacc() generates code which, when run, emulates what diffacc does, rather than emulates diffacc by itself. Also look at the rs1 and rs2 arguments. Those are not the run-time register values here, but the register numbers specified in the decoded instruction.

In general, we will need to learn this assembly-like TCG IR language here to know how to generate code that does the right thing. Luckily, there is a lazy workaround. This post introduces this workaround. Future posts might go into TCG IR. For now we just produce this quite general workround below:

static bool trans_diffacc(DisasContext *ctx, arg_diffacc *a)
{
    TCGv_i32 rs1 = tcg_constant_i32(a->rs1);
    TCGv_i32 rs2 = tcg_constant_i32(a->rs2);
    gen_helper_diffacc(cpu_env, rs1, rs2);
    return true;
}

The code generates a call to a helper function called diffacc, passing two constants which are the two operand register numbers to it. A helper function is just a function directly provided in the host environment, typically written in C and compiled as part of QEMU itself. The TCG IR code gen_helper_<...> generates translates into a function call to this host function. The argument cpu_env specifies that the CPU execution state is passed as an argument to the function which can then inspect and mutate register values through it.

Of course, we are using this hack here only because we are lazy and we do not know much about TCG IR. Its main downside is its large overhead. If we expect our instruction to be executed very frequently and want better performance, we should try to avoid calling helper functions. They are more suitable for instructions that are rare or hard to implement using TCG IR (e.g., those privileged instructions that need to check or change lots of system states or have some complex logic).

Step 3: Emulate instruction with helper function

Now we define find the helper function so during run-time the code generated for our instruction can call it. The helper function needs to be declared first in target/riscv/helper.h using a macro:

DEF_HELPER_3(diffacc, void, env, i32, i32)

which declares a helper function that returns nothing and takes three arguments of types env (CPU execution state), i32, and i32.

Now we are finally ready to define what the instruction actually does in target/riscv/op_helper.c. This ends up being very straightforward. We just provide a very direct implementation of the helper function (here we need to prefix its name with helper_):

uint64_t sum;

void helper_diffacc(CPURISCVState *env, uint32_t rs1, uint32_t rs2)
{
    sum += env->gpr[rs2] - env->gpr[rs1];
}

Conclusion

Hopefully this post explains enough to give you a general impression of how a target instruction works in QEMU. With this knowledge you should be able to start exploring other bits of QEMU TCG on your own, though I do plan to cover some of those too in the future to assist myself in my learning.