SourceForge.net Logo

SLJIT tutorial

Before started

Download the tutorial sources

SLJIT is a light-weight, platform independent JIT compiler, it's easy to embed to your own project, as a result of its 'stack-less', SLJIT have some limit to register usage.

Here is some other JIT compiler I digged these days, place here if you have interest:
    Libjit/liblighning: - the backend of GNU.net
    Libgccjit: - introduced in GCC5.0, its different from other JIT lib, this one seems like constructing a C code, it use the backend of GCC.
    AsmJIT: - branch from the famous V8 project (JavaScript engine in Chrome), support only X86/X86_64.
    DynASM: - used in LuaJIT.

AsmJIT and DynASM work in the instruction level, look like coding with ASM language, SLJIT look like ASM also, but it hide the detail of the specific CPU, make it more common, and become portable, libjit work on higher layer, libgccjit as I mention, really you are constructing the C code.

First program

Usage of SLJIT:
    1. #include "sljitLir.h" in the head of your C/C++ program
    2. Compile with sljit_src/sljitLir.c
ALL example can be compile like this:
    gcc -Wall -Ipath/to/sljit_src -DSLJIT_CONFIG_AUTO=1 \
      xxx.c path/to/sljit_src/sljitLir.c -o program
OK, let's take a look at the first program, this program we create a function that return the sum of 3 arguments.

    #include "sljitLir.h"

    #include <stdio.h>
    #include <stdlib.h>

    typedef sljit_sw (*func3_t)(sljit_sw a, sljit_sw b, sljit_sw c);

    static int add3(sljit_sw a, sljit_sw b, sljit_sw c)
    {
      void *code;
      sljit_sw len;
      func3_t func;

      /* Create a SLJIT compiler */
      struct sljit_compiler *C = sljit_create_compiler(NULL, NULL);

      /* Start a context(function entry), has 3 arguments, discuss later */
      sljit_emit_enter(C, 0, SLJIT_ARGS3(W, W, W, W), 1, 3, 0, 0, 0);

      /* The first arguments of function is register SLJIT_S0, 2nd, SLJIT_S1, etc. */
      /* R0 = first */
      sljit_emit_op1(C, SLJIT_MOV, SLJIT_R0, 0, SLJIT_S0, 0);

      /* R0 = R0 + second */
      sljit_emit_op2(C, SLJIT_ADD, SLJIT_R0, 0, SLJIT_R0, 0, SLJIT_S1, 0);

      /* R0 = R0 + third */
      sljit_emit_op2(C, SLJIT_ADD, SLJIT_R0, 0, SLJIT_R0, 0, SLJIT_S2, 0);

      /* This statement mov R0 to RETURN REG and return */
      /* in fact, R0 is RETURN REG itself */
      sljit_emit_return(C, SLJIT_MOV, SLJIT_R0, 0);

      /* Generate machine code */
      code = sljit_generate_code(C);
      len = sljit_get_generated_code_size(C);

      /* Execute code */
      func = (func3_t)code;
      printf("func return %ld\n", func(a, b, c));

      /* dump_code(code, len); */

      /* Clean up */
      sljit_free_compiler(C);
      sljit_free_code(code, NULL);
      return 0;
    }

    int main()
    {
      return add3(4, 5, 6);
    }

The function sljit_emit_enter create a context, save some registers to the stack, and create a call-frame, sljit_emit_return restore the saved-register and clean-up the frame. SLJIT is design to embed into other application, the code it generated has to follow some basic rule.

The standard called Application Binary Interface, or ABI for short, here is a document for X86_64 CPU (ABI.pdf), almost all Linux/Unix follow this standard. MS windows has its own, read this for more: X86_calling_conventions

When reading the doc of sljit_emit_emter, the parameters 'saveds' and 'scratchs' make me confused. The fact is, the registers in CPU has different functions in the ABI spec, some of them used to pass arguments, some of them are 'callee-saved', some of them are 'temporary used', take X86_64 for example, RAX, R10, R11 are temporary used, that means, they may be changed after a call instruction. And RBX, R12-R15 are callee-saved, those will remain the same values after the call. The rule is, every function should save those registers before using it.

Fortunately, SLJIT have done the most for us, SLJIT_S[0-9] represent those 'safe' registers, SLJIT_R[0-9] however, only for 'temporary used'.

When a function start, SLJIT move the function arguments to S0, S1, S2 register, it means function arguments are always 'safe' in the context; a maximum of 4 arguments is supported by SLJIT.

Sljit_emit_opX is easy to understand, in SLJIT a data value is represented by 2 parameters, it can be a register, an In-memory data, or an immediate number.

First parameter Second parameter Meaning
SLJIT_R*, SLJIT_S* 0 Temp/saved registers
SLJIT_IMM Number Immediate number
SLJIT_MEM Address In-mem data with Absolute address
SLJIT_MEM1(r) Offset In-mem data in [R + offset]
SLJIT_MEM2(r1, r2) Shift(size) In-mem array, R1 as base address, R2 as index,
Shift as size(0 for bytes, 1 for shorts, 2 for
4bytes, 3 for 8bytes)

Branch

    #include "sljitLir.h"

    #include <stdio.h>
    #include <stdlib.h>

    typedef sljit_sw (*func3_t)(sljit_sw a, sljit_sw b, sljit_sw c);

    /*
    This example, we generate a function like this:

    sljit_sw func(sljit_sw a, sljit_sw b, sljit_sw c)
    {
      if ((a & 1) == 0)
        return c;
      return b;
    }

    */
    static int branch(sljit_sw a, sljit_sw b, sljit_sw c)
    {
      void *code;
      sljit_uw len;
      func3_t func;

      struct sljit_jump *ret_c;
      struct sljit_jump *out;

      /* Create a SLJIT compiler */
      struct sljit_compiler *C = sljit_create_compiler(NULL, NULL);

      /* 3 arg, 1 temp reg, 3 save reg */
      sljit_emit_enter(C, 0, SLJIT_ARGS3(W, W, W, W), 1, 3, 0, 0, 0);

      /* R0 = a & 1, S0 is argument a */
      sljit_emit_op2(C, SLJIT_AND, SLJIT_R0, 0, SLJIT_S0, 0, SLJIT_IMM, 1);

      /* if R0 == 0 then jump to ret_c, where is ret_c? we assign it later */
      ret_c = sljit_emit_cmp(C, SLJIT_EQUAL, SLJIT_R0, 0, SLJIT_IMM, 0);

      /* R0 = b, S1 is argument b */
      sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_S1, 0);

      /* jump to out */
      out = sljit_emit_jump(C, SLJIT_JUMP);

      /* here is the 'ret_c' should jump, we emit a label and set it to ret_c */
      sljit_set_label(ret_c, sljit_emit_label(C));

      /* R0 = c, S2 is argument c */
      sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_S2, 0);

      /* here is the 'out' should jump */
      sljit_set_label(out, sljit_emit_label(C));

      /* end of function */
      sljit_emit_return(C, SLJIT_MOV, SLJIT_RETURN_REG, 0);

      /* Generate machine code */
      code = sljit_generate_code(C);
      len = sljit_get_generated_code_size(C);

      /* Execute code */
      func = (func3_t)code;
      printf("func return %ld\n", func(a, b, c));

      /* dump_code(code, len); */

      /* Clean up */
      sljit_free_compiler(C);
      sljit_free_code(code, NULL);
      return 0;
    }

    int main()
    {
      return branch(4, 5, 6);
    }
The key to implement branch is 'struct sljit_jump' and 'struct sljit_label', the 'jump' contain a jump instruction, it does not know where to jump unless you set a label to it, the 'label' is a code address just like label in ASM language.

sljit_emit_cmp/sljit_emit_jump generate a conditional/unconditional jump, take the statement
    ret_c = sljit_emit_cmp(C, SLJIT_EQUAL, SLJIT_R0, 0, SLJIT_IMM, 0);
For example, it create a jump instruction, the condition is R0 equals 0, and the position of jumping will assign later with the sljit_set_label statement.

In this example, it creates a branch like this:
      R0 = a & 1;
      if R0 == 0 then goto ret_c;
      R0 = b;
      goto out;
    ret_c:
      R0 = c;
    out:
      return R0;

This is how high-level-language compiler handle branch.

Loop

Loop example is similar with Branch.
    /* This example, we generate a function like this:

    sljit_sw func(sljit_sw a, sljit_sw b)
    {
      sljit_sw i;
      sljit_sw ret = 0;
      for (i = 0; i < a; ++i) {
        ret += b;
      }
      return ret;
    }
    */

      /* 2 arg, 2 temp reg, 2 saved reg */
      sljit_emit_enter(C, 0, SLJIT_ARGS2(W, W, W), 2, 2, 0, 0, 0);

      /* R0 = 0 */
      sljit_emit_op2(C, SLJIT_XOR, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_R1, 0);
      /* RET = 0 */
      sljit_emit_op1(C, SLJIT_MOV, SLJIT_RETURN_REG, 0, SLJIT_IMM, 0);
      /* loopstart: */
      loopstart = sljit_emit_label(C);
      /* R1 >= a --> jump out */
      out = sljit_emit_cmp(C, SLJIT_GREATER_EQUAL, SLJIT_R1, 0, SLJIT_S0, 0);
      /* RET += b */
      sljit_emit_op2(C, SLJIT_ADD, SLJIT_RETURN_REG, 0, SLJIT_RETURN_REG, 0, SLJIT_S1, 0);
      /* R1 += 1 */
      sljit_emit_op2(C, SLJIT_ADD, SLJIT_R1, 0, SLJIT_R1, 0, SLJIT_IMM, 1);
      /* jump loopstart */
      sljit_set_label(sljit_emit_jump(C, SLJIT_JUMP), loopstart);
      /* out: */
      sljit_set_label(out, sljit_emit_label(C));

      /* return RET */
      sljit_emit_return(C, SLJIT_MOV, SLJIT_RETURN_REG, 0);
After this example, you are ready to construct any program that contain complex branch and loop.

Here is an interesting fact, 'xor reg, reg' is better than 'mov reg, 0', it save 2 bytes in X86 machine.

I will give only the key code in the rest of this tutorial, the full source of each chapter can be found in the attachment.

Call external function

It's easy to call an external function in SLJIT, we use sljit_emit_icall with SLJIT_CALL operation to do so.

SLJIT_CALL is use to call a function with N arguments, the number of arguments and the return type are defined in the third parameter from sljit_emit_icall just like it is done for SLJIT defined dunctions.
the arguments for the callee function are passed from SLJIT_R0, R1 and R2. Keep in mind to maintain those 'temp registers'.

Assume that we have an external function:
    sljit_sw print_num(sljit_sw a);
JIT code to call print_num(S1):
    /* R0 = S1; */
    sljit_emit_op1(C, SLJIT_MOV, SLJIT_R0, 0, SLJIT_S1, 0);
    /* print_num(R0) */
    sljit_emit_icall(C, SLJIT_CALL, SLJIT_ARGS1(W, W), SLJIT_IMM, SLJIT_FUNC_ADDR(print_num));

This code call a imm-data(address of print_num), which is linked properly when the program loaded. There no problem in 1-time compile and execute, but when you planning to save to file and load/execute next time, that address may not correct as you expect, in some platform that support PIC, the address of print_num may relocate to another address in run-time. Check this out: PIC

Structure access

SLJIT use SLJIT_MEM1 to implement [Reg + offset] memory access.
    struct point_st {
      sljit_sw x;
      int y;
      short z;
      char d;
    };

    sljit_emit_op1(C, SLJIT_MOV_S32, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_S0),
      SLJIT_OFFSETOF(struct point_st, y));
In this case, SLJIT_S0 is the address of the point_st structure, offset of member 'y' is determined in compile time, the important MOV operation always comes with a 'signed/size' postfix, like this one _S32 means 'signed 32bits integer', the postfix list:
    U8 = unsigned byte (8 bit)
    S8 = signed byte (8 bit)
    U16 = unsigned half (16 bit)
    S16 = signed half (16 bit)
    U32 = unsigned int (32 bit)
    S32 = signed int (32 bit)
    P = pointer (sljit_p) size

Array accessing

SLJIT use SLJIT_MEM2 to access arrays, like this:
    sljit_emit_op1(C, SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM2(SLJIT_S0, SLJIT_S2),
      SLJIT_WORD_SHIFT);
This statement generates a code like this:
    WORD S0[];
    R0 = S0[S2]

The array S0 is declared to be WORD (using SLJIT_WORD_SHIFT), which will be sizeof(sljit_sw) in length. SLJIT use a 'shift' for length representation: (0 for single byte, 1 for 2 bytes, 2 for 4 bytes, 3 for 8bytes).

The file array_access.c demonstrate a array-print example, should be easy to understand.

Local variables

SLJIT provide SLJIT_MEM1(SLJIT_SP) to access the reserved space in sljit_emit_enter's last parameter.
In this example we have to pass the address to print_arr, local variable is the only choice.
    /* reserved space in stack for sljit_sw arr[3] */
    sljit_emit_enter(C, 0, SLJIT_ARGS3(W, W, W, W), 2, 3, 0, 0, 3 * sizeof(sljit_sw));
    /* opt arg R S FR FS local_size */

    /* arr[0] = S0, SLJIT_SP is the init address of local var */
    sljit_emit_op1(C, SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), 0, SLJIT_S0, 0);
    /* arr[1] = S1 */
    sljit_emit_op1(C, SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), 1 * sizeof(sljit_sw), SLJIT_S1, 0);
    /* arr[2] = S2 */
    sljit_emit_op1(C, SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), 2 * sizeof(sljit_sw), SLJIT_S2, 0);

    /* R0 = arr; in fact SLJIT_SP is the address of arr, but can't do so in SLJIT */
    sljit_get_local_base(C, SLJIT_R0, 0, 0); /* get the address of local variables */
    sljit_emit_op1(C, SLJIT_MOV, SLJIT_R1, 0, SLJIT_IMM, 3); /* R1 = 3; */
    sljit_emit_icall(C, SLJIT_CALL, SLJIT_ARGS2(W, P, W), SLJIT_IMM, SLJIT_FUNC_ADDR(print_arr));
    sljit_emit_return(C, SLJIT_MOV, SLJIT_R0, 0);

SLJIT_SP can only be used in SLJIT_MEM1(SLJIT_SP). In this case, SP is the address of 'arr', but we cannot assign it to Reg using SLJIT_MOV opr, instead, we use sljit_get_local_base, which load the address and offset of local variable to the target.

Brainfuck compiler

Ok, the basic usage of SLJIT ends here, with more detail, I suggest reading sljitLir.h directly, having fun hacking the wonder of SLJIT!

The brainfuck machine introduction can be found here: Brainfuck

Extra

1. Dump_code function
SLJIT didn't provide disassemble functional, this is a simple function to do this(X86 only)

    static void dump_code(void *code, sljit_uw len)
    {
      FILE *fp = fopen("/tmp/slj_dump", "wb");
      if (!fp)
        return;
      fwrite(code, len, 1, fp);
      fclose(fp);
    #if defined(SLJIT_CONFIG_X86_64)
      system("objdump -b binary -m l1om -D /tmp/slj_dump");
    #elif defined(SLJIT_CONFIG_X86_32)
      system("objdump -b binary -m i386 -D /tmp/slj_dump");
    #endif
    }
The branch example disassembling:

0000000000000000 <.data>:
    0:53push %rbx
    1:41 57push %r15
    3:41 56push %r14
    5:48 8b dfmov %rdi,%rbx
    8:4c 8b femov %rsi,%r15
    b:4c 8b f2mov %rdx,%r14
    e:48 83 ec 10sub $0x10,%rsp
    12:48 89 d8mov %rbx,%rax
    15:48 83 e0 01and $0x1,%rax
    19:48 83 f8 00cmp $0x0,%rax
    1d:74 05je 0x24
    1f:4c 89 f8mov %r15,%rax
    22:eb 03jmp 0x27
    24:4c 89 f0mov %r14,%rax
    27:48 83 c4 10add $0x10,%rsp
    2b:41 5epop %r14
    2d:41 5fpop %r15
    2f:5bpop %rbx
    30:c3retq

with GCC -O2
0000000000000000 <func>:
    0:48 89 d0mov %rdx,%rax
    3:83 e7 01and $0x1,%edi
    6:48 0f 45 c6cmovne %rsi,%rax
    a:c3retq

Err... Ok, the optimization here may be weak, or, optimization there is crazy... :-)
Originally by wenxichang#163.com, 2015.5.10