The following text is a brief overview of those key principles which are useful to know when generating code with SLJIT. Further details can be found in sljitLir.h. ---------------------------------------------------------------- What is SLJIT? ---------------------------------------------------------------- SLJIT is a platform independent assembler which - provides access to common CPU features - can be easily ported to wide-spread CPU architectures (e.g. x86, ARM, POWER, MIPS, s390x, LoongArch) The key challenge of this project is finding a common subset of CPU features which - covers traditional assembly level programming - can be translated to machine code efficiently This aim is achieved by selecting those instructions / CPU features which are either available on all platforms or simulating them has a low performance overhead. For example, some SLJIT instructions support base register pre-update when [base+offs] memory accessing mode is used. Although this feature is only available on ARM and POWER CPUs, the simulation overhead is low on other CPUs. ---------------------------------------------------------------- The generic CPU model of SLJIT ---------------------------------------------------------------- The CPU has - integer registers, which can store either an int32_t (4 byte) or intptr_t (4 or 8 byte) value - floating point registers, which can store either a single (4 byte) or double (8 byte) precision value - boolean status flags *** Integer registers: The most important rule is: when a source operand of an instruction is a register, the data type of the register must match the data type expected by an instruction. For example, the following code snippet is a valid instruction sequence: sljit_emit_op1(compiler, SLJIT_MOV32, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R1), 0); // An int32_t value is loaded into SLJIT_R0 sljit_emit_op1(compiler, SLJIT_REV32, SLJIT_R0, 0, SLJIT_R0, 0); // the int32_t value in SLJIT_R0 is byte swapped // and the type of the result is still int32_t The next code snippet is not allowed: sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R1), 0); // An intptr_t value is loaded into SLJIT_R0 sljit_emit_op1(compiler, SLJIT_REV32, SLJIT_R0, 0, SLJIT_R0, 0); // The result of the instruction is undefined. // Even crash is possible for some instructions // (e.g. on MIPS-64). However, it is always allowed to overwrite a register regardless of its previous value: sljit_emit_op1(compiler, SLJIT_MOV, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R1), 0); // An intptr_t value is loaded into SLJIT_R0 sljit_emit_op1(compiler, SLJIT_MOV32, SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R2), 0); // From now on SLJIT_R0 contains an int32_t // value. The previous value is discarded. Type conversion instructions are provided to convert an int32_t value to an intptr_t value and vice versa. In certain architectures these conversions are nops (no instructions are emitted). Memory accessing: Registers arguments of SLJIT_MEM1 / SLJIT_MEM2 addressing modes must contain intptr_t data. Signed / unsigned values: Most operations are executed in the same way regardless the value is signed or unsigned. These operations have only one instruction form (e.g. SLJIT_ADD / SLJIT_MUL). Instructions where the result depends on the sign have two forms (e.g. integer division, long multiply). *** Floating point registers Floating point registers can either contain a single or double precision value. Similar to integer registers, the data type of the value stored in a source register must match the data type expected by the instruction. Otherwise the result is undefined (even crash is possible). Rounding: Similar to standard C, floating point computation results are rounded toward zero. *** Boolean status flags: Conditional branches usually depend on the value of CPU status flags. These status flags are boolean values and can be set by certain instructions. To achive maximum efficiency and portability, the following rules were introduced: - Most instructions can freely modify these status flags except if SLJIT_KEEP_FLAGS is passed. - The SLJIT_KEEP_FLAGS option may have a performance overhead, so it should only be used when necessary. - The SLJIT_SET_E, SLJIT_SET_U, etc. options can force an instruction to correctly set the specified status flags. However, all other status flags are undefined. This rule must always be kept in mind! - Status flags cannot be controlled directly (there are no set/clear/invert operations) The last two rules allows efficient mapping of status flags. For example the arithmetic and multiply overflow flag is mapped to the same overflow flag bit on x86. This is allowed, since no instruction can set both of these flags. When either of them is set by an instruction, the other can have any value (this satisfies the "all other flags are undefined" rule). Therefore mapping two SLJIT flags to the same CPU flag is possible. Even though SLJIT supports a dozen status flags, they can be efficiently mapped to CPUs with only 4 status flags (e.g. ARM or SPARC). ---------------------------------------------------------------- Complex instructions ---------------------------------------------------------------- We noticed, that introducing complex instructions for common tasks can improve performance. For example, compare and branch instruction sequences can be optimized if certain conditions apply, but these conditions depend on the target CPU. SLJIT can do these optimizations, but it needs to understand the "purpose" of the generated code. Static instruction analysis has a large performance overhead however, so we choose another approach: we introduced complex instruction forms for certain non-atomic tasks. SLJIT can optimize these "instructions" more efficiently since the "purpose" is known to the compiler. These complex instruction forms can often be assembled from other SLJIT instructions, but we recommended to use them since the compiler can optimize them on certain CPUs. ---------------------------------------------------------------- Generating functions ---------------------------------------------------------------- SLJIT is often used for generating function bodies which are called from C. SLJIT provides two complex instructions for generating function entry and return: sljit_emit_enter and sljit_emit_return. The sljit_emit_enter also initializes the "compiling context" which specify the current register mapping, local space size, etc. configurations. The sljit_set_context can also set this context without emitting any machine instructions. This context is important since it affects the compiler, so the first instruction after a compiler is created must be either sljit_emit_enter or sljit_set_context. The context can be changed by calling sljit_emit_enter or sljit_set_context again. ---------------------------------------------------------------- All-in-one building ---------------------------------------------------------------- Instead of using a separate library, the whole SLJIT compiler infrastructure can be directly included: #define SLJIT_CONFIG_STATIC 1 #include "sljitLir.c" This approach is useful for single file compilers. Advantages: - Everything provided by SLJIT is available (no need to include anything else). - Configuring SLJIT is easy (e.g. redefining SLJIT_MALLOC / SLJIT_FREE). - The SLJIT compiler API is hidden from the world which improves securtity. - The C compiler can optimize the SLJIT code generator (e.g. removing unused functions). ---------------------------------------------------------------- Types and macros ---------------------------------------------------------------- The sljitConfig.h contains those defines, which controls the compiler. The beginning of sljitConfigInternal.h lists architecture specific types and macros provided by SLJIT. Some of these macros: SLJIT_DEBUG : enabled by default Enables assertions. Should be disabled in release mode. SLJIT_VERBOSE : enabled by default When this macro is enabled, the sljit_compiler_verbose function can be used to dump SLJIT instructions. Otherwise this function is not available. Should be disabled in release mode. SLJIT_SINGLE_THREADED : disabled by default Single threaded programs can define this flag which eliminates the pthread dependency. sljit_sw, sljit_uw, etc. : It is recommended to use these types instead of long, intptr_t, etc. Improves readability / portability of the code.