# Introduction Porting MIR to another target requires implementing a lot of machine-dependent code. On my estimation this code is about 3K C lines for target and writing it can take 1 month of work for experienced person. This document outlines what should be done for porting MIR. First I recommend to port MIR-interpreter and `c2mir` using it. You need to write file `mir-.c` for this. Then you can port MIR-generator by writing file `mir-gen-.c` file. For more details you can examine existing files `mir-x86_64.c`, `mir-aarhc64.c`, `mir-gen-x86_64.c`, and `mir-gen-aarhc64.c` ## Common machine dependent functions (file `mir-`) These function can be used to work with MIR-interpreter and MIR-generator. These functions should be placed in file `mir-.c` like `mir-x86_64.c` or `mir-aarch64.c`. You can use MIR internal functions `_MIR_publish_code`, `_MIR_get_new_code_addr`, `_MIR_publish_code_by_addr`, `_MIR_change_code`, `_MIR_update_code`, and `_MIR_update_code_arr` to work with executable code. Here is the list of the common function you should provide: * `void *_MIR_get_thunk (MIR_context_t ctx)` generates small code (thunk) and returns address to the code. The code redirects control flow to address stored in it. The address is not defined by the function but can be set by the following function * `void _MIR_redirect_thunk (MIR_context_t ctx, void *thunk, void *to)` sets up address in the thunk given as the 2nd argument to value given by the 3rd argument * `void *_MIR_get_wrapper (MIR_context_t ctx, MIR_item_t called_func, void *hook_address)` generates a function (`fun1`) code and returns its address. The code calls a function given by `hook_address` passing it `ctx` and `called_func` and getting an address of another function (`fun2`) which is called at end. The function `fun2` gets arguments which are passed to the generated function `fun1` * Function `_MIR_get_wrapper` is used for lazy code generation but can be used for other purposes in the future. For lazy code generation, `hook_address` is an address of function which generates machine code for MIR function `called_func`, redirects MIR function thunk to the generated machine code, and returns this code address ## Interpreter machine dependent functions (file `mir-.c`) These function are used to implement MIR interpreter. These functions should be also placed in file `mir-.c` and you can also use the MIR internal function mentioned above for their implementation * `void *_MIR_get_bstart_builtin (MIR_context_t ctx)` generates a function and returns address to the generated function. The generated function has no arguments and returns SP right before the function call. The generated function is used to implement `MIR_BSTART` insn in the interpreter * `void *_MIR_get_bend_builtin (MIR_context_t ctx)` generates a function and returns address to the generated function. The generated function has one argument and sets up SP from the function argument. The generated function is used to implement `MIR_BEND` insn in the interpreter * `void *va_arg_builtin (void *p, uint64_t t)` returns address of the next argument of MIR type `t`, where `p` is `va_list` * `void va_block_arg_builtin (void *res, void *p, size_t s, uint64_t ncase)` set up memory block with address `res` by the next block argument of size `s` and `ncase` (e.g. `MIR_T_BLK` has case 0, `MIR_T_BLK + 1` has case 1, and `MIR_T_BLK + MIR_BLK_NUM -1` has case `MIR_BLK_NUM - 1`), where `p` is `va_list` * `void *_MIR_get_interp_shim (MIR_context_t ctx, MIR_item_t func_item, void *handler)` generates and returns a function behaving as usual C function which calls function `void handler (MIR_context_t ctx, MIR_item_t func_item, va_list va, MIR_val_t *results)` where `va` can be used to access to arguments of the generated function call and `results` will contain what the generated function returns. MIR design assumes that call of external C function and MIR function can not be recognized and also that we should easily switch from generated code from MIR to the MIR code interpretation. It means that we need to have the same interface for MIR function interpretation as usual C function. The generated function provides such interface * `void *_MIR_get_ff_call (MIR_context_t ctx, size_t nres, MIR_type_t *res_types, size_t nargs, MIR_type_t *arg_types)` generates function `void fun1 (void *func_addr, MIR_val_t *res_arg_addresses)` and returns its address. The generated function `fun1` calls `func_addr` as usual C function which is returning `nres` results given as the first elements of `res_arg_addresses` and is taking `nargs` arguments as the subsequent elements of `res_arg_addresses`. Result and argument types are described correspondingly by vectors `res_types` and `arg_types` ## C2MIR machine dependent code After writing `mir-.c` file and making MIR interpreter work (use make `interp-test` to check this), you can make `c2m` working with the interpreter. You need to create a lot files but they are mostly copies of already existing ones. * First create directory `c2mir/` and files `c.h`, `c-code.c`, `c-ABI-code.c`, and `mirc__linux.h` in this directory. The simplest way to do this is to copy an existing directory `x86_64` or `aarch64`, rename files in the new directory, and modify them * file `c.h` defines types and macros used by C2MIR compiler. In most cases of 64-bit target, you don't need to change anything * file `c-code.c` defines platform depended constants (e.g. standard include directories) and functions (concerning data alignments) used by C2MIR compiler. You just need to rename definitions containing `` in their names * file `c-ABI-code.c` defines platform depended data and functions to generate ABI-compliant calls. You can use some functions of simple ABI in file `c2mir.c` or write code to generate ABI compliant calls. See comments for functions `target_init_arg_vars`, `target_return_by_addr_p`, `target_add_res_proto`, `target_add_call_res_op`, `target_gen_post_call_res_code`, `target_add_ret_ops`, `target_get_blk_type`, `target_add_arg_proto`, `target_add_call_arg_op`, and `target_gen_gather_arg` * file `mirc__linux.h` contains predefined macros of C2MIR compiler. You should rename some of them. To find what macros to rename, you can use `cpp -dM < /dev/null` on platform to which you port MIR * Second include files `c.h` and `c-code.c` into file `c2mir.c` and add target standard include and library directories in files `c2mir.c` and `c2mir-driver.c` For debugging the MIR interpreter, you can switch on MIR insn execution tracing by adding `-DMIR_INTERP_TRACE` for compilation `mir.c` file. C programs compiled by C2MIR compiler need some compiler specific standard files. They are stored in `c2m` program itself to make `c2m` movable. The stringified versions of the files are placed in directory `c2mir/` and have names `mirc__float.h`, `mirc__limits.h`, `mirc__stdarg.h`, `mirc__stddef.h`, and `mirc__stdint.h`. The easiest way to port the stringified standard include files is to copy them from existing target dependent directory (e.g. aarch64) modify files into the new directory. Again in most cases of 64-bit target, you don't need to change anything probably except macros related long double, char signedness, wchar, and `va_list` definitions. To run C tests for C2MIR with the MIR interpreter you can use `make c2mir-interp-test`. ## Machine-dependent code for MIR-generator (file `mir-gen-.c`) The last step for porting MIR is to make the MIR generator generating code for your target. You need to provide the following machine-dependent functions and definitions used by the MIR-generator: * `void target_init (MIR_context_t ctx)` and `void target_finish (MIR_context_t ctx)` can be used to initialize/finalize data internally for this file which can be used during all MIR generator work in given context * Register Description * it is a good practice to describe all hard regs you would like to use in MIR-code during MIR-generator as C enumerator. The enumerator should contains hard regs used for register allocations, stack access, parameter passing etc. All hard registers added in machinize should be here. The MIR generator does not use the enumerator constants but you can use them in `mir-gen-target.c` file for you convenience. The MIR generator only refers for the following target hard registers descriptions: * `MAX_HARD_REG` is maximal number of the described hard registers * `SP_HARD_REG` and `HARD_REG_FRAME_POINTER` are hard register numbers used as stack and frame pointer according target ABI * `TEMP_INT_HARD_REG1`, `TEMP_INT_HARD_REG2`, `TEMP_FLOAT_HARD_REG1`, `TEMP_FLOAT_HARD_REG2`, `TEMP_DOUBLE_HARD_REG1`, `TEMP_DOUBLE_HARD_REG2`, `TEMP_LDOUBLE_HARD_REG1`, and `TEMP_DOUBLE_HARD_REG2` are numbers of hard regs not used in machinized code and for register allocation. It is better use callee-cloberred hard regs. These regs occurs in MIR code only after RA. The corresponding `_REG1` and `_REG2` with the same prefix should be distinctive but hard regs with different prefixes can be the same * `int target_hard_reg_type_ok_p (MIR_reg_t hard_reg, MIR_type_t type)` should return true only if HARD_REG can contain value of `type` * `int target_fixed_hard_reg_p (MIR_reg_t hard_reg)` should return true you want that `hard_reg` is not used for register allocation. For stack and frame pointers, the temporary hard registers (ones with prefix `TEMP_` above) the function should always return true * `int target_call_used_hard_reg_p (MIR_reg_t hard_reg)` returns true if `hard_reg` can be cloberred by callee according target ABI * `int target_locs_num (MIR_reg_t loc, MIR_type_t type)` returns number of standard 64-bit stack slots should be used for value of `type`. Usually it is just `1` but it can be `2` for long double type * Array `const MIR_insn_code_t target_io_dup_op_insn_codes[]` contains codes of insns requiring to have output and one input operand to be the same on this target. `MIR_INSN_BOUND` is used as the end marker * `MIR_disp_t target_get_stack_slot_offset (MIR_context_t ctx, MIR_type_t type, MIR_reg_t slot)` returns offset of the slot relative the stack frame pointer. MIR registers which did not get a hard register will keep their values in 64-bit stack slots. The slot numbers start with 0. The transformation of slot number to the offset depends on your own or ABI stack frame layout. You should be not aware of stack slot alignment. It is MIR generator responsibility * `void target_machinize (MIR_context_t ctx)` transforms MIR code for the target. Usually it includes call transformations according to ABI by adding MIR insns which put call arguments into hard registers or on the stack and setting the called function result from the return register, setting up return register(s) for the current functions. The function can also * change some MIR insns into another sequence of MIR insns which are better for the target. It may also include peephole optimization * change some MIR insns into built-in function calls if a MIR insn requires a lot of target insns to implement it * change MIR insn operands on some hard register if the corresponding target insn works only with this hard register. In this case you need generate move insn to copy the original operand into the hard register * implements block data argument passing in a way to implement target call ABI for C aggregates Adding and deleting MIR insns should be done with MIR-generator functions `gen_add_insn_before`, `gen_add_insn_after`, and `gen_delete_insn`. * `void target_make_prolog_epilog (MIR_context_t ctx, bitmap_t used_hard_regs, size_t stack_slots_num)` adds MIR insns for prologue and epilogue of the generated function. This usually includes saving/restoring used callee-saved hard registers, stack space allocation, setting up set frame pointer register * `int target_insn_ok_p (MIR_context_t ctx, MIR_insn_t insn)`. MIR generator works mostly on simplified code. For any simplified (or machinized) insn the function should return TRUE. In brief simplified insns are * all insns with registers * register moves with immediate * moves which are loads or stores with indirect reg addressing MIR generator tries to combine several data dependent insns into one. Return also TRUE if you provide translation of the combined insn. The more combined insns you can translate, the better generated code will be * `uint8_t *target_translate (MIR_context_t ctx, size_t *len)` generates and returns address of machine code for the current MIR function. The function should generate machine insn(s) for any MIR insn for which `target_insn_ok_p` return TRUE. The function returns code length in bytes through parameter `len`. * `void target_rebase (MIR_context_t ctx, uint8_t *base)` is called by MIR generator after `target_translate` call. Sometimes after `target_translate` call we want to modify the output machine code for start code address `base` known only at this point. This function can be used for this. Usually it works on data collected by during `target_translate` execution It is better to start with small generator tests by using `make gen-test`. After the successful test code generation you can continue with bigger test set generated by `c2m`. Please use `make c2mir-gen-test` to run this test set. And finally you can run `make c2mir-bootstrap-test` which uses MIR-generator on a pretty big test (C-to-MIR compiler).