# C to MIR compiler * Implementation of a small C11 (2011 ANSI C standard) to MIR compiler * no optional standard features: variable size arrays, complex, atomic * support of the following C extensions: * `\e` escape sequence * binary numbers starting with `0b` or `0B` prefix * macro `__has_include` * empty structure, unions, and initializer list * range cases `case ...` * zero size arrays * statement expressions * Minimal compiler code dependency. No additional tools (like yacc/flex) are used * Simplicity of implementation over speed to make code easy to learn and maintain * Four passes to divide compilation on manageable sub-tasks: 1. Preprocessor pass generating tokens 2. Parsing pass generating AST (Abstract Syntax Tree). To be close ANSI standard grammar as soon as possible, [**PEG**](https://en.wikipedia.org/wiki/Parsing_expression_grammar) manual parser is used 3. Context pass checking context constraints and augmenting AST 4. Generation pass producing MIR ![C to MIR](c2mir.svg) C to MIR compiler can be used as a library to make it as a part of your code. The compiler can be used as a separate program as usual C compiler. To recognize compilation by C-to-MIR compiler, compiler specific macros `__mirc__` and `__MIRC__` defined as 1 can be used. An additional information about C-to-MIR compiler can be found in [this blog post](https://developers.redhat.com/blog/2021/04/27/the-mir-c-interpreter-and-just-in-time-jit-compiler) ## C to MIR compiler as usual C compiler The project makefile builds program `c2m` which can compile C and MIR files given on the command line and produce MIR code or execute it: * The compiler `c2m` has options `-E`, `-c`, `-S`, and `-o` as other C compilers: * `-E` stops the compiler after preprocessing and output the preprocessed file into standard output or into file given after option `-o` * `-S` stops the compiler after generation of MIR code and outputs MIR *textual* representations of C source files and binary MIR files with suffix `.bmir` * `-c` also stops the compiler after generation of MIR code and outputs MIR *binary* representation of C source files and textual MIR files with suffix `.mir` * Output files for options `-S` and `-c` are created in the current directory named as the source files by using suffix correspondingly `.mir` and `.bmir` * If you have one source file, you also can use option `-o` to setup the output file * You can give C source on the command line by using option `-s` and subsequent string which will be C source * You can read C source from the standard input by using option `-i` * If options `-E`, `-c`, or `-S` are not given, all generated MIR code is linked and checked that there is function `main`. The whole generated code is output as binary MIR file `a.bmir` or as file given by option `-o` * Instead of output of the linked file, you can execute the program by using options `-ei`, `-eg`, or `-el`: * `-ei` means execution the code by MIR interpreter * `-eg` means execution machine code generated by MIR-generator. MIR-generator processing all MIR code first before the interpreter * `-el` means lazy code generation. It is analogous to `-eg` but function code is generated on the first call of the function. So machine code will be never generated for functions never used * Command line arguments after option `-ei`, `-eg`, or `-el` are not processed by C to MIR compiler. Such arguments are passed to generated and executed MIR program * The executed program can use functions from libraries `libc` and `libm`. They are always available * Option `-lxxx` makes library `libxxx` available for the program execution * Option `-Lxxx` adds library directory `xxx` to search libraries given by options `-lxxx`. The search starts with the standard library directory and continues in directories given by preceding `-L` options in their order on the command line * To generate stand-alone executable see utility `b2ctab` description in directory `mir-utils` * Options `-D` and `-U` are analogous to ones used in other C compilers for macro manipulations on the command line * Option `-I` to add include directory is analogous to other C-compilers * Option `-fpreprocessed` means skipping preprocessor for C files * Option `-fsyntax-only` means stopping after parsing and semantic checking of C files without MIR code generation * Option `-w` means switching off reporting all warnings * Option `-pedantic` is used for stricter diagnostic about C standard conformance. It might be useful as C2MIR implements some GCC extensions of C * Option `-O` is used to set up MIR-generator optimization level. The optimization levels are described in documentation for MIR generator API function `MIR_gen_set_optimize_level` * Option `-dg` is used for debuging MIR-generator. It results in dumping debug information about MIR-generator work to `stderr` * Besides C files, MIR textual files with suffix `.mir` and MIR binary files with suffix `.bmir` can be given on the command line. In this case these MIR files are read and added to generated MIR code * Simple examples of the compiler usage and execution of C program: ``` c2m -c part1.c && c2m -S part2.c && c2m part1.bmir part2.mir -eg # variant 1 c2m part1.c part2.c && c2m a.bmir -eg # variant 2 c2m part1.c part2.c -eg # variant 3 ``` ## C to MIR compiler as a library The compiler can be used as a library and can be made a part of your program. It can take C code from a file or memory. The all compiler code is contained in file `c2mir.c`. Its interface is described in file `c2mir.h`: * Function `c2mir_init (MIR_context ctx)` initializes the compiler to generate MIR code in context `ctx` * Function `c2mir_finish (MIR_context ctx)` finishes the compiler to work in context `ctx`. It frees some common memory used by the compiler worked in context `ctx` * Function `c2mir_compile (MIR_context_t ctx, struct c2mir_options *ops, int (*getc_func) (void *), void *getc_data, const char *source_name, FILE *output_file)` compiles one C code file. Function returns true (non-zero) in case of successful compilation. It frees all memory used to compile the file. So you can compile a lot of files in the same context without program memory growth. Function `getc_func` provides access to the compiled C code which can be in a file or memory. The function will get `getc_data` every its call as its argument. Name of the source file used for diagnostic is given by parameter `source_name`. Parameter `output_file` is analogous to one given by option `-o` of `c2m`. Parameter ops is a pointer to a structure defining the compiler options: * Member `message_file` defines where to report errors and warnings. If its value is NULL, there will be no any output * Members `macro_commands_num` and `macro_commands` direct compiler as options `-D` and `-U` of `c2m` * Members `include_dirs_num` and `include_dirs` direct compiler as options `-I` * Members `debug_p`, `verbose_p`, `ignore_warnings_p`, `no_prepro_p`, `prepro_only_p`, `syntax_only_p`, `pedantic_p`, `asm_p`, and `object_p` direct the compiler as options `-d`, `-v`, `-w`, `-fpreprocessed`, `-E`, `-fsyntax-only`, `-pedantic`, `-S`, and `-c` of `c2m`. If all values of `prepro_only_p`, `syntax_only_p`, `asm_p`, and `object_p are zero, there will be no output files, only the generated MIR module will be kept in memory of the context `ctx` * Member `module_num` defines index in the generated MIR module name (if there is any)