Using LLVM for code generation: Part 2

So, time to write code to write code. First some #includes and global declarations.

# include <llvm/IR/Module.h>
# include <llvm/IR/IRBuilder.h>
# include <llvm/IR/Verifier.h>
# include <llvm/Support/TargetSelect.h>
# include <llvm/ExecutionEngine/ExecutionEngine.h>

struct App {
    llvm::LLVMContext  context;
    llvm::IRBuilder<>  builder;
    llvm::Module      *module;

    struct Types {
        llvm::Type *d;
        llvm::Type *i32;
    } types;

    App() : builder(context) {
        llvm::InitializeNativeTarget();

        types.d   = llvm::Type::getDoubleTy(context);
        types.i32 = llvm::Type::getInt32Ty (context);

        module = new llvm::Module("generate-function-mac", context);
        module->setDataLayout  (llvm::EngineBuilder().selectTarget()->createDataLayout());
        module->setTargetTriple(llvm::EngineBuilder().selectTarget()->getTargetTriple().getTriple());
    }
} *app;

Let’s look at each step in turn.

# include <llvm/IR/Module.h>
# include <llvm/IR/IRBuilder.h>
# include <llvm/IR/Verifier.h>
# include <llvm/Support/TargetSelect.h>
# include <llvm/ExecutionEngine/ExecutionEngine.h>

The doxygen documentation often only says the file name of the header for a call, without the path part, making it necessary to search the LLVM installation for the location of the include file. I recommend using the style above, with e.g. llvm/IR/ included, instead of saying and relying on a -I switch to the compiler. There are 30+ header file names that oocur two or mode times. The name Module.h occurs three times.

struct App {}

For covenience, all global data is put into a struct App, that will be instantated as at the start of main().

llvm::LLVMContext  context;
    llvm::IRBuilder<>  builder;
    llvm::Module      *module;

The context is something we “just need”. The builder is the object that we use to create the LLVM IR code. The module is where functions and global variables live. We can think of it as an in-memory object file for LLVM IR.

    struct Types {
        llvm::Type *d;
        llvm::Type *i32;
    } types;

Data types are represented as Type objects. Type objects are used when we want to define a local or global variable, or function prototype. The types struct is just a cache for two of the most used types, 32 bit signed integers, ad double precision floats. (The same Type object can be used over and over again; no need to create a new Type object for each use)

    App() : builder(context) {
        llvm::InitializeNativeTarget();
    }

The App constructor starts by creating the IRBuilder object, using the context object. The the LLVMContext constructor takes no arguments, so we need not menion it here. Then we initialize the library that knows about the “native” target, i.e. x86 if you are running the program on an x86 machine. LLVM defaults to generating code for the architecture that the program is running on, but can also generate for other architectures. Note that LLVM can be quite unforgiving; if we forget to initialize a library, we mey get a crash, not an error message.

        types.d   = llvm::Type::getDoubleTy(context);
        types.i32 = llvm::Type::getInt32Ty (context);

Cache pointers to commonly used types.

        module = new llvm::Module("generate-function-mac", context);
        module->setDataLayout  (llvm::EngineBuilder().selectTarget()->createDataLayout());

Create the module, given the context. Then tell it about various parameters for the target architecture, like e.g. pointer and integer sizes. This will generate a line in the *.ll file:

        target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
        module->setTargetTriple(llvm::EngineBuilder().selectTarget()->getTargetTriple().getTriple());

The target triple is a string like e.g. “i686-pc-linux-gnu”. This will generate a line in the *.ll file:

        target triple = "i686-pc-linux-gnu";

End of constructor.

} *app;

The app pointer will get initialized first thing in main().

“All this has been generic stuff, usable in most applications. The next thing is the C++ function that generates the “mac” function. What we want to generate is the equivalent of

    double mac(double a, double b, double c)
    {
        return a+b*c;

In LLVM IR source code form, the code we want looks like this:

define double @mac(double %a, double %b, double %c) {
  entry:
    %prod = fmul double %b, %c
    %rval = fadd double %a, %prod
    ret double %rval

Here is the complete C++ function:

static llvm::Function *generate_mac()
{
    auto  types        = { app->types.d, app->types.d, app->types.d };

    auto *signature    = llvm::FunctionType::get(app->types.d, types, false);

    auto *func         = llvm::Function::Create(signature,
                                                llvm::Function::ExternalLinkage,
                                                "mac",
                                                app->module);

    auto            it = func->arg_begin();    //--- llvm::ilist_iterator<llvm::Argument>
    llvm::Argument *a  = &*it++;               //--- An Argument is also a Value
    llvm::Argument *b  = &*it++;
    llvm::Argument *c  = &*it++;

    a->setName("a");
    b->setName("b");
    c->setName("c");

    auto *bb = llvm::BasicBlock::Create(app->context, "entry", func);
    app->builder.SetInsertPoint(bb);

    llvm::Value *prod = app->builder.CreateFMul(b, c, "prod");
    llvm::Value *rval = app->builder.CreateFAdd(a, prod, "rval");
    app->builder.CreateRet(rval);

    if (verifyFunction(*func, &llvm::errs()))
        abort();

    return func;
}

Let’s walk through this function step by step:

static llvm::Function *generate_mac() {
    auto  types        = { app->types.d, app->types.d, app->types.d };
    auto *signature    = llvm::FunctionType::get(app->types.d, types, false);

First create an initializer_list with one Type per function argument. Then create the function prototype by specifying the return type, the argument types, and whether there are any varargs arguments (think: printf(char *, …)) at the end.

A note on style. I usually prefer to write types explicitly, instead of using auto, but I have succumbed here in order to make the lines a little shorter.

    auto *func         = llvm::Function::Create(signature,
                                                llvm::Function::ExternalLinkage,
                                                "mac",
                                                app->module);

“Create the function “mac” with the given signature, belonging to the given module, and visible to functions in other compilation units (non-static in C).”

    auto            it = func->arg_begin();    //--- llvm::ilist_iterator<llvm::Argument>
    llvm::Argument *a  = &*it++;               //--- An Argument is also a Value
    llvm::Argument *b  = &*it++;
    llvm::Argument *c  = &*it++;

    a->setName("a");
    b->setName("b");
    c->setName("c");

We need handles to the three arguments, so that we can refer to them. We get them by iterating over a container that hold Argument pointers. Then, to make the generated code more readable, we assign names to the variables. These names will be used when later LLVM prints a source code representation of the code that we are going to generate.

    auto *bb = llvm::BasicBlock::Create(app->context, "entry", func);
    app->builder.SetInsertPoint(bb);

A basic block is the smallest (innermost) container of instructions. It is a sequence of instructions where execution flow can enter only at the first instruction, and where only the last instruction may be a jump of some kind, including a return instruction. In compiler theory, a basic block need not contain a jump at the end, as long as there is another basic block that follows, but in LLVM, there must be a jump. The “entry” string is a label that can be used to jump to the basic block. It is not used in this example. The third argument is the owner of the basic block, in our case the function.

    llvm::Value *prod = app->builder.CreateFMul(b, c, "prod");
    llvm::Value *rval = app->builder.CreateFAdd(a, prod, "rval");
    app->builder.CreateRet(rval);

Finally, we get to generate instructions. The IRBuilder is the class that generates the instructions, and it knows that it should put them into our (only) basic block. This code is rather obvious, except that we get pointers to something called a Value. We might have expected to get something like Instruction. I think of a value as being both the value computed, and the instruction(s) that computed it.

    if (verifyFunction(*func, &llvm::errs()))
        abort();

This call checks the code for errors. It returns true on error (sigh). Error messages are printed on llvm::errs(), effectively on stderr. Never ever skip this.

Finally, main():

int main()
{
    ::app = new App;
    generate_mac();
    app->module->dump();
    return 0;
}

Now, compile and link this program to a binary called generate.elf. The .elf extension is something is use to distinguish executables from other files. It also simplifies Makefiles.

As an exercise left to the reader, write a main program to call mac(). Generate, compile, link and run like this:

$ ./generate.elf >& mac.ll
$ clang -o test-mac.elf main.c mac.ll
$ ./test-mac.elf
Result is 16.745

Complete source listing

# include <llvm/IR/Module.h>
# include <llvm/IR/IRBuilder.h>
# include <llvm/IR/Verifier.h>
# include <llvm/Support/TargetSelect.h>
# include <llvm/ExecutionEngine/ExecutionEngine.h>

struct App {
    llvm::LLVMContext  context;
    llvm::IRBuilder<>  builder;
    llvm::Module      *module;

    struct Types {
        llvm::Type *d;
        llvm::Type *i32;
    } types;

    App() : builder(context) {
        llvm::InitializeNativeTarget();

        types.d   = llvm::Type::getDoubleTy(context);
        types.i32 = llvm::Type::getInt32Ty (context);

        module = new llvm::Module("generate-function-mac", context);
        module->setDataLayout  (llvm::EngineBuilder().selectTarget()->createDataLayout());
        module->setTargetTriple(llvm::EngineBuilder().selectTarget()->getTargetTriple().getTriple());
    }
} *app;

static llvm::Function *generate_mac()
{
    auto  types        = { app->types.d, app->types.d, app->types.d };

    auto *signature    = llvm::FunctionType::get(app->types.d, types, false);

    auto *func         = llvm::Function::Create(signature,
                                                llvm::Function::ExternalLinkage,
                                                "mac",
                                                app->module);

    auto *bb           = llvm::BasicBlock::Create(app->context, "entry", func);

    auto            it = func->arg_begin();    //--- llvm::ilist_iterator<llvm::Argument>
    llvm::Argument *a  = &*it++;               //--- An Argument is also a Value
    llvm::Argument *b  = &*it++;
    llvm::Argument *c  = &*it++;

    a->setName("a");
    b->setName("b");
    c->setName("c");

    app->builder.SetInsertPoint(bb);
    llvm::Value *prod = app->builder.CreateFMul(b, c, "prod");
    llvm::Value *rval = app->builder.CreateFAdd(a, prod, "rval");
    app->builder.CreateRet(rval);

    if (verifyFunction(*func, &llvm::errs()))
        abort();

    return func;
}

typedef double (MAC)(double, double, double);

int main()
{
    ::app = new App;
    generate_mac();
    app->module->dump();
    return 0;
}

You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090

View source for the content of this page.