Tuesday , May 11 2021

banach-space / llvm-tutor, Hacker News


                    

        

Build Status

Example LLVM passes – based onLLVM 9

llvm-tutoris a collection of self-contained reference LLVM passes. It’s a tutorial that targets novice and aspiring LLVM developers. Key features:

  • Complete– includesCMakebuild scripts, LIT tests and CI set- UP
  • Out of source– builds against a binary LLVM installation (no need to build LLVM from sources)
  • Modern– based on the latest version of LLVM ( and updated with every release)

The source files contain comments that will guide you through the implementation and the LIT tests verify that each pass works as expected. This document explains how to get started.

Table of Contents

TheHelloWorldpass fromHelloWorld.cppis a self-containedreference example. The correspondingCMakeLists.txtimplements the minimum set-up for an out-of-source pass.

For each function in a module,HelloWordprints its name and the number of arguments that it takes. You can build it like this:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)mkdir buildCDbuild cmake -DLT_LLVM_INSTALL_DIR=$ LLVM_DIRsource / dir / llvm / tutor>/ HelloWorld / make

Before you can test it, you need to prepare an input file:

#Generate an llvm test file$ LLVM_DIR/ bin / clang -S -emit-llvmsource / dir / llvm / tutor />inputs / input_for_hello.c -o input_for_hello.ll

Finally, runHelloWorldwithopt:

#Run the pass on the llvm file$ LLVM_DIR/ bin / opt -load-pass-plugin libHelloWorld.dylib -hello-world -disable -output input_for_hello.ll#The expected outputVisiting: foo (takes 1 args) Visiting: bar (takes 2 args) Visiting: fez (takes 3 args) Visiting: main (takes 2 args)

TheHelloWorldpass does not modify the input module. The- disable-outputflag is used to preventoptfrom printing the output bitcode file.

Platform Support And Requirements

This project has been tested onLinux 18. 04andMac OS X 10. 4) . In order to buildllvm-tutoryou will need:

  • LLVM 9
  • C compiler that supports C 14
  • CMake 3.4.3 or higher

In order to run the passes, you will need:

  • clang-9(to generate input LLVM files)
  • opt(to run the passes)

There are additional requirements for tests (these will be satisfied by installing LLVM-9):

  • lit(akallvm-lit, LLVM tool for executing the tests)
  • FileCheck(LIT requirement, it’s used to check whether tests generate the expected output)

Installing LLVM-9 on Mac OS X

On Darwin you can install LLVM 9 withHomebrew:

This will install all the required header files, libraries and tools in/ usr / local / opt / llvm /.

Installing LLVM -9 on Ubuntu

On Ubuntu Bionic, you caninstall modern LLVM from the officialrepository:

wget -O - https: // apt.llvm.org/llvm-snapshot.gpg.key|sudo apt-key add - sudo apt-add-repository"deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-9.0 main"sudo apt-get update sudo apt-get install -y llvm-9 llvm-9-dev clang-9 llvm-9-tools

This will install all the required header files, libraries and tools in/ usr / lib / llvm-9 /.

(Building LLVM-9 From Sources)

Building from sources can be slow and tricky to debug. It is not necessary, but might be your preferred way of obtaining LLVM-9. The following steps will work on Linux and Mac OS X:

git clone https: // github. com / llvm / llvm-project.gitCDllvm-project git checkout release / 9.x mkdir buildCDbuild cmake -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=X 86llvm-project / root / dir>/ llvm / cmake --build.

For more details read theofficial documentation.

You can buildllvm-tutor(and all the provided passes) as follows:

(CD)build / dir>cmake -DLT_LLVM_INSTALL_DIR=installation / dir / of / llvm /  (9>)source / dir / llvm / tutor>make

TheLT_LLVM_INSTALL_DIRvariable should be set to the root of either the installation or build directory of LLVM 9. It is used to locate the correspondingLLVMConfig.cmakescript that is used to set the include and library paths.

In order to run the tests, you need to installllvm-lit(akalit). It’s not bundled with LLVM 9 packages, but you can install it withpip:

)

#Install lit - note that this installs lit globallypip install lit

Running the tests is as simple as:

Voilà! You should see all tests passing.

  • HelloWorld– prints the functions in the input module and prints the number of arguments for each
  • StaticCallCounter– counts direct function calls at compile time (only static calls, pure analysis pass)
  • DynamicCallCounter– counts direct function calls at run-time (analysis instrumentation pass)
  • MBASub– code transformation for integersubinstructions (transformation pass, parametrisable)
  • MBAAdd– code transformation for 8-bit integeraddinstructions (transformation pass, parametrisable)
  • RIV– finds reachable integer values for each basic block (analysis pass)
  • DuplicateBB– duplicates basic blocks, requiresRIVanalysis results (transformation pass, parametrisable) ****************

Once you’vebuiltthis project, you can experiment with every pass separately. It is assumed that you haveclangandoptavailable in yourPATH. All passes work with LLVM files. You can generate one like this:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)#  (Textual form)$ LLVM_DIR/ bin / clang -emit-llvm input.c -S -o out.ll#  (Binary / bit-code form)$ LLVM_DIR/ bin / clang -emit-llvm input.c -o out.bc

It doesn’t matter whether you choose the textual or binary form, but obviously the former is more human-friendly. All passes, except forHelloWorld, are described below.

Count Compile Time Function Calls (StaticCallCounter)

StaticCallCounterwill count the number of function calls in the input LLVM file that are visible during the compilation (i.e. if a function is called within a loop, that counts as one call). Only direct function calls are considered (TODO: Expand).

exportLLVM_DIR=Installation / dir / of / llvm /  (9)#Generate an LLVM file to analyze$ LLVM_DIR/ bin / clang -emit-llvm -csource_dir>/ inputs / input_for_cc.c -o input_for_cc.bc#Run the pass through opt$ LLVM_DIR/ bin / opt -load/ lib /libStaticCallCounter.dylib -static-cc -analyze input_for_cc.bc

Thestaticexecutable is a command line wrapper that allows you to runStaticCallCounterwithout the need foropt:

build_dir>/ bin / static input_for_cc.bc

Count Run-Time Function Calls (DynamicCallCounter))

DynamicCallCounterwill count the number of run-time function calls . It does so by instrumenting the input LLVM file – it injects call-counting code that is executed every time a function is called.

Although the primary goal of this pass is toanalyzefunction calls, it also modifies the input file. Therefore it is a transformation pass. You can test it with one of the provided examples, e.g .:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)#Generate an LLVM file to analyze$ LLVM_DIR/ bin / clang -emit-llvm -csource_dir>/ inputs / input_for_cc.c -o input_for_cc.bc#Instrument the input file firstbuild_dir>/ bin / dynamic -dynamic input_for_cc.bc -o instrumented_bin#Now run the instrumented binary./instrumented_bin

Mixed Boolean Arithmetic Transformations

These passes implementMixed boolean arithmetic transformations. Similar transformation are often used in code obfuscation (you may also know them fromHacker’s Delight) and are a great illustration of what and how LLVM passes can be used for.

MBASub

TheMBASubpass implements this rather basic expression:

a - b==(a   ~ b)   1

Basically, it replaces all instances of integersubaccording to the above formula. The corresponding LIT tests verify that both the formula and that the implementation are correct. You can run this pass as follows:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)$ LLVM_DIR/ bin / clang -emit-llvm -S inputs / input_for_mba_sub.c -o input_for_sub. ll$ LLVM_DIR/ bin / opt -load/ lib /libMBASub.so -mba-sub inputs / input_for_sub.ll -o out.ll

MBAAdd

TheMBAAddpass implements a slightly more involved formula that is only valid for 8 bit integers:

a   b==(((a ^ b)   2 * (a & b)) * 39   23) * 151   111

Similarly toMBASub, it replaces all instances of integeraddaccording to the above identity, but only for 8-bit integers. The LIT tests verify that both the formula and the implementation are correct. You can runMBAAddlike this:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)$ LLVM_DIR/ bin / clang -O1 -emit-llvm -S inputs / input_for_mba.c -o input_for_mba.ll$ LLVM_DIR/ bin / opt -load/ lib /libMBAAdd.so -mba-add inputs / input_for_mba.ll -o out.ll

You can also specify the level ofobfuscationon a scale of (0.0) *************** (to) ************** (1.0) , with0corresponding to no obfuscation and1meaning that alladdinstructions are to be replaced with(((a ^ b) 2 * (a & b)) * 39 23) * 151 111, eg:

$ LLVM_DIR/ bin / opt -loadbuild_dir>/ lib / libMBAAdd.so -mba-add -mba-ratio=0.3 inputs / input_for_mba.c -o out.ll

Reachable Integer Values ​​(RIV))

For each basic block in a module,RIVcalculates the reachable integer values ​​(i.e. values ​​that can be used in the particular basic block). There are a few LIT tests that verify that indeed this is correct. You can run this pass as follows:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)$ LLVM_DIR/ bin / opt -load/ lib /libRIV.so -riv inputs / input_for_riv.c

Note that this pass, unlike previous passes, will produce information only about the IR representation of the original module. It won’t be very useful if trying to understand the original C or C input file.

Duplicate Basic Blocks (DuplicateBB)

This pass will duplicate all basic blocks in a module, with the exception of basic blocks for which there are no reachable integer values ​​(identified through theRIVpass). An example of such a basic block is the entry block in a function that:

  • takes no arguments and
  • is embedded in a module that defines no global values.

This pass depends on theRIVpass, hence you need to load it too in order forDuplicateBBto work:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)$ LLVM_DIR/ bin / opt -loadbuild_dir>/ lib / libRIV.so -load/ lib / libDuplicateBB.so -riv inputs / input_for_duplicate_bb.c

Basic blocks are duplicated by inserting anif-then-elseconstruct and cloning all the instructions (with the exception ofPHI nodes) into the new blocks.

Before running a debugger, you may want to analyze the output fromLLVM_DEBUGandSTATISTICmacros. For example, forMBAAdd:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)$ LLVM_DIR/ bin / clang -emit-llvm -S -O1 inputs / input_for_mba.c -o input_for_mba.ll$ LLVM_DIR/ bin / opt -load-pass-plugin/ lib / libMBAAdd.dylib -passes=mba-add input_for_mba.ll -debug-only=mba-add -stats -o out.ll

Note the- debug-only=mba-addand- statsflags in the command line – that’s what enables the following output:

% 12=add i8% 1,% 0 ->badref>=add i8 111,% 11   % 20=add i8% 12,% 2 ->Badref>=add i8 111,% 19   % 28=add i8% 20,% 3 ->(badref>=add i8 111,% 27===----------------------------------------------- --------------------------===                          ... Statistics Collected ...===----------------------------------------------- --------------------------===3 mba-add - The#of substituted instructions

As you can see, you get a nice summary fromMBAAdd. In many cases this will be sufficient to understand what might be going wrong.

For tricker issues just use a debugger. Below I demonstrate how to debugMBAAdd. More specifically, how to set up a breakpoint on entry toMBAAdd :: run. Hopefully that will be sufficient for you to start.

Mac OS X

The default debugger on OS X isLLDB. You will normally use it like this:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)$ LLVM_DIR/ bin / clang -emit-llvm -S -O1 inputs / input_for_mba.c -o input_for_mba.ll lldb -$ LLVM_DIR/ bin / opt -load-pass-pluginbuild_dir>/ lib / libMBAAdd.dylib -passes=mba-add input_for_mba.ll -o out.ll (lldb) breakpointset--name MBAAdd :: run (lldb) process launch

or, equivalently, by using LLDBs aliases:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)$ LLVM_DIR/ bin / clang -emit-llvm -S -O1 inputs / input_for_mba.c -o input_for_mba.ll lldb -$ LLVM_DIR/ bin / opt -load-pass-pluginbuild_dir>/ lib / libMBAAdd.dylib -passes=mba-add input_for_mba.ll -o out.ll (lldb) b MBAAdd :: run (lldb) r

At this point, LLDB should break at the entry toMBAAdd :: run.

Ubuntu

On most Linux systems,GDBis the most popular debugger. A typical session will look like this:

exportLLVM_DIR=Installation / dir / of / llvm /  (9)$ LLVM_DIR/ bin / clang -emit-llvm -S -O1 inputs / input_for_mba.c -o input_for_mba.ll gdb --args$ LLVM_DIR/ bin / opt -load-pass-pluginbuild_dir>/ lib / libMBAAdd.so -passes=mba-add input_for_mba.ll -o out.ll (gdb) b MBAAdd.cpp: MBAAdd :: run (gdb) r

At this point, GDB should break at the entry toMBAAdd :: run.

This is first and foremost a community effort. This project wouldn’t be possible without the amazing LLVMOnline documentation, the plethora of great comments in the source code, and the llvm-dev mailing list. Thank you!

It goes without saying that there’s plenty of great presentations on YouTube, blog posts and GitHub projects that cover similar subjects. I’ve learnt a great deal from them – thank you all for sharing! There’s one presentation / tutorial that has been particularly important in my journey as an aspiring LLVM developer and that helped todemocratiseout-of-source pass development:

  • “Building, Testing and Debugging a Simple out-of-tree LLVM Pass” Serge Guelton, Adrien Guinet (slides,video)

Adrien and Serge came up with some great, illustrative and self-contained examples that are great for learning and tutoring LLVM pass development. You’ll notice that there are similar transformation and analysis passes available in this project. The implementations available here are based on the latest release of LLVM’s API and have been refactored and documented to reflect whatI(aka. Banach-space) found most challenging while studying them.

The MIT License (MIT)

Copyright (c) 2019 Andrzej Warzyński

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and / or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

  

Brave Browser
Read More
Payeer

About admin

Leave a Reply

Your email address will not be published. Required fields are marked *