in ,

ELF Binaries and Relocation Entries, Hacker News


Recently I have been working on getting the (OpenRISC glibc) port ready for upstreaming. Part of this work has been to run the glibc testsuite and get the tests to pass. Theglibc testsuitehas a comprehensive set of linker and runtime relocation tests.

In order to fix issues with tests I had to learn more than I did before about ELF Relocations , Thread Local Storage and the binutils linker implementation in BFD. There is a lot of documentation available, but it’s a bit hard to follow as it assumes certain knowledge, for example have a look at the SolarisLinker and Librariessection on relocations. In this article I will try to fill in those gaps.

This will be an illustrated 3 part series covering

    ELF Binaries and Relocation Entries  

  • Thread Local Storage
  •   

  • How Relocations and Thread Local Store are implemented

All of the examples in this article can be found in mytls-examplesproject. Please check it out.

On Linux, you can download it andmakeit with your favorite toolchain. By default it will cross compile using anopenrisc toolchain. This can be overridden with theCROSS_COMPILEvariable. For example, to build for your current host.

$ git clone [email protected]: stffrdhrn / tls-examples.git $ make CROSS_COMPILE=gcc -fpic -c -o tls-gd-dynamic.o tls-gd.c -Wall -O2 -g gcc -fpic -c -o nontls-dynamic.o nontls.c -Wall -O2 -g ... objdump -dr x-static.o>x-static.S objdump -dr xy-static.o>xy-static.S

Now we can get started.

ELF Segments and Sections

Before we can talk about relocations we need to talk a bit about what makes up (ELF) binaries. This is a prerequisite as relocations and TLS are part of ELF binaries. There are a few basic ELF binary types:

    ****************************************** (Objects). o) – produced by a compiler, contains a collection of sections, also call relocatable files.  

  • Program – an executable program, contains sections grouped into segments.
  •   

  • Shared Objects (.so) – a program library, contains sections grouped into segments.
  •   

  • Core Files – core dump of program memory, these are also ELF binaries

(Here we will discuss Object Files and Program Files.)

An ELF Object

ELF Object

The compiler generates object files, these contain sections of binary data and these are not executable.

The object file produced by (GCC) generally contains. rela.text,.text,.dataand.bsssections.

    . rela.text– a list of relocations against the.textsection  

  • – contains compiled program machine code
  •   

  • – static and non static initialized variable values ​​
  •   

  • .bss– static and non static non-initialized variables

An ELF Program

ELF Program

ELF binaries are made ofsectionsand segments.

A segment contains a group of sections and the segment defines how the data should be loaded into memory for program execution.

Each segment is mapped to program memory by the kernel when a process is created. Program files contain most of the same sections as objects but there are some differences.

    .Text– contains executable program code, there is no.rela.textsection  

  • – theglobal offset tableused to access variables, created during link time. May be populated during runtime.

Looking at ELF binaries (readelf)

Thereadelftool can help inspect elf binaries.

Some examples:

Reading Sections of an Object File

Using the- Soption we can read sections from an elf file. As we can see below we have the.text,. rela.text,.bssand many other sections.

$ readelf -S tls-le-static.o There are 20 section headers, starting at offset 0x 604:  Section Headers:   [Nr] Name Type Addr Off Size ES Flg Lk Inf Al   [ 0] NULL 00 00 00 00 00 00 00 00 00 00 00 0 0 0   [ 1] .text PROGBITS 00 00 00  (0)  00 00 20 00 AX 0 0 4   RELA [ 2] .RELA 00 00 00 00 00  (F8)  0c I 17 1 4   [ 3] .data PROGBITS 00 00 00 00 000054 00 00 00 00 WA 0 0 1   [ 4] .bss NOBITS 00 00 00 00 000054 00 00 00 00 WA 0 0 1   [ 5] .tbss NOBITS 00 00 00 00 000054 000004 00 WAT 0 0 4   [ 6] .debug_info PROGBITS 00 00 00 00 000054 000074 00 0 0 1   [ 7] .rela.debug_info RELA 00 00 00 00 000428  (0c I)  6 4   [ 8] .debug_abbrev PROGBITS 00 00 00 00 00  (C8) *************************************************************************************************************************************************************************************************************************************************** (C)  0 0 1   [ 9] .debug_aranges PROGBITS 00 00 00 00 000144 00 00 20 00 0 0 1   [10] .rela.debug_arang RELA 00 00 00 00 00 04 AC 00 00  (0c I)  4 4   [11] .debug_line PROGBITS 00 00 00 00 000164 000087 00 0 0 1   [12] .rela.debug_line RELA 00 00 00 00 00  (C4) ******************************************************************************************************************************************************************************************************************************************************************* (0) **************************************************************************************************************************************************************************************************************************************************** (c 0c I)  11 4   [13] .debug_str PROGBITS 00 00 00 00 00  (eb)  A 01 MS 0 0 1   [14] .comment PROGBITS 00 00 00 00 000265 00  (b)  MS 0 0 1   [15] .debug_frame PROGBITS 00 00 00 00 000290 000030 00 0 0 4   [16] .rela.debug_frame RELA 00 00 00 00 000530  (0c I)  15 4   [17] .Symtab SYMTAB 00 00 00 00 0  (C0)  10 18 15 4   [18] .strtab STRTAB 00 00 00 00 00  (D0)  00 25 00 0 0 1   [19] .shstrtab STRTAB 00 00 00 00 000560 00  (A1)  0 0 1

Reading Sections of a Program File

Using the- Soption on a program file we can also read the sections. The file type does not matter as long as it is an ELF we can read the sections. As we can see below there is no longer arela.textsection, but we have others including the. gotsection.

$ readelf -S tls-le-static There are 31 section headers, starting at offset 0x 32 E8FC:  Section Headers:   [Nr] Name Type Addr Off Size ES Flg Lk Inf Al   [ 0] NULL 00 00 00 00 00 00 00 00 00 00 00 0 0 0   [ 1] .text PROGBITS 00 00  (D4)  00 D4 080304 00 AX 0 0 4   [ 2] __libc_freeres_fn PROGBITS 000823 D8 0803 D8 001118 00 AX 0 0 4   [ 3] .rodata PROGBITS 000834 F0 0814 F0  (C)  A 0 0 4   [ 4] __libc_subfreeres PROGBITS 0009893 c  (c)  00 24 00 A 0 0 4   [ 5] __libc_IO_vtables PROGBITS 00098960  (0) ************************************************************************************************************************************************************************************************************************************************************ (F4)  A 0 0 4   [ 6] __libc_atexit PROGBITS  C)  096 C 54 000004 00 A 0 0 4   [ 7] .eh_frame PROGBITS  C)  096 C 58 00 27 A8 00 A 0 0 4   [ 8] .gcc_except_table PROGBITS 0009 B 400 099400 000089 00 A 0 0 1   [ 9] .note.ABI-tag NOTE  (b)  C 09948 C 00 00 20 00 A 0 0 4   [10] .tdata PROGBITS 0009 DC 28 099 c 28 00 00 10 00 WAT 0 0 4   [11] .tbss NOBITS 0009 DC 38  (C)  00 00 24 00 WAT 0 0 4   [12] .init_array INIT_ARRAY 0009 DC 38 099 C 38 000004 04 WA 0 0 4   [13] .fini_array FINI_ARRAY 0009 DC3C 099 C3C 000008 04 WA 0 0 4   [14] .data.rel.ro PROGBITS  (DC)   C)  00  (BC)  WA 0 0 4   [15] .data PROGBITS  (e) ******************************************************************************************************************************************************************************************************************************************************************* (0) ************************************************************************************************************************************************************************************************************************************************ (a) ******************************************************************************************************************************************************************************************************************************************************************* (0) ******************************************************************************************************************************************************************************************************************************************************************* (0de0)  WA 0 0 4   [16] .got PROGBITS 0009 ede0 09 ade0 000064 04 WA 0 0 4   [17] .bss NOBITS  (ee)  09 AE 44  (0bec)  WA 0 0 4   [18] __libc_freeres_pt NOBITS  (fa)   (AE)  00 00 14 00 WA 0 0 4   [19] PROGBITS .com 00 00 00 00 09 AE 44 00  (a)  MS 0 0 1   [20] .debug_aranges PROGBITS 00 00 00 00  (ae6e)  00 0 0 1   [21] .debug_info PROGBITS 00 00 00 00  (D) *********************************************************************************************************************************************************************************************************************************** (e 0fd0)  00 0 0 1   [22] .debug_abbrev PROGBITS 00 00 00 00 19 A1B6  (CA)  0 0 1   [23] .debug_line PROGBITS 00 00 00 00 1C 1280 0ce  (c)  0 0 1   [24] .debug_frame PROGBITS 00 00 00 00 28 FBDC 0063 BC 00 0 0 4   [25] .debug_str PROGBITS 00 00 00 00  (f)  11 e 35 01 MS 0 0 1   [26] .debug_loc PROGBITS 00 00 00 00 2a7dcd 06 C 437 00 0 0 1   [27] .debug_ranges PROGBITS 00 00 00 00 314204 00 c 900 00 0 0 1   [28] .Symtab SYMTAB 00 00 00 00  (b)  0075 D0 10 29 926 4   [29] .strtab STRTAB 00 00 00 00  (D4)  CA 00 0 0 1   [30] .shstrtab STRTAB 00 00 00 00  (e) ************************************************************************************************************************************************************************************* (e)  C 00 0 0 1 Key to Flags:   W (write), A (alloc), X (execute), M (merge), S (strings), I (info),   L (link order), O (extra OS processing required), G (group), T (TLS),   C (compressed), x (unknown), o (OS specific), E (exclude),   p (processor specific)

Reading Segments from a Program File

Using the- Loption on a program file we can read the segments. Notice how segments map from file offsets to memory offsets and alignment. The two differentLOADtype segments are segregated by read only / execute and read / write. Each section is also mapped to a segment here. As we can see. text is in the firstLOAD` segment which is executable as expected.

$ readelf -l tls-le-static  Elf file type is EXEC (Executable file) Entry point 0x 2104 There are 5 program headers, starting at offset 52  Program Headers:   Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align   LOAD 0x 00 00  (0x) ***************************************************************************************************************************************** (0x) ***************************************************************************************************************************************** (0x) ********************************************************************************************************************************************** (AC 0x) ********************************************************************************************************************************************** (AC RE 0x)    LOAD 0x  (c) *************************************************************************************************************************************************************************************************************** (0x) *********************************************************************************************************************************************************************************************************************************************** (DC) *************************************************************************************************************************************************************************************************************** (0x)  DC 28 0x  (c 0x) ***************************************************************************************************************************************************************************************************************************************************************** (e1c RW 0x)    NOTE 0x  (c 0x) *********************************************************************************************************************************************************************************************************************************************** (b) ************************************************************************************************************************************************************************************************ (c 0x)  B  (c 0x) ******************************************************************************************************************************************************************************************************************************************************************* (0) *************************************************************************************************************************************************************************************************************************** (0x) ******************************************************************************************************************************************************************************************************************************************************************* (0)  R 0x4   TLS 0x  (c) *************************************************************************************************************************************************************************************************************** (0x) *********************************************************************************************************************************************************************************************************************************************** (DC) *************************************************************************************************************************************************************************************************************** (0x)  DC 28 0x  (0) ******************************************************************************************************************************************************************************************************************************************** (0x)  R 0x4   GNU_RELRO 0x  (c) *************************************************************************************************************************************************************************************************************** (0x) *********************************************************************************************************************************************************************************************************************************************** (DC) *************************************************************************************************************************************************************************************************************** (0x)  DC 28 0x0 03 D8 0x0 03 d8 R 0x1   Section to Segment mapping:   Segment Sections ...    00 .text __libc_freeres_fn .rodata __libc_subfreeres __libc_IO_vtables __libc_atexit .eh_frame .gcc_except_table .note.ABI-tag    01 .tdata .init_array .fini_array .data.rel.ro .data .got .bss __libc_freeres_ptrs    02 .note.ABI-tag    03 .tdata .tbss    04. tdata .init_array .fini_array .data.rel.ro

Reading Segments from an Object File

Using the- Loption with an object file does not work as we can see below.

readelf -l tls-le-static.o  There are no program headers in this file.

Relocation entries

As mentioned an object file by itself is not executable . The main reason is that there are no program headers as we just saw. Another reason is that The.textsection still contains relocation entries (or placeholders) for the addresses of variables located in the.dataand.bsssections. These placeholders will just be0in the machine code. So, if we tried to run the machine code in an object file we would end up with Segmentation faults ( (SEGV) ).

A relocation entry is a placeholder that is added by the compiler or linker when producing ELF binaries. The relocation entries are to be filled in with addresses pointing to data. Relocation entries can be made in code such as the.textsection or in data sections like the. Gotsection. For example:

Resolving Relocations

GCC and Linker

The diagram above shows relocation entries as white circles. Relocation entries may be filled or resolved at link-time or dynamically during execution.

Link time relocations

    Place holders are filled in when ELF object files are linked by the linker to create executables or libraries  

  • For example, relocation entries in.textsections

Dynamic relocations

    Place holders is filled during runtime by the dynamic linker. i.e. Procedure Link Table  

  • For example, relocation entries added to. gotand.pltsections which link to shared objects.

Note: Statically built binaries do not have any dynamic relocations and are not loaded with the dynamic linker.

In general link time relocations are used to fill in relocation entries in code. Dynamic relocations fill in relocation entries in data sections.

Listing Relocation Entries

A list of relocations in a ELF binary can printed usingreadelfwith The- roptions.

Output ofreadelf -r tls-gd- dynamic.o

Relocation section '.rela.text' at offset 0x 530 contains 10 entries:  Offset Info Type Sym.Value Sym. Name   Addend 00 00 00 00 00  (0f) *********************************************************************************************************************************************************************************************************************************** (R_OR1K_TLS_GD_HI1)  00 00 00 x   0 00000008 00  (0f)  R_ OR1K_TLS_GD_LO1 00 00 00 00 x   0 00 00 00 20  c R_OR1K_GOTPC_HI 16 00 00 00 00 _GLOBAL_OFFSET_TABLE_ - 4 00 00 00 24 0000100 d R_OR1K_GOTPC_LO 16 00 00 00 00 _GLOBAL_OFFSET_TABLE_   0 00 00  (c)   (0d0f R_OR1K_PLT)  00 00 00 00 __tls_get_addr   0 ...

The relocation entry list explains how to and where to apply the relocation entry. It contains:

    (Offset) – the location in the binary that needs to be updated  

  • (Info) – the encoded value containing theType, Sym and Addend, which is   broken down to:     
      Type– the type of relocation (the formula for what is to be performed is defined in the linker)
  •       

  • Sym. Value– the address value (if known) of the symbol.
  •       

  • Sym. Name– the name of the symbol (variable name) that this relocation needs to find during link time.
  •     

  

  • (Addend) – a value that needs to be added to the derived symbol address. This is used to with arrays (ie for a relocation referencing (a) we would have (Sym. Name)Aand an (Addend) of the data size of (a) times14)
  • Example

    File:nontls.c

    In the example below we have a simple variable and a function to access it’s address.

    static int x;  int * get_x_addr () {   return & x; }

    Let’s see what happens when we compile this source.

    The steps to compile and link can be found in thetls-examplesproject hosting the source examples.

    Before Linking

    Non TLS Object

    The diagram above shows relocations in the resulting object file as white circles.

    In the actual output below we can see that access to the variable (x) is referenced by a literal (0) in each instruction. These are highlighted with square brackets[]below for clarity.

    These empty parts of the.textsection are relocation entries.

    Addr. Machine Code Assembly Relocations 00 00  (0c) :    C: 19 60 [00 00] l.movhi r 11, [0] # c R_OR1K_AHI 16 .bss   (********************************************************************************************************************************************************************************************************************************************: 44 00 48 00 l.jr r9   14: 9D 6B  (l.addi r) , R 11, [0] # 14 R_OR1K_LO _ 16 _ IN_INSN .bss

    The functionget_x_addrwill return the address of variable (x) . We can look at the assembly instruction to understand how this is done. Some background of the OpenRISC ABI.

      Registers are 32 –  

    • Function return values are placed in register (r) .
    •   

    • To return from a function we jump to the address in the link registerR9.
    •   

    • OpenRISC has abranch delay slot, meaning the address after a branch it executed before the branch is taken.

    Now, lets break down the assembly:

      (l.movhi) – move the value[0]into high bits of registerr 11, clearing the lower bits.   

    • l. addi– add the value in registerr 11to the value[0]and store the results in (r) .
    •   

    • l. JR– jump to the address inR9

    This constructs a 32 – bit value out of 2 16 – bit values.

    After Linking

    Non TLS Object

    The diagram above shows the relocations have been replaced with actual values.

    As we can see from the linker output the places in the machine code that had relocation place holders are now replaced with values. For example1a 20 00 00has become (1a) (0a) .

    00 00 2298:     2298: 19 60 00 0a l.movhi r 11, 0xa     229 C: 44 00 48 00 l.jr r9     22 A0: 9D 6B EEE  (l.addi r) , R 11, - 4512

    If we calculate (0xawe see get0009 ee 60. That is the same location of (x) within our binary. This we can check withreadelf -swhich lists all symbols.

    $ readelf -s nontls-static | grep 'x'     42: 0009 ee  4) OBJECT LOCAL DEFAULT 17 x

    Types of Relocations

    As we saw above, a simple program resulted in 2 different relocation entries just to compose the address of 1 variable. We saw:

      (R_OR1K_AHI)   

    • (R_OR1K_LO_) _ IN_INSN

    The need for different relation types comes from the different requirements for the relocation. Processing of a relocation involves usually a very simple transform , each relocation defines a different transform. The components of the relocation definition are:

      (Input) The input of a relocation formula is always theSymbol Addresswho’s absolute value is unknown at compile time. But there may also be other input variables to the formula including:     

        Program CounterThe absolute address of the machine code address being updated      

      • (Addend) The addend available in from the relocation entry discussed above
      •     

        

    • (Formula) How the input is manipulated to derive the output value. For example shift right 16 bits.
    •   

    • (Bit-Field) Specifies which bits at the output address need to be updated.

    To be more specific about the above relocations we have:

          

          

        

          

          

        

    Relocation Type [17]        (Bit-Field)        (Formula)     
    (R_OR1K_AHI) SIMM 16 S>>16
    (R_OR1K_LO _) _ IN_INSN SIMM 16 S&& 0xffff

    The Bit-Field described above isSIMM 16which means update the lower 16 – bits of the 32 – bit value at the output offset and do not disturb the upper 16 – bits .

     ----------   ----- -----    | | SIMM 16  | 31 16 15 0 |    ----------   ---------- 

    There are many other Relocation Types with difference Bit- Fields and Formulas. These use different methods based on what each instruction does, and where each instruction encodes its immediate value.

    For full listings refer to architecture manuals.

    Take a look and see if you can understand how to read these now.

    Summary

    In this article we have discussed what ELF binaries are and how they can be read. We have talked about how from compilation to linking to runtime, relocation entries are used to communicate which parts of a program remain to be resolved. We then discussed how relocation types provide a formula and bit-mask for updating the places in ELF binaries that need to be filled in.

    In the next article we will discuss how Thread Local Storage works, both link-time and runtime relocation entries play big part in how TLS works.

    Further Reading

    Brave Browser
    (Read More)
    Payeer

    What do you think?

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    GIPHY App Key not set. Please check settings

    Home Assistant, Hacker News

    Home Assistant, Hacker News

    Disproved Discoveries That Won Nobel Prizes | RealClearScience, Hacker News