Relocation Table in ELF

Have you ever wondered how Type and Sym. Name get calculated in the Relocation Table?

Using the classic helloworld.c program as an example again,

1
2
3
4
5
6
#include <stdio.h>

int main() {
printf("Hello world!\n");
getchar();
}

$ gcc -o helloworld helloworld.c

Let’s print out its relocation table.

$ readelf -r helloworld

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Relocation section '.rela.dyn' at offset 0x410 contains 8 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000200db8 000000000008 R_X86_64_RELATIVE 630
000000200dc0 000000000008 R_X86_64_RELATIVE 5f0
000000201008 000000000008 R_X86_64_RELATIVE 201008
000000200fd8 000100000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_deregisterTMClone + 0
000000200fe0 000300000006 R_X86_64_GLOB_DAT 0000000000000000 __libc_start_main@GLIBC_2.2.5 + 0
000000200fe8 000400000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
000000200ff0 000500000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_registerTMCloneTa + 0
000000200ff8 000600000006 R_X86_64_GLOB_DAT 0000000000000000 __cxa_finalize@GLIBC_2.2.5 + 0

Relocation section '.rela.plt' at offset 0x4d0 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000200fd0 000200000007 R_X86_64_JUMP_SLO 0000000000000000 puts@GLIBC_2.2.5 + 0

Looking at the struct for the Relocation entry, Type and Sym. value/Sym. Name aren’t amongst its attributes.

However, the man page for elf left a hint as to where we can find them,

When the text refers to a relocation entry’s relocation type or symbol table index, it means the result of applying ELF[32|64]_R_TYPE or ELF[32|64]_R_SYM, respectively, to the entry’s r_info member.

What does it mean exactly?

So basically ELF[32|64]_R_TYPE and ELF[32|64]_R_SYM are defined as macros, which take r_info as a parameter, and the result from each marco is the Type and Sym. Name respectively.

Below are the definition of both macros, we will pick the 64-bit version since this helloworld program is in 64-bit.

elf.h

1
2
#define ELF64_R_SYM(i)			((i) >> 32)
#define ELF64_R_TYPE(i) ((i) & 0xffffffff)

Take the reloation entry for puts for example, put 000200000007 in place of i in ((i) >> 32), the result is 2. This is an index into the symbol table below. puts right sitting at the second index there, and its value can be retrieved as well in one go.

1
2
3
4
5
6
7
8
9
10
11
readelf -s helloworld

Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_deregisterTMCloneTab
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND puts@GLIBC_2.2.5 (2)
3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main@GLIBC_2.2.5 (2)
4: 0000000000000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
5: 0000000000000000 0 NOTYPE WEAK DEFAULT UND _ITM_registerTMCloneTable
6: 0000000000000000 0 FUNC WEAK DEFAULT UND __cxa_finalize@GLIBC_2.2.5 (2)

Next, the Type.

Put 000200000007 in place of i in the second macro ((i) & 0xffffffff), the result is 7.

Look up this number in the following table, the type is R_X86_64_JUMP_SLOT.

elf.h

1
2
3
4
5
6
7
8
9
10
11
/* AMD x86-64 relocations.  */
#define R_X86_64_NONE 0 /* No reloc */
#define R_X86_64_64 1 /* Direct 64 bit */
#define R_X86_64_PC32 2 /* PC relative 32 bit signed */
#define R_X86_64_GOT32 3 /* 32 bit GOT entry */
#define R_X86_64_PLT32 4 /* 32 bit PLT address */
#define R_X86_64_COPY 5 /* Copy symbol at runtime */
#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */
#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */
#define R_X86_64_RELATIVE 8 /* Adjust by program base */
#define R_X86_64_GOTPCREL 9 /* 32 bit signed PC relative

Hope you enjoyed.


What is a symbol?

A symbol is a name assigned to an entity in your code.
A variable name is a symbol, a function name is a symbol, it is basically a tag to a thing.

What is a Relocation Table?

A Relocation Table is a table that tells a linker which location in the code needs to be patched with the symbol’s real memory address. So in the final machine code, when the CPU needs to access the value of a symbol, it just go to that memory address. This is the only thing it cares about.

Put it in the real world, it is not possible to find Sam Smith’s home by just saying “Sending this letter to Sam’s house”. The address has to be physical.

Same story for programming. You got the point.