A Guide to hexdump

hexdump is a fantastic tool when one needs to dump information from a Linux binary file.

In my case, it helps me to see how the symbol table (.symtab) is organized in the binary form, so that various tools like readelf can make sense of the raw bytes stored there, and present them in a human readable form to the end users according to some predefined rules.

Intro

First of all, instead of jumping directly to hexdump, I would like to spend some words on the context that hexdump is going to deal with.

Since this post is a demonstration on analysing symbol table using hexdump, let’s first take a look at what a symbol table is, and where we can find it and how to make sense of the information in it.

Let’s get started!

Given a classic helloworld.c, let’s compile it and use the helloworld binary produced for this demo.

$ gcc -o helloworld helloworld.c

1
2
3
4
5
# include <stdio.h>

int main() {
printf("Hello world!");
}

Let’s print out its sections using readelf.

$ readelf -s helloworld

1
2
3
4
5
6
7
8
9
10
11
12
Symbol table '.dynsym' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
[...]

Symbol table '.symtab' contains 63 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000238 0 SECTION LOCAL DEFAULT 1
[...]
26: 0000000000000000 0 FILE LOCAL DEFAULT ABS crtstuff.c
27: 0000000000000560 0 FUNC LOCAL DEFAULT 14 deregister_tm_clones
[...]

Figure 1.

readelf -s prints out all the sections of a ELF file, symbol table is one of them.

Looking at above output, one thing confuses me though is the Name column - there isn’t an attribute in the symbol struct that says Name (see symbol struct below).

1
2
3
4
5
6
7
8
typedef struct {
uint32_t st_name;
unsigned char st_info;
unsigned char st_other;
uint16_t st_shndx;
Elf64_Addr st_value;
uint64_t st_size;
} Elf64_Sym;

Figure 2.

Some would argue that the first element st_name is the symbol name. However, that’s not true.

man elf.h - This member (st_name) holds an index into the object file’s symbol string table, which holds character representations of the symbol names.

Let’s put it another way. Basically this member is an integer, which is acting as an index into the .strtab section, that contains all the symbols used in this binary.

So the readelf command might have taken a step further for us, which is to find the symbol name using the st_name index in the .strtab, and display it in the Name coloumn.

To prove our speculation, we need a way to dump the symbol table to see what’s actually stored there. This is where hexdump comes in.

Hexdump

First of all, let’s see where the symbol table resides in the binary file.

$ readelf -S helloworld

1
2
3
4
5
6
7
8
9
10
11
12
There are 29 section headers, starting at offset 0x1930:

Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[...]
[26] .symtab SYMTAB 0000000000000000 00001040
00000000000005e8 0000000000000018 27 43 8
[27] .strtab STRTAB 0000000000000000 00001628
0000000000000208 0000000000000000 0 0 1

Figure 3

It is located at offset 0x1040 in the file, with a size of 0x5e8 bytes. Let’s dump them out for further inspection.

But wait! I wanted more than just dump them as is to the console, I want them to be tailored.

Here is the plan.

As just mentioned, the symbol table has 0x5e8 bytes of data, and that essentially is comprised of a group of elements, each representing a symbol (per the symbol struct in Figure 2.). So ideally each and every element should be on its own line, with attributes of each element separated from each other by spaces for clarity.

After hours of experimenting, below is the command to achieve that,

$ hexdump -s 0x1040 -n 1512 -v -e '"%_ax# " 4/1 "%02x" " " /1 "%02x " /1 "%02x " 2/1 "%02x" " " 8/1 "%02x" " " 8/1 "%02x""\n"' helloworld

1
2
3
4
5
6
7
8
1040# 00000000 00 00 0000 0000000000000000 0000000000000000
1058# 00000000 03 00 0100 3802000000000000 0000000000000000
1070# 00000000 03 00 0200 5402000000000000 0000000000000000
[...]
12b0# 01000000 04 00 f1ff 0000000000000000 0000000000000000
12c8# 0c000000 02 00 0e00 6005000000000000 0000000000000000
[...]
1610# af010000 12 00 0b00 e804000000000000 0000000000000000

Figure 4

Let’s look at line 12b0#. This line should represents one of the symbols in the symbol table. Let’s see which one it is.

The data 01000000 in the second column corresponds to the st_name field. As we’ve already knew that this is an index into the string table strtab. So let’s check out the string table to see what is at index 1 there. We need to dump the string table (.strtab) the same way as we did on the symbol table.

From Figure 3 we can see that the string table .strtab starts at 0x1628 and is 0x208 bytes long, let’s dump them out,

$ hexdump -C -s 0x1628 -n 520 helloworld

1
2
3
4
00001628  00 63 72 74 73 74 75 66  66 2e 63 00 64 65 72 65  |.crtstuff.c.dere|
00001638 67 69 73 74 65 72 5f 74 6d 5f 63 6c 6f 6e 65 73 |gister_tm_clones|
00001648 00 5f 5f 64 6f 5f 67 6c 6f 62 61 6c 5f 64 74 6f |.__do_global_dto|
00001658 72 73 5f 61 75 78 00 63 6f 6d 70 6c 65 74 65 64 |rs_aux.completed|

It is very clear that the string that starts at index 1 is crtstuff.c (null-terminated), which matches the same in the Name column in Figure 1.

Bingo!

Job well done hexdump!

Further Readings

What we’ve gone through just now does seem to be a trivial utilization of hexdump. However, there still some pieces that were intentionally left out for the sake of continuity that need to be cleared (at least for someone new to hexdump).

The following content elaborates everything that I felt were obsured to me when attempting above task. So it will be a mixture of explaination to various hexdump options, as well as the intepretations of the attributes in the symbol table. It is not a required but highly recommanded section that will save you enormous effort experimenting each option, as well as going through all the examples in the man page.

  • First of all, an explanation of all the options used in Figure 4.

    _a[dox]: display offset culmutively. [dox] is to format the offset into decimal, octal, hex respectively. # is printed literaly.

    -s: Skip offset bytes from the beginning of the input. Accepts decimal/octal/hex number.

    -n: Interpret only length bytes of input (only accepts decimal number).

    -v: Without the -v option, any number of groups of output lines, which would be identical to the immediately preceding group of output lines (except for the input offsets), are replaced with a line comprised of a single asterisk.

    -e: Flag for format string.

    The whole fomrat string following -e should be quoted in a pair of single quote(‘ ‘). A format string contains any number of format units, separated by whitespace. A format unit contains up to three items: an iteration count(ptional), a byte count(optional), and a format(required, must be surrounded by double quote (“ “).

    Sometimes we will see a couple of -e flags in a row, that indicates the same byte will be applied with multiple format strings.

    Check out this example.

    1
    2
    3
    >% echo hello | hexdump -e '8/1 "%02X ""\t"" "' -e '8/1 "%c""\n"'
    >68 65 6C 6C 6F 0A hello
    >

    hello will first be applied with format string 8/1 "%02X " and then 8/1 "%c".

    The first format string outputs 68 65 6C 6C 6F 0A and the second outputs hello, with a tab (“\t”) in between.

  • Now let’s move on to see how iteration_count/byte_count format collaborates together to produce the final result.

    The original form of the binary data at offset 12b0# is,

    010000000400f1ff00000000000000000000000000000000

    When above is sent to the command in Figure 4,

    • The first format string "%_ax#" outputs the offset 12b0#.

    • 4/1 "%02x" applies format "%02x" on 1 byte 4 times. So basically it takes one byte at a time, apply the format, and repeat this 4 times. 02 dictates the minimum number of characters to be printed. If the value to be printed is shorter than this number, the result is padded with leading 0s.

    • " " appends a space to above result.

    • /1 is equivalent to 1/1, this is to apply the format once on a single byte.

    • \n signifies the end of the format string(s). The remaining bytes will be processed starting from the 1st format string.

      The original data will be eventually formatted into,

      12b0# 01000000 04 00 f1ff 0000000000000000 0000000000000000

      Per the symbol struct, the 2nd field is st_name which is 4 bytes long(uint32_t). It has the value 01000000 which basically is 1. It is represented as little-endian, so the most significant byte is the right-most byte, and the least significant byte left-most. So the correct order to put it is: 00000001, that’s the value of 1. This 1 is an index into .strtab, that is the first character into this table.

      The 3rd field is st_info, 1 byte, which specifies the symbol’s type and binding attributes. It has a value of 04, so it is of type FILE and it binds to LOCAL.

      The 4th field is st_shndx. Every symbol table entry is “defined” in relation to some section. This member holds the relevant section header table index. So basically it tells which section this symbol belongs to.

      The 5th field is st_value, that is the address of the symbol in the binary file. If we look at line 27 (28th) line in Figure 2, the deregister_tm_clones symbol is located at address 560, that is the offet where it can found in the file.

Conclusion

As you get to know more and more about hexdump, you will find its power is not only limited to dealing with ELF binaries, but also in literally any tasks relating to binary files, for instance, extracting partition table from a disk dump.

I hope you will find this article helpful. Any comment is welcomed via email (before the comment function is sorted out in my blog).

GL & HF!

References

http://man7.org/linux/man-pages/man5/elf.5.html

http://man7.org/linux/man-pages/man1/hexdump.1.html