Demystifying Bitcoin's peers.dat

Bitcoin uses a custom format to store peer information. Although the inbuilt JSON-RPC provides a helpful getpeerinfo method to list your active connections, it offers no method to query, dump, or otherwise access the information in peers.dat, which contains far more than just your active connections. Having access to the information in this file can be helpful for a number of reasons, such as finding out information about the network and finding more nodes than just your connections to broadcast transactions to.

My interest in this was piqued by this post on the Bitcoin StackExchange. This blog post is an attempt to answer that question (and cover some gaps in my personal crypto tools) by building a utility that can read and query peers.dat. The post will go step by step, as I am writing this while building the utility.

Research

To work with a minimal example, I deleted my existing peers.dat and ran bitcoind again. This gives me a much lighter file (11 KB), instead of a 4MB+ one that contained peer info over many months. I then open up the peers.dat file in sublime. This gives us a bunch of hex, no surprises there. When dealing with hex, hexdump is step 1, so let's see what we get:

$ hexdump -C peers.dat | head -n 17
00000000  f9 be b4 d9 01 20 91 41  99 be 39 46 d6 2c 9f b3  |..... .A..9F.,..|
00000010  e6 80 ef db 3c 1d 64 52  7c 18 c4 3e 0b eb 23 9b  |....<.dR|..>..#.|
00000020  59 46 e3 79 40 f8 6b 00  00 00 01 00 00 00 00 04  |YF.y@.k.........|
00000030  00 40 34 fc 01 00 85 55  f7 5a 09 00 00 00 00 00  |.@4....U.Z......|
00000040  00 00 00 00 00 00 00 00  00 00 00 00 ff ff 05 09  |................|
00000050  8b 05 20 8d 00 00 00 00  00 00 00 00 00 00 ff ff  |.. .............|
00000060  2d 20 82 13 00 00 00 00  00 00 00 00 00 00 00 00  |- ..............|
00000070  34 fc 01 00 08 c2 f9 5a  09 00 00 00 00 00 00 00  |4......Z........|
00000080  00 00 00 00 00 00 00 00  00 00 ff ff 31 49 ae 91  |............1I..|
00000090  20 8d 00 00 00 00 00 00  00 00 00 00 ff ff 2d 20  | .............- |
000000a0  82 13 00 00 00 00 00 00  00 00 00 00 00 00 34 fc  |..............4.|
000000b0  01 00 24 34 f8 5a 09 00  00 00 00 00 00 00 00 00  |..$4.Z..........|
000000c0  00 00 00 00 00 00 00 00  ff ff 25 23 b7 0a 20 8d  |..........%#.. .|
000000d0  00 00 00 00 00 00 00 00  00 00 ff ff 2d 20 82 13  |............- ..|
000000e0  00 00 00 00 00 00 00 00  00 00 00 00 34 fc 01 00  |............4...|
000000f0  b7 84 fa 5a 09 00 00 00  00 00 00 00 00 00 00 00  |...Z............|
00000100  00 00 00 00 00 00 ff ff  b4 6b 56 4d 20 8d 00 00  |.........kVM ...|

This is a truncated output, as the file continues much the same way.

The hex immediately gives us some clues. The first four bytes (f9 be b4 d9) stand out as the message start string, which is something you see very often when working with bitcoin's network level implementations. Based on some past projects with Bitcoin, these bytes are ingrained into my head and stood out immediately.

The next set of bytes after the message start doesn't offer any immediate insight, so let's skip ahead for the moment. The next thing that does stand out is a repeated pattern of ff ff, followed by four bytes, followed by 20 8d. The first two don't seem to give out any information (ff ff would be an unlikely candidate for magic bytes, so is unlikely to be a marker in the file). However, the next two (20 8d) do give us some useful information. When converted to decimal, we get 8333, which is the default port number for bitcoin.

This new piece of information updates our previous pattern to ff ff + four bytes + port number (8333). IPv4 addresses use a 32-bit address space, which is four bytes. It stands to reason that the four bytes before the port number are an IP address, considering this file is meant to hold peer info.

This is easily verified by taking the first instance of this pattern's middle four bytes (05 09 8b 05) and converting them to an IP address. I wrote a quick Go script for this, as it is an operation I will likely be doing in the utility as well. The script gives us 5.9.139.5, which seems like a plausible IP address. Running the script with a few other bytes from other patterns also produces valid IPs, so I'm pretty sure at this point this is the correct interpretation.

This seems like all the information I'm going to get just by looking at the hex above, so the next step is to look at the hexdump for the bottom of the file:

$ hexdump -C -v peers.dat | tail -n 17
00002b30  00 00 4f 00 00 00 00 00  00 00 00 00 00 00 00 00  |..O.............|
00002b40  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00002b50  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00002b60  00 00 00 00 00 00 02 00  00 00 53 00 00 00 61 00  |..........S...a.|
00002b70  00 00 00 00 00 00 00 00  00 00 01 00 00 00 21 00  |..............!.|
00002b80  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00002b90  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00002ba0  00 00 00 00 00 00 01 00  00 00 69 00 00 00 00 00  |..........i.....|
00002bb0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00002bc0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00002bd0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00002be0  00 00 00 00 00 00 01 00  00 00 2c 00 00 00 00 00  |..........,.....|
00002bf0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00002c00  00 00 00 00 00 00 86 be  3d 3b b8 dd 01 93 f1 79  |........=;.....y|
00002c10  f9 c9 d1 ff f5 d4 cb 38  4d ab 56 25 62 d0 c8 d7  |.......8M.V%b...|
00002c20  d8 82 c1 4c c1 2c                                 |...L.,|

Nothing really sticks out except the last few bytes. Counting these shows that there is 32 bytes of continous non-zero data. This is very likely a hash, and being at the very end of the file, it's almost certainly a checksum

With the initial hex analysis not providing enough information to completely decode the file, it's time to head to the Bitcoin source code. Some quick searches on GitHub reveal that peers.dat is managed by addrdb.cpp

Browsing through addrdb.cpp shows that the initial assumptions about the message start and checksum hash were correct:

bool SerializeDB(Stream& stream, const Data& data)
{
    ...
        CHashWriter hasher(SER_DISK, CLIENT_VERSION);
        stream << Params().MessageStart() << data;
        hasher << Params().MessageStart() << data;
        stream << hasher.GetHash();
    ...
}

This snippet shows that the checksum algorithm is the same as the regular Bitcoin hashing algorithm (since it relies on hash.h), which is a double-sha256. We can verify this quickly with bash and openssl:

$ head -c -32 peers.dat | openssl dgst -sha256 -binary | openssl dgst -sha256
(stdin)= 86be3d3bb8dd0193f179f9c9d1fff5d4cb384dab562562d0c8d7d882c14cc12c

We strip the last 32 bytes from the file (which contain the checksum), and pass the rest to openssl for hashing twice. Do note that this requires a GNU version of head, as OS X will complain about a negative byte number. We can see that the result matches what we see in the hexdump, so that's definitely the checksum.

Now all that's needed is to determine the data format, and the rest of the header following the message start bytes. Switching from addrdb.cpp to addrdb.h leads us to a helpful comment that lays out the serialized header format for us.

Lastly, some more GitHub search magic leads to chainparamsseeds.h. This states that:

Each line contains a 16-byte IPv6 address and a port.
IPv4 as well as onion addresses are wrapped inside an IPv6 address accordingly.

This explains the whole lot of 00 in the hexdump preceding the IP, as well as the ff ff. Since the majority of connections still default to IPv4, what we are seeing is IPv4 addresses encoded as IPv6 addresses, which follows the format ::ffff:IPv4-address, which is where the ff ff comes from.

File Structure

Based on the above research, a flow of how peers.dat is generated and structured can be put together.

The snippet from addrdb.cpp shared above is the starting point for the creation of peers.dat. This snippet gave us the base structure as

peers.dat
├── header
├── data
└── checksum

The file header consists of:

A 4-byte magic value which is the message start defined in chainparams.cpp.
1 version byte, always 0x01
1-byte defining the key length for nkey, always 0x20 (32 in decimal)
Key length number of bytes that specify the nkey, 0x20 from above
4 bytes specifying how many new addresses there are
4 bytes specifying how many tried addresses there are
4 bytes specifying how many buckets there are (XOR'd against 2**30)

In all, this gives us a 50-byte header for peers.dat, which can be encapsulated in the following Go struct:

type PeersDB struct {
	Path         string
	MessageBytes []byte // 0  : 4
	Version      uint8  // 4  : 4
	KeySize      uint8  // 5  : 5
	NKey         []byte // 37 : 32
	NNew         uint32 // 41 : 4
	NTried       uint32 // 45 : 4
	NewBuckets   uint32 // 49 : 4
}

This expands our structure to:

peers.dat
└── header
    ├── MessageBytes
    ├── Version
    ├── KeySize
    ├── NKey
    ├── NNew
    ├── NTried
    └── NewBuckets
├── data
├── checksum

The data part of the file is simply the peer info structure repeated NNew + NTried times (i.e. once per peer). The structure of this can be pieced together from three files. CAddrInfo from addrman.h encapsulates the information regarding a single peer as follows:

inline void SerializationOp(Stream& s, Operation ser_action) {
    READWRITEAS(CAddress, *this);
    READWRITE(source);
    READWRITE(nLastSuccess);
    READWRITE(nAttempts);
}

We will try to keep our Go code in sync with Bitcoin Core's code to allow for easier updates in the event of changes.

The PeersDB struct is updated to include CAddrInfo slices for New and Tried peers, becoming:

type PeersDB struct {
	Path          string
	MessageBytes  []byte // 0  : 4
	Version       uint8  // 4  : 4
	KeySize       uint8  // 5  : 5
	NKey          []byte // 37 : 32
	NNew          uint32 // 41 : 4
	NTried        uint32 // 45 : 4
	NewBuckets    uint32 // 49 : 4
	NewAddrInfo   []CAddrInfo
	TriedAddrInfo []CAddrInfo
}

The CAddrInfo struct simply mimics the data serlialized in it:

type CAddrInfo struct {
	Address     CAddress
	Source      []byte
	LastSuccess uint64
	Attempts    uint32
}

Looking into the code for CAddress in protocol.h, we can assemble a CAddress struct as follows:

type CAddress struct {
	SerializationVersion []byte
	Time                 uint32
	ServiceFlags         []byte
	PeerAddress          CService
}

Finally, CService is found in netaddress.h, and converts to a very simple struct:

type CService struct {
	IPAddress []byte
	Port      uint16 // This is serialized as BigEndian
}

These simple structs, chained together, give us an overall structure for peers.dat. In a comprehensive tree, it looks something like:

peers.dat
└── header
    ├── MessageBytes
    ├── Version
    ├── KeySize
    ├── NKey
    ├── NNew
    ├── NTried
    └── NewBuckets
└── data
    └── repeated
        └── CAddrInfo
            ├── CAddress
                ├── SerializationVersion
                ├── Time
                ├── ServiceFlags
                ├── CService
                    ├── IPAddress
                    └── Port
            ├── Source
            ├── LastSuccess
            └── Attempts
└── checksum

With the structure of the file all figured out, we can move on to writing the utility to parse it. At this point, the utility itself is quite simple - simply start from the beginning, read the header, loop until NNew + NTried entries of CAddrInfo have been read, and then dump the output in the format requested. Instead of putting the boring code in this post, I invite you to visit the bitpeers repo.

That said, there is still a chunk of the file after the last tried address before the checksum that is not decoded. As far as I can tell, it does not contain any peer information. Instead, I suspect it is some kind of per-peer/per-bucket integrity check, preventing an attacker from changing all the peers to a set they control. I may work on decoding that in the future, and will update this post if I do.

Using bitpeers

Installing bitpeers is quite simple, provided you have go installed, and your GOPATH set up. If not, go has a handy getting started guide which you can use to fix that.

Once go is set up, simply run

go get -u github.com/RaghavSood/bitpeers/cmd/bitpeers

If your go environment is set up properly, you should now have a bitpeers command available. If not, try finding your GOBIN (GOPATH/bin) and adding it to your PATH.

bitpeers allows you to easily dump peers.dat addresses as either human-readable plaintext or JSON. It accepts three flags:

Usage of bitpeers:
      --addressonly       outputs only addresses if specified
      --filepath string   the path to peers.dat
      --format string     the output format {json|text} (default "json")

Running bitpeers --filepath /mnt/doge/.dogecoin/peers.dat --addressonly will produce a JSON array of all the IPs and ports in peers.dat. You can also pass the --format text option to produce a list of all IPs and ports, one IP:port per line.

Running without the --addressonly option will produce the full JSON/text output, which contains the following:

$ bitpeers --filepath ./peers.dat --format text
SerializationVersion: 34fc0100
Time: 1526192792
ServiceFlags: 0x000000000000000d
IP: 42.5.143.180:8333
Source: 8.8.8.8
LastSuccess: 1526746622
Attempts: 0

If you happen to find any inconsistencies or issues with the parser or output, please open an issue on the GitHub repo.

Demystifying Bitcoin's peers.dat

Research

File Structure

Using bitpeers

Related Posts