Bitcoin uses a custom format to store peer information. Although the inbuilt JSON-RPC provides a helpful
getpeerinfo method to list your active connections, it offers no method to query, dump, or otherwise access the information in peers.dat, which contains far more than just your active connections. Having access to the information in this file can be helpful for a number of reasons, such as finding out information about the network and finding more nodes than just your connections to broadcast transactions to.
My interest in this was piqued by this post on the Bitcoin StackExchange. This blog post is an attempt to answer that question (and cover some gaps in my personal crypto tools) by building a utility that can read and query peers.dat. The post will go step by step, as I am writing this while building the utility.
To work with a minimal example, I deleted my existing peers.dat and ran
bitcoind again. This gives me a much lighter file (11 KB), instead of a 4MB+ one that contained peer info over many months. I then open up the peers.dat file in sublime. This gives us a bunch of hex, no surprises there. When dealing with hex,
hexdump is step 1, so let’s see what we get:
This is a truncated output, as the file continues much the same way.
The hex immediately gives us some clues. The first four bytes (
f9 be b4 d9) stand out as the message start string, which is something you see very often when working with bitcoin’s network level implementations. Based on some past projects with Bitcoin, these bytes are ingrained into my head and stood out immediately.
The next set of bytes after the message start doesn’t offer any immediate insight, so let’s skip ahead for the moment. The next thing that does stand out is a repeated pattern of
ff ff, followed by four bytes, followed by
20 8d. The first two don’t seem to give out any information (
ff ff would be an unlikely candidate for magic bytes, so is unlikely to be a marker in the file). However, the next two (
20 8d) do give us some useful information. When converted to decimal, we get
8333, which is the default port number for bitcoin.
This new piece of information updates our previous pattern to
ff ff + four bytes + port number (
8333). IPv4 addresses use a 32-bit address space, which is four bytes. It stands to reason that the four bytes before the port number are an IP address, considering this file is meant to hold peer info.
This is easily verified by taking the first instance of this pattern’s middle four bytes (
05 09 8b 05) and converting them to an IP address. I wrote a quick Go script for this, as it is an operation I will likely be doing in the utility as well. The script gives us
126.96.36.199, which seems like a plausible IP address. Running the script with a few other bytes from other patterns also produces valid IPs, so I’m pretty sure at this point this is the correct interpretation.
This seems like all the information I’m going to get just by looking at the hex above, so the next step is to look at the hexdump for the bottom of the file:
Nothing really sticks out except the last few bytes. Counting these shows that there is 32 bytes of continous non-zero data. This is very likely a hash, and being at the very end of the file, it’s almost certainly a checksum
With the initial hex analysis not providing enough information to completely decode the file, it’s time to head to the Bitcoin source code. Some quick searches on GitHub reveal that peers.dat is managed by addrdb.cpp
Browsing through addrdb.cpp shows that the initial assumptions about the message start and checksum hash were correct:
This snippet shows that the checksum algorithm is the same as the regular Bitcoin hashing algorithm (since it relies on hash.h), which is a double-sha256. We can verify this quickly with bash and openssl:
We strip the last 32 bytes from the file (which contain the checksum), and pass the rest to openssl for hashing twice. Do note that this requires a GNU version of
head, as OS X will complain about a negative byte number. We can see that the result matches what we see in the hexdump, so that’s definitely the checksum.
Now all that’s needed is to determine the data format, and the rest of the header following the message start bytes. Switching from addrdb.cpp to addrdb.h leads us to a helpful comment that lays out the serialized header format for us.
Lastly, some more GitHub search magic leads to chainparamsseeds.h. This states that:
Each line contains a 16-byte IPv6 address and a port.
IPv4 as well as onion addresses are wrapped inside an IPv6 address accordingly.
This explains the whole lot of
00 in the hexdump preceding the IP, as well as the
ff ff. Since the majority of connections still default to IPv4, what we are seeing is IPv4 addresses encoded as IPv6 addresses, which follows the format
::ffff:IPv4-address, which is where the
ff ff comes from.
Based on the above research, a flow of how peers.dat is generated and structured can be put together.
The snippet from
addrdb.cpp shared above is the starting point for the creation of
peers.dat. This snippet gave us the base structure as
The file header consists of:
0x20(32 in decimal)
In all, this gives us a 50-byte header for peers.dat, which can be encapsulated in the following Go struct:
This expands our structure to:
The data part of the file is simply the peer info structure repeated
NNew + NTried times (i.e. once per peer). The structure of this can be pieced together from three files.
CAddrInfo from addrman.h encapsulates the information regarding a single peer as follows:
We will try to keep our Go code in sync with Bitcoin Core’s code to allow for easier updates in the event of changes.
PeersDB struct is updated to include
CAddrInfo slices for New and Tried peers, becoming:
CAddrInfo struct simply mimics the data serlialized in it:
Looking into the code for
CAddress in protocol.h, we can assemble a
CAddress struct as follows:
CService is found in netaddress.h, and converts to a very simple struct:
These simple structs, chained together, give us an overall structure for
peers.dat. In a comprehensive tree, it looks something like:
With the structure of the file all figured out, we can move on to writing the utility to parse it. At this point, the utility itself is quite simple - simply start from the beginning, read the header, loop until
NNew + NTried entries of
CAddrInfo have been read, and then dump the output in the format requested. Instead of putting the boring code in this post, I invite you to visit the bitpeers repo.
That said, there is still a chunk of the file after the last tried address before the checksum that is not decoded. As far as I can tell, it does not contain any peer information. Instead, I suspect it is some kind of per-peer/per-bucket integrity check, preventing an attacker from changing all the peers to a set they control. I may work on decoding that in the future, and will update this post if I do.
bitpeers is quite simple, provided you have go installed, and your
GOPATH set up. If not, go has a handy getting started guide which you can use to fix that.
Once go is set up, simply run
If your go environment is set up properly, you should now have a
bitpeers command available. If not, try finding your
GOPATH/bin) and adding it to your
bitpeers allows you to easily dump
peers.dat addresses as either human-readable plaintext or JSON. It accepts three flags:
bitpeers --filepath /mnt/doge/.dogecoin/peers.dat --addressonly will produce a JSON array of all the IPs and ports in
peers.dat. You can also pass the
--format text option to produce a list of all IPs and ports, one IP:port per line.
Running without the
--addressonly option will produce the full JSON/text output, which contains the following:
If you happen to find any inconsistencies or issues with the parser or output, please open an issue on the GitHub repo.