Bitcoin uses a custom format to store peer information. Although the inbuilt JSON-RPC provides a helpful getpeerinfo
method to list your active connections, it offers no method to query, dump, or otherwise access the information in peers.dat, which contains far more than just your active connections. Having access to the information in this file can be helpful for a number of reasons, such as finding out information about the network and finding more nodes than just your connections to broadcast transactions to.
My interest in this was piqued by this post on the Bitcoin StackExchange. This blog post is an attempt to answer that question (and cover some gaps in my personal crypto tools) by building a utility that can read and query peers.dat. The post will go step by step, as I am writing this while building the utility.
Research
To work with a minimal example, I deleted my existing peers.dat and ran bitcoind
again. This gives me a much lighter file (11 KB), instead of a 4MB+ one that contained peer info over many months. I then open up the peers.dat file in sublime. This gives us a bunch of hex, no surprises there. When dealing with hex, hexdump
is step 1, so let's see what we get:
This is a truncated output, as the file continues much the same way.
The hex immediately gives us some clues. The first four bytes (f9 be b4 d9
) stand out as the message start string, which is something you see very often when working with bitcoin's network level implementations. Based on some past projects with Bitcoin, these bytes are ingrained into my head and stood out immediately.
The next set of bytes after the message start doesn't offer any immediate insight, so let's skip ahead for the moment. The next thing that does stand out is a repeated pattern of ff ff
, followed by four bytes, followed by 20 8d
. The first two don't seem to give out any information (ff ff
would be an unlikely candidate for magic bytes, so is unlikely to be a marker in the file). However, the next two (20 8d
) do give us some useful information. When converted to decimal, we get 8333
, which is the default port number for bitcoin.
This new piece of information updates our previous pattern to ff ff
+ four bytes + port number (8333
). IPv4 addresses use a 32-bit address space, which is four bytes. It stands to reason that the four bytes before the port number are an IP address, considering this file is meant to hold peer info.
This is easily verified by taking the first instance of this pattern's middle four bytes (05 09 8b 05
) and converting them to an IP address. I wrote a quick Go script for this, as it is an operation I will likely be doing in the utility as well. The script gives us 5.9.139.5
, which seems like a plausible IP address. Running the script with a few other bytes from other patterns also produces valid IPs, so I'm pretty sure at this point this is the correct interpretation.
This seems like all the information I'm going to get just by looking at the hex above, so the next step is to look at the hexdump for the bottom of the file:
Nothing really sticks out except the last few bytes. Counting these shows that there is 32 bytes of continous non-zero data. This is very likely a hash, and being at the very end of the file, it's almost certainly a checksum
With the initial hex analysis not providing enough information to completely decode the file, it's time to head to the Bitcoin source code. Some quick searches on GitHub reveal that peers.dat is managed by addrdb.cpp
Browsing through addrdb.cpp shows that the initial assumptions about the message start and checksum hash were correct:
This snippet shows that the checksum algorithm is the same as the regular Bitcoin hashing algorithm (since it relies on hash.h), which is a double-sha256. We can verify this quickly with bash and openssl:
We strip the last 32 bytes from the file (which contain the checksum), and pass the rest to openssl for hashing twice. Do note that this requires a GNU version of head
, as OS X will complain about a negative byte number. We can see that the result matches what we see in the hexdump, so that's definitely the checksum.
Now all that's needed is to determine the data format, and the rest of the header following the message start bytes. Switching from addrdb.cpp to addrdb.h leads us to a helpful comment that lays out the serialized header format for us.
Lastly, some more GitHub search magic leads to chainparamsseeds.h. This states that:
Each line contains a 16-byte IPv6 address and a port.
IPv4 as well as onion addresses are wrapped inside an IPv6 address accordingly.
This explains the whole lot of 00
in the hexdump preceding the IP, as well as the ff ff
. Since the majority of connections still default to IPv4, what we are seeing is IPv4 addresses encoded as IPv6 addresses, which follows the format ::ffff:IPv4-address
, which is where the ff ff
comes from.
File Structure
Based on the above research, a flow of how peers.dat is generated and structured can be put together.
The snippet from addrdb.cpp
shared above is the starting point for the creation of peers.dat
. This snippet gave us the base structure as
The file header consists of:
- A 4-byte magic value which is the message start defined in chainparams.cpp.
- 1 version byte, always
0x01
- 1-byte defining the key length for nkey, always
0x20
(32 in decimal) - Key length number of bytes that specify the nkey,
0x20
from above - 4 bytes specifying how many new addresses there are
- 4 bytes specifying how many tried addresses there are
- 4 bytes specifying how many buckets there are (XOR'd against
2**30
)
In all, this gives us a 50-byte header for peers.dat, which can be encapsulated in the following Go struct:
This expands our structure to:
The data part of the file is simply the peer info structure repeated NNew + NTried
times (i.e. once per peer). The structure of this can be pieced together from three files. CAddrInfo
from addrman.h encapsulates the information regarding a single peer as follows:
We will try to keep our Go code in sync with Bitcoin Core's code to allow for easier updates in the event of changes.
The PeersDB
struct is updated to include CAddrInfo
slices for New and Tried peers, becoming:
The CAddrInfo
struct simply mimics the data serlialized in it:
Looking into the code for CAddress
in protocol.h, we can assemble a CAddress
struct as follows:
Finally, CService
is found in netaddress.h, and converts to a very simple struct:
These simple structs, chained together, give us an overall structure for peers.dat
. In a comprehensive tree, it looks something like:
With the structure of the file all figured out, we can move on to writing the utility to parse it. At this point, the utility itself is quite simple - simply start from the beginning, read the header, loop until NNew + NTried
entries of CAddrInfo
have been read, and then dump the output in the format requested. Instead of putting the boring code in this post, I invite you to visit the bitpeers repo.
That said, there is still a chunk of the file after the last tried address before the checksum that is not decoded. As far as I can tell, it does not contain any peer information. Instead, I suspect it is some kind of per-peer/per-bucket integrity check, preventing an attacker from changing all the peers to a set they control. I may work on decoding that in the future, and will update this post if I do.
Using bitpeers
Installing bitpeers
is quite simple, provided you have go installed, and your GOPATH
set up. If not, go has a handy getting started guide which you can use to fix that.
Once go is set up, simply run
If your go environment is set up properly, you should now have a bitpeers
command available. If not, try finding your GOBIN
(GOPATH/bin
) and adding it to your PATH
.
bitpeers
allows you to easily dump peers.dat
addresses as either human-readable plaintext or JSON. It accepts three flags:
Running bitpeers --filepath /mnt/doge/.dogecoin/peers.dat --addressonly
will produce a JSON array of all the IPs and ports in peers.dat
. You can also pass the --format text
option to produce a list of all IPs and ports, one IP:port per line.
Running without the --addressonly
option will produce the full JSON/text output, which contains the following:
If you happen to find any inconsistencies or issues with the parser or output, please open an issue on the GitHub repo.