[1]
tl ; dr:What is the complete path between visiting thepiratebay and sublimating an mp3 file from thin air? In this post, we’ll implement enough of the BitTorrent protocol to download Debian. Look at theSource codeor skip to thelast bit. [byteIndex] BitTorrent is a protocol for downloading and distributing files across the Internet. In contrast with the traditional client / server relationship, in which downloaders connect to a central server (for example: watching a movie on Netflix, or loading the web page you’re reading now), participants in the BitTorrent network, called peers
download pieces of files fromeach other – this is what makes it a [
1] peer-to-peer
protocol. We’ll investigate how this works, and build our own client that can find peers and exchange data between them.
The protocol evolved organically over the past 20 years, and various people and organizations added extensions for features like encryption, private torrents, and new ways of finding peers. We’ll be implementing theoriginal spec from 2020 to keep this a weekend-sized project.
I’ll be using a (Debian ISOfile as my guinea pig because it’s big, but not huge, at 686 MB. As a popular Linux distribution, there will be lots of fast and cooperative peers for us to connect to. And we’ll avoid the legal and ethical issues related to downloading pirated content.
Here’s a problem: we want to download a file with BitTorrent, but it’s a peer-to-peer protocol and we have no idea where to find peers to download it from. This is a lot like moving to a new city and trying to make friends — maybe we’ll hit up a local pub or a meetup group! Centralized locations like these are the big idea behind
**********************************
Of course, these central servers are liable to get raided by the feds if they facilitate peers exchanging illegal content. You may remember reading about trackers like TorrentSpy, Popcorn Time, and KickassTorrents getting seized and shut down. New methods cut out the middleman by making even
DHT (********************, [
magnet links
.
A .torrent file describes the contents of a torrentable file and information for connecting to a tracker. It’s all we need in order to kickstart the process of downloading a torrent. Debian’s .torrent file looks like this: (************************************** d8: announce (***********************************************************************************************************************************************************************: http: //bttracker.debian.org: / announce7 : comment (***********************************************************************************************************************************************************************: “Debian CD from cdimage.debian.org” (**********************************************************************************************************************************************************************************: creation datei e9: httpseedsl [
That mess is encoded in a format called (Bencode) ************************ (pronounced) ***************************** (bee-encode) ), and we'll need to decode it.
Bencode can encode roughly the same types of structures as JSON — strings, integers, lists, and dictionaries. Bencoded data is not as human-readable / writable as JSON, but it can efficiently handle binary data and it’s really simple to parse from a stream. Strings come with a length prefix, and look like
4: spam. Integers go between
startand
markers, so (7) would encode to [0:4] ****************************** (i7e) ******************************************. Lists and dictionaries work in a similar way: (l4: spami7ee) represents
['spam', 7], whiled4: spami7ee means {spam: 7} (********************************************.In a prettier format, our .torrent file looks like this: [
1] (****************************************** (d) ************************************ 8 [1:] ****************************: ***************************** (announce) [1pstrlen820:] : http://bttracker.debian.org: / announce 7 [1:] ****************************: ******************************** (comment) [1pstrlen820:] : “Debian CD from cdimage.debian.org” [1pstrlen820:] ************************************************************************************************************************************************************************:creation date i (************************************ (************************************
e 4 [1:] ****************************: ****************************** (info) d [1:] **************************** 6 [1:] ****************************: ****************************** (length) i (************************************ (************************************ (e) 4 [1:] ****************************: ****************************** (name) [1pstrlen820:]debian – (****************************************************************************************************************************************************************************************************************. 2.0-amd – netinst.iso [1pstrlen820:] ************************************************************************************************************************************************************************: piece length i (************************************ (************************************ (e) 6 [1:] ****************************: ****************************** (pieces) [1pstrlen820:]: ()) binary blob of the hashes of each piece) e [1:] **************************** e [1:] **************************** (**********************************************
In this file, we can spot the URL of the tracker, the creation date (as a Unix timestamp), the name and size of the file, and a big binary blob containing the SHA-1 hashes of each
piece (*************************, which are equally-sized parts of the file we want to download. The exact size of a piece varies between torrents, but they are usually somewhere between 350 KB and 1MB. This means that a large file might be made up of thousands of pieces. We’ll download these pieces from our peers, check them against the hashes from our torrent file, assemble them together, and boom, we’ve got a file!
******************************************
This mechanism allows us to verify the integrity of each piece as we go. It makes BitTorrent resistant to accidental corruption or intentional
torrent poisoning. Unless an attacker is capable of breaking SHA-1 with a preimage attack, we will get exactly the content we asked for.
It would be really fun to write a bencode parser, but parsing isn’t our focus today. But I found Fredrik Lundh’s(line parser) to be especially illuminating. For this project, I usedgithub.com/jackpal/bencode-go(*************************: (*************************** [
1] [1] (************************************************ (import(************************************** “github.com/jackpal/bencode-go”(**************************************type (************************************ (bencodeInfo) (struct [1:] ************************ ({************************************** Pieces (********************************** (string) bencode: pieces ” PieceLength [1:] **************************** (int) bencode: piece length ” Length (********************************** (int) bencode: length ” Name (********************************** (string) bencode: name ”} (**************************************type (************************************ (bencodeTorrent) (struct [1:] ************************ ({************************************** Announce [1:] **************************** (string) bencode: announce “ Info (************************************ (bencodeInfo) bencode: info ”} (**************************************// Open parses a torrent file [1:] **************************func (Open) ************************************* (
(r) ************************************* (io) ************************************ (**************************************************************** (Reader) ************************************
)[1:] **************************** [‘spam’, 7] bencodeTorrent
,(error) ************************************){{************************************) bto (************************************:=bencodeTorrent) ************************************* {} err [1:] **************************** (************************************:=bencode) *****************************************************Unmarshal*********************************************** (r) (****************************, **************************************
&
bto) if (********************************** (err)!=(************************************ (nil) ************************************{ return (********************************** (nil), [1:] ************************ (err) ************************************** } return&bto [1:] ,
nil} (******************************************** byteIndex] ******************************************
********************************************** view in context [
Because I like to keep my structures relatively flat, and I like to keep my application structs separate from my serialization structs, I exported a different, flatter struct named
TorrentFileand wrote a few helper functions to convert between the two.Notably, I split (pieces) ****************************************** (previously a string) into a slice of hashes (each [20] byte [
4] so that I can easily access individual hashes later. I also computed the SHA-1 hash of the entire bencodedinfodict (the one which contains the name, size, and piece hashes). We know this as the4] [1] [1] (************************************************ (type(TorrentFile) ************************************* (struct) {{**************************************** Announce [1:] **************************** (string) [1pstrlen820:] ************************ (InfoHash) ************************************[20]byte [1:] **************************** [1pstrlen820:] ************************ (PieceHashes) ************************************[] [20]byte (************************************** PieceLength [1:] **************************** (int) Length (********************************** (int) Name (********************************** (string)} (**************************************func (********************************** ()infohashand it uniquely identifies files when we talk to trackers and peers. More on this later.
************************************************ [
bto [1:] ************************ [‘spam’, 7] ************************************
bencodeTorrent) (************************************ (toTorrentFile)() (******************************** () * (****************************, ************************************ (TorrentFile) ************************************,*********************************** (error)) [1:] ************************ ({************************************** … [1:] **************************}(************************************************ (*********************************************************************************************** view in context [
4] ****************************************** Retrieving peers from the trackerNow that we have information about the file and its tracker, let’s talk to the tracker to
announce [1] [1] (************************************************ (func) ********************************************************************** (********************************** (t) (********************************** (TorrentFile) ************************************our presence as a peer and to retrieve a list of other peers. We just need to make a GET request to the
announce
URL supplied in the .torrent file, with a few query parameters:)buildTrackerURL((peerID) **************************************************** [2] byte (**********************************,port [
1:] ************************** (uint)(************************************ ()string, [2] error) { base [1:] ****************************,err [1:] ************************:=url(***********************************. **********************************
************ (Parse) (( (******************************** (t) ************************************
. (Announce) ************************************) if (********************************** (err)!=(************************************ (nil) ************************************{ return (********************************** () “************************************), (*********************************** (err) } params (************************************:=url (**************************************************Values *********************************** {************************************ “info_hash” (**********************************:[]************************************** (string) ************************************ ({string (********************************** ()(t) ************************************ [1:] **************************InfoHash********************************** [:])}, “peer_id” (**********************************:
[]************************************** (string) ************************************ ({string (********************************** ()(peerID) ************************************ [1:] ************************** [offset4:offset6]}, “port” (**********************************:
[]************************************** (string) ************************************ ({(strconv) **************************************
itoa) ************************************ [
1:] ************************** ([byteIndex] ****************int(************************************ (Port))}, “uploaded” (**********************************:[]************************************** (string) ************************************ ({“0”
}, “downloaded” (**********************************:
[]************************************** (string) ************************************ ({“0”
}, “compact” (**********************************:
[]************************************** (string) ************************************ ({(1)
}, “left” (**********************************:
[]************************************** (string) ************************************ ({(strconv) **************************************
itoa) ************************************ [
1:] ************************** ([byteIndex] ****************t(************************************. ********************************** (Length)), } base [1:] ****************************(RawQuery) ************************************ [1:] ************************==********************************
params************************************. ********************************** (Encode)() return (********************************** (base)[1:] ************************ (String) ************************************ (),(nil) *************************************} (******************************************** ******************************************
************************************************** view in context
The important ones:[
1] info_hash: Identifies thefile
we’re trying to download. It’s the infohash we calculated earlier from the bencoded (info) dict. The tracker will use this to figure out which peers to show us.
1pstrlen820:] ******************************************************** () ******************************** (Parsing the tracker response)
peer_id (**************************: A byte name to identify (ourselves) to trackers and peers. We’ll just generate 25 random bytes for this. Real BitTorrent clients have IDs like
- TR - k8hj0wgej6chwhich identify the client software and version — in this case, TR (stands for Transmission client 2.) *********************************************************************************************************************************************************. ****************************************************** [
We get back a bencoded response:
[1] (****************************************** (d) ************************************ 8 [1:] ****************************: ****************************** (interval) i (************************************ (************************************* e 5 [1:] ****************************: ***************************** (peers) [1pstrlen820:] ****************************************************************************************************************************** (**************************************:(another long binary blob)e [1:] **************************** (******************************************** (************************************************************************************ (Interval) tells us how often we’re supposed to connect to the tracker again to refresh our list of peers. A value of means we should reconnect every (minutes) 1195 seconds).
************************************** (Peers) ******************************************** is another long binary blob containing the IP addresses of each peer. It’s made out of
groups of six bytes. The first four bytes in each group represent the peer’s IP address — each byte represents a number in the IP. The last two bytes represent the port, as a big-endian (uint) ****************************************************************************************************************************************************************************** [
1:] **********************************.Big-endian, ornetwork order, means that we can interpret a group of bytes as an integer by just squishing them together left to right. For example, the bytes (0x1A (0xE1) ******************************************** make (0x1AE1) , or (in decimal.) *Interpreting the same bytes. in
little-endian
order would make 0xE A=
******************************************************** [1:] **********************[1] [1] (************************************************// Peer encodes connection information for a peer [1:] ************************** (************************* (typePeer ************************************** (struct) {**************************************** IP [1:] **************************** (net)[1:] ********************** (IP) ************************************** Port [1:] **************************** (uint) ******************************************************************************************************************************************************************************} (**************************************// Unmarshal parses peer IP addresses and ports from a buffer [1:] **************************func
(Unmarshal) ************************************* (
(peersBin) ************************************* [] [byteIndex] ****************byte) (************************************ ([](Peer. [1:] **************************,(error) *************************************{{************************************** const (********************************** (peerSize)=[1:] ************************ (6) ************************************
// 4 for IP, 2 for port [ 1:] **************************numPeers(**********************************:=(************************************ (len)( (********************************** (peersBin) ************************************** [2])/ peerSize if (********************************** (len) ( [1:] ************************ (peersBin) ***********************************
)(%) ************************************ (peerSize)!=(******************************** (0) ************************************ { err [1:] **************************** (************************************:=(fmt) ***************************************************** (Errorf) ************************************ (************************************“Received malformed peers”) return (********************************** (nil), [1:] ************************ (err) ************************************** } peers (************************************:=make (************************************ ([] [4] ********************** (Peer) (****************************, ************************************** (numPeers)) for (********************************** (i):=(*********************************** (0) ************************************; (i) ************************************
************ (;
i( ) **********************************{ offset [1:] **************************** (************************************:=(i) *************************************
****************peerSize peers (************************************ [i] ****************************IP 1:] **************************** peersBin [1][offset:offset4]) peers (************************************ [i] ****************************=(*********************************** (net)************************************** (IP)[
Port=(*********************************** (binary)**************************************BigEndian. [1:] ************************ (Uint)([]byte [ 1:] ************************ (**************peersBin************************************ [offset4:offset6])) return (********************************** (peers), [1:] ********************** (nil) **************************************} (******************************************** (******************************************************** Start a TCP connection with the peer. This is like starting a phone call.
******************************************************** (view in context
******************************************
Now that we have a list of peers, it’s time to connect with them and start downloading pieces! We can break down the process into a few steps. For each peer, we want to:
Complete a two-way BitTorrent
handshake (*************************. “Hello?” “Hello.” (********************************************************
Exchanging
messages (to downloadpieces.
“I’d like piece # (please. ”[1] ************** (**************************************************************** (Start a TCP connection) [1] [1] (************************************************conn**********************************, ************************************ (err)
:=[1:] ************************ (net) ************************************
. (DialTimeout) ************************************* (“tcp” (**********************************, (******************** (peer) ************************************. (String) ************************************* (),3) ************************************* (************************************ time(***********************************. ********************************** (Second))if (********************************** (err)!=(************************************ (nil) ************************************{ return (********************************** (nil), [1:] ************************ (err) **************************************} (********************************************(Complete the handshake) *****************************************
****************************************************** view in context
******************************************
I set a timeout so that I don’t waste too much time on peers that aren’t going to let me connect. For the most part, it’s a pretty standard TCP connection.
We’ve just set up a connection with a peer, but we want do a handshake to validate our assumptions that the peer (****************************can communicate using the BitTorrent protocol
is able to understand and respond to our messages
1pstrlen820:]has the file that we want, or at least knows what we’re talking about (************************************************************ [
My father told me that the secret to a good handshake is a firm grip and eye contact. The secret to a good BitTorrent handshake is that it’s made up of five parts:
(******************************************************** The length of the protocol identifier, which is always (0x) ************************************************************************************************************************************************************ in hex
The protocol identifier, called the
pstr which is always (BitTorrent protocol) ******************************************
Eight
bytes [1pstrlen820:] **************, all set to 0. We’d flip some of them to 1 to indicate that we support certainextensions. But we don’t, so we’ll keep them at 0.The
[1] (******************************************(infohash) that we calculated earlier to identify which file we want
The
(Peer ID) that we made up to identify ourselves (****************************************************************** (Put together, a handshake string might look like this:
x (************************************************ (BitTorrent protocol) x 0003 x [byteIndex] x x 01 x x ( x)************************************** x 90 xd4 xc8 x x 24 xa4 x xbe x4c x 53 xbc x5a x 12 x2c xf7 x x 80 x ( x) ****************************************************************************************************************************************************************************************** x (************************************* – TR – k8hj0wgej6ch [1:] ********************************** [1:] ********************************** (**********************************************
After we send a handshake to our peer, we should receive a handshake back in the same format. The infohash we get back should match the one we sent so that we know that we’re talking about the same file. If everything goes as planned, we’re good to go. If not, we can sever the connection because there’s something wrong.
“Hello?” “这 是 谁? 你 想要 什么?” “Okay, wow, wrong number. ” [1:]
In our code, let’s make a struct to represent a handshake, and write a few methods for serializing and reading them: (*************************** [
1] [1] (************************************************// A Handshake is a special message that a peer uses to identify itself [1:] ************************** (************************* (type(Handshake) ************************************* (struct) {**************************************** Pstr [1:]. **************************** (string) InfoHash [1:] **************************** [1] ************
byte PeerID [1:] **************************** [1] ************byte} (**************************************// Serialize serializes the handshake to a buffer [1:] **************************func (************************************ (h) (********************************** (Handshake) ************************************) (****************************, ************************************ Serialize [byteIndex] ******************()[] []. ********************************** (byte) ************************************** [2] { pstrlen [1:]. **************************** (************************************:=len) ************************************* (
**************** (h) ************************************************************. ********************************************** (Pstr)) bufLen [1:] **************************** (************************************:=
(********************************** (pstrlen) buf [1:] **************************** (************************************:=make (************************************ ([] [4] ********************** (byte) (****************************, ************************************** (bufLen)) buf [1:] **************************** [1pstrlen820:] ****************************=byte [4] ************************ (
(pstrlen) ************************************* copy (********************************** ()(buf) ************************************ [1:] ********************** [1:], ************************************ (h)
************************************* (Pstr) ************************************
) // Leave 8 reserved bytes [1:] **************************copy(************************************ (buf) (************************, ,(h) ************************************(InfoHash) ************************************* [offset4:offset6] copy (********************************** ()(buf) ************************************ [1:] ********************** [1pstrlen820:] **************************** (h) (************************************ (PeerID) ************************************[:]) return (********************************** (buf)} (**************************************// Read parses a handshake from a stream [1:] **************************func (Read (************************************ ((r) ************************************* (io) ************************************ (**************************************************************** (Reader) ************************************
)[1:] **************************** [‘spam’, 7] Handshake, [1:] ************************** (error) ************************************) (********************************** ({************************************ // Do Serialize (), but backwards [1:] **************************// … [1:] **************************}(************************************************
******************************************************** view in context
****************************************** Send and receive messages [
1]Once we’ve completed the initial handshake, we can send and receive messages [
byteIndex]. Well, not quite — if the other peer isn’t ready to accept messages, we can’t send any until they tell us they’re ready. In this state, we’re consideredchokedby the other peer. They’ll send us anunchokemessage to let us know that we can begin asking them for data. By default, we assume that we’re choked until proven otherwise. Once we’ve been unchoked, we can then begin sendingrequestsfor pieces, and they can send us messages back containing pieces .
******************************************************************** [1pstrlen820:] (Interpreting messages)
A message has a length, an [
byteIndex] ID
and apayload
On the wire, it looks like:
[
1pstrlen820:] ****************
A message starts with a length indicator which tells us how many bytes long the message will be. It’s a – bit integer, meaning it’s made out of four bytes smooshed together in big-endian order. The next byte, the
ID, tells us which type of message we’re receiving — for example, a (2) byte means “interested.” Finally, the optionalpayload
fills out the remaining length of the message. (*************************** [
1] [1] (************************************************ (type(messageID) ************************************* (uint8) const (********************************** () MsgChoke [1:] **************************** (messageID)=[1:] ********************** (0) ************************************** MsgUnchoke [1:] **************************** (messageID)=[1:] ************************ (1) ************************************** MsgInterested [1:] **************************** (messageID)=[1:] ************************ (2) ************************************** MsgNotInterested [1:] **************************** (messageID)=[1:] ************************ (3) ************************************** MsgHave [1:] **************************** (messageID)=[1:] ************************ (4) ************************************** MsgBitfield [1:] **************************** (messageID)=[1:] ************************ (5) ************************************** MsgRequest [1:] **************************** (messageID)=[1:] ************************ (6) ************************************** MsgPiece [1:] **************************** (messageID)=[1:] ************************ (7) ************************************** MsgCancel [1:] **************************** (messageID)=[1:] ************************ (8)(**************************************// Message stores ID and payload of a message************************************ (************************* (type(Message) ************************************** (struct) {**************************************** ID (********************************** (messageID) Payload [1:] ****************************[]
byte} (**************************************// Serialize serializes a message into a buffer of the form [1:] ************************** //1:] **************************** // Interprets `nil` as a keep-alive message [1:] **************************func (************************************ (m) (******************************** (Message) ************************************ [ ) (****************************, ************************************ Serialize [byteIndex] ******************()[] []. ********************************** (byte) ************************************** [2] { if (********************************** (m) ==(************************************ (nil) ************************************{ return (************************************ (make) ([](byte) ************************************, ((4) ************************************* } length (************************************:=uint)(( (*********************************** (len) ************************************(*********************************** (m) **********************************
************
Payload
1:] )// 1 for id [1:] **************************buf(**********************************:=(************************************ (make)([](byte) ************************************,********************************** (4) ************************************* ( ) length [] **********************************) (******************************** ( ) (1) ************************************ [
binary [1:] ****************************BigEndian) ************************************ [1:] ********************** [1] **************
PutUint (************************************ (********************************** (buf) ************************************ [1pstrlen820:] ****************************length) buf [1:] **************************** [1pstrlen820:] ********************************=byte [offset4:offset6] ****************************** ()(m) ************************************ [1:] ************************** (************************* (ID) **********************************) copy (********************************** ()(buf) ************************************ [1:] ********************** [4] ******************************** (m) ************************************ . (Payload) ************************************) return (********************************** (buf)} (********************************************1] [1] (************************************************// Read parses a message from a stream. Returns `nil` on keep-alive message [1:] **************************func (Read (************************************ (
view in context
******************************************
To read a message from a stream, we just follow the format of a message. We read four bytes and interpret them as a (uint) ************************************************************************************************************************************************************************ to get thelength
of the message. Then, we read that number of bytes to get the
ID ((the first byte) and the
payload (the remaining bytes).[
(r) ************************************* (io) ************************************ (**************************************************************** (Reader) ************************************
)[1:] **************************** [‘spam’, 7] (Message), [1:] ************************ (error) ************************************
)({************************************ lengthBuf [ 1:] **************************** (************************************:=make (************************************ ([] [4] ********************** (byte) (****************************, ************************************** (4)) _,err [1:] ************************:=io***********************************. ********************************************** (ReadFull) (( (******************************** (r) *************************************
,, (lengthBuf) ************************************) if (********************************** (err)!=(************************************ (nil) ************************************{ return (********************************** (nil), [1:] ************************ (err) ************************************** } length (************************************:=(binary) *****************************************************BigEndian***********************************. ********************************************** (Uint) ************************************************************************************************************************************************************************
************ ()lengthBuf
1:] **************************if(length) ************************************==(0) ************************************** ({*************************************) return (********************************** (nil), [1:] ********************** (nil) ************************************** } (************************************** messageBuf [1:] **************************** (************************************:=make (************************************ ([] [4] ********************** (byte) (****************************, ************************************** (length)) _,err [1:] ************************==********************************) // keep-alive message [
io***********************************. ********************************** (ReadFull)( (********************************* (r) , [
1:] **************************** messageBuf [1]) if (********************************** (err)!=(************************************ (nil) ************************************{ return (********************************** (nil), [1:] ************************ (err) ************************************** } (************************************** m (************************************:=(Message) ************************************* {{**********************************) ID (************************************: messageID [1:] ************************ (**************messageBuf************************************ [offset4:offset6], [1:] ********************************** Payload [1:] **************************** (************************************: messageBuf [1:] ********************** [1:], ************************************ } (************************************** return&(m) ************************************ [1:] , nil} (********************************************
view in context
****************************************** (Bitfields)
One of the most interesting types of message is the
bitfield, which is a data structure that peers use to efficiently encode which pieces they are able to send us. A bitfield looks like a byte array, and to check which pieces they have, we just need to look at the positions of thebitsset to 1. You can think of It like the digital equivalent of a coffee shop loyalty card. We start with a blank card of all (0) , and flip bits to (1) to mark their positions as “stamped.”
****************************
By working with (bits) ****************************** instead of (bytes) , this data structure is super compact. We can stuff information about eight pieces in the space of a single byte — the size of a (bool) . The tradeoff is that accessing values becomes a little more tricky. The smallest unit of memory that computers can address are bytes, so to get to our bits, we have to do some bitwise manipulation:
[
1] [1] (************************************************// A Bitfield represents the pieces that a peer has [1:] ************************** (************************* (type(Bitfield) ************************************* [] byte) **************************************// HasPiece tells if a bitfield has a particular index set [1:] **************************func (************************************ (bf) Bitfield *************************************(******************** (HasPiece) ****************************************************** ([byteIndex] ******************indexint [1:] ****************************)(bool){ byteIndex [1:] **************************** (************************************:=index (************************************ /8 offset [1:] **************************** (************************************:=index (************************************ (%) ************************************8 return (********************************** (bf)
>>(********************************** (************ (7) ******************************** (7) **************************** – offset&(1) ************************************ [1:] ************************=! **************************** [1pstrlen820:] **************** (0)} (**************************************// SetPiece sets a bit in the bitfield [1:] **************************func (************************************ (bf) Bitfield *************************************(******************** (SetPiece) ************************ (SetPiece) **************************** ([byteIndex] ******************
indexint [1:] ****************************
GIPHY App Key not set. Please check settings