Tracing the Original Seed
The research project made some astounding findings. Perhaps the most surprising statistic is that only around 100 users (which could be individuals or groups) account for a massive 66 per cent of the original seeds, and these seeds then went on to account for 75 per cent of the downloads in the researchers' data-set. So, while plenty of other BitTorrent users might seed the file after they've downloaded it, the original torrent only comes from a very select group of users. But how do we know that these users are, in fact, the source of the original seed?
'We use the RSS feed,'
explains Kryczka, 'so we make the connection a few seconds after it's published, and then we use a tracker. Trackers tell us how many seeders and leechers there are, but not who is a seeder who is a leecher. If we get a response from the tracker that says the torrent has one seeder and zero leechers, we can immediately get the IP address of the seeder. We can assume that the seeder that's found just a few seconds after the torrent's birth is the initial one.'
Using the Pirate Bay RSS feed, the researchers were able to identify the original seeders within seconds of a torrent going live
That logic may seem fair enough in those circumstances, but it's not always that simple. What happens, for example, if the tracker doesn't just list the one seeder?
'If we got a response where there was one seeder and a small number of peers,'
says Kryczka, 'then we tried to contact each of them to get their bitfield - the percentage of the content they have - enabling us to find seeder.'
In addition to tracking the IP addresses of the original seeders, the researchers also tracked usernames, which complicated matters further. Usernames are required in order to publish content links on Pirate Bay, and Kryczka says the research involved trying to tally the usernames up to the IP addresses.
'In some cases, it's a clear correspondence where one username corresponds with one IP address,'
says Kryczka. 'In other cases, one username corresponds with a few IP addresses from hosting providers, in order to disperse content more effectively from several places. There are also cases where one username corresponds with a few IP addresses from the same ISP, which is known as the NAT [network address translation] effect. Finally, there are also cases where lots of usernames correspond with one IP address, and this is mainly an example of fake users. Pirate Bay removes fake accounts, so "fake" publishers create new accounts using a random string and start publishing again from the same IP address.'