Downloading big files from Tor

Some time ago I had to download some large files from the website, of one of the ransomware groups, in order to analyze the published (stolen) data. The leak was huge, hundreds of gigabytes, split in many large files and all published on the website hosted in Tor network.


The Tor network for anonymous communication and information exchange is ideal, but, for example, downloading files, torrents and other heavy materials is inefficient due to the speed of the network itself. Its not designed to exchange huge files, and that what Tor project developers is saying since the beginning.

With the development of the Tor project, the speed and capabilities have increased and, for example, watching YouTube videos or downloading smaller files is possible and relatively fast compared to the early days many years ago.

Also not all files you are downloading using Tor Browser have a chance to resume download in case of lost connection or new circuit or identity change.

If you visit http://ransomwr3tsydeii4q43vazm7wofla5ujdajquitomtd47cxjtfgwyyd.onion/ you can see list of active ransomware group sites. Before ransomware start encryption of files, they are all exfiltrated to the ransomware operator server, all data in attacked company is encrypted and negotiation for encryption key starts. Each of group publish leaked data, if ransom is not paid.

Sometimes companies don’t know what exactly was leaked, and sometimes they don’t pay the ransom and download their unencrypted data after the leak :) but every time, you need to analyze what was leaked and limit further dangers of publishing company data.

So I had a task to download leak, analyze it, assess the risk and suggest what other attacks and threats may arise from the disclosed data. Overall, it’s a boring analytical task, but I was able to automate the process of downloading large files from the Tor network.

Ahh, before I describe technical details, one more challenge is managers…

I like the surprise on the faces of managers who ask how long it will take to download 300 giga of leaked data from Tor, and I say, with luck about 20 weeks. Then the conversation starts: “no way”, “that it has to be ASAP”, “best answer is to have it for yesterday”, “tomorrow at the latest”, “in the worst case by the end of the week during business hours” etc. So you need to start explain non technical guys how Tor network works :| Sometimes there are even ideas that if our computer is too slow or our network is weak, they can pay for the best connection and the fastest computer just to have the data for tomorrow (maybe you should have paid the ransom you would have data very fast back on drives, or invested in security before this sad situation occurred). Nobody understand that 300 GB of data with average speed of 0,2 MBit/s will take exactly 200000 minutes, so 138,9 days or 19,84 weeks or 4,563 months. The way of presenting stats depends on how stressed out the manager is and his sense of humor. Never mind.


Ok, to sum up steps from technical part, because it’s friday and I made it a little chaotic.

Solution, checks every one hour if curl is downloading files over Tor from the list of provided files, and all stuff is done in the background session. If download is dead, it will start it again, files on the list are rotated and resume on errors. Thanks to that you can forgot about it and wait till all files will be saved on local disk. You can remove logging errors and statuses option from the scripts, I did it because I was testing also other solutions for notifications. In general it’s not perfect, but it works. You can from time to time login to the virtual machine and check if everything works, like check network traffic or active sessions sudo screen -ls and if you want see how curl works bring back session sudo screen -r <ID>. To leave session press Ctrl-A and d to detach the screen. That’s all, maybe it will be useful for someone.

I assume that you have knowledge of Linux and it basic commands.

Technical part

Curl, screen and Tor is all you need. Install virtual machine with Debian (no GUI needed, just minimal version of Debian, or whatever distribution you like, but in my example it’s Debian, configure it with your user and sudo stuff etc).

Install curl, screen and tor:

sudo apt install curl tor screen

Create a folder where files will be downloaded, put there text file with all links you want to download, one file per line.

Tor configuration is minimal (sudo nano /etc/tor/torrc)

RunAsDaemon 1

Restart it:

sudo service tor restart

In your home user directory create file and put there:

cd /home/user/test/download/
xargs -n 1 curl -x socks5h:// -L -O -k --retry 9999999 --retry-max-time 0 -C - < urls.txt

change /home/user/test/download/ to location of your download folder and change urls.txt to the file you created that contains URLs to files. This file if executed will start download files listed in text file, one by one and retry on failure.

In details, xargs build and execute command lines from standard input, in our case it take each line from txt file and pass it to curl. -x parameter is for proxy and we are using socks5 for Tor. So far we have curl over Tor with list of files to download -L. Then -O is for remote name, so each downloaded file will be saved on disk with the same name as file on server. -k for insecure connection, by default, every SSL connection curl makes is verified to be secure, we don’t need that for Tor connection. --retry will retry number of times before giving up. --retry-max-time limit the total time allowed for retries and finally -C - tell curl to automatically find out where/how to resume the transfer. It then uses the given output/input files to figure that out.

Then create another file called and put there:

#Check if xargs is running
if pgrep -x "xargs" > /dev/null
echo "Running - no action - "$(date) >> /var/log/download.log
#Not running
echo "Not found - starting again - "$(date) >> /var/log/download.log
#Kill all screen sessions
killall screen >> /var/log/download.log
#Run new session with curl
screen -dm sh /home/user/ >> /var/log/download.log

This script will check if process with xarg in name is running. If running, it will add log entry Running - no action - <date of check>" to the log file var/log/download.log If not running then entry about failure will be added to the same log file, then dead screen session in the background will be killed, this will be also added as note to the log file and new session will be created where our curl script for download files is executed.

Last thing is to add script to the crontab.

sudo crontab -e

then add:

0 * * * * /home/user/

to execute script every hour.

Enjoy downloading big files from Tor.