I had this problem asked in an interview at Rubrik. I would like to get everyone's thoughts here.
Question : You have to transfer large files (e.g. each file having size of 40 GB).
All the files are there on the disk and you have to transfer all the files to another system with given IP address.
The network bandwidth is 10 GBPS (the network interface card can support upto 10GBPS data including both data sent and recieved).
What would be the best possible way to transfer the files.
My Response :
Whenever we have to transfer large files, the best way would be to send one big file in parts.
We can have a program which creates parts of the original file, and we send the parts.
And at the reciever's end we will have some program which will basically collate the parts into one single file.
By sending the file by parts, if any file part transfer fails, we will have to only transfer the failed part again, and not the entire file.
Also we can spawn mutiple threads to transfer multiple file parts in parallel.
Or we can use a ThreadPool with a message queue, all the file transfer jobs will be added to queue by the program which generates the parts from original file.
Initial thought:
First I thought that, by using multiple threads we can speed up the file transfer.
But then after discussing with interviewer, I realized we can never send and recieve more than 10 GBPS.
So merely creating more threads will not lead to faster file transfers.
For example in download softwares, If we start downloading one video from Youtube the speed will be say X Mbps,
now If I start 3 other video downloads the speed drops to X/4 Mbps, since 4 downloads are going on in parallel.
Edit :
I did mention the file compression to reduce the file size, and the interviewer acknowledged it in affirmative.
I would like to hear from others. Or some resource which explains this problem and solution in detail.