*don\'t forget to upvote if you found it helpful :)*\n\ninterviewer shared the below question over google docs, wanted me to read it then explain and type my solution there\n\n```\nBotnet Crawler\n\nBackground\nYou\u2019re a hacker which is currently controlling 10k hosts using your own Trojan software. You have full control over each host. \nUnfortunately, as a poor hacker, you\u2019re getting a free Raspberry Pi 1 Model A as the only centralized master for controlling these hosts. Tech spec listed below:\nCPU: 700 MHz;\t\t\t\t\t\tMemory: 128 MB;\nNetwork: 5 MB/s for both upload and download;\t\tDisk: Not available;\nEach device in the botnet will send a heartbeat every few seconds to your Raspberry Pi if it\u2019s still online. Inside your controller, you can easily know all available hosts in the form of a list of IP addresses and ports. You could execute any command on each host through SSH.\n\nProblem Statement\nYou want to crawl wikipedia.com with roughly 1 billion pages in total.\nThe raw HTML size per page is 100 KB on average. \nOnly need to crawl and store raw HTML pages locally (local hard drive). Please be aware that the available disk per host is roughly 40 GB since most of the space should be left for users.\nMachines can be turned on and turned off at any time. Failure rate: roughly 1 host will become unavailable per second. Most of the hosts (95%) will be back within 10 minutes.\n\nPrecautions\n10k hosts in the botnet are only hardware resources you have. No access to anything which is NOT free (e.g. AWS, Azure, Google Cloud).\nIntroducing open source tooling which would significantly simplify the design (e.g. HDFS / Spark / Kafka) is strongly discouraged since this against the purpose of this interview.\n\nClarifications\nYou\u2019re guaranteed to have at least 10k hosts at any given time.\nNo need worrying about page refreshing/updating, just need to crawl all pages once.\nNo need worrying about pages outside the domain of wikipedia.com.\nThe only seed page you have to start with is wikipedia.com.\nExcept for the max hard drive available which we\u2019ve predefined, you want to use as little hardware resources (CPU, memory, network) as possible since most of the resource should be left for users.\nThe tech spec for each host will be similar to the hosts we use on a daily basis. For your reference, best-selling PC 2020 on Amazon: AMD Ryzen 3 3200U Dual Core (3.5GHz); 4GB DDR4 Memory; 128GB PCIe NVMe SSD. \n\nP0 Constraints\nBalance the workload between all hosts. All hosts should have relatively similar resources\' usage in terms of CPU, memory, network and hard drive IOPS.\nStrictly avoid duplicated crawling.\nBe able to handle high hardware failure as described in the description.\n\nP1 Constraints\nTight engineering schedule. After finishing the design, the prototype (MVP) should be implemented within 2 months with 1 engineer.\nMinimize the total time to finish the entire crawling job.\n\nGoal\nDesign a software system which could craw and store all pages under wikipedia.com.\nThe software system should fit all the constraints.\n```\n\nMy full onsite interview:\nhttps://leetcode.com/discuss/interview-question/1162056/facebook-onsite-e4-usa/916742\n\nMy phone interviews:\nhttps://leetcode.com/discuss/interview-question/1162082/facebook-or-phone-interview-or-E4-or-USA

don\'t forget to upvote if you found it helpful :)\n\ninterviewer shared the below question over google docs, wanted me to read it then explain and type my solu