There's a very large website that we want to crawl. This site is organized like Wikipedia, with every page linking to multiple other pages. Please design a system that:
The interviewer stressed on the fact that you have 100 machines available, you need to design this system using those.
Upon further clarification, I was told that we just want to visit the pages (and do nothing else?). Visiting a URL exactly once was very important. Assume the website has tens of millions of pages. We don't have to visit external links.
The third bullet point didn't make sense to me, so I asked to clarify. He said it's not important (??).