System Design - Data Engineer - FAANG
Anonymous User
683

Design a system to identify companies who are stealing TechCrunch's articles and republishing them on their websites. Cover logical architecture, technology choices, performance and efficiency of the system.

Input : a file with home page URLs of the suspecting companies.

Output: Your program should crawl the given websites, find pages that resemble pages hosted on TechCrunch and generate an overall score that indicates the degree of similarity.

Comments (1)