I had system design interview in Atlassian, goal was P50.
The intial task was something Tagging System (in my words): multiple services would like to store tags Cofluence for page, Jira for tickets and etc with CRUD and things like all pages per tag, dashboards and etc. Similar this
The interviewer started to aksing from endpoint design and during interview highlited that not intrested in whole system design. The main topics were API enpoints, pagination and remained time (not much) on databases.
The question which took the most time was about PUT endpoint which updates tags for a page or ticket (contentId)
PUT /content/{contentId}/tags
{tags:[]}The iterviewer ask consider huge amount of tags hundred thousands and more. I replied it does make too much sense and better to have a restriction like maximum tags for the content, the interviewer insisted to consider huge amount of tags. What do you think the proper solutions for this case?
My thoughts (happy to hear feedback):
a. Using the simple PUT method :
b. Another approach that was discussed leans towards the candidate's expectations and aligns more naturally with handling large data – the idea of splitting data into manageable batches or chunks. The primary concept here involves breaking down substantial data on the client side and then transmitting it to the server. Afterward, the data can be reassembled on a storage system like S3 and processed piece by piece asynchronously using messaging queues, taking advantage of bulk update capabilities if needed.
Remaining part of the interview :
It was a bit confusing because the requirement above about hunderd thousands tags should flow in other parts. Still had some good discussions: I provided a comprehensive explanation regarding pagination, delving into the nuances of both offset and cursor options, complete with trade-offs and illustrative examples. This discussion encompassed querying pages based on tags and considerations for the dashboard.
I shared my insights on the database structure, involving the segmentation of tags, statistical pages per tag, and a fast tag-per-pages mechanism. These recommendations were accompanied by strategies like caching, sharding, consistent hashing, and replication, all of which I justified in response to the interviewer's inquiries.
Additionally, we explored non-functional requirements, touching on critical aspects such as performance, availability, scalability, and durability. These considerations were central to our conversation. Furthermore, we addressed topics related to logging and monitoring during our discussion.
The interviewer made small traps such as insisting that POST and PUT can be used interchangeably, which I was not agreed with, advocating idempotence for PUT and etc.