please comment anything you disagree with, anything you like about my design, and what would you change about my design. As you can probably tell from my design, I am competely new to system design questions
-
separate tables for users: one to see who they follow and one for their images
-
We need an efficient way to get photos for the timeline
-
we prioritize fast reads over fast writes to the database
-
Lots and lots of data
My idea:
- When it comes to CAP, will use AP. The reasoning behind this is that I believe it is more important to prioritize a photo/video being saved on a database instead of making sure every user sees every recently added photo.
- Use the “eventual consistency” for making sure databases are synchronized.
- Use a noSQL database where the key is the key of the user account, in each key it has keys for all of their followers and the keys for all their photos
- Use a SQL database to store photos that are indexed on the ID
- This will cause photo uploads to slow down but allows for fast retrieval
- Shard the SQL databases based on the generated key for user’s account that can never change (different from their username).
- We will organize the accounts on sharded databases so the first X amount of characters in the account ID will tell us which database the user’s photos/videos are in.
- When a user inserts a photo
- We instantly add it to a cache because we want to put the newest photos in people’s photo-feed and we want their feed to load quickly
- instead of having the server wait for the write to be done, we will use asynchronous programming (ie message queue) to free up space on the server to perform other tasks.
- We need to make sure photos are never lost, which means we need a very reliable database setup. We do this by using the master-slave design.
- Use a load balancer that will distribute traffic to servers
- When accessing a shared database,
- The load balancer will determine which sharded database to ask for the desired photo/video
- Use a CDN that will utilize caching. The caching will used some sort of weighted heuristic to determine when to remove items from the cache. This heuristic should be based on how many followers the person has near CDN and how long the photo/video has been in the cache.
- If an queried photo is not in the cache, the CDN will go to a database to find the item, then add it back to the cache.