Hi guys,
I have attempted system design (HLD) of dropbox, let me know , your thoughts, suggestions or any feedback for improvement.
Functional Requirements :
Non functional Requirements :
Core Entities :
File
User
APIs :
POST : /api/v1/upload
GET : /api/v1/files/{fileId} --> for fetching the info of a file
GET : /api/v1/download?fileId=''
POST : /api/v1/files/share
{
"fileId" : "",
"userId" : ""
}
GET : /api/v1/users/{userId}/files?type='shared'
High level Design :

Flow :
--> user comes on our platform, tries to upload a file by calling /upload
--> uploading the file first on our app server, and then on S3, would spikes in
app server memory
--> App server fetches an pre-signed url from s3 , sends it to the client, client uploads to s3 using the presigned url
--> the presigned url is only valid for a short period of time
--> once the upload is complete , the client calls App server to store the meta-data
in files DB
File download :


Deep Dives :

File Download :
-- Instead of fetching the s3 link for a file from DB, we could fetch it from cache like Redis
-- additionaly, CDNs can be used as well
-- What would be the cache expiration policy and TTL (need to consider the scale of the system) ?
-- LRU eviction policy
File Sharing :
-- while fetching the list of files shared with a user , we are taking JOIN of 2 tables ,
(Files DB and Shared Files) which might be slow
-- Denormalize data in Shared Files by keeping relevant feilds (like fileName) like the ones
which needs to be shown upfront on UI to the user
-- if a user is interested in viewing the full info of a file, user will call an api for fetching the
complete info of file from files DB
-- Files info Table can be indexed on fileId to support fatser access by file id
-- Additionally, to support high read throughput , we could have separate DBs for read and writes
for files DB