Chat System Design Complete Guide
Anonymous User
6369

image
image

Good to know:

  1. chat we see in the phones are stored on the client itself
  2. end to end encryptions is done in whatsapp

Something like Facebook Messenger, that stores all messages permanently, or something like WhatsApp that stores the messages only until they are undelivered. Once we receive confirmation that the messages are successfully received, they can be deleted from the system. This is something we need to clarify in the requirement gathering stage because we will choose the database for our message service based on this specification. Deletes are not handled very efficiently in Cassandra, so if we are building something similar to WhatsApp we might decide to go with another database

Number of chat servers needed:
The maximum limit of connections comes from the maximum number of file descriptors available to the process, most commonly. Then, there will be other limitations, such as the available memory to handle the connections internally (how much memory does your process use per connection?).

When there is little bottleneck from RAM or CPU then it can handle a large amount of concurrent connections
Let’s plan for 500 million connections at any time. Assuming a modern server can handle 50K concurrent connections at any time, we would need 10K such servers.
assume each connection takes up 1MB of RAM - then 50,000 connections would take up 50 GB or RAM. So you are bound by how much RAM a modern server has.

image

API Needed:

image

If group is allowed - join group and leave group could be 2 apis.

Database Schemas:

image

Showing message in sorted order: [Message_id carries the responsibility of ensuring the order of messages]
NoSQL databases usually do not provide such a feature.
sort them on basis of created timestamp
OR
store by message by messageId such that IDs should be sortable by time, meaning new rows have higher IDs than old ones. ( IDs must be unique.)
1. auto increment like sql but nosql usually don't provide
2. 64 bit sequence bit generator like snowFlake
3. local sequence generator (The reason why local IDs work is that maintaining message sequence within one-on-one channel or a group channel is sufficient)

If one to one chat is main priority(whatsapp):
may be sharding one userId will help (we can fetch all the messages of one user to another user)

####SAVING BOTH THE MESSAGES IN BOTH THE SHARDS OR ROLLBACK
chat is between 2 parties 
if user A sends user B: suppose then A's message will be saved in shard 1 and 
B's message will be saved in shard 2

Here data is stored at 2 places so we need to maintain the consistency
application layer consistency solution using atomic operation:
try:
write to A 
write to B
except:
rollback on A
rollback on B

So now, we can create range key on conversation/userId and all conversations/userId will be sorted on timestamp. (for one to one, conversation can be userId)

image

If group conversation chat is main priority(slack, telegram):
may be sharding one conversationId will help (we can fetch all the messages from one channel, all users from one channel)

Flow:

**Stateless vs Stateful APIs:
**
image

image

· Chat servers facilitate message sending/receiving. 
· Presence servers manage online/offline status.
· API servers handle everything including user login signup, change profile, etc.
· Notification servers send push notifications.
· Finally, the key-value store is used to store chat history. When an offline user comes online, she will see all her previous chat history. 

Service discovery:
There might be hundreds of thousands, or even more persistent connections to a chat server. If a chat server goes offline, service discovery (Zookeeper) will provide a new chat server for clients to establish new connections with.

  1. User A tries to log in to the app.
  2. The load balancer sends the login request to API servers.
  3. After the backend authenticates the user, service discovery finds the best chat server for User A. In this example, server 2 is chosen and the server info is returned back to User A.
  4. User A connects to chat server 2 through WebSocket.

image

image

Presence servers are responsible for managing online status and communicating with clients through WebSocket.
image

Comments (4)