(1) Use Cases
(2) Scenarios That Will Not Be Covered
(3) Who Will Use
(4) How Many Will Use
(5) Usage Patterns (1) Throughput
- Queries per second (QPS) for read and write queries.
(2) Latency
- Expected latency for read and write queries.
(3) Read/Write Ratio
(4) Traffic Estimates
- Write (QPS, volume of data).
- Read (QPS, volume of data).
(5) Storage Estimates
(6) Memory Estimates
- If we are using a cache, what kind of data do we want to store in the cache?
- How much RAM and how many machines are needed?
- Amount of data to store on disk/SSD. (1) Latency and Throughput Requirements
(2) Consistency vs. Availability
- Weak/strong/eventual consistency.
- Failover/replication for availability. (1) APIs for Read/Write Scenarios for Crucial Components
(2) Database Schema
(3) Basic Algorithm
(4) High-Level Design for Read/Write Scenario (1) Scaling the Algorithm
(2) Scaling Individual Components
- Availability, Consistency, and Scale story for each component
- Consistency and availability patterns.
(3) Components to Consider
a) DNS
b) CDN (Push vs. Pull)
c) Load Balancers (Active-Passive, Active-Active, Layer 4, Layer 7)
d) Reverse Proxy
e) Application Layer Scaling (Microservices, Service Discovery)
f) Database options
- RDBMS: ACID Properties, Primary-Secondary, Primary-Primary, Federation, Sharding, Denormalization, SQL Tuning - Postgres
- Use-cases: Structured data with relationships
- NoSQL: Key-Value, Wide-Column, Document - MongoDB, DynamoDB
- Use-cases: Unstructured or semi-structured data
- Graph: Neo4j, Amazon Neptune
- Use-cases: Social networks, knowledge graphs, recommendation systems, and bioinformatics
- NewSQL: Key-Value with ACID Properties - CockroachDB, Google Spanner, VoltDB
- Use-cases: Transaction processing, real-time analytics and IoT device data
- Time Series: Time-stamped data points - InfluxDB, TimescaleDB, Prometheus
- Use-cases: IoT sensor data, financial market data, system metrics, and logs
- Vector: High-dimensional vector data - Pinecone, Weaviate, KDB.AI
- Use-cases: Machine learning, similarity search, and recommendation systems
- Fast lookups:
- RAM (Bounded size) => Redis, Memcached.
- AP (Unbounded size) => Cassandra, RIAK, Voldemort, DynamoDB (default mode)
- CP (Unbounded size) => HBase, MongoDB, Couchbase, DynamoDB (consistent read setting).
g) Caches
- Client caching, CDN caching, Webserver caching, Database caching, Application caching, Cache at query level, Cache at object level.
- Eviction policies:
- Cache aside
- Write through
- Write behind
- Refresh ahead.
h) Asynchronism
- Message queues
- Task queues
- Back pressure.
i) Communication
- TCP
- UDP
- REST
- RPC
- WebSockets (1) Throughput of Each Layer
(2) Latency Caused Between Each Layer
(3) Overall Latency Justification (1) Identify key metrics relevant to your system's design
- Availability
- Latency
- Throttling
- Request Patterns/Volume
- Measure customer experience/feature specific metrics
(2) Define metrics for infrastructure and resources
- Tools like Graphana with Prometheus, AppDynamics, etc., can be used. (1) Measure app index and latency of microservices
(2) Tools like New Relic, AppDynamics can be used
- Use Grafana with Prometheus or AppDynamics for monitoring
(3) Canaries - to simulate customer experience and pro-actively detect service degradation (1) Implement tools to gather and visualize metrics
(2) Collect and analyze logs with ELK (Elastic, Logstash, Kibana) or Splunk.
(1) Firewall, encryptions at rest and in transit
(2) TLS
(3) Authentication, Authorization (AuthN/Z)
(4) Limited Egress/Ingress
(5) Principle of least privilegeExtension of: https://leetcode.com/discuss/career/229177/my-system-design-template