System design for weather widget

Anonymous User

1421

Problem Statement

To design and implement a weather widget that fetches and displays the latest weather data from the National Weather Service (NWS) at regular intervals to provide users with up-to-date weather information.

Functional Requirements

Forecast range - The widget shows hourly forecast for the next 24 hours and daily forecast for the next 7 days.
Temperature details - The widget displays daily high and low temperatures.
Weather descriptors - An enum displays weather conditions (sunny, cloudy, rainy, snowy, windy)

Non-Functional Requirements

Availability - The system should aim for high availability, aiming for 99.99% uptime.
Performance - The system should load and display the data quickly, ideally under a second.
Scalability - The system should be scalable to support a growing number of users.

Resource estimation

Assumptions

Our system fetches the data from NWS every hour.
There are 1 billion subscribers to the widget.
100 million daily active users. Daily active users check the widget 5 times a day.
One server can serve 1000 requests per second.
Stores data for 10K major locations and each data entry consume 50KB.

Traffic estimation

100M * 5 times day = 500M requests per day = 500M/(24 * 3600) requests per second = ~5800 requests per second

Storage estimation

Assuming that we retain the data for 30 days for historic analysis

10K * 50KB * 24 hours a day * 30 days a month = 0.3 TB

Let's assume 3 replicas for redundancy. So total = ~1 TB.

**Cache estimation **

Let’s assume we cache the 1000 most popular locations.
1K * 50KB = 500KB

Let’s assume that we have 20% overhead for cache management = 600KB

Let’s assume extra space for replication across different cache servers = ~2 MB (3 replicas for example).

Server estimation

~5800 RPS/1000 RPS = 6 servers

However, in real-world scenarios, we need additional capacity for failovers, traffic spikes etc., so let’s double the server need = 12 servers.

High Level Design

Design Details and discussion points

Push vs Pull

We have two options:

Pull model - Client periodically calls the server to get the latest weather data.
Push model - Server periodically sends the data to client whenever it gets the data from NWS.

In our case, pull model serves better. This is because:

We know the frequency of the periodic updates from NWS and the updates on the widget don’t need to be real-time as long as they have the latest update from the last hour. So the client can just periodically check the server for updates.
Pull model is simple to implement. We can have stateless RESTful APIs serving the updates. On the other hand, push model requires web sockets or server side events, which are more complicated to maintain.

Database choice and data models

We will use NoSQL database here because:

Weather data is semi-structured, and NoSQL databases provide flexibility in terms of schema evolution. They can scale horizontally to handle high write and read throughput.
We can use MongoDB for example
Database sharding will happen on the locationID as the lookups are going to be per location.
We will add additional replicas for redundancy, higher throughput, and increased availability.

Example MongoDB entry:

{
   "locationID": 123 (PK)
   "location": "New York",
   "latitude": 40.7128,
   "longitude": -74.0060,
   "current_weather": {
        "temperature": 22,
        "condition": "Cloudy",
        "humidity": 60,
        ...
        ...
   },
   hourly_forecast: [
	   {
		   hour
		   temperature
		   condition
		   humidity
	   }
	   ..
	   ..
   ],
   daily_forecast: [
      {
		   date
		   highTemperature
		   lowTemperature
		   condition
		}
		..
		..
	 ]
}

Caching

Implementing caching will significantly enhance performance and reduce load on backend systems, and offer a better user experience. Breakdown of caching strategies and mechanisms we can use:

Client-side caching (on the phones)

Local storage or in-app database: Store the latest weather update in the phone’s local storage or an in-app database like SQLite or Realm. This ensures that when a user opens the widget, they immediately see the last fetched data.
Cache headers: When your backend sends data to the client, include cache headers (like ‘Cache-control’) to specify how long the client should consider the data fresh. This way, the widget knows when it should request new data.

CDN (Content Delivery Network) caching

Placing a CDN in front of your API servers can cache the responses at edge locations closer to your users. This reduces the latency for fetching the data, especially if you have a global user base.

Application level caching (backend)

In-memory data stores (e.g., Redis, Memcached) - These can be used to cache the latest weather data fetched from the NWS. When a client requests data, your backend first checks these caches before falling back to the database.

Evaluation

Our data model includes all the information in the functional requirements.
Availability:
- Load balancer does periodic health checks on the servers and routes the requests to healthy servers.
- Database replication helps with availability if one replica goes down, others can be used.
Scalability:
- Servers, cache, and database can be horizontally scaled.
Performance:
- in-app storage, CDN, and in-memory cache help with low latency.
- NoSQL databases also support low latency.

Comments (4)