Design an analytics system.
- The input to the system is fed from another service and contains Personally Identifiable Information(PII) such as email,name etc..
- The input comes in the form of an API request -
eg. { "email" : "abc@gmail.com",
"phone": 9888,
"name":John
}
The service should return the following metrics for the last 1 week, 1 month, 1 day and 2 yrs -
A). No. of requests with the given email-id ..
B). No. of requests with unique names for a given email-id. Some requests may contain same email-id but many different names and phone numbers. That is likely to come from a fraudster. These metrics help us in fraud detection.
C). Percentage of request with a given name of the total number of entries for the given email id.
eg, For email id - abc@gmail.com, there may be a total of 100 records out of which 50 come with the name John,20 with the name Robert, 30 with the name Daniel. So, for John, it will be 50%, Robert - 20% and Daniel - 30%
Note: You can assume that the data older than 2 yrs will be automatically deleted from our datastore.
**Questions : **
**1. What are the services ?do you need just one service - analytics service ?
- Time Series DB ?
- Can Oracle be used? NoSQL?
- Can Kafka fit in somewhere?
- Performing aggregation in the DB vs code. eg...
No. of requests with unique names for a given email-id. This query filters by email id and then does aggregation on it using the name(count(*)). Doing aggregation in the query vs offloading it to the application code. Pros and cons