I've been giving interviews for Senior Java developer positions recently and I was able to answer most of them except for one design problem that I have encountered in my recent interviews. Can you please share your views on how to approach these system design problem.
1. Service Health Monitoring and Alerting Service
- How would you design APM like tool which does the Service/Application's health checks
- The tool should Check Application's Internal Health like the CPU usage, memory usage, db connections and threads and send appropriate alert
- The tool should Check Application's External Health Check like Rate limits, Errors or high latencies and send appropriate alert.
I totally blanked when I saw this question and for the first 15 minutes I was silent and thinking where to start on this. Since we use NewRelic at my workplace I started with below approach;
- For the Service Health Monitoring and Alerting Service we need two components
- An embbed Java agent which is provided as a JVM argument and a centralized SaaS like Dashboard where each Java agent reports its metrics
- The embbed Java agent collects application's internal metrics like CPU usage, memory usage, db connections and threads
- The embedded Java agent will also use Spring AOP like pointcuts to intercept all the HTTP and DB connections and reports the response times to the centralized dashboard
- The embedded Java agent will use gRpc protocol to report the metrics so that latency will be less to communicate with the Dashboard
- The embedded Java agent uses semaphore with a pre defined configured value like CONCURRENT_REQUEST_LIMIT and allows only pre configured number of requests from a specific IP to achieve the rate limiting
- In the dashboard user has to define the Alerts like if App A reponse times beyond certain limit then a thread will look into these configs and does the validation and sends an alert
The interview was kept asking more technical detailes of how do you capture those metrics and how do you send alerts etc. Can someone please share your thoughts on this problem.