Friend had an onsite with this question.
Here's what I can get:
You have a third-party vendor a database with a huge list of doctors, their aggegrate ratings from patients (1 star being the worst, 5 stars being the best), and maybe their practice (no duplicate entries for doctors). This database gets rarely updated, but it does get updated.
Your task is design a proxy service, so that a client using your service can get the ratings about a list of doctors from database. You don't always want to access the database because it's read expensive.
What technologies would you use in designing your service, what are the benefits of cost of using each technology?
Their design consists of keeping a version of the db in a local database, updating the local version periodically. Then providing GETs in an API service.
To which I think is pretty isn't ideal. I would like to think if you can ask the interviewer if the data has location, similar locationing to Uber with geohashing. Because the client probably doesn't need data about the entire country's doctors, just doctor's in their area. Like client provides their geolocation to the API, there is some memcache layer that sees if we have the doctors within that area, if so then return back the list, if not then read from the db providing geo hashing keys. Do we need an extra db to store access data for backup, which gives itself to sharding the db by location.
I'm kinda stuck in how to keep track if the db has updated besides random polling that maybe queries last updated column in some table, which feels hacky and inefficient.
Not sure what technologies to use besides to just very broad context of memcaching, maybe a SQL db. Some sort of message queue to poll the overall db.
Any help would be awesome.