Hi,
I have question from Google interview, that I still don't know how to resolve :)
We have a big log file - several gigabytes. Each line contains request/response log - with columns like REQUEST_ID, TIMESTAMP, START/END FLAG. We need to parse file, and print requests ids that exceeded given time threshold. Some of requests contains start log, but has never completed and do not have log with end time.
i.e.
1 1 START
2 2 START
1 4 END
3 8 START
3 15 END
And given timeout threshold as 5.
Request 1 started at 1
Request 2 started at 2
Request 1 ended at 4 ->4-1 = 3 < 5 - under threshold - it's ok - do nothing.
Request 3 started at 8 -> In this place we should already know that request 2 started at 2, and 8-2 = 6 what is > 5, that means we should print here that request 2 is timed out.
Request 3 ended at 15 - >15-8 =7 > 5 -> we shoud print that request 3 timed out.
Does anyone have idea how to solve this efficiently ?