Perform analytics on Streaming Data - Data Engineering Design
Anonymous User
2825

In a scenario like FB photo upload, there are 4 steps identified -
1 - click on browse
2 - select pics
3 - check for any edits
4 - Click on upload

For each customer that does this action, there are events generated with following info -
user_id | step_num | event_timestamp

Order in which these events are streamed are not completely strict i.e. we might have user 2 data streamed in before all events of user 1 are captured.

u1, 1, 2020-05-30 00:00:01
u1, 2, 2020-05-30 00:00:02
u2, 1, 2020-05-30 00:00:02
u3, 1, 2020-05-30 00:00:02
u1, 3, 2020-05-30 00:00:03
....

How would you implement any streaming solution to calculate average time taken for each step. We can assume total time taken by a user for each step to be (time_stamp_at_step_1 - time_stamp_at_step_0).

Comments (5)