Position: SDE2\n\nYou are given a logfile with multiple requests in the following form:-\n`Timestamp Customer_ID Page_ID`\n\nAssume that the logfile is sorted on timestamp. Find a way of generating the most common 3 page sequence. A 3 page sequence is a sequence of pages such that a user visits these 3 pages one after each other.\n\neg.\n\n```\nT1 C1 P1\nT2 C2 P2\nT3 C1 P2\nT4 C2 P1\nT5 C1 P1\n```\n\nHere, the most frequent 3 page sequence is P1->P2->P1. C2 here is a corner-case because C2 only visits 2 pages, asked about this and was told to discard such sequences.\n\nI gave 2 solutions:\n* Straightforward hashmap: Hashmap<sequence, integer> in java, where sequence is a key generated by page visits. Eg. key(1->2->3) = 123 where 1,2,3 are page ids.\n* O(n) time: O(n) for separating out page visits according to customers, O(n) for going through each possible sequence and incrementing count in hashmap(O(n - k) actually), and finally O(n) in finding the maximum. Space would be O(n) in all scenarious. Here n is the total number of logfile entries, and k is the sequence length. \n\n=========================================================\n* Compressed Trie of page sequences: much more scalable, much lesser memory in a real-world scenario, and slightly more time to get the maximum. The Leaves store the count of each sequence. Can have variable page sequences(k), uses much lesser memory in real-world scenario because almost 90% of users visit home first, and then probably search/detail/cart/checkout. To get the most popular/max-occurring page sequence: can keep a running maximum count. Implemented HashSet every trie-node for faster child-node retrieval(it is not characters but pageids, so we cannot just keep a 26-sized array map). \n* Time Complexity analysis: Time taken to create trie: For \'n\' sequences with all unique nodes,O(n) to parse for all users, O(n), for every page id, we have to create a node, but for real-world scenarios, it will be much faster(because sequences will often be repeated), finally O(n) for calculating maximum because everytime we get a sequence, we compare against the maximum count encountered till that sequence. Here n is the number of entries in the logfile.\n* Memory Complexity: O(n) to create trie(again, much better in real-world scenario). \n* All of this is assuming that page sequence size is a constant.\n\nI implemented the 2nd one. Was rejected and the reason given after the interview was that I had issues in understanding Data Structures. The interviewers agreed with my preference for trie during the interview process, but the final result was given to me by the HR.\n\n\nEDIT: I have coded the second solution here : https://leetcode.com/playground/DdwGRi3p. This solution also sorts the given log entries on the basis of timestamp, but in the original question, it is assumed that the entries are already sorted by time.\n\nThere is a leetcode question similar to this where sequences might not necessarily be consecutive. https://leetcode.com/problems/analyze-user-website-visit-pattern/  \neg for P1 -> P2-> P3->P4, seqences for this question would be P1, P2, P3 and P2, P3, P4, but for the leetcode question, P1, P3, P4 is also a valid sequence.\n

Position: SDE2\n\nYou are given a logfile with multiple requests in the following form:-\nTimestamp Customer_ID Page_ID\n\nAssume that the logfile is sorted on