This is in continuation to my previous post - first round
I was only asked one question in this round.
Given a continous stream of strings, with each string containing a bunch of words seperated by space, print the first and the last unique occuring words for each word.
Strings consist of only lowercase letters seperated by spaces.
Eg. Stream : s1 - my name is xyz, s2 - my name is fff
output - [my, my],[my,name],[my,is],[my,xyz],[name,xyz],[is,xyz],[xyz, xyz], [xyz,fff]
I came up with two different approaches,
Approach - 1:
Using a List, a Map<String,Integer> and an integer first(which starts with 0 and which we will keep incrementing if the current entry in the list is not unique) the map keeps track of the duplicates and the list keeps track of the unique elements. Whenever we encounter a word which is not present in the map, I add an entry into the map - map.put(word,1), on the other hand if the word is already present in the map I will replace the map entry with -1.
There are a couple of scenarios to think about and I wrote the code for this logic.
After this, the interviewer asked me to optimize the approach, as we have to deal with billions of strings of data. So I explained another approach with a LinkedHashSet and a HashSet. The LHS will have the unique entries and the HS contains the duplicates. Whenever I see a word, I check if it's already present in the HS to see if it's already a duplicate. If not, I'll check if it's there in the LHS to check if the key has to be marked as a duplicate. If it's present in the LHS I'll remove it from the LHS and put into the HS.
Overall, I think I kinda got a little clumsy when I was explaining my approach as I wasn't sure if the interviewer understood my logic.