Status: 7+ YOE, BS Electrical Engineering
Position: Software Engineer in a Pharma Company
Location: Banglore, India
Date: September 13, 2020
Hi Guys,
I recently took Google Onsite. I thought it went very well however I was outrightly rejected. I need some help in figuring out what went wrong.
I have modified the questions to respect NDA however the essence is still the same.
Onsite Behaviour Round (45 minutes):
the usual stuff. It was a very good interactive round. I have answered these kinds of questions before in SalesForce and Netflix telephonic (I passed those rounds but in both the cases positions were closed). The only question were I think i messed up is , Tell me about a time where under pressure(time constraint) you took some decision. In this case the interviewer was not happy with the answer since he did not think it was under pressure related to time.
Onsite System Design (1 hour):
The question was design Auto Suggest
I started with Trie. The interviewer was happy with the approach. To have a context I said we could have a web crawler which will have all the pages plus we will have a reverse indexing of words -> pages so that whenever user types in some word it will get the pages. For auto correct I said we could use Fuzzy Search using levenshtein distance approach. elastic Search does that.
The interviewer said that is correct however lets keep it to Auto Suggest. I said first approach is to track the history of the user and suggest accordingly. I gave him personal example, i said I am preparing for interviews so whenever I type in back, google shows me backtracing first instead of backpack. He said that is correct. Also I said assume Leonardo Di Caprio is up for oscars and Oscars are approaching so any user typing in Le the first thing that I would show him is Leonardo Di Caprio and not Leonardo Da Vinci. He liked that example.
He asked me to write a basic Trie class, I wrote it. He stopped me in between saying that is fine. The we talked about Caching, we could have hashmap with some key words and a list of words depending on the rank. The rank would be decide dymanically based on world events(Oscars) or user location or user search history (unless he is searching in incognito mode). I said we could use Redis cluster for this since Redis provides sorted lists.
Then we went to size of the trie. I started with basic char level size in UTF-8 it is 1-3 bytes so assuming the longest length of the word is 20 chars, the size comes to about 60 bytes . He asked me what would I store in Trie I said all the words that user has searched he said do you really need all the words, I said what about a dictionary , he said yes that makes sense. Then he said how would you distribute it. I said we could group the chars and distribute it among different servers, however I said contninuing it would give rise to hotspots since a-h block will be used more then as compared to q-z. He said that is correct how would you resolve it , I said consistent hashing isone way, he said what is consistent hashing, I answered (I gave the essence of consisten hashing i:e anytime we add or remove a node there is no need to reconfigure keys for n-1 nodes plus in the hotspot areas we can keep adding nodes , the more nodes we add better since the work is ditributed) he was happy with it.
We later went to Kafka and Cassandra. He asked me how to control Kafka Servers , I said through Zookeeper. He asked me about CAP theorem and Cassandra gossip protocol. He asked me whether I have worked on these systems before, I have worked on these systems before. He seemed pleased with answers. Overall we had good conversation which progressed right from Application Layer to Database.
Onsite Coding Question 1 (45 minutes):
given a binary grid with 0 and 1 where 0 represents Water and 1 represents Land. Find the max contiguous water body.
For eg in the grid given below, the ans is 6. We have two contiguous water bodies one is of the size two and the other of size 6
1 1 0 0
0 0 1 1
0 0 0 0
My solution: do a dfs from the water bodies. Keep a max_count variable for the maximum count of the water body.
Keep a count of the water body found during dfs. Once the dfs is done do a max on max_count and count.
The interviewer was ok with the approach. I went ahead and coded it in 15 minutes. While coding I verified with him that we are going to go in four directions, up, down, right and left.
I did a dry run , found a bug in the code and corrected it.
The time complexity that I gave for this code is O(mn). The space complexity is O(mn) where m is no of rows and n is no of cols for the DFS recursion + the visited
The follow up question was find the coniguous Land and if the water bodies are landlocked, meaning if they are surrounded by land on all sides include those bodies in the count as well
for example grid below should give 12 -> contiguous land is 10 + 2 water bodies since they are land locked.
1 1 1 1
1 0 0 1
1 1 1 1
if they are not land locked do not include the water bodies.
After thinking for about five mins, I figured out if the water bodies touch boundary they are not landlocked.
I asked the interviewer about this, he mentioned excellent question and said I was right.
So going forward with the idea, maintain a variable is_land_locked = True
while doing dfs , if any water body is touching boundary, is_land_locked should be false and hence should not be included in the final calculation.
so now the logic is first do a dfs on water bodies, check if they are land_locked. also while doing dfs on water bodies , capture on adjacent land cell.
Once the water dfs is done, do a land dfs starting with the captured land cell.
I quickly reformatted my code to include this logic. I had 3-4 minutes left to do this change. The interviewer seemed happy with the approach.
Onsite Coding Question 2 (30 minutes):
Give a set of nodes with some weights , make sure each node receives packets based on its weight.
After some thinking, I came up with this simple example,
lets say there are three nodes, and the weight of the nodes is 10 20 and 70
i will generate a random number , in python I could do it using random.randrange(0,100)
if the number is between 1 - 10 send it to node 1,
if the number is between 10 - 30 send it to node 2
anything else, send it to node 3
he was happy with this idea and said it was the correct approach. However there was one glitch in the weights that he gave me
the weights are 1 2 and 6 so we were dealing with float points that too only between 0 and 1. I was not able to recall the exact api to generate random floating point numbers through random api in python
The interviewer at this stage said he would look it up for me and said random.random () is the ans. I coded it and he was kind of happy with it. he was more interested in the actual implementation.
This interviewer was 10 minutes late so we had Less than 30 minutes to discuss this problem.
Onsite Coding Question 3 (45 minutes):
given a tree with root node as parent(as in father or mother) and child nodes as children , given two people find if they are related or not.
The interviewer was vague. I had to ask him if this is directed or undirected graph. I figured out it has to be undirected graph. If a parent has children then the child is also connected to the parent
This is how I came up with input
[
[john, alice, alex, joe]
]
assume the above case, john is the father of alice, alex and joe
I would create following adjacency list
john -> [alice, alex,joe]
alice -> [john, alex,joe]
alex -> [john, alice, joe]
joe -> [john, alice, alex]
the approach would be use a simple BFS and traverse through the graph. the time complexity os O(E+V).
I coded it in 15 minutes. The interviewer was happy with the approach.
The follow up question was how would you optimize BFS. I said Bi-directional BFS. He said that is correct. This was just discussion no coding.
He then asked me how would you maintain the relationship and return it in output if two person are related along with True and False.
I started talking out loud and figured out Enum is one way to do it.
In the Node class we can have Children, Siblings, Parents Enum and then depending from which we direction we approach i:e whether person1-> person2 or person2->person1
the relationship would change. if we are approaching from Parent to Child, the relationship would be children if we are approaching from child to Parent it would be
Father. He said that is correct.
He then asked me how we would distribute it amongst several machines. I said we could distribute in a PostOrder way, get all the relationships for each node and then
push those relations upwards.
I think he was ok with it.