Microsoft | Onsite | Design the T9 predictive text algorithm and system

I was asked this question in the microsoft virtual interview discussion for Office Engg team.

In older days, the cellphones would not have alphabets. In that time the keypad digits from 1-9 would be used to generate alphabets. Later the T9 predictive text algorithm came. In that kind of algorithm, digit 2 would map to characters 'a' to 'c'. Similarly, digit 3 would map to character 'd' to 'f' and so on.

image

Say when you type "6263" then it displays the words "mane", "name", and "oboe".
How will you design such a algorithm?

He mentioned that how can we start predicting the letters from the first digit itself? Also what will be the complexity of the algorithm? How can we make it better.

I proposed the following:

  1. Use a trie that represents a lexicon. For each digit typed generate the possible alphabets. For each node in the tree, keep the possible options. Just return them. Complexity of generating all possible alphabets would be either O(3^n) or O(4^n), because for each digit selected by the user, we can have either 3 or 4 possible digits. Searching in the trie would be O(|w|) where |w| is the length of the prefix.

He asked if we can do any better?

  1. I said that we can also give an inverted hash table, where we will keep the 'key' as the 'prefix' and the values would be suggested words.
    Expanding more on this solution, the map would have keys as the following:
    a to z - 26 keys
    aa to zz - 26^2 keys
    aaa to zzz - 26^3 keys.

I mentioned that some of these prefixes would be invalid, so number of keys would be small. He was concerned about the space consumption of this algorithm.

  1. I also proposed doing a binary search in the sorted dictionary.

He was not satisfied with any of the solutions.. Any suggestions?

Comments (12)