Approach #1: Piecewise [Accepted]

Intuition

Let W = A[0].length. It is clear that we can determine in time, whether two words from A are similar.

One attempt is a standard kind of brute force: for each pair of words, let's draw an edge between these words if they are similar. We can do this in time. After, finding the connected components can be done in time naively (each node may have up to edges), (or with a union-find structure.) The total complexity is .

Another attempt is to enumerate all neighbors of a word. A word has up to neighbors, and if that neighbor is itself a given word, we know that word and neighbor are connected by an edge. In this way, we can build the graph in time, and again take or time to analyze the number of connected components.

One insight is that between these two approaches, we can choose which approach works better. If we have very few words, we want to use the first approach; if we have very short words, we want to use the second approach. We'll piecewise add these two approaches (with complexity and ), to create an approach with complexity.

Algorithm

We will build some underlying graph with N nodes, where nodes i and j are connected if and only if A[i] is similar to A[j], then look at the number of connected components.

There are a few challenges involved in this problem, but each challenge is relatively straightforward.

  • Use a helper function similar(word1, word2) that is true if and only if two given words are similar.

  • Enumerate all neighbors of a word, and discover when it is equal to a given word.

  • Use either a union-find structure or a depth-first search, to calculate the number of connected components of the underlying graph. We've showcased a union-find structure in this solution, with notes of a depth-first search in the comments.

For more details, see the implementations below.

Complexity Analysis

  • Time Complexity: , where is the length of A, and is the length of each given word.

  • Space Complexity: . If , the space complexity is . Otherwise, the space complexity is : for each of neighbors we store a word of length . (Plus, we store node indices ("buckets") which is dominated by the term.) Because in this case, we could also write the space complexity as .


Analysis written by: @awice.