Approach #1: Maintain Answer of Suffix [Accepted]

Intuition

We can think of substrings as two for-loops, for the left and right boundary of the substring. To get a handle on this problem, let's try to answer the question: what is the answer over all substrings that start at index i? Let's call this . If we can compute this faster than linear (brute force), we have an approach.

Now let be the unique letters function, eg. .

The key idea is we can write as a sum of disjoint functions over each character. Let be if occurs exactly once in , otherwise , and so on with every letter. Then , where is the alphabet.

Algorithm

This means we only need to answer the following question (26 times, one for each character): how many substrings have exactly one ? If we knew that S[10] = S[14] = S[20] = "A" (and only those indexes have an "A"), then when i = 8, the answer is 4 (j = 10, 11, 12, 13); when i = 12 the answer is 6 (j = 14, 15, 16, 17, 18, 19), and so on.

In total, , where index[c] are the indices i (in order) where S[i] == c (and padded with S.length if out of bounds). In the above example, index["A"] = [10, 14, 20].

Now, we want the final answer of . There is a two pointer approach: how does differ from ? If for example S[0] == "B", then most of the sum remains unchanged (specifically, ), and only the part changes, from to .

We can manage this in general by keeping track of peek[c], which tells us the correct index i = peek[c] such that our current contribution by character c of is index[c][peek[c] + 1] - index[c][peek[c]].

Complexity Analysis

  • Time Complexity: , where is the length of S.

  • Space Complexity: .


Approach #2: Split by Character [Accepted]

Intuition

As in Approach #1, we have , where is the alphabet, and we only need to answer the following question (26 times, one for each character): how many substrings have exactly one ?

Algorithm

Consider how many substrings have a specific . For example, let's say S only has three "A"'s, at 'S[10] = S[14] = S[20] = "A"; and we want to know the number of substrings that contain S[14]. The answer is that there are 4 choices for the left boundary of the substring (11, 12, 13, 14), and 6 choices for the right boundary (14, 15, 16, 17, 18, 19). So in total, there are 24 substrings that have S[14] as their unique "A".

Continuing our example, if we wanted to count the number of substrings that have S[10], this would be 10 * 4 - note that when there is no more "A" characters to the left of S[10], we have to count up to the left edge of the string.

We can add up all these possibilities to get our final answer.

Complexity Analysis

  • Time Complexity: , where is the length of S.

  • Space Complexity: . We could reduce this to if we do not store all the indices, but compute the answer on the fly.


Analysis written by: @awice. Approach #2 inspired by @lee215.