Longest Palindromic Substring Part II

November 20, 2011 in string

Given a string S, find the longest palindromic substring in S.

Note:
This is Part II of the article: Longest Palindromic Substring. Here, we describe an algorithm (Manacher’s algorithm) which finds the longest palindromic substring in linear time. Please read Part I for more background information.

In my previous post we discussed a total of four different methods, among them there’s a pretty simple algorithm with O(N2) run time and constant space complexity. Here, we discuss an algorithm that runs in O(N) time and O(N) space, also known as Manacher’s algorithm.

Hint:
Think how you would improve over the simpler O(N2) approach. Consider the worst case scenarios. The worst case scenarios are the inputs with multiple palindromes overlapping each other. For example, the inputs: “aaaaaaaaa” and “cabcbabcbabcba”. In fact, we could take advantage of the palindrome’s symmetric property and avoid some of the unnecessary computations.

An O(N) Solution (Manacher’s Algorithm):
First, we transform the input string, S, to another string T by inserting a special character ‘#’ in between letters. The reason for doing so will be immediately clear to you soon.

For example: S = “abaaba”, T = “#a#b#a#a#b#a#”.

To find the longest palindromic substring, we need to expand around each Ti such that Ti-d … Ti+d forms a palindrome. You should immediately see that d is the length of the palindrome itself centered at Ti.

We store intermediate result in an array P, where P[ i ] equals to the length of the palindrome centers at Ti. The longest palindromic substring would then be the maximum element in P.

Using the above example, we populate P as below (from left to right):

T = # a # b # a # a # b # a #
P = 0 1 0 3 0 1 6 1 0 3 0 1 0

Looking at P, we immediately see that the longest palindrome is “abaaba”, as indicated by P6 = 6.

Did you notice by inserting special characters (#) in between letters, both palindromes of odd and even lengths are handled graciously? (Please note: This is to demonstrate the idea more easily and is not necessarily needed to code the algorithm.)

Now, imagine that you draw an imaginary vertical line at the center of the palindrome “abaaba”. Did you notice the numbers in P are symmetric around this center? That’s not only it, try another palindrome “aba”, the numbers also reflect similar symmetric property. Is this a coincidence? The answer is yes and no. This is only true subjected to a condition, but anyway, we have great progress, since we can eliminate recomputing part of P[ i ]‘s.

Let us move on to a slightly more sophisticated example with more some overlapping palindromes, where S = “babcbabcbaccba”.


Above image shows T transformed from S = “babcbabcbaccba”. Assumed that you reached a state where table P is partially completed. The solid vertical line indicates the center (C) of the palindrome “abcbabcba”. The two dotted vertical line indicate its left (L) and right (R) edges respectively. You are at index i and its mirrored index around C is i’. How would you calculate P[ i ] efficiently?

Assume that we have arrived at index i = 13, and we need to calculate P[ 13 ] (indicated by the question mark ?). We first look at its mirrored index i’ around the palindrome’s center C, which is index i’ = 9.


The two green solid lines above indicate the covered region by the two palindromes centered at i and i’. We look at the mirrored index of i around C, which is index i’. P[ i' ] = P[ 9 ] = 1. It is clear that P[ i ] must also be 1, due to the symmetric property of a palindrome around its center.

As you can see above, it is very obvious that P[ i ] = P[ i' ] = 1, which must be true due to the symmetric property around a palindrome’s center. In fact, all three elements after C follow the symmetric property (that is, P[ 12 ] = P[ 10 ] = 0, P[ 13 ] = P[ 9 ] = 1, P[ 14 ] = P[ 8 ] = 0).


Now we are at index i = 15, and its mirrored index around C is i’ = 7. Is P[ 15 ] = P[ 7 ] = 7?

Now we are at index i = 15. What’s the value of P[ i ]? If we follow the symmetric property, the value of P[ i ] should be the same as P[ i' ] = 7. But this is wrong. If we expand around the center at T15, it forms the palindrome “a#b#c#b#a”, which is actually shorter than what is indicated by its symmetric counterpart. Why?


Colored lines are overlaid around the center at index i and i’. Solid green lines show the region that must match for both sides due to symmetric property around C. Solid red lines show the region that might not match for both sides. Dotted green lines show the region that crosses over the center.

It is clear that the two substrings in the region indicated by the two solid green lines must match exactly. Areas across the center (indicated by dotted green lines) must also be symmetric. Notice carefully that P[ i ' ] is 7 and it expands all the way across the left edge (L) of the palindrome (indicated by the solid red lines), which does not fall under the symmetric property of the palindrome anymore. All we know is P[ i ] ≥ 5, and to find the real value of P[ i ] we have to do character matching by expanding past the right edge (R). In this case, since P[ 21 ] ≠ P[ 1 ], we conclude that P[ i ] = 5.

Let’s summarize the key part of this algorithm as below:

if P[ i' ] ≤ R – i,
then P[ i ] ← P[ i' ]
else P[ i ] ≥ P[ i' ]. (Which we have to expand past the right edge (R) to find P[ i ].

See how elegant it is? If you are able to grasp the above summary fully, you already obtained the essence of this algorithm, which is also the hardest part.

The final part is to determine when should we move the position of C together with R to the right, which is easy:

If the palindrome centered at i does expand past R, we update C to i, (the center of this new palindrome), and extend R to the new palindrome’s right edge.

In each step, there are two possibilities. If P[ i ] ≤ R – i, we set P[ i ] to P[ i' ] which takes exactly one step. Otherwise we attempt to change the palindrome’s center to i by expanding it starting at the right edge, R. Extending R (the inner while loop) takes at most a total of N steps, and positioning and testing each centers take a total of N steps too. Therefore, this algorithm guarantees to finish in at most 2*N steps, giving a linear time solution.

Note:
This algorithm is definitely non-trivial and you won’t be expected to come up with such algorithm during an interview setting. However, I do hope that you enjoy reading this article and hopefully it helps you in understanding this interesting algorithm. You deserve a pat if you have gone this far! :)

Further Thoughts:

  • In fact, there exists a sixth solution to this problem — Using suffix trees. However, it is not as efficient as this one (run time O(N log N) and more overhead for building suffix trees) and is more complicated to implement. If you are interested, read Wikipedia’s article about Longest Palindromic Substring.
  • What if you are required to find the longest palindromic subsequence? (Do you know the difference between substring and subsequence?)

Useful Links:
» Manacher’s Algorithm O(N) 时间求字符串的最长回文子串 (Best explanation if you can read Chinese)
» A simple linear time algorithm for finding longest palindrome sub-string
» Finding Palindromes
» Finding the Longest Palindromic Substring in Linear Time
» Wikipedia: Longest Palindromic Substring

VN:F [1.9.22_1171]
Rating: 4.9/5 (7 votes cast)
Longest Palindromic Substring Part II, 4.9 out of 5 based on 7 ratings

57 responses to Longest Palindromic Substring Part II

  1. i think my solution is simpler than this one, the key point of my solution is:
    The center point of palindromic substring is always follow this pattern, either is “…..XyX…..” or “….XX….”.

    so you can scan once and then find those center point of palindromic substring and then expand it on each center points to find the one with maxium length.

    i ve already posted my java solution in the comments of Longest Palindromic Substring Part I

    VA:F [1.9.22_1171]
    0
    • Yes your solution is simpler but runs in O(N^2) worst case. It is already discussed in my previous post.

      VN:F [1.9.22_1171]
      0
      • i agreed, it is O(N^2) worst case, thanks.

        VA:F [1.9.22_1171]
        0
        • No problem.
          Basically this algorithm is an improvement over your method. It is using the symmetric property of a palindrome to eliminate some of the recomputations of palindrome’s length, and amazingly improve it to a linear time solution.

          VN:F [1.9.22_1171]
          0
    • Excellent post, I learnt a lot

      Thanks!

      VA:F [1.9.22_1171]
      0
  2. Looks like this is O(N^2) algorithm as there is a while loop in for loop. Could you please clarify?

    VA:F [1.9.22_1171]
    0
    • Even with the extra while loop inside, it is guaranteed in the worst case the algorithm completes in 2*n steps.

      Think of how the i and right edge (R) relates. In the loop each time, you look if this index is a candidate to re-position the palindrome’s center. If it is, you increment the existing R one at a time. See? R could only be incremented at most N steps. Once you incremented a total of N steps, it couldn’t be incremented any more. It’s not like you will increment R all the time in the while loop. This is called amortized O(1).

      VN:F [1.9.22_1171]
      0
      • How is it amortised O(1) ? I am confused from the definition from amortise.

        Amortised to me means , for a worst case of runs of some operations its amortised cost over all of them

        VA:F [1.9.22_1171]
        0
  3. Got it. Thanks for the clarification. Great solution

    VA:F [1.9.22_1171]
    0
  4. Great write-up. Thanks for the article

    VN:F [1.9.22_1171]
    0
    • Thanks!
      My goal of writing this article is to provide an intuitive way to understand the algorithm. I hope you really appreciate the beauty of this algorithm.

      VN:F [1.9.22_1171]
      0
  5. while explaining how to fill P[i] you mentioned

    if P[ i' ] ≤ R – i,
    then P[ i ] ← P[ i' ]
    else P[ i ] ≥ P[ i' ]. (Which we have to expand past the right edge (R) to find P[ i ].

    is the else statement right?? shouldnt be “else P[ i' ] ≥ P[ i ]“

    VA:F [1.9.22_1171]
    +1
  6. It should rather be:
    else P[ i ] ≥ (R-i) (Which we have to expand past the right edge (R) to find P[ i ])
    .. Also note how coherent the reasoning in the bracket sounds now.

    Explanation :
    When P[i'] > R-i then all we know, by symmetry about C, is :
    P[i'] > R-i .. by obviousness
    and
    P[i] ≥ R-i .. by the meaning of R
    From this we clearly cannot conclude upon max(P[i'], P[i])

    VA:F [1.9.22_1171]
    0
  7. Using suffix tree can do this in O(n). And building suffix tree can be also done in O(n): http://blog.csdn.net/g9yuayon/article/details/2574781

    But this algorithm is pretty cool too!

    VA:F [1.9.22_1171]
    0
  8. Clear explanation. It could hardly be any better :)

    Thumbs up for elucidating this magic O(n) solution in such intuitive manner. You got talent to clearly expressing an algorithm, which I even missed in books like Cormen’s Algorithm!

    VA:F [1.9.22_1171]
    0
    • Thanks! Good to know I’ve done my job — to introduce tricky but interesting algorithms in an intuitive manner.

      VN:F [1.9.22_1171]
      0
  9. Really beautiful algorithm.

    VN:F [1.9.22_1171]
    0
  10. Wonderful writeup with great illustrations! I think there is one minor bug in your code: if s itself is a palindrome, then the following line accesses the array out of bounds.

    VA:F [1.9.22_1171]
    0
    • ahh… you are right! Thanks for your sharp observation.

      I thought that I feel something is not right when I decided to add ‘$’ both to the begin and the end of the input string. (It should be adding two different sentinels ‘^’ and ‘$’ to the begin and the end of the string. This should avoid bounds checking and the out of bounds problem)

      VN:F [1.9.22_1171]
      0
  11. Thanks for such a lucid explanation. I had already visited all the references that you had suggested at the end. I was not able to understand the essence of it until I read yours. :)

    Well, I have one question.
    Is it possible to run the algorithm without using the ‘#’,'^’,'$’ symbols?

    VA:F [1.9.22_1171]
    0
    • As an after-thought, I have one doubt.
      Why are we using the line
      P[i] = (R > i) ? min(R-i, P[i_mirror]) : 0;

      Can you please clarify this?

      VN:F [1.9.22_1171]
      0
  12. there are another two solutions
    1) suffix array version preprocess requires N*logN query is O(N) http://www.mashengjun.info/?p=901
    2) another O(N) solution http://www.mashengjun.info/?p=464 using up&down pointer

    VN:F [1.9.22_1171]
    0
  13. The first i_mirror is -1 which is less than 0 so there may be run time error?

    VN:F [1.9.22_1171]
    0
  14. It’s really O(N)?

    VA:F [1.9.22_1171]
    0
  15. A Very nice explanation!!!!:)
    One thing I could not understand… “In this case, since P[ 21 ] ≠ P[ 1 ], we conclude that P[ i ] = 5.”.. In this statement, from what I have understood, I think it should be P[21] ≠ P[8]. Please correct me if I’m wrong…..

    VA:F [1.9.22_1171]
    0
  16. The conclusion of the algorithm above states that
    if P[ i' ] ≤ R – i,
    then P[ i ] ← P[ i' ]
    else P[ i ] ≥ P[ i' ]. (Which we have to expand past the right edge (R) to find P[ i ].

    The first check should be P[ i' ] < R – i, as when they are equal the proper value of p[i] can not be fully determined with P[ i' ] only but needs to expand.
    for example: string #b#b#a#b#a#b#a# with i = 9, c = 7, R = 12

    VA:F [1.9.22_1171]
    0
  17. J said on May 5, 2012

    It is clear that the two substrings in the region indicated by the two solid green lines must match exactly. Areas across the center (indicated by dotted green lines) must also be symmetric.

    i is in green area (index 15). So I should have the same value as i[7]. but it doesn’t.

    So what is going on?

    VA:F [1.9.22_1171]
    0
    • J said on May 5, 2012

      The first two lines are quotes from the written explanation.

      VA:F [1.9.22_1171]
      0
  18. Is there a problem in the following code? i_mirror will be -1 and P[i_mirror] will be out of boundary when C=0 and i =1, or do I miss something? Thanks.

    int C = 0, R = 0;
    for (int i = 1; i i) ? min(R-i, P[i_mirror]) : 0;

    }

    VA:F [1.9.22_1171]
    0
  19. Here is the piece of code where I am confused:
    int C = 0, R = 0;
    for (int i = 1; i i) ? min(R-i, P[i_mirror]) : 0;

    VA:F [1.9.22_1171]
    0
  20. Should P[i] R – i ??

    VA:F [1.9.22_1171]
    0
  21. Have to disagree with the others. You have actually managed to complicate a simple algorithm!

    The key idea is quite simple actually.

    VA:F [1.9.22_1171]
    0
  22. Recurrence:

    L[i] = max{L[i-1] + 1, #Arr[i-1 - L[i-1]…i] is univalue.
    L[i-1]+2 #Arr[i] == Arr[i-1-L[i-1]]}
    L[i]=1 otherwise.

    Am i missing something here? this recurrrence solves it and it is a lot simpler.

    VA:F [1.9.22_1171]
    0
    • My bad :) my recurrence messes up the overlapping palindromes case, awesome solution and nice explanation. you deserve a pat on your back :) thank you

      VA:F [1.9.22_1171]
      0
  23. The following is the implementation of the Manacher’s algorithm without pre-processing the input string. It is a bit clumsy – sorry for that. I tested it using the online judge here and it seems to be working fine.

    Language : java

    VA:F [1.9.22_1171]
    0
    • Sorry,there seems to be some problem – some part of the code seems to be omitted when I post my code using the

      tag. So, I am posting it without the code tag.

      public String longestPalindrome(String s) {

      if(s==null)
      return “”;

      int len=s.length();
      int[] p=new int[2*len-1];
      p[0]=1;
      int R=0,C=0;
      int curLen,l,r,start;
      for(int i=1;ii) ? Math.min(curLen,p[iMirror]) : ( i%2==0 ? 1 : 0) );
      if(i%2==0){
      l=(i/2-p[i]/2-1);
      r=(i/2+p[i]/2+1);
      }
      else{
      l=(i/2-p[i]/2);
      r=(i/2+p[i]/2+1);
      }

      while(l>=0 && rR){
      C=i;
      R=r-1;
      }

      }

      int maxIndex=getMaxIndex(p);
      if(maxIndex%2==0)
      start=(maxIndex/2-p[maxIndex]/2);
      else
      start=(maxIndex/2-p[maxIndex]/2 +1);

      return s.substring(start,start+p[maxIndex]);

      }

      int getMaxIndex(int p[]){
      int len=p.length;
      int maxIndex=0;
      for(int i=1;ip[maxIndex])
      maxIndex=i;
      }
      return maxIndex;
      }

      VA:F [1.9.22_1171]
      0
  24. nice explaination thanks for article..
    :P

    VA:F [1.9.22_1171]
    0
  25. I am stuck at summarized part of this algo.
    if P[ i' ] ≤ R – i,
    then P[ i ] ← P[ i' ]
    else P[ i ] ≥ P[ i' ]. (Which we have to expand past the right edge (R) to find P[ i ].)
    how could we say which one be large for else part.?
    even in example for i=15,p[i]=5,i’=7 p[i']=7;
    i m confused here. plz help me.
    Thanks..

    VA:F [1.9.22_1171]
    0
  26. try simple solution.
    bool is palindrome(string s)
    { int len=s.length(), a=0, b=len-1;
    while(a<b)
    {
    if(s[a]!=s[b])
    return false;
    }
    return true;
    }

    VN:F [1.9.22_1171]
    -1
  27. try simple solution.
    bool is palindrome(string s)
    { int len=s.length(), a=0, b=len-1;
    while(a<b)
    {
    if(s[++a]!=s[--b])
    return false;
    }
    return true;
    }

    VN:F [1.9.22_1171]
    0
  28. Dude your solution is O(n^2) take a case of a string aaaaaa, palindrome length is 6 and solution for this string confirms O(n^2)

    VA:F [1.9.22_1171]
    0
    • I don’t think so.

      Take a look at the explanation, it is guaranteed that the operation needed will not exceed O(N). Maybe you should try to simulate the algorithm with that input

      VN:F [1.9.22_1171]
      0
  29. Thanks so much, exelent tutorial!

    VN:F [1.9.22_1171]
    0
  30. really nice one..helpedme to learn a lot.. :)

    VA:F [1.9.22_1171]
    0
  31. GOOD WORK :)

    VN:F [1.9.22_1171]
    0
  32. Wonderful post. Thanks for posting these kind of problems and their solutions which will help in interviews. I have a question in this part-II solution.

    I think the complexity will be still O(n * n), since you are traversing the string twice actually. One for modifying the original input string to insert characters ^,#,$ and then again you will do another traversal from the beginning to end of the string to search for actual palindromes in the string.

    VN:F [1.9.22_1171]
    0

Leave a reply

Your email address will not be published. Required fields are marked *

You may use the <code> tag to embed your code.