## Implement strstr() to Find a Substring in a String

October 14, 2010 in string

Write C code to implement the strstr (Search for a substring) function. Do not use any system library such as strlen.

The strstr function locates a specified substring within a source string. Strstr returns a pointer to the first occurrence of the substring in the source. If the substring is not found, strstr returns a null pointer.

As expected, this question is very popular among technical interviews. This question tests your ability in manipulating strings using pointers, as well as your coding ability.

Of course, you can demonstrate to your interviewer that this problem can be solved using known efficient algorithms such as Rabin-Karp algorithm, KMP algorithm, and the Boyer-Moore algorithm. Since these algorithms are usually studied in advanced algorithms class, for an interview it is sufficient to solve it using the most direct method — The brute force method.

For non-C programmers who are not familiar with C-style strings, your first reaction is to obtain the length of the string. But remember, use of system library is not allowed. If you are not familiar with pointer manipulation, it’s time to brush up your pointer skills.

The brute force method is straightforward to implement. We scan the substring (the target) with the string from its first position and start matching all subsequent letters one by one. If one of the letter doesn’t match, we start over again with the next position in the string. Assume that N = length of string, M = length of substring, then the complexity is O(N*M).

Think you’ve got it all in your head? Try writing the code down on a piece of paper. Remember to test for special cases.

Did you handle the special case correctly? That is, when the target substring is an empty string. What should you return in this case? The correct implementation of strstr should return the full string in this case.

The above solution is usually good enough for an interview. But it turns out we are able to improve the code a little further. Note that the above code iterates at most N times in the outer loop.

The improvement is based on one observation: We only need to iterate at most N-M+1 times in the outer loop. Why? Because after looping more than N-M+1 times, the string has less than M characters to be matched with the substring. In this case, we know no more substring will be found in the string, and therefore saving us from additional comparisons (which might be substantial especially when M is large compared to N.)

How do we loop for only N-M+1 times? We are able to calculate the size of the string and substring by iterating a total of N+M times. In fact, finding just the length of the substring, M is sufficient. We use an additional pointer, p1Adv and advance it M-1 times ahead of the string pointer. Therefore, when p1Adv points to ‘\0′, we know that it has iterated N-M+1 times.

VN:F [1.9.22_1171]
Implement strstr() to Find a Substring in a String, 4.4 out of 5 based on 36 ratings

### 32 responses to Implement strstr() to Find a Substring in a String

if (!*p1)
return NULL;

after

if (!*p2)
return p1Begin;

VA:F [1.9.22_1171]
+1
2. in your first answer, will there be an infinite loop in while (*p1)?

VA:F [1.9.22_1171]
0
3. No, because this line:
p1 = p1Begin + 1;

is advancing p1 by one step forward in each iteration.

Eventually p1 *p1 will be equal to '\0', which will exit the loop.

VA:F [1.9.22_1171]
+1
4. Hi, 1337:

Good optimization on the algorithm.

After seeing your first solution, I was also thinking that there is room for improvement, based on the same thought about the difference of the two strings. But it is easier thought of than done, you know, esp. when it involves considerable pointer’s manipulation.

By the way, I have a confusion here. Why shall the function return the source string’s beginning location when the target string is of 0-length?

VA:F [1.9.22_1171]
0
5. O(n) solution where n is the size of str.

char* StrStr(const char *str, const char *target) {
if (*target == NULL)
return (char*)str;
char *p1 = (char*) str;
char *p2 = (char*) target;
char *p1Begin = p1;
while (*p1) {
if (*p1 == *p2) {
if (p2 == (char*)target) {
p1Begin = p1;
}
p2++;
} else {
p2 = (char*) target;
}
if (!*p2)
return p1Begin;
p1++;
}
return NULL;
}

VA:F [1.9.22_1171]
-1
6. nvm, this doesn’t work for some special cases.

VA:F [1.9.22_1171]
+1
7. You dont need to check whether p1 reaches the end of the string. Your solution assumes that the string is longer than the substring, and you use p1Adv to specify the bound. So checking *p1 is redundant.

VA:F [1.9.22_1171]
0
8. I think the improved code has a bug. Like floor #7 says, if size of “str” is less than size of “target”, the code will not work. It only works under the assumption that “str” has a larger size than “target”.

VA:F [1.9.22_1171]
0
9. I have a solution of O(n), very simple code, tried works:

const char *mystrstr(const char* str, const char* substr)
{
if(!substr || !str) return NULL;

const char *p_str = str, *s_str = substr, *matchHead = NULL;

while(*p_str != ” && *s_str!= ”)
{
if(*p_str == *s_str)
{
if(s_str == substr) matchHead = p_str;
p_str++;
s_str++;
}
else
{
p_str++;
s_str = substr;
}
}

if(*p_str == ” && *s_str !=”)
return NULL;
else
}

VA:F [1.9.22_1171]
-2
• You are right. My code is similar to yours.

char * my_strstr(char *src , char *target){
if(src == NULL && target == NULL)
return NULL;
else{
char *temp_target = target;
int found = 0;
int firstMatchedChar = 1;
char *substrptr = NULL;

while(*src !=”){
if(*src == *temp_target){
if(firstMatchedChar){
substrptr = src;
printf(“1st Match is at : %c\n”,*substrptr);
firstMatchedChar = 0;
}
temp_target++;
if(*temp_target == ”){
found = 1;
printf(“Breaking . String found..\n”);
break;
}
}
else{
printf(“No match at %c . Resetting pointers\n”, *src);
temp_target = target;
firstMatchedChar = 1;
substrptr = NULL;
}
src++;
}
if(found){
printf(“Match found at %s\n”,substrptr);
return substrptr;
}
else{
printf(“No match found\n”);
return NULL;
}
}
}

VA:F [1.9.22_1171]
0
10. char * strstr (const char * str, const char* substr){
const char * p1= str;
const char * p2= substr;
const char * pos = str;
if (!*p1 && !*p2) return true;
while (p1&&p2){
if (*p1==*p2) p1++; p2++;
else {
pos++;p1=pos;p2=substr;
}
}
if (p1 || !*p1&&!*p2) return true;
return false;
}

VA:F [1.9.22_1171]
0
11. This is the most optimized code

char* strstr(char* haystack, char* needle) {
for (;; ++haystack) {
char* h = haystack;
for (char* n = needle;; ++n, ++h) {
if (!*n) return haystack;
if (*h != *n) break;
}
if (!*h) return NULL;
}
}

VA:F [1.9.22_1171]
0
12. I might be misunderstood something, but what happened in the second version when

is longer than

?

VA:F [1.9.22_1171]
0
13. A small improvement. When you are comparing each character of the two strings via:
while (*p1 && *p2 && *p1 == *p2) {
p1++;
p2++;
}
you can record the next character that equals *target. Then if this comparison doesn’t give a match, you can start directly from the next location instead of doing
p1 = p1Begin + 1

VN:F [1.9.22_1171]
0

VA:F [1.9.22_1171]
0
15. I like your code. and I’ve seen the other post…but those are complicated and some of them are wrong for this purpose…

VA:F [1.9.22_1171]
-1
16. There might be 2 small issues for the improved version.

1. str and target are the same string.
2. target is longer than str.

the fixes will be as below:
while (*p2++)
{
}

VA:F [1.9.22_1171]
+2

VA:F [1.9.22_1171]
0
18. can nebody plz help..i have a log sheet of say 2000 lines.i have to compare a substring with each line.So,first i have to take each line as a string.Then compare the substring with each string.Suppose i got 50 such strings which contain the substring.Now i need only those strings which have the substring at location 5.

VA:F [1.9.22_1171]
0
19. char *string = “heror”;
char *pattern = “herehero”;

This case it will not work.

VA:F [1.9.22_1171]
0
20. i have write the below code in C language please tell me is this code correct or not or what modification required please send me your opinion on gmohit5877@gmail.com

#include
#include
void main()
{
int i=0,j;
char *st,*sub;
clrscr();
printf(“Enter the main String \n”);
scanf(“%s”,st);
printf(“\n enter the substring : \n”);
scanf(“%s”,sub);

while(st[i]!=”)
{
j=0;
while(sub[j]!=”)
{
if(st[i+j]==sub[j])
j++;
else
break;
}
if(sub[j]==”)
break;
i++;
}
if(sub[j]==”)
printf(“the substring is at position %d “,i);
getch();
}

thank you

VA:F [1.9.22_1171]
0
21. How complexity is o(m*n)?

VA:F [1.9.22_1171]
-1
22. VA:F [1.9.22_1171]
0
23. simplest way to write a strstr();
char *mystrstr(char *main,char *sub)
{
char *p,*q;/*to store the base address*/
p =main;
q = sub;
while (*main){
while (*sub){
if (*main == *sub){
main++; sub++;
}
else
break;
}
if (*sub == ”)
return p;

main++;
p = main;/*reassigning the current value*/
sub = q;/*setting the string again to first position*/
}
}

VA:F [1.9.22_1171]
0
• I suspect that Amith KUmar forgot a return NULL near the bottom.

I suspect that Amith Kumar forgot that if the needle is the empty string
then the main string or haystack should be returned.

Here is a slightly more correct impementation to Amith’s ‘simplest way’.

Please forgive me for swapping the use of the parameter
variables and local variables from the way that Amith originally

With much code movement and optimization, you can
end up with a very similar and smaller algorithm I have posted below.

VA:F [1.9.22_1171]
0
24. VA:F [1.9.22_1171]
0
• I suppose I could do it without the ‘else’

VA:F [1.9.22_1171]
0
• Recursion for Recursion’s sake

VA:F [1.9.22_1171]
0
• Darn. Got it wrong. This time for sure.
char * strstr(const char * s1, const char *s2)
{
if (!*s2) return (char *)s1;
if (!*s1) return NULL;
if ((*s1 == *s2) &&
(strstr(s1+1, s2+1) == *s1 + 1)) return (char *)s1;
return strstr(s1+1, s2);
}

VA:F [1.9.22_1171]
0