(Not logged on) | Register | Log On

You can subscribe to this discussion group using an RSS feed reader. techInterview Discussion

Answers to technical interview questions. A part of TechInterview.org

A part of techInterview.org: answers to technical interview questions.

Your host: Michael Pryor

Really confused with substring matching problems!!

I was just preparing myself for an internship and was just going through various problems listed here. WRT to the various substring matching problems, I am really confused. Are there any other substring matching algorithm rather than the suffix tree method? In this post http://discuss.techinterview.org/default.asp?interview.11.212980.20 Schultz has mentioned about some dynamic programming method but the link doesnt seem to be working!!

Hope someone can put some light into this with the some resources.

Thnx in advance
Harsh Agarwal Send private email
Wednesday, July 21, 2010
 
 
Schultz's algorithm was wrong anyway. I believe the proper dynamic program is described in that thread.
d Send private email
Wednesday, July 21, 2010
 
 
I haven't read the Schultz thread fully, but it seems that the DP algorithm described there is O(A*B*C).  Assuming wlog A<=B<=C, you can get O(A(B+C))=O(A*C) as follows (but, of course, the suffix tree approach is linear time).

First, let's start with the standard 2-string DP algo.  (I believe that is actually what your question was asking.)  Let a and b be the strings, and A and B be their lengths, respectively.  You simply fill out a 2-D table L[1..A, 1..B] whose semantics are:
 L[i,j] is the length of the longest common suffix of a a[1..i] and b[1..j].
You compute the table based on the following formula:
  if a[i] != b[j] then L[i,j] = 0;
  else L[i,j] = 1 + L[i-1, j-1].
(And define L[i,0] = L[0,j] = 0 for all i and j.)
The answer is then given by the pair (i,j) for which L[i, j] is maximum (and the actual substring is a[i-L[i,j]+1 .. i] or, equivalently,  b[j-L[i,j]+1 .. j]).

Now to deal with three strings a, b, c, do the above for a and b.  Then use L to compute the following array M[1..A]:
M[i] is the length of the longest suffix of a[1..i] that is also a substring of b.  The formula for M[i] is simply
  M[i]= max_j {L[i,j]}.

Finally, compute T[1..A, 1..C]:
T[i,j] is length of the longest common suffix of a[1..i] and c[1..j] that is also a substring of b.  The formula to use is
  if a[i] != c[j] then T[i,j] = 0;
  else T[i,j] = min{ M[i], 1+T[i-1, j-1] }.
A.F.
Thursday, July 22, 2010
 
 
Here is a working solution for Longest Common substring problem:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void print_lcs(char b[][20], char x[], int i, int j)
{
    if (i==0 || j==0) return;

    if (b[i][j] == 'c') {
        print_lcs(b, x, i-1, j-1);
        printf(" %c", x[i-1]);
    } else if (b[i][j] = 'u') {
        print_lcs(b,x,i-1, j);
    } else {
        print_lcs(b,x,i, j-1);
    }
}

void lcs_length(char x[], char y[])
{
    int m, n, i, j, c[20][20];
    char b[20][20];
    int z = -1;
    int start = 0;
    char *result;
    int x1, x2;

    m = strlen(x);
    n = strlen(y);

    for(i=0; i <=m; i++) c[i][0] = 0;
    for(i=0; i <=n; i++) c[0][i] = 0;

    for(i=1; i <=m; i++) {
        for(j=1; j <= n; j++) {
            if(x[i-1] == y[j-1]) {
                c[i][j] = c[i-1][j-1] + 1;
                //b[i][j] = 'c';
                if (c[i][j] > z) {
                    z = c[i][j];
                    x1 = i;
                    x2 = j;
                    printf("Z before is %d\n", z);
                    start = i-z;
                    printf("X[i-1] is %c; Y[j-1] is %c\n", x[i-1], y[j-1]);
                    printf("Z is %d; I is %d; J is %d;x is %s; start is %d; x[start] is %s\n", z, i, j, x, start, x+start);
                    printf("Start char: %c\n", x[start]);
                }
            } else {
                c[i][j] = 0;
            }
            /*
            else if (c[i-1][j] >= c[i][j-1]) {
                c[i][j] = 0;
                b[i][j] = 'u';
            } else {
                c[i][j] = 0;
                b[i][j] = 'l';
            }
            */
        }
    }

    printf("Before\n");
    if (z != -1) {
        result = (char *) calloc(z, sizeof(char));
        printf("Start is %d; X is %s\n", start, x);
        printf("Temporary %s\n", x+x1-z);
        strncpy(result, x + start, z);
        printf("Result is %s\n", result);
        free(result);
    }

    printf("\n");
}

int main() {
    int i, j;
    char x[20], y[20];

    scanf("%s %s", x, y);

    lcs_length(x, y);
    lcs_length(y, x);
    return 0;
}
Roger Hill Send private email
Tuesday, August 03, 2010
 
 
Powered by FogBugz