techInterview Discussion 

A part of techInterview.org: answers to technical interview questions.
Your host: Michael Pryor 
Call a string /zeroone balanced/ if it has the same number of zeros and ones.
Examples of zeroone balanced strings: "", "01", "10", "0011", "0101", "0110", "1001", "1010", "1100", ... Examples of strings that are not zeroone balanced: "0", "1", "00", "11", "000", "001", "010", "011", "100", "101", "110", "111", ... A string y is a substring of a string w if there exist strings x and z such that w == x + y + z (where + denotes string concatenation). The problem is to find the length of the longest zeroone balanced substring of the input. Examples: "01" => 2 "001" => 2 ("01") "0011" => 4 "10011" => 4 ("1001" or "0011") "11111" => 0 ("")
Have I misinterpreted the problem?
As per OP: find the length of longest consecutive substring of 0 and 1 such that number of 0s and 1s are equal. Didn't OP wanted to ask following? example: "10010111" return 2 as length of longest consecutive substring is 2 where 0's and 1's are equal. @d Given your example and explanation, the problem seems different. I guess in my above mentioned example, as d says, we need to find length of "01" which is also 2(coincidence), if yes, then ya I have misinterpreted the problem. Am I correct d? will be back soon.
Thank you very much d!
From the problem definition you posted, I could think of a O(n) algorithm. Sweep through the input maintaining the zeroone balance information. Balance is defined as follows: balance = 0 (initialization) If we find a '1' in the input increment by 1, else decrement by 1(we found a zero). balance = 0; max_length = 0; for(int i=0; i<input(length); i++) { if (input[i] == 1) balance = balance + 1; else balance = balance  1; if(balance ==0) { max_length++; } } max_length has the answer.
If d says it is hard, it must be really hard.
Think again people.
Viladge Idi@ Monday, December 28, 2009
@Imran
Let's go through your algorithm for "10000111", as per d's explanation answer should be 6, right? if (input[i] == 1) balance = balance + 1; else balance = balance  1; if(balance ==0) { max_length++; } a[0]=1 balance = 1 max_length = 0 a[1]=0 balance = 0 max_length = 1 a[2]=0 balance = 1 max_length = 1 a[3]=0 balance = 2 max_length = 1 a[4]=0 balance = 3 max_length = 1 a[5]=1 balance = 2 max_length = 1 a[6]=1 balance = 1 max_length = 1 a[7]=1 balance = 0 max_length = 2 your algorithm returns 2 or 4, if you multiply the result by 2 considering balancing. Shouldn't the answer be 3 or 6. Please correct me, if I have assumed anything wrong here. Thanks
sweep the string twice forward and backward, for example:
111010011110 this string has 8 1s and 4 0s, it means the max substring <= 4 it also means there are 4 more 1 than 0 if you can find a consecutive substring in which 1 is four more than 0 from left or from right or both side, then you'll get the max substring e.g. 111010011110 > 111 010011 110 "111" + "110" has 5 1s and 1 0s, then "010011" is the max substring now the question became easer, and it can be solved in O(n) time, but i'm afraid O(1) extra space is not enough for the worst case :)
Rajendra,
You are right! I should have multiplied the length by 2 at the end of the algorithm. Thanks for pointing it out. d! Can you explain why do you think it is impossible/hard??
Hello, everyone. I think I know the answer to this algorithm.
This problem is similar to another problem. Find a maximum subarray in a given array whose sum is 0. When given a 01 series, we can view 0 as 1. My algorithm works as follows, Suppose the array is A: 0 0 1 1 0 1 0 1 Create another array B, each element of B contains two parts, idx and sum. idx is the index in the array i and sum is the sum of A[1..n]. Then do a sorting on B.sum. The last part is scan array B find two element there sum is the same and difference on idx is largest. The whole time complexity is o(nlgn) and space is o(n)
You can actually do it in O(n) time using O(n) bits (which would normally count as O(n/log n) space).
A.F. Saturday, January 02, 2010
I don't have time for a fully detailed description, so I'll give a somewhat sketchy one.
Let us call a substring with equal numbers of 0s and 1s "balanced". Let's assume array indexing starts at 1. Define the following function: f(i) = [number of 1s in a[1..i] ]  [number of 0s in a[1..i] ]. For convenience, also define f(0)=0. Then a substring a[i..j] is balanced iff f(i1)=f(j). A substring starting at position i is of one of three types: Type 1: f(i1) = f(n). Type 2: f(i1) < f(n). Type 3: f(i1) > f(n). The basic idea is to find the maximum length balanced substring of each type and take the maximum among them. Finding the maximumlength substring of Type 1 is straightforward in O(n) time and O(1) space (i.e., O(log n) bits). Types 2 and 3 are symmetrical, so I'll only describe the algorithm for Type 2. The key notion is "suffix minimum". A number 1<=i<=n is said to be a "suffix minimum" if f(i)<f(k) for all i1<=k<=n. (Yes, k ranges from i1 to n, not from i; "i1" is not a typo.) Note that if i is a suffix minimum then i<n and f(i)<f(n). Also note that if i1 and i2 are suffix minima such that i1<i2, then f(i1)<f(i2). Consider i>=1 such that f(i1)<f(n). Let m be maximum such that m>i, m is a suffix minimum, and f(m)<=f(i1). Assuming m exists, the longest balanced substring starting at position i is the one that ends at the unique position j>=m such that f(j)=f(i1). (If you think about it for a moment you will see that by the definition of local minimum and the fact that f(i1)<f(n) there is indeed a unique such j.) Moreover, f() strictly increases in the interval m..j. What if m does not exist? Then (because f(i1)<f(n)) there is no nonempty balanced substring starting at i. To the algorithm. The first order of business is a preprocessing step to compute two things. 1. The value f(n). This takes linear time and O(1) space (i.e., O(log n) bits). 2. An array suf_min[1..n] of bits, such that suf_min[i]=1 iff i is a suffix minimum. This can be done in linear time using O(1) additional space in a single pass over a[] from a[n] down to a[1]. Next we wish to find the minimum i such that there exists a balanced substring of Type 2 starting at i. We also need to find the corresponding m and j for this i, and compute f(i), f(j), and f(m). (Of course, If no such i exists we are done.) This is not difficult to do in O(n) time and O(1) extra space. In general, we will only be incrementing i, j, and m, so updating the corresponding f() values can be done in O(1) per increment. I will therefore assume that the f() values are known whenever needed. We now iterate. Suppose we have found i, m, and j such that f(i1)<f(n), m>i, m is the last suffix minimum such that f(m)<=f(i1), j>=m, and f(j)=f(i1). We proceed to the next i such that f(i1)<f(n). It is easy to see that if the new f(i1) is not greater than the previous one, then it cannot be the starting point of a balanced substring longer than the one found previously. So in this case we just go on to the next candidate. Otherwise, the new m is either the old m, or appears after it. If it is the same as the old m, we just need to search forward with j. Otherwise we find the correct new m with a forward search, and then the new j with a forward search from the new m. I've glossed over some details (in particular what to do when the new m does not exist), but it should be clear that the whole thing can be done in linear time using O(1) extra space. So in total we use an nbit array + a constant number of variables (i.e., additional O(log n) bits) and solve the problem in linear time.
A.F. Sunday, January 03, 2010
Little mistake. In the definition of suffix minimum, k ranges from i1 to n, excluding i, of course.
A.F. Sunday, January 03, 2010
Hi A.F,
the following: "An array suf_min[1..n] of bits, such that suf_min[i]=1 iff i is a suffix minimum. This can be done in linear time using O(1) additional space in a single pass over a[] from a[n] down to a[1]. " Can you please explain how? Sorry if this is obvious. Thanks.
Confused Sunday, January 03, 2010
// C++ code
// Array indexing still starts at 1 (so the input is in A[1]..A[n] // and the output is in suf_min[1]..suf_min[n]). for(int i=n1, prevF=0, minF=0; i >= 0; i){ int currF = prevF  (A[i+1] ? 1 : 1); if(currF > prevF && prevF < minF){ suf_min[i+1] = 1; minF = currF; } else suf_min[i+1] = 0; prevF = currF; } Note that in computing the values of f() backwards it doesn't matter if we start with the true value f(n) or any arbitrarily chosen value (such as 0, which I used in the code above). The suffix minima occur in the same places.
A.F. Monday, January 04, 2010
I was able to get the memory consumption down to O(sqrt(n/log n)) (i.e., sqrt(n log n) bits) but it certainly doesn't look like the "right" answer.
The idea is this. The reason my previous algorithm needed n bits was that it had to compute the suffix minima from right to left, but use them in the lefttoright direction. The point is, however, that it is not necessary to remember all of them explicitly. Instead, we can compute them on demand, if we remember enough precomputed information to make this possible. We partition A[] into blocks of length sqrt(n log n). For each block we compute and store the minimum value that f() attains in the suffix of the array beginning right after the end of the block. This can be done in one forward pass (to compute f(n)), followed by a single backward pass. Since there are n/sqrt(n log n) = sqrt(n / log n) blocks, that is also the space we need to keep these values. Following this preprocessing step, we proceed with the algorithm as before, except that every so often we need to compute a batch of suffix minima to be used next. Specifically, each time the search for m enters a new block, we perform the following minipreprocessing step, which finds and stores the suffix minima occurring in that block. We first compute the value of f() at the block end by a forward scan of the block. Then, using this value and the minimum value of f() in the suffix of A[] beginning right after the end of the block (which the algorithm's preprocessing stage has computed and stored), we find all the suffix minima occurring in the block by a backward scan of the block and store them in a sqrt(n log n) long array of bits. We use this bit array to locate the suffix minima needed while the search proceeds within the block. We reuse this same bit array over and over for all of the blocks, so the total memory consumption is O(sqrt(n log n)), as promised. The time complexity is of course O(n).
A.F. Tuesday, January 05, 2010
That trick sort of looks familiar: http://rjlipton.wordpress.com/2009/11/22/newstreamingalgorithmsforoldproblems/
10000111
xno. of 1's = 0 (Initialized) yno. of 0's = 0 (Initialized) max = 0 (var maintains max balance) scan each digit 1 ( x =1 , y= 0, max =0) 0 => x=1, y=1 0 0 0 => x=1, y=4 Since the next digit is 1 again (starting digit) do the following max = Max( max, Min(x,y)) = Max ( 0, Min(1,4)) = 1 Reset value of x. x=0 1 => x=1 1 1 => x=3 Since this is End of seq/ or start of 0 now max = Max(max, Min(x,y)) = Max(1,Min(4,3)) = 3 Now the ans is 3*2 = 6. Done in linear time with o(1) space. For subsequent changing from 0 to 1, reset x and from 1 to 0, reset y. So max holds the answer.
I think of a way in O(n) with O(n) space
1. For each index i, save the sum(1)sum(0) to another array B. Sum subarray 0 : i. 2. Go through B, get each distinct value's min and max index 3. The biggest max  min is the result. Sample: Input: 1 0 0 0 0 1 1 1 0 0 1 0 1 B: 1 0 1 2 3 2 1 0 1 2 1 2 1 1: Min = Max = 0 0: Min = 1; Max = 7 1: Min = 2; Max = 12 2: Min = 3; Max = 11 3: Min = Max = 4 The result is 10 : 0 0 1 1 1 0 0 1 0 1 
Powered by FogBugz