## The Painter’s Partition Problem Part II

April 6, 2011 in binary search

**Note:**

This is Part II of the article: The Painter’s Partition Problem. Please read Part I for more background information.

You have to paint N boards of length {A

_{0}, A_{1}, A_{2}… A_{N-1}}. There areKpainters available and you are also given how much time a painter takes to paint 1 unit of board. You have to get this job done as soon as possible under the constraints that any painter will only paint continuous sections of board, say board {2, 3, 4} or only board {1} or nothing but not board {2, 4, 5}.

In my previous post we talked about how to use the power of Dynamic Programming (DP) to solve a fairly difficult problem. Although the DP approach is pretty neat, it uses an extra O(*kN*) space. In this post, we introduce an algorithm without extra space and runs in the order of O(*N* log *c*) time, where *c* is the sum of all elements in A.

**Hint:**

Try to think how to apply Binary Search indirectly to solve this problem. First, it does not require A to be sorted in any way. Second, if you sort A, the constraint that the painter must paint continuous sections of board does not necessarily hold anymore. Think carefully: What if you have a maximum board length in mind such that no painter can exceed this value?

**Solution:**

Assume that you are assigning continuous section of board to each painter such that its total length must not exceed a predefined maximum, cost_{max}. Then, you are able to find the number of painters that is required, *x*. Following are some key observations:

- The lowest possible value for cost
_{max}must be the maximum element in A (name this as**lo**). - The highest possible value for cost
_{max}must be the entire sum of A, (name this as**hi**). - As cost
_{max}increases,*x*decreases. The opposite also holds true.

Now, the question translates directly into:

_{max}while satisfying the condition

*x*=

*k*? The search space will be the range of [

**lo**,

**hi**].

Let us study using the example below to understand how this works:

Assume that A = { 100, 200, 300, 400, 500, 600, 700, 800, 900 }, and k = 3.

Since *k* is positive, we know that the highest possible cost_{max }must be the sum of A, 4500. (ie, assigning all boards to one painter).

The lowest possible cost_{max} must be the largest element in A, 900. This requires a total of six painters — The first painter paints {100, 200, 300}, second painter paints {400, 500} while the rest of them paints one board each).

Below is a simple conceptual illustration of how the search space looks like, with its corresponding *x* value (the required number of painters) pointing to cost_{max}.

900 ... 4500 ↑ ↑ x=6 x=1

Note that *x* decreases while cost_{max} increases.

We want to find the minimum of cost_{max} under the constraint of *x* = *k*.

900 ... 2700 ... 4500 ↑ ↑ ↑ x=6 x=2 x=1

We choose the middle element, 2700, and find its corresponding *x*, which is 2.

Since *x* = *k* (which equals to 3), we can find a better minimum (ie, the real solution) in the lower half.

Therefore, we discard the upper half and continue searching in the lower half.

900 ... 1800 ... 2700 ↑ ↑ ↑ x=6 x=3 x=2

The middle element now is 1800 and its corresponding *x* is 3, which is still = *k*.

We discard the upper half again and continue searching in the lower half.

900 ... 1350 ... 1800 ↑ ↑ ↑ x=6 x=5 x=3

This time, the middle element is 1350 and its corresponding *x* is 5, which is > *k*.

We have to discard the lower half **including the middle element itself**, since *x* > *k* in that region (ie, no valid solution).

Here, read the phrase in bold **“****including the middle element itself****“** again, as this is extremely important to maintain the invariant. Why?

Yes, that means the lower half **including **the value 1350 would be discarded.

We continue searching in the upper half (ie, in the range of [1351, 1800]).

…

After multiple successions of halving the search space, the final answer is 1700, and its corresponding *x* is 3. This is also the minimum of cost_{max} while maintaining the requirement *x* = *k*.

The complexity of this algorithm is *O*(*N* log (** ∑ **A_{i} )), which is quite efficient. Furthermore, it does not require any extra space, unlike the DP solution which requires *O*(*kN*) space. This solution is also easier to code. Just be very careful while writing code for any variation of binary search, and think through all the corner cases while you code.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | int getMax(int A[], int n) { int max = INT_MIN; for (int i = 0; i < n; i++) { if (A[i] > max) max = A[i]; } return max; } int getSum(int A[], int n) { int total = 0; for (int i = 0; i < n; i++) total += A[i]; return total; } int getRequiredPainters(int A[], int n, int maxLengthPerPainter) { int total = 0, numPainters = 1; for (int i = 0; i < n; i++) { total += A[i]; if (total > maxLengthPerPainter) { total = A[i]; numPainters++; } } return numPainters; } int partition(int A[], int n, int k) { int lo = getMax(A, n); int hi = getSum(A, n); while (lo < hi) { int mid = lo + (hi-lo)/2; int requiredPainters = getRequiredPainters(A, n, mid); if (requiredPainters <= k) hi = mid; else lo = mid+1; } return lo; } |

**Additional Reference:**

TopCoder’s Tutorial on Binary Search

**Further Thoughts:**

The above binary search solution has a run time complexity of O(N log ( ∑ Ai )). What impact would the term ? Ai (ie, the search space) have over the run time itself? What if the search space is extremely large?

As you may have known, binary search is extremely fast. For example, binary search in a range of one million elements requires no more than 20 iterations. Here, we explore an alternative solution which reduces the search space even further. (Thanks to an id grass for the idea)

The idea is to first build a cumulative sum array B, where B[j] = ∑{i=0..j} Ai. We know that we must allocate one of the amount found in B[j] to the first painter. Since the elements in B is in increasing order, we can apply binary search in B to answer the following question:

Finally, we can set lo = B[j] and hi = B[j+1], and apply the same binary search from method 1 from above.

Using the above example:

A = { 100, 200, 300, 400, 500, 600, 700, 800, 900 }, and k = 3.

We build the cumulative array B:

B = { 100, 300, 600, 1000, 1500, 2100, 2800, 3600, 4500 }

After we apply binary search, we are able to find that B[4] = 1500, B[5] = 2100, because for costmax = 1500, x = 4; while costmax = 2100, x =3.

Time required: O(N log N)

Then, we set lo = 1500 and hi = 2100, then apply binary search to find in that range.

Time required: O(N log (B[j+1]-B[j]) ) = O(N log A[j+1]).

Therefore, the worst case run time complexity = O(N log N + N log max(A) )

The Painter's Partition Problem Part II,
speeddy said on April 7, 2011

Hi, I still can not quite get this part:

We have to discard the lower half including the middle element itself, since x > k in that region (ie, no valid solution).

Here, read the phrase in bold “including the middle element itself“ again, as this is extremely important to maintain the invariant. Why?

Yes, that means the lower half including the value 1350 would be discarded.

We continue searching in the upper half (ie, in the range of [1351, 1800]).

Why we must use the 1351. Why it is so important to add the 1 here?

+11337c0d3r said on April 7, 2011

This is because costmax = 1350 is not a valid solution (due to x > k) and thus must be discarded.

If you don’t discard it, the invariant of binary search is no longer maintained.

You can try this on an array with just two elements. If you chose not to discard the element in the border of lower half (when x > k), then you will be stuck in an infinite loop (ie, the search space stays at two elements).

0speeddy said on April 7, 2011

Thanks very much. Let me try.

0speeddy said on April 8, 2011

Hi, so why lo = mid +1 ;

But hi doesn’t need hi = mid-1?

900 … 1800 … 2700

↑ ↑ ↑

x=6 x=3 x=2

The middle element now is 1800 and its corresponding x is 3, which is still ≤ k.

We discard the upper half again and continue searching in the lower half.

0maxq said on July 23, 2011

I think he tries to maintain the invariant such that

the value will only exist in the specified range, if

it exists.

It can be made that in each iteration, lo or hi is

set to mid instead of mid+1, but it will need a

different loop invariant and a different loop

ending criteria.

0lipingwu said on August 3, 2011

Invariant of binary search here, I think, meas two parts: one part is that the target value would be in the searching range always, and another part is that the searching range will decrease in each loop so that the termination can be reached. Then mid+1 aims to satisfy the second part while meet the first part. For the meaning of invariant, you can refer to following post: http://reprog.wordpress.com/2010/04/25/writing-correct-code-part-1-invariants-binary-search-part-4a/

0klds said on May 12, 2011

What is an id grass?

0maxq said on July 22, 2011

An ID from mitbbs, I think.

0Haitao said on May 18, 2011

How about this part of the problem “you are also given how much time a painter takes to paint 1 unit of board”, which implies that each painter may paint the same section of boards with different time? In other words, painter j (0<=j<k) can paint at Tj hours / unit. I think the solution here and the DP solution in part I only solve the special case of the original problem where all Tj's are equal.

When Tj's are different, the binary search may still apply, but now

. lo = max(Ai) * min(Tj)

. hi = sum(Ai) * max(Tj)

However finding x is much more difficult. I think we can safely sort Tj's and use the smallest ones, and reduce finding x to a different sub problem. I haven't thought through and get the complexity though.

Adapting the DP solutioin as it is to accont for differnt Tj's may be more difficult though.

What do you think? Did I miss something?

+2lipingwu said on August 3, 2011

The binary search would be fine, I think. Just we need to consider the order of painters when get the required painters. I still don’t quite understand how it will work by using the smallest ones. For the DP, I think the sub problems would be more complex, and We can change the sub problem into the optimal solution for a set of k different painters instead of just k painters. Time complexity may be big, though.

0bluesun said on May 24, 2011

Good post! I enjoyed reading it.

You may need to be careful with grass’s modified low and high limits for binary search. If you are going to go that route, then there is a small problem with this function: getRequiredPainters(int A[], int n, int maxLengthPerPainter). If maxLengthPerPainter is less than max(a[]), it still returns a positive number, while in fact, it shall return an error instead. In the case of the algorithm your code is listed above, this function shall work because the low limit is set to the max(a[]). So you wouldn’t fall into the error scenario. However, if you use grass’s method, sum(a[0]….a[i]) can be less than the max(a[]). That way, you would end up with a pair of wrong low and high limits.

For example, if a[] = { 10, 60, 200, 210, 220 }, and k=4.

when maxLengthPerPainter = a[0] = 10, calling getRequiredPainters will return 5

when maxLengthPerPainter = a[0] + a[1] = 10 + 60 = 70, calling getRequiredPainters will return 4.

So here lo=10 and hi = 70, and the final answer will be 70. While the correct answer shall be 220, which is the max(a[]).

There are two ways to resolve this issue. One is to modify getRequiredPainters to let it return -1 when maxLengthPerPainter is less than max(a[]) and do some error check inside the calling function; The other is to set lo to max(a[]), and get hi with two restraints: getRequiredPainters=lo. Then you can continue with the binary search to get the correct answer.

It seems to me that such reducing of the limits would not give too much improvement to the binary search method. It however, can avoid the scenario that the sum of a[] may overflow.

+1lipingwu said on August 3, 2011

I had the same concern, and following is my code for it:

int getK(int arr[], int n, int max){

int k = 0;

int sum = 0;

for(int i = 0; i max){

k ++;

if(sum == arr[i]) return INT_MAX;//There is only one element in the partition, which is greater than the max cost.

sum = arr[i];

}

}

return k;

}

Returning INT_MAX can imply the invalid partitions, since INT_MAX always is used as an infinite integer in c++, and the additional detection is not needed.

I agree that this way doesn’t have too much improvement in running time, and avoiding the overflow of the sum of a[] is a good point.:)

0Yunsong Wu said on May 29, 2014

I agree with @bluesun that grass’ approach can be wrong. Another example is

The cumulative array B

The number of painters in total is 5.

Firstly, we check “3″ in B, and the number of painters needed is 3, which is smaller than 5, than we decrease our index – to decrease our maxLengthPerPainter and to increase the number of painters needed. Secondly, we check 2, and the number of painters needed is 3. Again, we decrease the index. Finally, we check 1, and the number of painters needed is 5. We cannot decrease the index any more. Then, the return value will be 1, which is definitely wrong, as the correct value is 100.

The error is caused because maxLengthPerPainter is smaller than the maximum value in A.

And, assuming grass’ approach is correct, the worst case run time complexity is worse than or the same as the previous approach. Why? Because the complexity of grass’ approach is O(NlogN + Nlogmax(A)) = O(Nlog(N*max(A))), and N*max(A) >= log(sum(A)), thus O(Nlog(N*max(A))) is worse than or the same as O(Nlog(sum(A))) – the time complexity of previous approach.

0Yunsong Wu said on May 29, 2014

A typo in the last paragraph: N*max(A) >= sum(A) (no “log” here)

0sunder said on June 8, 2011

binary search in B and then binary serach in B[j]—B[j+1] cannot reduce the complexity, since it is binary search

log(size of array B) + log(B[j+1] – B[j]) = log((size of array B) x (B[j+1]- B[j])) = log(B[n] – B[1])

+1lipingwu said on August 3, 2011

I agree about this. Can any body else have a double check?

0surender said on July 27, 2011

HI,

very nice explaination,

for this solution, what if i take (sum of N)/K as average(Avg) of any painter. linearly adding each N[i] until my absolyte difference of Avg and {N[0..i]} is decreasing. As soon as it moves from decreasing to increasing phase, i’ll stop it and assign those boards to one of the kth painter, and it continues till kth painter.

will this give me feasible solution??

surender

0lipingwu said on August 3, 2011

Surrender,

I think we can’t get feasible solution always. For example, assume the array is {1, 2, 1, 2, 2} and k = 3. Then the Avg would be 2. So your solution will partition the array into {1}, {2}, {1, 2, 2}, which gets the result of 5. However, the best solution is {1,2}, {1, 2} and {2}, which gets the result of 3.

Liping

0lipingwu said on August 3, 2011

Hi,

Although it is obvious that the optimal solution can be got by this way. But should we still provide proof for it? For example, is it possible that the interviewer will ask us to prove it? Given a max cost, we feed as many continuous blocks as possible into a partition. and then we minimize the max cost. How can we prove this can get the optimal solution?

0Abhishek said on November 22, 2011

If we use array B[] as suggested , worst case time run time for getRequiredPainters will be O(NlogN) and the while loop which calls getRequiredPainters will be executed at the most O(log maxA) then worst case run time for whole algo should be O(log maxA) * O(NlogN). Please suggest.

0Topcoder99 said on December 26, 2011

Hi

Your solution provides an answer to the average value. I believe the exact question is to divide the array into partitions and the output should be

Partition 1: a, b, c

Partition 2: d, e

………………..

Partition k: x,y

Something like above

Using 1700 in the above example how does the parttions can be arrived

0Priyanka said on March 24, 2012

Let A = max_i {A_i}, and B = \sum_i {A_i}.

Note that B is at most nA.

So we can simply do bisect the interval [A, B], which gives O(n logn) time algorithm.

(The number of iterations needed is log(B/A) <= logn.)

This is a standard trick in this kind of algorithm to drop the dependencies on the input size.

0michael said on October 21, 2012

!!!!!!!!!!!!! THIS IS A K STEPS SHORTEST PATH PROBLEM !!!!!!!!!!!!!!!!

Correct me if I am wrong.

Here is the DAG:

B0, A0, B1, A1, …. , A(N-1), BN

V =

E = {(Bi, Bj) where 0<=i<j<=N}, and length(Bi, Bj) = sqr((Ai+…+A(j-1)))

The problem is to find a K step shortest path from B0 to BN.

0michael said on October 21, 2012

The nodes set is xml-ed.

V={B0, B1, B2, … BN}

0subu said on June 30, 2013

this seems to assume that all painters paint in the same speed. The solution breaks if apainter is > twice as fast as the painter adjacent to them.

I think DP would work for that case….

Having said that, this is an elegant solution and has numerous applications in resource or job allocations for smp machines.

0sshlyk said on July 5, 2013

I think author should clarify his assumptions (like all painter have the same speed)

Nevertheless, I like the approach and would like to apply it in this way.

First, you can sort painter by productivity. Most productive painter is P1, less productive is Pk

Next assume that you have time constrain. It is possible to check if all boards can be painted by group of painters within this time period.

This can be done by computing ideal maximum board length each painter can handle within given period. Next, for each painter, starting from the most productive one, find longest continuous subarray with maximum cumulative board length the painter can handle (can be done in O(n) for each painter).

If you have unassigned boards left, that means it can not be done.

Once you have this step, you can do binary search on time-to-complete. Basically assume maximum time painters can spend working is one year. Then, check if they complete the job in 365 days, if yes, check 180 if no, check 270 and so on.

The complexity would be O(k*n*log(365)) where k is number of painters and n is number of boards.

I am just throwing an idea and not sure if it works.

0William Li said on August 4, 2014

It is not that simple. Take the array of boards {100, 200, 300, 400, 500, 600, 700, 800, 900, 1000} and the painters who paint at rates of {1, 2, 6} seconds per unit of board. Consider the problem: Is it possible to find a distribution of work such that it is possible to finish in under 3700 units of time. By your algorithm, the first painter paints {100, 200, 300, 400, 500, 600, 700, 800}, the second painter paints {900} and the third painter paints {}. 1000 units of board are yet unpainted.

However, using a different distribution of work, such as painter 1 painting {700, 800, 900, 1000}, painter 2 painting {400, 500, 600}, painter 3 painting {100, 200, 300} we can paint the entire fence.

0stillsea said on April 4, 2014

The complexity of this algorithm should be O(Nlog(c/max(A))), because the binary search has a initial lower bound of max(A) and upper bound of c. Another way to think about it is, if we replace array A with {1, 2, 3, 4, 5, 6, 7, 8, 9}, the computation cost is exactly the same with the original case. So the complexity cannot be O(Nlog(c)).

0