The Painter’s Partition Problem Part II

April 6, 2011 in binary search

Note:
This is Part II of the article: The Painter’s Partition Problem. Please read Part I for more background information.

You have to paint N boards of length {A0, A1, A2 … AN-1}. There are K painters available and you are also given how much time a painter takes to paint 1 unit of board. You have to get this job done as soon as possible under the constraints that any painter will only paint continuous sections of board, say board {2, 3, 4} or only board {1} or nothing but not board {2, 4, 5}.

In my previous post we talked about how to use the power of Dynamic Programming (DP) to solve a fairly difficult problem. Although the DP approach is pretty neat, it uses an extra O(kN) space. In this post, we introduce an algorithm without extra space and runs in the order of O(N log c) time, where c is the sum of all elements in A.

Hint:
Try to think how to apply Binary Search indirectly to solve this problem. First, it does not require A to be sorted in any way. Second, if you sort A, the constraint that the painter must paint continuous sections of board does not necessarily hold anymore. Think carefully: What if you have a maximum board length in mind such that no painter can exceed this value?

Solution:
Assume that you are assigning continuous section of board to each painter such that its total length must not exceed a predefined maximum, costmax. Then, you are able to find the number of painters that is required, x. Following are some key observations:

  1. The lowest possible value for costmax must be the maximum element in A (name this as lo).
  2. The highest possible value for costmax must be the entire sum of A, (name this as hi).
  3. As costmax increases, x decreases. The opposite also holds true.

Now, the question translates directly into:

How do we use binary search to find the minimum of costmax while satisfying the condition x = k? The search space will be the range of [lo, hi].

Let us study using the example below to understand how this works:

Assume that A = { 100, 200, 300, 400, 500, 600, 700, 800, 900 }, 
and k = 3.

Since k is positive, we know that the highest possible costmax must be the sum of A, 4500. (ie, assigning all boards to one painter).

The lowest possible costmax must be the largest element in A, 900. This requires a total of six painters — The first painter paints {100, 200, 300}, second painter paints {400, 500} while the rest of them paints one board each).

Below is a simple conceptual illustration of how the search space looks like, with its corresponding x value (the required number of painters) pointing to costmax.

900        ...       4500
 ↑                     ↑
x=6                   x=1

Note that x decreases while costmax increases.
We want to find the minimum of costmax under the constraint of x = k.

900  ...  2700  ...  4500
 ↑          ↑          ↑
x=6        x=2        x=1

We choose the middle element, 2700, and find its corresponding x, which is 2.
Since x = k (which equals to 3), we can find a better minimum (ie, the real solution) in the lower half.
Therefore, we discard the upper half and continue searching in the lower half.

900  ...  1800  ...  2700
 ↑          ↑          ↑
x=6        x=3        x=2

The middle element now is 1800 and its corresponding x is 3, which is still = k.
We discard the upper half again and continue searching in the lower half.

900  ...  1350  ...  1800
 ↑          ↑          ↑
x=6        x=5        x=3

This time, the middle element is 1350 and its corresponding x is 5, which is > k.
We have to discard the lower half including the middle element itself, since x > k in that region (ie, no valid solution).
Here, read the phrase in bold including the middle element itself again, as this is extremely important to maintain the invariant. Why?
Yes, that means the lower half including the value 1350 would be discarded.
We continue searching in the upper half (ie, in the range of [1351, 1800]).

After multiple successions of halving the search space, the final answer is 1700, and its corresponding x is 3. This is also the minimum of costmax while maintaining the requirement x = k.

The complexity of this algorithm is O(N log (Ai )), which is quite efficient. Furthermore, it does not require any extra space, unlike the DP solution which requires O(kN) space. This solution is also easier to code. Just be very careful while writing code for any variation of binary search, and think through all the corner cases while you code.

Additional Reference:
TopCoder’s Tutorial on Binary Search

Further Thoughts:
The above binary search solution has a run time complexity of O(N log ( ∑ Ai )). What impact would the term ? Ai (ie, the search space) have over the run time itself? What if the search space is extremely large?

As you may have known, binary search is extremely fast. For example, binary search in a range of one million elements requires no more than 20 iterations. Here, we explore an alternative solution which reduces the search space even further. (Thanks to an id grass for the idea)

The idea is to first build a cumulative sum array B, where B[j] = ∑{i=0..j} Ai. We know that we must allocate one of the amount found in B[j] to the first painter. Since the elements in B is in increasing order, we can apply binary search in B to answer the following question:

Find the index j such that B[j] fails to satisfy the condition while B[j+1] satisfy.

Finally, we can set lo = B[j] and hi = B[j+1], and apply the same binary search from method 1 from above.

Using the above example:

A = { 100, 200, 300, 400, 500, 600, 700, 800, 900 }, and k = 3.

We build the cumulative array B:

B = { 100, 300, 600, 1000, 1500, 2100, 2800, 3600, 4500 }

After we apply binary search, we are able to find that B[4] = 1500, B[5] = 2100, because for costmax = 1500, x = 4; while costmax = 2100, x =3.

Time required: O(N log N)

Then, we set lo = 1500 and hi = 2100, then apply binary search to find in that range.

Time required: O(N log (B[j+1]-B[j]) ) = O(N log A[j+1]).

Therefore, the worst case run time complexity = O(N log N + N log max(A) )

VN:F [1.9.22_1171]
Rating: 4.7/5 (20 votes cast)
The Painter's Partition Problem Part II, 4.7 out of 5 based on 20 ratings

29 responses to The Painter’s Partition Problem Part II

  1. Hi, I still can not quite get this part:

    We have to discard the lower half including the middle element itself, since x > k in that region (ie, no valid solution).
    Here, read the phrase in bold “including the middle element itself“ again, as this is extremely important to maintain the invariant. Why?
    Yes, that means the lower half including the value 1350 would be discarded.
    We continue searching in the upper half (ie, in the range of [1351, 1800]).

    Why we must use the 1351. Why it is so important to add the 1 here?

    VA:F [1.9.22_1171]
    +1
    • This is because costmax = 1350 is not a valid solution (due to x > k) and thus must be discarded.
      If you don’t discard it, the invariant of binary search is no longer maintained.
      You can try this on an array with just two elements. If you chose not to discard the element in the border of lower half (when x > k), then you will be stuck in an infinite loop (ie, the search space stays at two elements).

      VN:F [1.9.22_1171]
      0
      • Thanks very much. Let me try.

        VA:F [1.9.22_1171]
        0
      • Hi, so why lo = mid +1 ;

        But hi doesn’t need hi = mid-1?

        900 … 1800 … 2700
        ↑ ↑ ↑
        x=6 x=3 x=2

        The middle element now is 1800 and its corresponding x is 3, which is still ≤ k.
        We discard the upper half again and continue searching in the lower half.

        VA:F [1.9.22_1171]
        0
        • I think he tries to maintain the invariant such that
          the value will only exist in the specified range, if
          it exists.

          It can be made that in each iteration, lo or hi is
          set to mid instead of mid+1, but it will need a
          different loop invariant and a different loop
          ending criteria.

          VA:F [1.9.22_1171]
          0
    • Invariant of binary search here, I think, meas two parts: one part is that the target value would be in the searching range always, and another part is that the searching range will decrease in each loop so that the termination can be reached. Then mid+1 aims to satisfy the second part while meet the first part. For the meaning of invariant, you can refer to following post: http://reprog.wordpress.com/2010/04/25/writing-correct-code-part-1-invariants-binary-search-part-4a/

      VA:F [1.9.22_1171]
      0
  2. What is an id grass?

    VA:F [1.9.22_1171]
    0
  3. How about this part of the problem “you are also given how much time a painter takes to paint 1 unit of board”, which implies that each painter may paint the same section of boards with different time? In other words, painter j (0<=j<k) can paint at Tj hours / unit. I think the solution here and the DP solution in part I only solve the special case of the original problem where all Tj's are equal.

    When Tj's are different, the binary search may still apply, but now
    . lo = max(Ai) * min(Tj)
    . hi = sum(Ai) * max(Tj)
    However finding x is much more difficult. I think we can safely sort Tj's and use the smallest ones, and reduce finding x to a different sub problem. I haven't thought through and get the complexity though.

    Adapting the DP solutioin as it is to accont for differnt Tj's may be more difficult though.

    What do you think? Did I miss something?

    VA:F [1.9.22_1171]
    +2
    • The binary search would be fine, I think. Just we need to consider the order of painters when get the required painters. I still don’t quite understand how it will work by using the smallest ones. For the DP, I think the sub problems would be more complex, and We can change the sub problem into the optimal solution for a set of k different painters instead of just k painters. Time complexity may be big, though.

      VA:F [1.9.22_1171]
      0
  4. Good post! I enjoyed reading it.

    You may need to be careful with grass’s modified low and high limits for binary search. If you are going to go that route, then there is a small problem with this function: getRequiredPainters(int A[], int n, int maxLengthPerPainter). If maxLengthPerPainter is less than max(a[]), it still returns a positive number, while in fact, it shall return an error instead. In the case of the algorithm your code is listed above, this function shall work because the low limit is set to the max(a[]). So you wouldn’t fall into the error scenario. However, if you use grass’s method, sum(a[0]….a[i]) can be less than the max(a[]). That way, you would end up with a pair of wrong low and high limits.

    For example, if a[] = { 10, 60, 200, 210, 220 }, and k=4.
    when maxLengthPerPainter = a[0] = 10, calling getRequiredPainters will return 5
    when maxLengthPerPainter = a[0] + a[1] = 10 + 60 = 70, calling getRequiredPainters will return 4.
    So here lo=10 and hi = 70, and the final answer will be 70. While the correct answer shall be 220, which is the max(a[]).

    There are two ways to resolve this issue. One is to modify getRequiredPainters to let it return -1 when maxLengthPerPainter is less than max(a[]) and do some error check inside the calling function; The other is to set lo to max(a[]), and get hi with two restraints: getRequiredPainters=lo. Then you can continue with the binary search to get the correct answer.

    It seems to me that such reducing of the limits would not give too much improvement to the binary search method. It however, can avoid the scenario that the sum of a[] may overflow.

    VA:F [1.9.22_1171]
    +1
    • I had the same concern, and following is my code for it:

      int getK(int arr[], int n, int max){
      int k = 0;
      int sum = 0;
      for(int i = 0; i max){
      k ++;
      if(sum == arr[i]) return INT_MAX;//There is only one element in the partition, which is greater than the max cost.
      sum = arr[i];
      }
      }

      return k;
      }

      Returning INT_MAX can imply the invalid partitions, since INT_MAX always is used as an infinite integer in c++, and the additional detection is not needed.

      I agree that this way doesn’t have too much improvement in running time, and avoiding the overflow of the sum of a[] is a good point.:)

      VA:F [1.9.22_1171]
      0
    • I agree with @bluesun that grass’ approach can be wrong. Another example is

      The cumulative array B

      The number of painters in total is 5.

      Firstly, we check “3″ in B, and the number of painters needed is 3, which is smaller than 5, than we decrease our index – to decrease our maxLengthPerPainter and to increase the number of painters needed. Secondly, we check 2, and the number of painters needed is 3. Again, we decrease the index. Finally, we check 1, and the number of painters needed is 5. We cannot decrease the index any more. Then, the return value will be 1, which is definitely wrong, as the correct value is 100.

      The error is caused because maxLengthPerPainter is smaller than the maximum value in A.

      And, assuming grass’ approach is correct, the worst case run time complexity is worse than or the same as the previous approach. Why? Because the complexity of grass’ approach is O(NlogN + Nlogmax(A)) = O(Nlog(N*max(A))), and N*max(A) >= log(sum(A)), thus O(Nlog(N*max(A))) is worse than or the same as O(Nlog(sum(A))) – the time complexity of previous approach.

      VN:F [1.9.22_1171]
      0
  5. binary search in B and then binary serach in B[j]—B[j+1] cannot reduce the complexity, since it is binary search

    log(size of array B) + log(B[j+1] – B[j]) = log((size of array B) x (B[j+1]- B[j])) = log(B[n] – B[1])

    VA:F [1.9.22_1171]
    +1
  6. HI,
    very nice explaination,
    for this solution, what if i take (sum of N)/K as average(Avg) of any painter. linearly adding each N[i] until my absolyte difference of Avg and {N[0..i]} is decreasing. As soon as it moves from decreasing to increasing phase, i’ll stop it and assign those boards to one of the kth painter, and it continues till kth painter.
    will this give me feasible solution??

    surender

    VA:F [1.9.22_1171]
    0
    • Surrender,

      I think we can’t get feasible solution always. For example, assume the array is {1, 2, 1, 2, 2} and k = 3. Then the Avg would be 2. So your solution will partition the array into {1}, {2}, {1, 2, 2}, which gets the result of 5. However, the best solution is {1,2}, {1, 2} and {2}, which gets the result of 3.

      Liping

      VA:F [1.9.22_1171]
      0
  7. Hi,

    Although it is obvious that the optimal solution can be got by this way. But should we still provide proof for it? For example, is it possible that the interviewer will ask us to prove it? Given a max cost, we feed as many continuous blocks as possible into a partition. and then we minimize the max cost. How can we prove this can get the optimal solution?

    VA:F [1.9.22_1171]
    0
  8. If we use array B[] as suggested , worst case time run time for getRequiredPainters will be O(NlogN) and the while loop which calls getRequiredPainters will be executed at the most O(log maxA) then worst case run time for whole algo should be O(log maxA) * O(NlogN). Please suggest.

    VA:F [1.9.22_1171]
    0
  9. Hi

    Your solution provides an answer to the average value. I believe the exact question is to divide the array into partitions and the output should be

    Partition 1: a, b, c
    Partition 2: d, e

    ………………..
    Partition k: x,y

    Something like above

    Using 1700 in the above example how does the parttions can be arrived

    VA:F [1.9.22_1171]
    0
  10. Let A = max_i {A_i}, and B = \sum_i {A_i}.
    Note that B is at most nA.
    So we can simply do bisect the interval [A, B], which gives O(n logn) time algorithm.
    (The number of iterations needed is log(B/A) <= logn.)

    This is a standard trick in this kind of algorithm to drop the dependencies on the input size.

    VN:F [1.9.22_1171]
    0
  11. !!!!!!!!!!!!! THIS IS A K STEPS SHORTEST PATH PROBLEM !!!!!!!!!!!!!!!!
    Correct me if I am wrong.

    Here is the DAG:
    B0, A0, B1, A1, …. , A(N-1), BN
    V =
    E = {(Bi, Bj) where 0<=i<j<=N}, and length(Bi, Bj) = sqr((Ai+…+A(j-1)))

    The problem is to find a K step shortest path from B0 to BN.

    VN:F [1.9.22_1171]
    0
  12. The nodes set is xml-ed.
    V={B0, B1, B2, … BN}

    VN:F [1.9.22_1171]
    0
  13. this seems to assume that all painters paint in the same speed. The solution breaks if apainter is > twice as fast as the painter adjacent to them.

    I think DP would work for that case….

    Having said that, this is an elegant solution and has numerous applications in resource or job allocations for smp machines.

    VA:F [1.9.22_1171]
    0
    • I think author should clarify his assumptions (like all painter have the same speed)

      Nevertheless, I like the approach and would like to apply it in this way.

      First, you can sort painter by productivity. Most productive painter is P1, less productive is Pk

      Next assume that you have time constrain. It is possible to check if all boards can be painted by group of painters within this time period.
      This can be done by computing ideal maximum board length each painter can handle within given period. Next, for each painter, starting from the most productive one, find longest continuous subarray with maximum cumulative board length the painter can handle (can be done in O(n) for each painter).
      If you have unassigned boards left, that means it can not be done.

      Once you have this step, you can do binary search on time-to-complete. Basically assume maximum time painters can spend working is one year. Then, check if they complete the job in 365 days, if yes, check 180 if no, check 270 and so on.

      The complexity would be O(k*n*log(365)) where k is number of painters and n is number of boards.

      I am just throwing an idea and not sure if it works.

      VN:F [1.9.22_1171]
      0
  14. The complexity of this algorithm should be O(Nlog(c/max(A))), because the binary search has a initial lower bound of max(A) and upper bound of c. Another way to think about it is, if we replace array A with {1, 2, 3, 4, 5, 6, 7, 8, 9}, the computation cost is exactly the same with the original case. So the complexity cannot be O(Nlog(c)).

    VA:F [1.9.22_1171]
    0
  15. This paragraph is really a good one it helps new net users, who are wishing in favor of blogging.

    VA:F [1.9.22_1171]
    0
  16. This is a really good tip especially to those new to the blogosphere.
    Simple but very accurate info… Thanks for sharing this one.
    A must read post!

    VA:F [1.9.22_1171]
    0

Leave a reply

Your email address will not be published. Required fields are marked *

You may use the <code> tag to embed your code.