I have been asked this question in the interview of a start-up
We have an array with millions of elements. We are given a quad core machine. We need to add these elements in the most efficient way.
My approach
Since we are given a quad core machine, let us fork out four threads. Let us divide our input array into chunks of equal size. The last chunk may have less elements, depending on the size of the array.
Start 4 threads and to each of these threads pass the chunk. This can be done by passing the individual array along with its start index and the end index.
Also declare a result array of size 4 in which we will save the sum obtained by summing the elements of each thread.
After the threads return, then do a summation over this array of size 4 and return the sum.
Discussion points: