CSV parsing at scale

The question is how do you parse CSV?
Now assume that the order of your rows have changed and new columns have been added. How do you parse this and do so in a scalable manner? I added the scalable part, because I can only assume that's what the question is leading towards. I completely misunderstood this question (rude interviewer) and I'm still not sure what the best answer is.

If I was to handle this literally, I'd just get a stupid library to do this obviously. I was assuming that the interviewer wasn't asking me to literally go through the header row and create some 2d array, but maybe I'm wrong. So if that is the case, then that's one way this can be done. We probably do need a 2D array since order matters. If I'm wrong here, please let me know.

For scale, would there be a need for a SQL db here? That just seems over complicating it, but I'm not sure how to handle scale when it consumes a significant chunk of memory. It'll eventually have to go to disk if that's the case then maybe there's some other type of nosql db that can work.

Comments (5)