Amazon | Phone | Seattle | Column Store vs Row Store

Let's say you receive data from various customers, in following format :

DataEntry : {
id : Integer
time : DateTime
value : Double
}

You wish to run queries like average, sum , max etc for a fixed time period (Interview said couple of hours). Now, One DataEntry is coming every nano second. Which one would you choose between RowStore or ColumnStore and why ?

I tried to give my reason to favor column store but interviewer's response was always that one can throw in data in a MySQL(RowStore) and make it work. I tried to give following answers but interviewer wasn't convinced :

  1. Column Store can store via columns thereby restricting data access to only relevant data(value in this case). No need to access entire rows and then pull out columns.
  2. Time series data is "append-only" in nature(You don't go around changing past values). Column Stores can handle PetaBytes of data with ease and are move available, partition-tolerant than Row Stores.

Although, interviewers didn't expilcity said this but I felt like his question was this : What advantage does Column Store give over Row Store if scaling is not a problem? I might be wrong though.

Any help is appreciated. Thanks.

Comments (7)