Hubspot interview guidance

Anonymous User

6650

I recently gave my Hubspot Online Asessment which was the question as below. I have added my code to this question as well, which did not return the correct answer. I spent hours and hours looking at the code but still failed. Can anyone please help me figure out what was wrong with the code?

The question...

Many companies use HubSpot for its calling capabilities. Sales reps use the HubSpot product throughout the day to make phone calls to prospects.

We've found that certain customers using HubSpot have a large number of sales reps concurrently making calls with HubSpot, and this puts heavy load on our systems. In response to this, we'd like to bill our customers based on their peak calling load. In other words, we'd like to bill customers based on their maximum number of concurrent calls.

You're provided with an HTTP GET endpoint that returns phone call records represented as JSON: https://candidate.hubteam.com/candidateTest/v3/problem/dataset?userKey=69bf2d64913a583b62a92f53b50d

Each call looks something like this:

{
  "customerId": 123,
  "callId": "2c269d25-deb9-42cf-927c-543112f7a76b",
  "startTimestamp": 1707314726000,
  "endTimestamp": 1707317769000
}

customerId is a unique identifier for one customer. One customer may have many sales reps concurrently making calls.
callId is a unique identifier for a single phone call. No two phone calls will have the same callId.
startTimestamp is when the call started. This value is given as a UNIX timestamp in milliseconds. In other words, it's the number of milliseconds that passed between the UNIX epoch (1970-01-01 00:00:00 UTC) and the start of the call.
endTimestamp is when the call ended. This value is also given as a UNIX timestamp in milliseconds. endTimestamp will always be greater than startTimestamp for a given call record.
For the billing team to charge our customers correctly, they need to know the maximum number of concurrent calls for each customer for each day. The billing team has asked you to POST this information to the following endpoint: https://candidate.hubteam.com/candidateTest/v3/problem/result?userKey=69bf2d64913a583b62a92f53b50d. The POST body must be in the following format:

{
  "results": [
    {
      "customerId": 123,
      "date": "2024-02-07",
      "maxConcurrentCalls": 1,
      "timestamp": 1707314726000,
      "callIds": [
        "2c269d25-deb9-42cf-927c-543112f7a76b"
      ]
    }
  ]
}

date is a UTC date in the format YYYY-MM-DD. So the example date refers to February 7th 2024.
maxConcurrentCalls is the maximum number of simultaneous calls that occurred at any time during the corresponding date for this customer.
timestamp is a UNIX timestamp (in milliseconds) at which maxConcurrentCalls was reached for this customer and date. There could be multiple time periods during this date where maxConcurrentCalls is reached. A timestamp during any of these time periods can be chosen.
callIds is an array of calls that were happening for this customer at timestamp. The length of this array should equal maxConcurrentCalls. The order of callIds does not matter.
This example response only has one entry in the results array, but we expect the actual answer to have multiple results.

Note
The startTimestamp of a call is inclusive, and the endTimestamp of a call is exclusive. This means that:
If call A has an endTimestamp of 123, and call B has a startTimestamp of 123, they never overlapped.
For a given results entry, the timestamp should always be less than the endTimestamp of each call in callIds.
A single call may span multiple UTC dates, and calls can be arbitrarily long.
The order of results posted does not matter.
For a given customerId and date, if no phone calls occurred during that date, there should be no results entry with that customerId and date combination.
No two entries in the results array should have identical values for both customerId and date. In other words, for a given customerId and date combination, there should be at most one entry in the array.

For the above question, my solution was as follows,

import requests
from datetime import datetime, timedelta
from collections import defaultdict


def get_dates_between(start_timestamp, end_timestamp):
    start_date = datetime.utcfromtimestamp(start_timestamp / 1000).date()
    end_date = datetime.utcfromtimestamp(end_timestamp / 1000).date()

    current_date = start_date
    dates = []

    while current_date <= end_date:
        dates.append(current_date.strftime('%Y-%m-%d'))
        current_date += timedelta(days=1)

    return dates

class PhoneRecords:
    def __init__(self, url):
        self.url = url
        self.records = self.__get_records()

    def __get_records(self):
        response = requests.get(self.url)
        return response.json()

    def __process_results(self, result):
        answer = {'results': []}

        for customer_id in result:
            for date in result[customer_id]:
                if result[customer_id][date]['maxConcurrentCalls'] > 0:  # Only include dates with calls
                    current_data = {
                        'customerId': customer_id,
                        'date': date,
                        'maxConcurrentCalls': result[customer_id][date]['maxConcurrentCalls'],
                        'callIds': result[customer_id][date]['callIds'],
                        'timestamp': result[customer_id][date]['timestamp']
                    }
                    answer['results'].append(current_data)

        return answer

    def get_peak_phone_calls(self):
        customer_concurrent_data_by_date = defaultdict(lambda: defaultdict(list))

        # Organize the records
        for record in self.records['callRecords']:
            customer_id = record['customerId']
            start_timestamp = record['startTimestamp']
            end_timestamp = record['endTimestamp'] - 1

            dates_active = get_dates_between(start_timestamp, end_timestamp)

            start_date = datetime.utcfromtimestamp(start_timestamp / 1000).strftime('%Y-%m-%d')
            end_date = datetime.utcfromtimestamp(end_timestamp / 1000).strftime('%Y-%m-%d')

            if start_date == end_date:
                # Call starts and ends on the same date
                customer_concurrent_data_by_date[customer_id][start_date].append(
                    (start_timestamp, 'start', record['callId']))
                customer_concurrent_data_by_date[customer_id][start_date].append(
                    (end_timestamp, 'end', record['callId']))

            else:
                customer_concurrent_data_by_date[customer_id][start_date].append(
                    (start_timestamp, 'start', record['callId'])
                )
                for date in dates_active[1:]:
                    customer_concurrent_data_by_date[customer_id][date].append(
                        (int(datetime.strptime(date, '%Y-%m-%d').timestamp() * 1000), 'start', record['callId'])
                    )
                customer_concurrent_data_by_date[customer_id][end_date].append(
                    (end_timestamp, 'end', record['callId'])
                )

        # Process each customer call data
        result = {}

        for customer_id, date_events in customer_concurrent_data_by_date.items():
            result[customer_id] = {}

            for date, events in date_events.items():
                # Sort events by timestamp and event type
                events.sort(key=lambda x: (x[0], x[1] == 'start'))
                print(events)

                concurrent_calls = 0
                max_concurrent_calls = 0
                max_timestamp = None
                active_calls = set()
                call_ids_at_max_concurrency = []

                # Parse through the events to count concurrent calls for a date
                for timestamp, event_type, call_id in events:
                    if event_type == 'start':
                        concurrent_calls += 1
                        active_calls.add(call_id)
                        if concurrent_calls >= max_concurrent_calls:
                            max_concurrent_calls = concurrent_calls
                            max_timestamp = timestamp
                            call_ids_at_max_concurrency = list(active_calls)
                    else:
                        if call_id in active_calls:
                            concurrent_calls -= 1
                            active_calls.remove(call_id)

                result[customer_id][date] = {
                    'maxConcurrentCalls': max_concurrent_calls,
                    'callIds': call_ids_at_max_concurrency,
                    'timestamp': max_timestamp
                }

        return self.__process_results(result)

    def send_records(self, url):
        payload = self.get_peak_phone_calls()
        print(payload)  # For debugging, to see what will be sent
        response = requests.post(url, json=payload)
        print(f"Response Status Code: {response.status_code}")


if __name__ == '__main__':
    phone_records = PhoneRecords(
        'https://candidate.hubteam.com/candidateTest/v3/problem/dataset?userKey=69bf2d64913a583b62a92f53b50d'
    )

    phone_records.send_records(
        'https://candidate.hubteam.com/candidateTest/v3/problem/test-result?userKey=69bf2d64913a583b62a92f53b50d'
    )

Comments (8)