MathCo Cloud Engineer II Interview Experience
Anonymous User
199
Aug 20, 2025
Aug 20, 2025

MathCo Cloud Engineer II Interview Experience

I reached out to the recruiter via email (found through a LinkedIn post).
The recruiter called and scheduled my first round.

Focus Areas: Python, Pandas, SQL, Spark,Azure .


Questions Asked & My Answers

1. Introduction

Brief introduction about myself, experience, and projects.
Some Azure Questions , Whats ADF,Logic Apps , Azure Functions , whats the differences . Blob

2. Python & Pandas Task

Question:
Write a Python script to:

  • Read a CSV file with sales data
  • Group by region and calculate total sales
  • Identify the top 3 regions by sales
  • Output results to a new CSV

Answer:

import pandas as pd

# Read input CSV
df = pd.read_csv("sales_data.csv")

# Group by region and calculate total sales
sales_df = df.groupby("region", as_index=False)["sales"].sum()

# Sort by sales in descending order
sales_df = sales_df.sort_values("sales", ascending=False)

# Get top 3 regions
top_regions = sales_df.head(3)

# Save to CSV
top_regions.to_csv("top_regions.csv", index=False)

3) SQL Question

Tables:

  • orders(order_id, customer_id, order_date, amount)
  • customers(customer_id, name, region)

Task:
Find the top 3 customers by total order amount in the last 6 months.
Include customer name and region.

Answer:

WITH recent_orders AS ( 
    SELECT *
    FROM orders
    WHERE order_date >= DATEADD(MONTH, -6, GETDATE())
),
customer_orders AS (
    SELECT customer_id, SUM(amount) AS total_amount
    FROM recent_orders
    GROUP BY customer_id
)
SELECT TOP 3 c.name, c.region, co.total_amount
FROM customers c
JOIN customer_orders co 
    ON c.customer_id = co.customer_id
ORDER BY co.total_amount DESC;

4. PySpark Task

Question:
Given a large dataset of user activity logs:

  • Load the data using PySpark.
  • Filter logs for a specific date range.
  • Group by user and count actions.
  • Save the result as a Parquet file.

Solution (PySpark):

from pyspark.sql import SparkSession
from pyspark.sql.functions import col

# Initialize Spark
spark = SparkSession.builder.appName("UserActivityLogs").getOrCreate()

# Load data (CSV format assumed, could also be JSON/Parquet)
df = spark.read.csv("user_activity.csv", header=True, inferSchema=True)

# Define date range
start_date = "2025-01-01"
end_date = "2025-01-31"

# Filter logs within the date range
filtered_df = df.filter((col("time") >= start_date) & (col("time") <= end_date))

# Group by user and count number of actions
result = filtered_df.groupBy("user").count()

# Save result to Parquet file
result.write.mode("overwrite").parquet("output/user_activity_summary.parquet")

# Stop Spark session
spark.stop()

✅ Verdict: I was selected for the next round 🎉

Round 2 Interview Experience

The interviewer didn’t seem very interested in interviewing.

  • Flow of the Interview:
    • Short introduction
    • Asked about real-time experience with PySpark (I didn’t have much to share)
    • Asked a couple of questions

Questions Asked

  1. Best Practices for Orchestration
    • Use workflow orchestrators like Airflow, Azure Data Factory, or Prefect
    • Maintain modular DAGs and clear dependencies
    • Implement error handling & retries
    • Use parameterization for reusability
    • Ensure logging & monitoring for observability
    • Version control your pipelines (Git)
    • Automate testing & CI/CD

  1. Python Question

    • Write a function that gives the sum of any number of inputs using *args.
    def add_numbers(*args):
        return sum(args)
    
    # Example
    print(add_numbers(1, 2, 3, 4))  # Output: 10

Verdict : Rejected

The interview ended in just 10 minutes.
Overall, it wasn’t my day.

Comments (0)