I reached out to the recruiter via email (found through a LinkedIn post).
The recruiter called and scheduled my first round.
Focus Areas: Python, Pandas, SQL, Spark,Azure .
Question:
Write a Python script to:
Answer:
import pandas as pd
# Read input CSV
df = pd.read_csv("sales_data.csv")
# Group by region and calculate total sales
sales_df = df.groupby("region", as_index=False)["sales"].sum()
# Sort by sales in descending order
sales_df = sales_df.sort_values("sales", ascending=False)
# Get top 3 regions
top_regions = sales_df.head(3)
# Save to CSV
top_regions.to_csv("top_regions.csv", index=False)
Tables:
orders(order_id, customer_id, order_date, amount)customers(customer_id, name, region)Task:
Find the top 3 customers by total order amount in the last 6 months.
Include customer name and region.
Answer:
WITH recent_orders AS (
SELECT *
FROM orders
WHERE order_date >= DATEADD(MONTH, -6, GETDATE())
),
customer_orders AS (
SELECT customer_id, SUM(amount) AS total_amount
FROM recent_orders
GROUP BY customer_id
)
SELECT TOP 3 c.name, c.region, co.total_amount
FROM customers c
JOIN customer_orders co
ON c.customer_id = co.customer_id
ORDER BY co.total_amount DESC;Question:
Given a large dataset of user activity logs:
Solution (PySpark):
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
# Initialize Spark
spark = SparkSession.builder.appName("UserActivityLogs").getOrCreate()
# Load data (CSV format assumed, could also be JSON/Parquet)
df = spark.read.csv("user_activity.csv", header=True, inferSchema=True)
# Define date range
start_date = "2025-01-01"
end_date = "2025-01-31"
# Filter logs within the date range
filtered_df = df.filter((col("time") >= start_date) & (col("time") <= end_date))
# Group by user and count number of actions
result = filtered_df.groupBy("user").count()
# Save result to Parquet file
result.write.mode("overwrite").parquet("output/user_activity_summary.parquet")
# Stop Spark session
spark.stop()✅ Verdict: I was selected for the next round 🎉
The interviewer didn’t seem very interested in interviewing.
Python Question
*args.def add_numbers(*args):
return sum(args)
# Example
print(add_numbers(1, 2, 3, 4)) # Output: 10Verdict : Rejected
The interview ended in just 10 minutes.
Overall, it wasn’t my day.