# MathCo Cloud Engineer II Interview Experience

I reached out to the recruiter via email (found through a LinkedIn post).  
The recruiter called and scheduled my first round.

**Focus Areas:** Python, Pandas, SQL, Spark,Azure .   

---

## Questions Asked & My Answers

### 1. Introduction
Brief introduction about myself, experience, and projects.
Some Azure Questions , Whats ADF,Logic Apps , Azure Functions , whats the differences . Blob 
---

### 2. Python & Pandas Task

**Question:**  
Write a Python script to:
- Read a CSV file with sales data  
- Group by region and calculate total sales  
- Identify the top 3 regions by sales  
- Output results to a new CSV  

**Answer:**
```python
import pandas as pd

# Read input CSV
df = pd.read_csv("sales_data.csv")

# Group by region and calculate total sales
sales_df = df.groupby("region", as_index=False)["sales"].sum()

# Sort by sales in descending order
sales_df = sales_df.sort_values("sales", ascending=False)

# Get top 3 regions
top_regions = sales_df.head(3)

# Save to CSV
top_regions.to_csv("top_regions.csv", index=False)


```

### 3) SQL Question  

**Tables:**  
- `orders(order_id, customer_id, order_date, amount)`  
- `customers(customer_id, name, region)`  

**Task:**  
Find the top 3 customers by total order amount in the last 6 months.  
Include customer name and region.  

**Answer:**  
```sql
WITH recent_orders AS ( 
    SELECT *
    FROM orders
    WHERE order_date >= DATEADD(MONTH, -6, GETDATE())
),
customer_orders AS (
    SELECT customer_id, SUM(amount) AS total_amount
    FROM recent_orders
    GROUP BY customer_id
)
SELECT TOP 3 c.name, c.region, co.total_amount
FROM customers c
JOIN customer_orders co 
    ON c.customer_id = co.customer_id
ORDER BY co.total_amount DESC;
```

### 4. PySpark Task  
**Question:**  
Given a large dataset of user activity logs:  
- Load the data using PySpark.  
- Filter logs for a specific date range.  
- Group by user and count actions.  
- Save the result as a Parquet file.  

---

**Solution (PySpark):**

```python
from pyspark.sql import SparkSession
from pyspark.sql.functions import col

# Initialize Spark
spark = SparkSession.builder.appName("UserActivityLogs").getOrCreate()

# Load data (CSV format assumed, could also be JSON/Parquet)
df = spark.read.csv("user_activity.csv", header=True, inferSchema=True)

# Define date range
start_date = "2025-01-01"
end_date = "2025-01-31"

# Filter logs within the date range
filtered_df = df.filter((col("time") >= start_date) & (col("time") <= end_date))

# Group by user and count number of actions
result = filtered_df.groupBy("user").count()

# Save result to Parquet file
result.write.mode("overwrite").parquet("output/user_activity_summary.parquet")

# Stop Spark session
spark.stop()
```

✅ Verdict: I was selected for the next round 🎉


## Round 2 Interview Experience  

The interviewer didn’t seem very interested in interviewing.  

- **Flow of the Interview:**  
  - Short introduction  
  - Asked about real-time experience with **PySpark** (I didn’t have much to share)  
  - Asked a couple of questions  

---

### Questions Asked  

1. **Best Practices for Orchestration**  
   - Use workflow orchestrators like **Airflow**, **Azure Data Factory**, or **Prefect**  
   - Maintain **modular DAGs** and clear dependencies  
   - Implement **error handling & retries**  
   - Use **parameterization** for reusability  
   - Ensure **logging & monitoring** for observability  
   - Version control your pipelines (Git)  
   - Automate testing & CI/CD  

---

2. **Python Question**  
   - Write a function that gives the sum of any number of inputs using `*args`.  

   ```python
   def add_numbers(*args):
       return sum(args)

   # Example
   print(add_numbers(1, 2, 3, 4))  # Output: 10

Verdict : Rejected

The interview ended in just 10 minutes.
Overall, it wasn’t my day.


MathCo Cloud Engineer II Interview ExperienceI reached out to the recruiter via email (found through a LinkedIn post).
The recruiter called and scheduled my first round.Focus Areas: Python, Pandas, SQL, Spark,Azure .Questions Asked & My Answers1. Int

MathCo Cloud Engineer II Interview Experience

Questions Asked & My Answers

1. Introduction

Brief introduction about myself, experience, and projects. Some Azure Questions , Whats ADF,Logic Apps , Azure Functions , whats the differences . Blob

2. Python & Pandas Task

3) SQL Question

4. PySpark Task

Round 2 Interview Experience

Questions Asked

Brief introduction about myself, experience, and projects.
Some Azure Questions , Whats ADF,Logic Apps , Azure Functions , whats the differences . Blob