Data science fundamentals round (1 out of 4 tech rounds)
- Data Quality & Outliers
Question: In a given dataset, some feature values are extremely large. How do you handle them? Do you remove, retain, or transform them?
Follow-up: What are other critical data quality issues you have faced in production systems?
- Feature Engineering
Scenario: You are working for a subscription service experiencing high customer attrition (churn).
Question: What are the top 5 features you would engineer to predict user churn?
- Metrics & Loss Functions
Question: How do you handle tasks that require strict attention to False Negatives (e.g., fraud or disease detection)? What specific performance metric do you optimize for?
- System Design (Time-Series)
Scenario: You are receiving a streaming time-series data feed and need to detect anomalies. The constraints are extreme: it is highly latency-sensitive, and data arrives at 10,000 samples per second.
Question: What is the optimal architectural design for this? How do you balance the trade-off between algorithmic accuracy and system latency?