It was 1.30hr discussion with a senior DE .Below topics we discussed:
- All the tools and tech stack I ever worked on
- What is diff b/w dag and lineage graph
- Architecture of spark
- Optimisations in spark
- Optimisations in bq , sql
- Indexes in sql
- Find all EMP whose dept and Sal are same
- Python lambda functions, why it is faster
- Python decorator
- Python : count all occurrences of words in a string (has map)
- Ci cd
- Working methodology
- Spark job, no of task , actions
- Narrow and wide transformations
- Shuffling
- SCD types
- CDC
- Fact dimension tables , star and snowflake schema
- Query plan in sql
- Spark query physical plan
- Joins , physical joins
- What is data source and data sink when you were using spark application
- Optimisations in airflow
- Airflow variable configuration
- Airflow task depency
- Airflow architecture
- Acid properties
- DWH and delta lake
- Bq denormalizations
- Normalisation vs denormalisation , which is better
- Coalesce repartition
- Lazy evaluation
- What are the ways to upload data in bq
- How you will upload a 10 gb csv file in bq
- How will you delete duplicates from tables in bq
- Bq data types
- Security features in gcp
- Ways to compare files data for data validation
- How are you good with dealing with clients