**1.Online Test 1 hr 30mins around **
1.Multiple choice questions 10
2. one SQL question ( Find the total number of unique students and total number of students scored full marks by age )
3. One Python Question ( find the given list contains odd pairs or not with length of n .. n is always even number )**2. Technical Round 1 hr 30mins **
1.About the technical skills
2.How to submit a Spark job and the configurations used
3.Spark optimization
4.catalyst optimizer
5.Handling small files in Spark
6.EMR VS Glue
7.Athena
8.Partition Pruning
9.Predicate pushdown
10.Write sql program to find the data --> Each person with total sales amount and include highest sales city name
11.Write the same code in Pyspark
12.Python code to find the longest consequetive sequence in the given list**3. Technical Round 1 hr 30mins **
1.About the recent project skill set
2.Difference between RDBMS , Datawarehouse , NoSQL
3.EMR questions
4.Difference between lambda and functions in Python
5.Python memory management
6.Iterators, generators and Decorators
7.different concepts in python basics like list comprehension
8.Redshift performance and how to optimize long running queries
9.Repartition vs Coalesce in Pyspark
10. application / job/stage/task
11. How to identify the long running tasks and how we will approach
12. data Skewness
13. What are the aways we can store data
14. Parquet vs avro and when to use waht
15.Star schema vs snowflake Schema
16. Snowflake data warehouse advantages
17. SQL concepts
18. lambda configurations in aws
19. S3 storage related questions
20. lambda vs EMR vs GLue
21. GLue etl related questions
22. 2 SQL questions like multiple employes sales in each month and how to do with and with out window functions
23. Write the same code in Pyspark but get the data from csv or text from S3 location
24. Python question from text file we need to read the data and sort it based on the name and if name is same we need to sort is based on the age. "((john - 19) (john - 23) (boss - 30) (krishna - 40))"
#My Solution given on interview :
input_file = "((john - 19) (john - 23) (boss - 30) (krishna - 40))"
input_dic = {}
for i in input_file.split(') ('):
if '((' in i:
i = i.replace('(','')
if '))' in i:
i = i.replace(')','')
x , y = i.split(' - ')
input_dic[x] = input_dic.get(x,[]) + [y]
z = [i for i in input_dic.keys()]
z.sort()
ans = "("
for i in z:
temp = input_dic[i]
temp.sort()
for j in temp:
ans += ' (' + i + ' - ' + j + ')'
ans = ans +')'
print(ans)1. DE roles and responsibilities
2. DataLake vs DataWarehouse
3. How CI CD was implemented in my project
4. My daily activities and more questions to drill
5. how the aws service changes deployed in your project
6. How you solve the conflict between junior resources to deliver the product on time.
7. SQL question to find the cumulative sum and using 2 menthods
8. Implement the same in Pyspark for above SQL question
9. Performance tuning related questions on Spark
10. Aws cloud formation
11. aws aurora questions