EPAM Lead Data Engineer Interview Experience
Anonymous User
271

EPAM Lead Data Engineer Interview Experience :

**1.Online Test 1 hr 30mins around **

1.Multiple choice questions 10
2. one SQL question ( Find the total number of unique students and total number of students scored full marks by age )
3. One Python Question ( find the given list contains odd pairs or not with length of n .. n is always even number )

**2. Technical Round 1 hr 30mins **

1.About the technical skills 
2.How to submit a Spark job and the configurations used 
3.Spark optimization 
4.catalyst optimizer
5.Handling small files in Spark
6.EMR VS Glue
7.Athena 
8.Partition Pruning 
9.Predicate pushdown
10.Write sql program to find the  data --> Each person with total sales amount and include highest sales city name 
11.Write the same code in Pyspark
12.Python code to find the longest consequetive sequence in the given list

**3. Technical Round 1 hr 30mins **

1.About the recent project skill set 
2.Difference between RDBMS , Datawarehouse , NoSQL
3.EMR questions
4.Difference between lambda and functions in Python
5.Python memory management
6.Iterators, generators and Decorators 
7.different concepts in python basics like list comprehension 
8.Redshift performance and how to optimize long running queries 
9.Repartition vs Coalesce in Pyspark
10. application / job/stage/task 
11. How to identify the long running tasks and how we will approach 
12. data Skewness 
13. What are the aways we can store data 
14. Parquet vs avro and when to use waht 
15.Star schema vs snowflake Schema 
16. Snowflake data warehouse advantages 
17. SQL concepts 
18. lambda configurations in aws 
19. S3 storage related questions 
20. lambda vs EMR vs GLue 
21. GLue etl related questions 
22. 2 SQL questions like multiple employes sales in each month and how to do with and with out window functions 
23. Write the same code in Pyspark but get the data from csv or text from S3 location
24. Python question from text file we need to read the data and sort it based on the name and if name is same we need to sort is based on the age. 
"((john - 19) (john - 23) (boss - 30) (krishna - 40))"

#My Solution given on interview :

input_file = "((john - 19) (john - 23) (boss - 30) (krishna - 40))"

input_dic = {}

for i in input_file.split(') ('):
if '((' in i:
	i = i.replace('(','')
if '))' in i:
	i = i.replace(')','')

x , y = i.split(' - ')

input_dic[x] = input_dic.get(x,[]) + [y]


z = [i for i in input_dic.keys()]

z.sort()

ans = "("

for i in z:

temp = input_dic[i]
temp.sort()

for j in temp:

	ans += ' (' + i + ' - ' + j + ')'

ans = ans +')'

print(ans)

4. Manager Round 30 to 45 mins

1. DE roles and responsibilities
2. DataLake vs DataWarehouse
3. How CI CD was implemented in my project 
4. My daily activities and more questions to drill 
5. how the aws service changes deployed in your project 
6. How you solve the conflict between junior resources to deliver the product on time.
7. SQL question to find the cumulative sum and using 2 menthods 
8. Implement the same in Pyspark for above SQL question 
9. Performance tuning related questions on Spark
10. Aws cloud formation
11. aws aurora questions 
Comments (0)