# **EPAM Lead Data Engineer Interview Experience :**

**1.Online Test 1 hr 30mins around **

    1.Multiple choice questions 10
    2. one SQL question ( Find the total number of unique students and total number of students scored full marks by age )
    3. One Python Question ( find the given list contains odd pairs or not with length of n .. n is always even number )

**2. Technical Round 1 hr 30mins **

    1.About the technical skills 
    2.How to submit a Spark job and the configurations used 
    3.Spark optimization 
    4.catalyst optimizer
    5.Handling small files in Spark
    6.EMR VS Glue
    7.Athena 
    8.Partition Pruning 
    9.Predicate pushdown
    10.Write sql program to find the  data --> Each person with total sales amount and include highest sales city name 
    11.Write the same code in Pyspark
    12.Python code to find the longest consequetive sequence in the given list


**3. Technical Round 1 hr 30mins **

    1.About the recent project skill set 
    2.Difference between RDBMS , Datawarehouse , NoSQL
    3.EMR questions
    4.Difference between lambda and functions in Python
    5.Python memory management
    6.Iterators, generators and Decorators 
    7.different concepts in python basics like list comprehension 
    8.Redshift performance and how to optimize long running queries 
    9.Repartition vs Coalesce in Pyspark
    10. application / job/stage/task 
    11. How to identify the long running tasks and how we will approach 
    12. data Skewness 
    13. What are the aways we can store data 
    14. Parquet vs avro and when to use waht 
    15.Star schema vs snowflake Schema 
    16. Snowflake data warehouse advantages 
    17. SQL concepts 
    18. lambda configurations in aws 
    19. S3 storage related questions 
    20. lambda vs EMR vs GLue 
    21. GLue etl related questions 
    22. 2 SQL questions like multiple employes sales in each month and how to do with and with out window functions 
    23. Write the same code in Pyspark but get the data from csv or text from S3 location
    24. Python question from text file we need to read the data and sort it based on the name and if name is same we need to sort is based on the age. 
```
"((john - 19) (john - 23) (boss - 30) (krishna - 40))"

#My Solution given on interview :

input_file = "((john - 19) (john - 23) (boss - 30) (krishna - 40))"

input_dic = {}

for i in input_file.split(') ('):
if '((' in i:
	i = i.replace('(','')
if '))' in i:
	i = i.replace(')','')

x , y = i.split(' - ')

input_dic[x] = input_dic.get(x,[]) + [y]


z = [i for i in input_dic.keys()]

z.sort()

ans = "("

for i in z:

temp = input_dic[i]
temp.sort()

for j in temp:

	ans += ' (' + i + ' - ' + j + ')'

ans = ans +')'

print(ans)
```

# 4. Manager Round 30 to 45 mins 

    1. DE roles and responsibilities
    2. DataLake vs DataWarehouse
    3. How CI CD was implemented in my project 
    4. My daily activities and more questions to drill 
    5. how the aws service changes deployed in your project 
    6. How you solve the conflict between junior resources to deliver the product on time.
    7. SQL question to find the cumulative sum and using 2 menthods 
    8. Implement the same in Pyspark for above SQL question 
    9. Performance tuning related questions on Spark
    10. Aws cloud formation
    11. aws aurora questions 

EPAM Lead Data Engineer Interview Experience :**1.Online Test 1 hr 30mins around ****2. Technical Round 1 hr 30mins ****3. Technical Round 1 hr 30mins **4. Mana