Data Scientist interview preparation

899

Q1. What is the difference between AI, Data Science, ML, and DL?

Ans: Artificial intelligence (AI) is the ability of a computer or a robot controlled by a computer to do tasks that are usually done by humans because they require human intelligence and discernment
AI categorized into two categories

General AI - Planning, decision making, identifying objects, recognizing sounds, social &
business transactions
Applied AI - driverless/ Autonomous car or machine smartly trade stocks

Machine Learning: As it is evident from the name, it gives the computer that makes it more similar to humans: The ability to learn. Machine learning is actively being used today, perhaps in many more places than one would expect. Classified into 3 types

supervised
unsupervised
reinforcement

Data Science: Data science has many tools, techniques, and algorithms called from these fields, plus
others –to handle big data
The goal of data science, somewhat similar to machine learning, is to make accurate predictions and to
automate and perform transactions in real-time, such as purchasing internet traffic or automatically
generating content.

Deep Learning: It is a technique for implementing ML.
ML provides the desired output from a given input, but DL reads the input and applies it to another data.
In ML, we can easily classify the flower based upon the features. Suppose you want a machine to look at
an image and determine what it represents to the human eye, whether a face, flower, landscape, truck,
building, etc

Q2. Difference between Supervised, Unsupervised and Reinforcement learning?

Ans: Supervised learning
In a supervised learning model, the algorithm learns on a labeled dataset, to generate reasonable
predictions for the response to new data. (Forecasting outcome of new data)
• Regression
• Classification

Unsupervised learning
An unsupervised model, in contrast, provides unlabelled data that the algorithm tries to make sense of by
extracting features, co-occurrence and underlying patterns on its own. We use unsupervised learning for
• Clustering
• Anomaly detection
• Association
• Autoencoders
Reinforcement Learning
Reinforcement learning is less supervised and depends on the learning agent in determining the output
solutions by arriving at different possible ways to achieve the best possible solution.

Q3. What is Linear Regression?

Ans: Linear Regression tends to establish a relationship between a dependent variable(Y) and one or more
independent variable(X) by finding the best fit of the straight line.
The equation for the Linear model is Y = mX+c, where m is the slope and c is the intercept

Q4. OLS Stats Model (Ordinary Least Square)

Ans: OLS is a stats model, which will help us in identifying the more significant features that can has an
influence on the output. OLS model in python is executed as:
lm = smf.ols(formula = 'Sales ~ am+constant', data = data).fit() lm.confint() lm.summary()

Q6. What is L1 Regularization (L1 = lasso) ?

Ans: The main objective of creating a model(training data) is making sure it fits the data properly and reduce
the loss. Sometimes the model that is trained which will fit the data but it may fail and give a poor
performance during analyzing of data (test data). This leads to overfitting. Regularization came to
overcome overfitting.
Lasso Regression (Least Absolute Shrinkage and Selection Operator) adds “Absolute value of
magnitude” of coefficient, as penalty term to the loss function
Lasso shrinks the less important feature’s coefficient to zero; thus, removing some feature altogether. So,
this works well for feature selection in case we have a huge number of features

Comments (1)