Data Scientist Interview Questions India 2025

Q: Walk me through how you'd build a customer-churn model.

Define churn precisely, gather features (usage, tenure, support tickets), handle leakage, split train/validation/test, baseline first, try logistic regression then gradient boosting, evaluate on recall, then translate into a retention action.

Q: What is overfitting and how do you detect and prevent it?

Model performs well on train, poorly on test. Detect via a validation gap. Prevent with cross-validation, regularisation, early stopping, dropout, more data, or a simpler model.

Q: Explain the difference between bagging and boosting.

Bagging trains models in parallel on bootstrapped samples and averages them (Random Forest) — reduces variance. Boosting trains sequentially, each correcting the last (XGBoost, LightGBM) — reduces bias.

Q: How would you design an A/B test and decide the winner?

State hypothesis and primary metric, compute required sample size from baseline rate and minimum detectable effect, randomise, run to significance (avoid peeking), then check p-value and confidence interval before rolling out.

Question 1

Explain the bias-variance tradeoff.

Accepted Answer

High bias = underfitting (model too simple, misses patterns); high variance = overfitting (model memorises noise). The goal is the sweet spot. Mention how regularisation, more data, or simpler models shift the balance.

Question 2

How do you handle an imbalanced dataset (e.g. fraud detection)?

Accepted Answer

Don't rely on accuracy. Use precision/recall, F1, ROC-AUC. Techniques: resampling (SMOTE, undersampling), class weights, anomaly-detection framing, and choosing a threshold that fits the business cost of false negatives.

Question 3

Difference between supervised, unsupervised and reinforcement learning?

Accepted Answer

Supervised uses labelled data (regression, classification). Unsupervised finds structure in unlabelled data (clustering, PCA). Reinforcement learns via reward signals from an environment. Give one real example of each.

Question 4

What is regularisation? Compare L1 and L2.

Accepted Answer

Penalises large weights to reduce overfitting. L1 (Lasso) drives some weights to zero, doing feature selection. L2 (Ridge) shrinks weights smoothly. ElasticNet blends both.

Question 5

How do you evaluate a regression model vs a classification model?

Accepted Answer

Regression: RMSE, MAE, R². Classification: accuracy, precision, recall, F1, ROC-AUC, confusion matrix. Always tie the metric to the business decision the model supports.

Question 6

Explain p-value and statistical significance in simple terms.

Accepted Answer

The probability of seeing a result this extreme if the null hypothesis were true. p<0.05 is the common threshold. Warn against p-hacking and stress practical vs statistical significance.

Question 7

Walk me through how you'd build a customer-churn model.

Accepted Answer

Define churn precisely, gather features (usage, tenure, support tickets), handle leakage, split train/validation/test, baseline first, try logistic regression then gradient boosting, evaluate on recall, then translate into a retention action.

Question 8

What is overfitting and how do you detect and prevent it?

Accepted Answer

Model performs well on train, poorly on test. Detect via a validation gap. Prevent with cross-validation, regularisation, early stopping, dropout, more data, or a simpler model.

Question 9

Explain the difference between bagging and boosting.

Accepted Answer

Bagging trains models in parallel on bootstrapped samples and averages them (Random Forest) — reduces variance. Boosting trains sequentially, each correcting the last (XGBoost, LightGBM) — reduces bias.

Question 10

How would you design an A/B test and decide the winner?

Accepted Answer

State hypothesis and primary metric, compute required sample size from baseline rate and minimum detectable effect, randomise, run to significance (avoid peeking), then check p-value and confidence interval before rolling out.

Question 11

Tell me about a model you shipped to production. What was the impact?

Accepted Answer

Use STAR. Cover the business problem, your approach, how it was deployed and monitored, and the quantified outcome (revenue, cost, accuracy lift). Mention what you'd improve.

Question 12

Describe a time stakeholders rejected your analysis. What did you do?

Accepted Answer

Show you listened, found the real objection (often trust or framing), re-presented with business language and clear visuals, and built buy-in. Avoid sounding defensive.

Question 13

How do you keep your data-science skills current?

Accepted Answer

Mention Kaggle, papers/newsletters, reproducing techniques on real data, and learning the business domain — not just chasing new algorithms.

Question 14

Your model's accuracy dropped suddenly in production. How do you debug?

Accepted Answer

Check for data drift, pipeline/feature breakage, label delay, and seasonality. Compare input distributions train vs live, validate the feature store, and roll back if needed while you investigate.

Question 15

A stakeholder wants a model deployed in 3 days but data is messy. What do you do?

Accepted Answer

Set expectations, ship a simple, explainable baseline that delivers value, document data-quality risks, and plan an iteration. Communicate the tradeoff between speed and reliability clearly.

Question 16

You discover a feature is leaking the target. What now?

Accepted Answer

Stop, remove the leaking feature, re-evaluate honestly, and explain why the earlier 'great' metric was misleading. Integrity over impressive numbers.

Question 17

What are your salary expectations as a data scientist?

Accepted Answer

Deflect first: 'I'd like to understand the scope and team before numbers.' Anchor on market data — entry ₹8–14 LPA, mid ₹18–30 LPA, senior ₹35 LPA+ in India, varying by city and company tier.

Question 18

We can match your current CTC but not more. How do you respond?

Accepted Answer

Quantify the value you add and cite market benchmarks from Glassdoor/AmbitionBox. If base is fixed, negotiate joining bonus, ESOPs, or an early review at 6 months.

Question 19

What does the data infrastructure and ML stack look like here?

Accepted Answer

Shows you care about how models actually reach production. Reveals maturity — feature store, MLOps, or notebooks-on-laptops.

Question 20

How is success measured for this role in the first year?

Accepted Answer

Surfaces whether the role is research, analytics, or production ML, and aligns expectations early.

Question 21

How do data scientists and engineers collaborate here?

Accepted Answer

Tells you whether you'll be blocked on deployment and how cross-functional the team really is.

Data Scientist Interview Questions
India 2025

About Data Scientist interviews in India

🎯 Interview Success Tips

🔧 Technical Questions

🧠 Behavioural Questions

💡 Situational Questions

💰 Salary Questions

🎤 Ask Interviewer Questions

Data Scientist Interview QuestionsIndia 2025