Skip to content

mrsaeeddev/ai-interview-questions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation



AI INTERVIEW QUESTIONS

GitHub last commit GitHub contributors Twitter Follow

Real-World AI Interview Questions for You



General Requirements for AI related roles:

DATA ENGINEERS:

Requirements:

  • Programming Experience in a language like Python, Go etc.
  • Solid Knowledge of Operating Systems
  • Understanding of ETL pipelines
  • Heavy, In-Depth Database Knowledge – SQL and NoSQL
  • Data Warehousing – Hadoop, MapReduce, HIVE, PIG, Apache Spark, Kafka
  • Basic Machine Learning Familiarity

DATA SCIENTISTS:

Requirements:

  • Programming Experience in a language like Python, R etc.
  • Statistics (familiarity with statistical tests, distributions, maximum likelihood estimators, etc.)
  • Basic Machine Learning Skills
  • Multivariable Calculus & Linear Algebra
  • In-Depth knowledge of SQL
  • Data Wrangling and EDA Skills
  • Data Visualization and Communication

MACHINE LEARNING ENGINEERS:

Requirements:

  • Programming Experience in a language like Python, R, Java etc.
  • Advanced Probability and Statistics Knowledge
  • Data Modeling & Evaluation
  • Advanced Machine Learning
  • Software Engineering and System Design

DATA SCIENCE QUESTIONS

Q. How to find outliers in data?
A. i. If you know the outlier values, then you may set some threshold value for the outliers. So, by filtering the data that lies inside that values you can get filtered data.
ii. If you don't know the outlier values in advance, you can apply clustering to find out the clusters and drop the data that lies outside that. Same goes for other models like Linear Regression or SVM.
iii. Scatter plots and Box plots are used to find visualize outliers so you can use them for visualization part.

Q. If the dataset you are using is large and you face runtime issues handling it, how would you handle it?
A. Different appraoches:

  • Historical Data:
    • Large Dataset: - See this - Load data in batches
    • Small Datasets: You are good to go with Pandas and Numpy as usual
  • Realtime Data: - You need to look into big data solutions like Kafka, Hadoop etc

Q. Why CatBoost and XGBoost is better than gradient boosting in sklearn?
A. See here
See here

Q. How Gradient Boosting works?
A. See here

Q. How Hyperparameters in an algorithm work?
A. See here

Q. How Linear Regression works?
A. See here