Lloyds Specific background concepts + associated courses
These seem to fall almost entirely within the machine learning science track, using scikit learn, as for the statistics knowledge, need to look up what statistics is typically used in data science / machine learning. "relatively in-depth knowledge of statistics"
Statistics?
'Statistics Knowledge'
- (Chapter 9 in DS Essentials book from humble bundle, concepts, defs)
- Intro to Stats / Intro to Stats (Python)
- Exploratory Data Analysis, Hypothesis Testing, Sampling in Python
- Data Science Toolkit (1)(2) -> Statistical Thinking in Python (1)(2) -> Case Studies in Statistical Thinking
- Practicing Stats Interview Q's Python
- Intro to Regression Python
- Time Series Analysis in Python
- Bayesian Data Analysis in Python
Linear Algebra
Matrices, Vectors, Matrix-Vector Eqns, Eigenvalues / Eigenvectors, Principal Component Analysis
Cheat Sheet, Tutorial (Computer bookmarks), Linear Algebra in R (no python eq.?)
Modelling (Often ML, see ML Scientist, ML Fundamentals track)
Logisitic Regression - (Intro to Linear Modelling Python), Supervised Learning Scikit, Intro to Regression Python, Intermediate Regression, Linear Classifiers in Python
SVM - Linear Classifiers in Python, Practice ML Interview Q's course
Random Forests - ML with tree-based models, ML for finance, Predictive analytics networked data R
XGBoost - Extreme gradient boosting with XGBoost, ML for finance,
Time Series Modelling - manipulating time series data, analysing time series data, ML for time series data (other general courses)\
Codility
Python Functions - Data science Toolkit 1-2, Writing Efficient Python Code, Python Functions Course, Interview Q's Python, Udemy Course (reddit) \
Other
Messy Data - Dealing with missing data in Python, Feature Engineering ML NLP, etc on ML scientist track
Data Engineering
Software solutions that are tested (?)
Using Git for source control - Intro to Git, version control w Git,
Manipulating data with SQL - Intro SQl, Intermediate SQL, Data Manip. w SQL (SQL fundamentals track)
Building ML Data Pipelines - Designing ML workflows in Python, Pyspark
Other Job mentioned concepts
Supervised Learning (ML) Unsupervised Learning (ML)
# Start coding here...
Other Jobs Reqs
Seen SQL mentioned, Python, Tableau, Postgre, Power BI
Some Maths / Stats concepts are common
Maths / Stats for Data Science (General)
Mentioned in some articles, best book seems to be the maths essentials for Data Science by Hadrien Jean \
Calculus
- Derivatives
- Integrals and area under curves
Stats and Probability
- Descriptive Statistics
- Probability Distributions
- Joint, Marginal, and Conditional Probability
- Expectation and Variance of random variables
- Bayesian Statistics
(not in book)
- Time Series
- Regression (linear / logistical)
- More on Variance Stdev etc, meaning, analysis
Linear Algebra
- Vectors
- Matrices
- Space Transformation, Linear Dependency etc.
- Linear Equations
- Eigenvalues / Eigenvectors
- Single Value Decomposition
Other Sources for Stats / Maths?
- Datacamp courses
- Lecture Notes (linalg, stats?)
- Humble bundle books / already downloaded books
- AS/A2 CGP Revision book
- Datacamp tutorials?
Datacamp Python Skill Assessments
The Python Programming skill assessment measures the following skills:
- Basic Python Syntax and Semantics: Understanding of fundamental Python syntax and semantics.
- Data Structures: Proficiency in using lists, dictionaries, sets, and tuples. Iterators, zips etc.
- Control Flow: Ability to use loops, conditionals, and comprehensions effectively.
- Functions: Competence in defining and using functions, including lambda functions, doctstrings, *args, return types.
- Modules and Packages: Knowledge of importing and using Python modules and packages.
- Error Handling: Skills in handling exceptions and debugging code, try, except.
- File I/O: Understanding of reading from and writing to files.
- OOP concepts
- PEP8 / software engineering?
The Data Manipulation skill assessment measures the following skills:
- Data Cleaning: Ability to handle missing data, duplicates, and incorrect data types.
- Data Transformation: Skills in reshaping, aggregating, and summarizing data.
- Data Integration: Proficiency in merging and joining datasets.
- Data Filtering: Competence in selecting and filtering data based on conditions.
- Data Sorting: Understanding of sorting data by various criteria.
- Data Aggregation: Skills in grouping data and performing aggregate calculations.
The Statistics Fundamentals assessment measures the following skills:
- Descriptive Statistics: Understanding of measures of central tendency (mean, median, mode) and measures of dispersion (range, variance, standard deviation).
- Probability: Knowledge of basic probability concepts, including probability distributions and rules of probability.
- Inferential Statistics: Skills in hypothesis testing, confidence intervals, and p-values.
- Correlation and Regression: Understanding of correlation coefficients and simple linear regression.
- Data Visualization: Ability to interpret and create basic statistical plots such as histograms, box plots, and scatter plots.
The Data Visualization skill assessment typically covers the following topics:
- Basic Plotting: Understanding and creating basic plots such as line plots, bar charts, and histograms.
- Advanced Plotting: Creating more complex visualizations like scatter plots, box plots, and heatmaps.
- Customization: Customizing plots with titles, labels, legends, and colors.
- Libraries: Proficiency in using popular data visualization libraries such as Matplotlib, Seaborn, and Plotly.
- Interpretation: Interpreting and drawing insights from visual data representations.
The Machine Learning Fundamentals in Python skill assessment typically covers the following topics:
- Supervised Learning: Understanding and applying algorithms like linear regression, logistic regression, and decision trees.
- Unsupervised Learning: Techniques such as clustering and dimensionality reduction.
- Model Evaluation: Methods for evaluating model performance, including metrics like accuracy, precision, recall, and F1 score.
- Feature Engineering: Techniques for preparing and transforming data for machine learning models. Libraries: Proficiency in using machine learning libraries such as scikit-learn.
The Importing and Cleaning Data with Python skill assessment typically covers the following topics:
- Data Importing: Techniques for importing data from various sources such as CSV files, Excel files, SQL databases, and web APIs.
- Data Cleaning: Methods for handling missing data, correcting data types, and dealing with duplicates.
- Data Transformation: Techniques for transforming data, including normalization, scaling, and encoding categorical variables.
- Data Wrangling: Using libraries like pandas to manipulate and reshape data.
- Libraries: Proficiency in using Python libraries such as pandas, NumPy, and others for data manipulation and cleaning.
SQL Skill Assessments
Other Skill Assessments
The Exploratory Analysis Theory skill assessment covers the following topics:
- Data Understanding: Assessing your ability to understand and interpret data.
- Data Preparation: Evaluating your skills in preparing data for analysis.
- Exploratory Data Analysis (EDA): Testing your knowledge of techniques and methods used in EDA.
- Data Visualization: Measuring your ability to visualize data effectively.
The Data Storytelling skill assessment covers the following topics:
- Data Visualization: Your ability to create and interpret visual representations of data.
- Narrative Techniques: How well you can craft a compelling story using data.
- Audience Understanding: Assessing your skills in tailoring the story to different audiences.
- Communication Skills: Evaluating your effectiveness in communicating insights derived from data.
The Statistical Experimentation skill assessment covers the following topics:
- Experimental Design: Your understanding of how to design experiments to test hypotheses.
- Statistical Testing: Evaluating your knowledge of statistical tests and their applications.
- Data Collection: Assessing your skills in collecting data for experiments.
- Result Interpretation: Measuring your ability to interpret the results of statistical tests.