Course Outline
Introduction to Machine Learning
- Types of machine learning – supervised vs unsupervised.
- From statistical learning to machine learning.
- The data mining workflow: business understanding, data preparation, modeling, deployment.
- Choosing the right algorithm for the task.
- Overfitting and the bias-variance tradeoff.
Overview of Python and ML Libraries
- Why use programming languages for ML.
- Choosing between R and Python.
- Python crash course and Jupyter Notebooks.
- Python libraries: pandas, NumPy, scikit-learn, matplotlib, seaborn.
Testing and Evaluating ML Algorithms
- Generalization, overfitting, and model validation.
- Evaluation strategies: holdout, cross-validation, bootstrapping.
- Metrics for regression: ME, MSE, RMSE, MAPE.
- Metrics for classification: accuracy, confusion matrix, unbalanced classes.
- Model performance visualization: profit curve, ROC curve, lift curve.
- Model selection and grid search for tuning.
Data Preparation
- Data import and storage in Python.
- Exploratory analysis and summary statistics.
- Handling missing values and outliers.
- Standardization, normalization, and transformation.
- Qualitative data recoding and data wrangling with pandas.
Classification Algorithms
- Binary vs multiclass classification.
- Logistic regression and discriminant functions.
- Naïve Bayes, k-nearest neighbors.
- Decision trees: CART, Random Forests, Bagging, Boosting, XGBoost.
- Support Vector Machines and kernels.
- Ensemble learning techniques.
Regression and Numerical Prediction
- Least squares and variable selection.
- Regularization methods: L1, L2.
- Polynomial regression and nonlinear models.
- Regression trees and splines.
Unsupervised Learning
- Clustering techniques: k-means, k-medoids, hierarchical clustering, SOMs.
- Dimensionality reduction: PCA, factor analysis, SVD.
- Multidimensional scaling.
Text Mining
- Text preprocessing and tokenization.
- Bag-of-words, stemming, and lemmatization.
- Sentiment analysis and word frequency.
- Visualizing text data with word clouds.
Recommendation Systems
- User-based and item-based collaborative filtering.
- Designing and evaluating recommendation engines.
Association Pattern Mining
- Frequent itemsets and Apriori algorithm.
- Market basket analysis and lift ratio.
Outlier Detection
- Extreme value analysis.
- Distance-based and density-based methods.
- Outlier detection in high-dimensional data.
Machine Learning Case Study
- Understanding the business problem.
- Data preprocessing and feature engineering.
- Model selection and parameter tuning.
- Evaluation and presentation of findings.
- Deployment.
Summary and Next Steps
Requirements
- Basic understanding of statistics and linear algebra.
- Familiarity with data analysis or business intelligence concepts.
- Some exposure to programming (preferably Python or R) is recommended.
- Interest in learning applied machine learning for data-driven projects.
Audience
- Data analysts and scientists.
- Statisticians and research professionals.
- Developers and IT professionals exploring machine learning tools.
- Anyone involved in data science or predictive analytics projects.
Testimonials (3)
Even with having to miss a day due to customer meetings, I feel I have a much clearer understanding of the processes and techniques used in Machine Learning and when I would use one approach over another. Our challenge now is to practice what we have learned and start to apply it to our problem domain
Richard Blewett - Rock Solid Knowledge Ltd
Course - Machine Learning – Data science
I like that training was focused on examples and coding. I thought that it is impossible to pack so much content into three days of training, but I was wrong. Training covered many topics and everything was done in a very detailed manner (especially tuning of model's parameters - I didn't expected that there will be a time for this and I was gratly surprised).
Bartosz Rosiek - GE Medical Systems Polska Sp. Zoo
Course - Machine Learning – Data science
It is showing many methods with pre prepared scripts- very nicely prepared materials & easy to traceback