Get in Touch

Course Outline

Introduction

This section offers a broad overview of when to apply 'machine learning', the factors to consider, and the implications involved, including its advantages and disadvantages. Topics include data types (structured, unstructured, static, streamed), data validity and volume, data-driven versus user-driven analytics, statistical models compared to machine learning models, challenges associated with unsupervised learning, the bias-variance trade-off, iteration and evaluation methods, cross-validation strategies, and distinctions between supervised, unsupervised, and reinforcement learning.

MAJOR TOPICS

1. Understanding Naive Bayes

  • Fundamental concepts of Bayesian methods
  • Probability basics
  • Joint probability
  • Conditional probability and Bayes' theorem
  • The Naive Bayes algorithm
  • Naive Bayes classification
  • The Laplace estimator
  • Applying numeric features with Naive Bayes

2. Understanding Decision Trees

  • Divide and conquer approach
  • The C5.0 decision tree algorithm
  • Selecting the optimal split
  • Pruning decision trees

3. Understanding Neural Networks

  • Evolution from biological to artificial neurons
  • Activation functions
  • Network topology
  • Determining the number of layers
  • Direction of information flow
  • Node count per layer
  • Training neural networks via backpropagation
  • Deep Learning

4. Understanding Support Vector Machines

  • Classification using hyperplanes
  • Identifying the maximum margin
  • Handling linearly separable data
  • Handling non-linearly separable data
  • Utilizing kernels for non-linear spaces

5. Understanding Clustering

  • Clustering as a machine learning task
  • The k-means algorithm for clustering
  • Using distance metrics to assign and update clusters
  • Selecting the appropriate number of clusters

6. Measuring Classification Performance

  • Working with classification prediction data
  • In-depth analysis of confusion matrices
  • Utilizing confusion matrices for performance measurement
  • Beyond accuracy – alternative performance measures
  • The kappa statistic
  • Sensitivity and specificity
  • Precision and recall
  • The F-measure
  • Visualizing performance trade-offs
  • ROC curves
  • Estimating future performance
  • The holdout method
  • Cross-validation
  • Bootstrap sampling

7. Tuning Standard Models for Enhanced Performance

  • Using caret for automated parameter tuning
  • Developing a simple tuned model
  • Customizing the tuning process
  • Improving model performance through meta-learning
  • Understanding ensembles
  • Bagging
  • Boosting
  • Random forests
  • Training random forests
  • Evaluating random forest performance

MINOR TOPICS

8. Understanding Classification Using Nearest Neighbors

  • The kNN algorithm
  • Calculating distance
  • Selecting an appropriate k value
  • Preparing data for kNN usage
  • Why is the kNN algorithm considered lazy?

9. Understanding Classification Rules

  • Separate and conquer approach
  • The One Rule algorithm
  • The RIPPER algorithm
  • Deriving rules from decision trees

10. Understanding Regression

  • Simple linear regression
  • Ordinary least squares estimation
  • Correlations
  • Multiple linear regression

11. Understanding Regression Trees and Model Trees

  • Incorporating regression into trees

12. Understanding Association Rules

  • The Apriori algorithm for association rule learning
  • Measuring rule interest – support and confidence
  • Constructing a set of rules using the Apriori principle

Extras

  • Spark, PySpark, MLlib, and Multi-armed bandits

Requirements

Knowledge of Python

 21 Hours

Number of participants


Price per participant

Testimonials (7)

Upcoming Courses

Related Categories