Machine Learning is a well understood process. We typically start with some existing data and pass it through an algorithm. The algorithm ‘learns’ from that specific data and produces a ‘data model’. This model has learnt from the data and now encapsulates information derived from the raw data. We then have to test the model (to see how good it is) and try to incrementally improve it. Finally, we evaluate the finished model and deploy it.
- An understanding of data
- A good logical mind
- We do not expect people to have a background in mathematics
Module 1: Introduction
This module introduces the background to Machine Learning.
- Definition of Machine Learning (ML)
- Origins of ML
- Rule deduction (Expert Systems) vs induction (ML)
- Why do we want machines to learn?
- Supervised vs. unsupervised learning
- Case studies
- Regression as a classic example of ML
Module 2: Data collection and preparation
Collecting the correct data for the training and testing phases is crucial. The data is often ‘dirty’ and needs to be cleaned. But more than that, the way in which the data is pre-processed is often the difference between poor and highly effective ML.
- Data selection
- Data sampling
- Data volume reduction
- Removing ambiguities
- Missing values
- Data and dimensional reduction
- Data understanding
- Generalisation of hierarchies
Module 3: Introduction to ML in R
R is a well-established, open source language with many built-in ML algorithms. his module introduces the language and provides some practical ML work.
- Introduction to R
- Lab: ML with R
Module 4: Creating or choosing an algorithm
Building a new algorithm for the data modelling (or, as is often done, choosing an existing one) is another vital part of the process. There are a number of algorithms that are very frequently used. This module will look as some examples and explain what they are trying to achieve and how they work.
- Examples of creating algorithms
- The use of data mining algorithms
- Classes and examples of data mining/Machine Learning algorithms
- Decision trees
- Sequence analysis
- Back propagation
- Deep Learning
Module 5: training and test data
How and why we train and test
- Selecting the training and testing data
- Ratio of training to test data
- How to make an unbiased selection
Module 6: Testing and confusion matrices
Testing is a vital (and complex) part of the process.
- Type 1, 2 and 3 errors
- False positives vs False negatives
- Classification models
- Confusion matrices
Module 7: ROC curves
ROC space and curves are also vital tools for estimating the efficiency of algorithms
- Measuring efficiency
- ROC space and ROC curves
Module 8: Efficiency, Overfitting, Bias and Variance
- More about efficiency
- Bias and Variance
Module 9: Combining data models
Any one ML system that we built will have a certain level of efficiency. But we can build a number of different data models and combine them in various ways so that the efficiency of the whole is greater than the sum of the parts.
- Gradient boosting
- Case study of combining models