Machine Learning is a well understood process. We typically start with some existing data and pass it through an algorithm. The algorithm ‘learns’ from that specific data and produces a ‘data model’. This model has learnt from the data and now encapsulates information derived from the raw data. We then have to test the model (to see how good it is) and try to incrementally improve it. Finally, we evaluate the finished model and deploy it.

• An understanding of data
• A good logical mind
• We do not expect people to have a background in mathematics

Module 1: Introduction

This module introduces the background to Machine Learning.

• Definition of Machine Learning (ML)
• Origins of ML
• Rule deduction (Expert Systems) vs induction (ML)
• Why do we want machines to learn?
• Supervised vs. unsupervised learning
• Case studies
• Regression as a classic example of ML

Module 2: Data collection and preparation

Collecting the correct data for the training and testing phases is crucial. The data is often ‘dirty’ and needs to be cleaned. But more than that, the way in which the data is pre-processed is often the difference between poor and highly effective ML.

• Data selection
• Data sampling
• Data volume reduction
• Removing ambiguities
• Normalisation
• Discretisation
• Cleansing
• Missing values
• Outliers
• Data and dimensional reduction
• Data understanding
• Generalisation of hierarchies

Module 3: Introduction to ML in R

R is a well-established, open source language with many built-in ML algorithms. his module introduces the language and provides some practical ML work.

• Introduction to R
• Lab: ML with R

Module 4: Creating or choosing an algorithm

Building a new algorithm for the data modelling (or, as is often done, choosing an existing one) is another vital part of the process. There are a number of algorithms that are very frequently used. This module will look as some examples and explain what they are trying to achieve and how they work.

• Examples of creating algorithms
• The use of data mining algorithms
• Classes and examples of data mining/Machine Learning algorithms
• Decision trees
• Clustering
• Segmentation
• Association
• Classification
• Sequence analysis
• Neural nets
• History
• Layers
• Weights
• Back propagation
• Deep Learning
• KNN
• SVM

Module 5: training and test data

How and why we train and test

• Selecting the training and testing data
• Ratio of training to test data
• How to make an unbiased selection

Module 6: Testing and confusion matrices

Testing is a vital (and complex) part of the process.

• Type 1, 2 and 3 errors
• False positives vs False negatives
• PCC
• Classification models
• Confusion matrices

Module 7: ROC curves

ROC space and curves are also vital tools for estimating the efficiency of algorithms

• Measuring efficiency
• ROC space and ROC curves

Module 8: Efficiency, Overfitting, Bias and Variance

• More about efficiency
• Overfitting
• Bias and Variance

Module 9: Combining data models

Any one ML system that we built will have a certain level of efficiency. But we can build a number of different data models and combine them in various ways so that the efficiency of the whole is greater than the sum of the parts.

• Ensemble
• Boosting