An introduction to Python, Data Science and Big Data, plus a deep introduction to the major Big Data technologies for the practitioners working with them.

This 5-day course is ideal for people who are currently working as software engineers with data, or in business intelligence, looking for a level-up to the next stage of large data analysis skills and contemporary patterns of Data Science.

You will learn how to work and model large sets of data and understand the statistical mathematical models behind it. You'll also work with SQL and NoSQL trends and understand how to create an effective hypothesis-approach way of working with data and discerning really measurable statistical outcomes.

Target Audience

Practitioners of data analysis and fledging data scientists who wish to leverage Big Data technologies such as NoSQL databases, Hadoop and Spark.


  • Basic knowledge of data base architecture and SQL
  • Basic knowledge in programming: variables, flow and scope and functions
  • Prior experience of working with data
  • Experience with Python or other scripting languages such as Perl will be an advantage

Learning Outcomes

At the end of this course attendees will know:

  • Fundamentals of Data Science
  • Fundamentals of Machine Learning
  • Fundamentals of Python programming
  • Python's data and numerical packages
  • How to visualise data using Python
  • Different data models
  • What is a NoSQL database, how is it different from a (traditional) Relational Database
  • What is Hadoop
  • What is Spark

At the end of this course attendees will be able to:

  • Write Python programs to manipulate data
  • Use Python to visualise data
  • Query graph databases Neo4j
  • Query column store database Cassandra
  • Query document based database MongoDB
  • Use Hadoop and Spark
  • Use Python Machine Learning libraries to perform predictive analysis

Course Outline

  • Introduction to Data Science
  • Data Mining and Machine Learning
  • Data Models
  • NoSQL
  • Introduction to Python
  • Python and Data
  • Python Databases and SQL
  • Data Science and Numerical Python
  • MongoDB
  • Neo4j and Graph Analytics
  • Functional Programming ​​
  • Hadoop and Ecosystem
  • Spark MapReduce
  • Spark SQL
  • Python Machine Learning