About this course

Course type Premium
Course code QAACD
Duration 3 Days

This three day Cassandra course is a hybrid course for developers and administration staff. The class is 60% lecture and 40% labs.

This is a fast-paced, vendor agnostic, technical overview of the Cassandra database. In each sub-topic, the instructor will provide links and resource recommendations for students who want to explore that area further, for example, YouTube videos, books, blog posts. Delegates will be given a PDF slide deck, which can be used as reference material after the course. PDFs will also be given out for the 5 labs in the course.

Target Audience:

  • This course is targeted at both technical and non-technical people who want to understand the emerging world of Big Data, with a specific focus on Cassandra.

  • Software Engineers, Data Scientists, Network Engineers or Technologists, ideally with experience in relational/SQL databases and Java programming or a similar modern programming language.

Prerequisites

  • No prior knowledge of databases or programming is assumed, although having some basic experience with relational/SQL databases and Java will help.

Delegates will learn how to

At the end of this course you will be able to:

  • Identify the correct use cases for Cassandra
  • Appreciate the core concepts of the operations side of the Cassandra database
  • Dive into the critical architecture paths of Cassandra: Bloom filters, Block Indexes, SSTables
  • Access a 3-node Cassandra cluster in Rackspace to perform hands-on labs
  • Understand the fundamentals of how to write Java or Python code to interact with Cassandra
  • Gain links to the best books, blog posts and videos to learn more about Cassandra on their own

Outline

Session 1: Intro to Cassandra

  • How to pick a NoSQL database
  • Brief use case discussion of: Key/Value, Key/Document, Column Family, Graph, Real-time
  • Structured vs. Unstructured data
  • Cassandra Origins: Amazon Dynamo, Google BigTable and Facebook
  • So, what's Cassandra good for? Use Cases.
  • Hardware recommendations (Spinning disks vs SSD,
  • CPU/RAM/Network requirements, etc)
  • Cassandra versions
  • Cassandra distributions
  • Book, YouTube & Blog recommendations for learning more about Cassandra
  • Lab 1: Install Cassandra 2.0 on a single node in the cloud

Session 2: Cassandra Architecture Fundamentals and Intro to CQL

  • Peer to peer design
  • Logical Data Model: Keyspace, Column Family/Table, Rows, Columns
  • Traditional Ring design vs. VNodes
  • Partitioners: Murmer3, Random (md5) and ByteOrdered
  • Gossip communications
  • Coordinator node
  • Seed nodes
  • Write/Read consistency levels: Any, One, Two, Three, Quorum
  • Snitches: Dynamic snitching, Simple Snitch, Rack Inferring
  • Snitch, Property File Snitch, Gossiping Property File Snitch
  • Routing Client requests
  • How a table is flushed from Memtable onto disk into SSTable files
  • Compaction fundamentals to reduce SSTable data files
  • Nodetool commands: gossipinfo, cfstats, describing
  • YAML file fundamentals
  • Operations management web GUI
  • Stress testing Cassandra
  • CQL command fundamentals
  • Lab 2: Run Cassandra commands and explore operations management concepts (Create a new Keyspace and table,
  • write data to the table, flush the table to SSTable on disk, learn how to run compaction, run nodetool commands, explore the
  • web GUI, benchmark the one node by inserting and reading 100,000 rows)

Session 3: Scaling Cassandra, Advanced CQL and Advanced YAML file

  • Best practices for scaling a Cassandra cluster
  • Managing a Cassandra cluster across data centers (new write/read consistency levels: Local quorum, each_quorum, all, serial)
  • Deeper dive into the YAML file settings
  • Advanced CQL concepts
  • Lab 3: Grow the cluster size to 3 nodes (Install Cassandra on 2 additional nodes in Rackspace and edit the YAML files to
  • configure the 3-node cluster)

Session 4: Database Internals

  • Deep dive into the Write path
  • In-memory structures for each SSTable: partition index, partition summary, bloom filter
  • Fsync settings for the commit log
  • How inserts, updates and deletes are treated byCassandra
  • Hinted Handoffs
  • Deletes and Tombstone fundamentals
  • Advanced Compaction concepts
  • Deep dive into the Read path: Row cache, partition key cache, partition summary, bloom filters, etc
  • Off-heap components in Cassandra
  • Compression concepts
  • Lightweight Transactions
  • Snapshots
  • Lab 4: Advanced Cassandra commands (query the system table, take a snapshot, decommission a node, rejoin the same
  • node back into the cluster)

Session 5: Java or Python API

  • Different ways to programmatically query Cassandra: Thrift, Hector, Astyanax, Java, Python, C#, ODBC, plus others
  • Writing your first client application
  • Connecting to the Cassandra cluster programmatically
  • Using a session to execute CQL commands
  • Asynchronous I/O to Cassandra cluster
  • Node discovery
  • Automatic failover
  • Modifying cluster configuration programmatically
  • Lab 5: Java or Python API lab (learn how to programmatically insert and read data from a Cassandra cluster using the Java or Python API)

Session 6: Advanced Concepts

  • JVM performance tuning fundamentals
  • JConsole vs jmxterm
  • Tools to monitor/test Cassandra clusters: disk i/o, memory analysis, visualisation
  • Logging in Cassandra (log4j)
  • Security: SSL encryption for client-to-node and node-to-node
  • Security: Authentication and Authorisation fundamentals
  • Security: Firewall ports
  • Using Hadoop with Cassandra
  • Using Solr with Cassandra
Premium Course

3 Days

Duration
Delivery Method

Delivery method

Classroom / Attend from Anywhere

Receive classroom training at one of our nationwide training centres, or attend remotely via web access from anywhere.

Find dates and prices

Online booking is currently not available for this course, to find out more please call us on 0345 074 7998 or email us at info@qa.com to discuss how we can help.

Trusted, awarded and accredited

Fully accredited to ensure we provide the highest possible standards in learning

All third party trademark rights acknowledged.