Cassandra Database Tutorial for Beginners: Learn in 3 Days

Cassandra is a distributed database management system designed for handling a high volume of structured data across commodity servers. In this tutorial, you will see the various concept of Cassandra like data modeling, clusters, monitoring tool, query language, etc.

Not necessary but knowledge of other database management system like HBase or MongoDB will be of help.


Here is what we cover in the Course


1. What is Apache Cassandra?

2. Download & Install Casandra on Windows : A Step By Step Guide

3. Cassandra Architecture & Replication Factor Strategy Tutorial

4. Learn Cassandra Data Modeling with Simple Example

5. Create, Alter & Drop Keyspace in Cassandra: Complete Tutorial

6. Cassandra Table: Create, Alter, Drop & Truncate

7. Cassandra Query Language(CQL): Insert, Update, Delete, Read Data

8. Create & Drop INDEX in Cassandra: Complete Tutorial

9. Cassandra Data Types & Expiration Tutorial

10. Cassandra Collections Tutorial - SET, LIST & MAP

11. Cassandra Cluster Setup on Multiple Nodes (Machines)

12. Datastax DevCenter & OpsCenter Installation Complete Guide

13. Cassandra SECURITY - Create User & Authentication With JMX

What is Apache Cassandra?

Apache Cassandra is highly scalable, distributed and high-performance NoSQL database. Cassandra is designed to handle a huge amount of data.

Cassandra Database Tutorial for Beginners: Learn in 3 Days

In the image above, circles are Cassandra nodes and lines between the circles shows distributed architecture, while the client is sending data to the node.

Cassandra handles the huge amount of data with its distributed architecture. Data is placed on different machines with more than one replication factor that provides high availability and no single point of failure.

Cassandra History

  • Cassandra was first developed at Facebook for inbox search.
  • Facebook open sourced it in July 2008.
  • Apache incubator accepted Cassandra in March 2009.
  • Cassandra is a top level project of Apache since February 2010.
  • The latest version of Apache Cassandra is 3.2.1.

First let's understand what NoSQL database is.

Nosql Cassandra Database

NoSQL databases are called "Not Only SQL" or "Non-relational" databases. NoSQL databases store and retrieve data other than tabular relations such as relation databases.

NoSQL databases include MongoDB, HBase, and Cassandra.

There are following properties of NoSQL databases.

  • Design Simplicity
  • Horizontal Scaling
  • High Availability

Data structures used in Cassandra are more specified than data structures used in relational databases. Cassandra data structures are faster than relational database structures.

NoSQL databases are increasingly used in Big Data and real-time web applications. NoSQL databases are sometimes called Not Only SQL i.e. they may support SQL-like query language.

Nosql Cassandra Database Vs Relational databases

Here are the differences between relation databases and NoSQL databases in a tabular format.

Relational Database NoSQL Database
Handles data coming in low velocity Handles data coming in high velocity
Data arrive from one or few locations Data arrive from many locations
Manages structured data Manages structured unstructured and semi-structured data.
Supports complex transactions (with joins) Supports simple transactions
single point of failure with failover No single point of failure
Handles data in the moderate volume. Handles data in very high volume
Centralized deployments Decentralized deployments
Transactions written in one location Transaction written in many locations
Gives read scalability Gives both read and write scalability
Deployed in vertical fashion Deployed in Horizontal fashion

Apache Cassandra Features

There are following features that Cassandra provides.

  • Massively Scalable Architecture: Cassandra has a masterless design where all nodes are at the same level which provides operational simplicity and easy scale out.
  • Masterless Architecture: Data can be written and read on any node.
  • Linear Scale Performance: As more nodes are added, the performance of Cassandra increases.
  • No Single point of failure: Cassandra replicates data on different nodes that ensures no single point of failure.
  • Fault Detection and Recovery: Failed nodes can easily be restored and recovered.
  • Flexible and Dynamic Data Model: Supports datatypes with Fast writes and reads.
  • Data Protection: Data is protected with commit log design and build in security like backup and restore mechanisms.
  • Tunable Data Consistency: Support for strong data consistency across distributed architecture.
  • Multi Data Center Replication: Cassandra provides feature to replicate data across multiple data center.
  • Data Compression: Cassandra can compress up to 80% data without any overhead.
  • Cassandra Query language: Cassandra provides query language that is similar like SQL language. It makes very easy for relational database developers moving from relational database to Cassandra.

Cassandra Use Cases/Application

Cassandra is a non-relational database that can be used for different types of applications. Here are some use cases where Cassandra should be preferred.

  • Messaging

    Cassandra is a great database for the companies that provides mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them.

  • Internet of things Application

    Cassandra is a great database for the applications where data is coming at very high speed from different devices or sensors.

  • Product Catalogs and retail apps

    Cassandra is used by many retailers for durable shopping cart protection and fast product catalog input and output.

  • Social Media Analytics and recommendation engine

    Cassandra is a great database for many online companies and social media providers for analysis and recommendation to their customers.