Cassandra Tutorial for Beginners: Learn in 3 Days
What is Apache Cassandra?
Cassandra is a distributed database management system designed for handling a high volume of structured data across commodity servers.
Cassandra handles the huge amount of data with its distributed architecture. Data is placed on different machines with more than one replication factor that provides high availability and no single point of failure.
In the image below, circles are Cassandra nodes and lines between the circles shows distributed architecture, while the client is sending data to the node.
Cassandra Syllabus
Introduction
👉 Lesson 1 | Install Cassandra — How to Download & Install Cassandra on Windows |
👉 Lesson 2 | Cassandra Architecture & Replication Factor Strategy — A Comprehensive Guide |
👉 Lesson 3 | Cassandra Data Model — Learn with Simple Example |
Advanced Stuff
👉 Lesson 1 | Cassandra Keyspace — Create, Alter & Drop Keyspace in Cassandra with Example |
👉 Lesson 2 | Cassandra Table — Create, Alter, Drop & Truncate (with Example) |
👉 Lesson 3 | Cassandra Query Language (CQL) — Insert Into, Update, Delete (Example) |
👉 Lesson 4 | Create & Drop INDEX in Cassandra — Learn with Example |
👉 Lesson 5 | Cassandra CQL Data Types & Data Expiration using TTL — Learn with Example |
👉 Lesson 6 | Cassandra Collection — Set, List, Map with Example |
👉 Lesson 7 | Cassandra Cluster Setup — Cluster Setup on Multiple Nodes |
👉 Lesson 8 | DataStax DevCenter & OpsCenter Installation — A Step-by-Step Guide |
👉 Lesson 9 | Cassandra Security — Create User & Authentication with JMX |
Must Know!
👉 Lesson 1 | Cassandra Interview Questions — Top 23 Cassandra Interview Q & A |
👉 Lesson 2 | Cassandra Tutorial PDF — Download Cassandra Tutorial PDF for Beginners |
Cassandra History
- Cassandra was first developed at Facebook for inbox search.
- Facebook open sourced it in July 2008.
- Apache incubator accepted Cassandra in March 2009.
- Cassandra is a top level project of Apache since February 2010.
- The latest version of Apache Cassandra is 3.2.1.
First let’s understand what NoSQL database is.
Nosql Cassandra Database
NoSQL databases are called “Not Only SQL” or “Non-relational” databases. NoSQL databases store and retrieve data other than tabular relations such as relation databases.
NoSQL databases include MongoDB, HBase, and Cassandra.
There are following properties of NoSQL databases.
- Design Simplicity
- Horizontal Scaling
- High Availability
Data structures used in Cassandra are more specified than data structures used in relational databases. Cassandra data structures are faster than relational database structures.
NoSQL databases are increasingly used in Big Data and real-time web applications. NoSQL databases are sometimes called Not Only SQL i.e. they may support SQL-like query language.
Nosql Cassandra Database Vs Relational databases
Here are the differences between relation databases and NoSQL databases in a tabular format.
Relational Database | NoSQL Database |
---|---|
Handles data coming in low velocity | Handles data coming in high velocity |
Data arrive from one or few locations | Data arrive from many locations |
Manages structured data | Manages structured unstructured and semi-structured data. |
Supports complex transactions (with joins) | Supports simple transactions |
single point of failure with failover | No single point of failure |
Handles data in the moderate volume. | Handles data in very high volume |
Centralized deployments | Decentralized deployments |
Transactions written in one location | Transaction written in many locations |
Gives read scalability | Gives both read and write scalability |
Deployed in vertical fashion | Deployed in Horizontal fashion |
Apache Cassandra Features
There are following features that Cassandra provides.
- Massively Scalable Architecture: Cassandra has a masterless design where all nodes are at the same level which provides operational simplicity and easy scale out.
- Masterless Architecture: Data can be written and read on any node.
- Linear Scale Performance: As more nodes are added, the performance of Cassandra increases.
- No Single point of failure: Cassandra replicates data on different nodes that ensures no single point of failure.
- Fault Detection and Recovery: Failed nodes can easily be restored and recovered.
- Flexible and Dynamic Data Model: Supports datatypes with Fast writes and reads.
- Data Protection: Data is protected with commit log design and build in security like backup and restore mechanisms.
- Tunable Data Consistency: Support for strong data consistency across distributed architecture.
- Multi Data Center Replication: Cassandra provides feature to replicate data across multiple data center.
- Data Compression: Cassandra can compress up to 80% data without any overhead.
- Cassandra Query language: Cassandra provides query language that is similar like SQL language. It makes very easy for relational database developers moving from relational database to Cassandra.
Cassandra Use Cases/Application
Cassandra is a non-relational database that can be used for different types of applications. Here are some use cases where Cassandra should be preferred.
- MessagingCassandra is a great database for the companies that provides Mobile phones and messaging services. These companies have a huge amount of data, so Cassandra is best for them.
- Internet of things ApplicationCassandra is a great database for the applications where data is coming at very high speed from different devices or sensors.
- Product Catalogs and retail appsCassandra is used by many retailers for durable shopping cart protection and fast product catalog input and output.
- Social Media Analytics and recommendation engineCassandra is a great database for many online companies and social media providers for analysis and recommendation to their customers.