Cassandra
Create, Alter & Drop Keyspace in Cassandra with Example
In this article, you will learn- Cassandra Create Keyspace Alter Keyspace Drop/Delete Keyspace How...
Cassandra is designed to handle big data. Cassandra’s main feature is to store data on multiple nodes with no single point of failure.
The reason for this kind of Cassandra’s architecture was that the hardware failure can occur at any time. Any node can be down. In case of failure data stored in another node can be used. Hence, Cassandra is designed with its distributed architecture.
Cassandra stores data on different nodes with a peer to peer distributed fashion architecture.
All the nodes exchange information with each other using Gossip protocol. Gossip is a protocol in Cassandra by which nodes can communicate with each other.
In this tutorial, you will learn-
There are following components in the Cassandra;
Node is the place where data is stored. It is the basic component of Cassandra.
A collection of nodes are called data center. Many nodes are categorized as a data center.
The cluster is the collection of many data centers.
Every write operation is written to Commit Log. Commit log is used for crash recovery.
After data written in Commit log, data is written in Mem-table. Data is written in Mem-table temporarily.
When Mem-table reaches a certain threshold, data is flushed to an SSTable disk file.
As hardware problem can occur or link can be down at any time during data process, a solution is required to provide a backup when the problem has occurred. So data is replicated for assuring no single point of failure.
Cassandra places replicas of data on different nodes based on these two factors.
One Replication factor means that there is only a single copy of data while three replication factor means that there are three copies of the data on three different nodes.
For ensuring there is no single point of failure, replication factor must be three.
There are two kinds of replication strategies in Cassandra.
SimpleStrategy
SimpleStrategy is used when you have just one data center. SimpleStrategy places the first replica on the node selected by the partitioner. After that, remaining replicas are placed in clockwise direction in the Node ring.
Here is the pictorial representation of the SimpleStrategy.
NetworkTopologyStrategy
NetworkTopologyStrategy is used when you have more than two data centers.
In NetworkTopologyStrategy, replicas are set for each data center separately. NetworkTopologyStrategy places replicas in the clockwise direction in the ring until reaches the first node in another rack.
This strategy tries to place replicas on different racks in the same data center. This is due to the reason that sometimes failure or problem can occur in the rack. Then replicas on other nodes can provide data.
Here is the pictorial representation of the Network topology strategy
The coordinator sends a write request to replicas. If all the replicas are up, they will receive write request regardless of their consistency level.
Consistency level determines how many nodes will respond back with the success acknowledgment.
The node will respond back with the success acknowledgment if data is written successfully to the commit log and memTable.
For example, in a single data center with replication factor equals to three, three replicas will receive write request. If consistency level is one, only one replica will respond back with the success acknowledgment, and the remaining two will remain dormant.
Suppose if remaining two replicas lose data due to node downs or some other problem, Cassandra will make the row consistent by the built-in repair mechanism in Cassandra.
Here it is explained, how write process occurs in Cassandra,
There are three types of read requests that a coordinator sends to replicas.
The coordinator sends direct request to one of the replicas. After that, the coordinator sends the digest request to the number of replicas specified by the consistency level and checks whether the returned data is an updated data.
After that, the coordinator sends digest request to all the remaining replicas. If any node gives out of date value, a background read repair request will update that data. This process is called read repair mechanism.
Summary
This tutorial explains the Cassandra internal architecture, and how Cassandra replicates, write and read data at different stages. Also, here it explains about how Cassandra maintains the consistency level throughout the process.
In this article, you will learn- Cassandra Create Keyspace Alter Keyspace Drop/Delete Keyspace How...
The syntax of Cassandra query language (CQL) resembles with SQL language. Create Table Alter Table...
$20.20 $9.99 for today 4.6 (119 ratings) Key Highlights of Cassandra PDF 94+ pages eBook Designed...
Apache Cassandra is used by smaller organizations while Datastax enterprise is used by the larger...
Cassandra Create Index Command 'Create index' creates an index on the column specified by the...
What is Apache Cassandra? Cassandra is a distributed database management system designed for...