Saturday, December 1, 2018

Cassandra - Data modeling



Cassandra is a NoSQL data base. Some core features that are provided by Cassandra are -

  1. High write performance
  2. High scalability
  3. Fault tolerant
  4. Linear scale performance
  5. Easy data distribution

The Cassandra data model contains -
  1. Keyspaces - A keyspace is the container of all data in Cassandra. Replication is specified at the keyspace level.
  2. Tables - Every table should have a primary key, which can be a composite primary key.
  3. Columns - A column contains the key value data in Cassandra.


Cassandra provides CREATE TABLE, ALTER TABLE, and DROP TABLE statements for data definition and provides INSERT, UPDATE, SELECT, and DELETE statements for data manipulation. The ‘where’ clauses in Cassandra have restrictions. Certain filters in the where clause can result in cluster wide scans which are not desirable.

While doing data modeling for your application, some things to keep in mind would be -
  1. Start with what kind of queries you will be performing on the database.
  2. Denormalize when you can.
  3. Data should be spread evenly across the cluster. This can be achieved by picking the right partition key. The partition key is the first element of the primary key. Data is partitioned by the first part of the primary key and clustered by the remaining part.
  4. Minimize the number of partitions that need to be accessed in your read queries.