Cassandra Interview Questions and Answers
Last updated on 24th Oct 2020, Blog, Interview Question
Cassandra is open-source and is designed in such a way that it can handle large amounts of data, providing high availability that has no single point of failure. Cassandra became a top-level Apache Project in 2010. Cassandra has been written in java language and hence it can run on vast array operating systems and platforms. It can be flexible in Real-time storing the data for the online applications as well as it can read data for the business intelligence system.
So you have finally found your dream job in Cassandra but are wondering how to crack the 2020 Cassandra Interview and what could be the probable Cassandra interview questions. Every Cassandra interview is different and the scope of a job is different too. Keeping this in mind we have designed the most common Cassandra Interview Questions and Answers to help you get success in your interview.
1. Why is Apache Cassandra developed?
Cassandra is a distributed database management system. It is initially developed at Facebook to improve its performance, and it is a tool made to power the Facebook inbox search feature. Due to its outstanding technical features, Cassandra became very popular and a top-level project.
2. What is Apache Cassandra?
Cassandra is an open-source, distributed and decentralized database. It is also used for managing a large amount of structured data which is spread out everywhere.
3. Describe the benefits of using Cassandra?
Cassandra has features which are very beneficial as it is easy to work with; Some of those are high performance, fault tolerance, predictable scaling, distributed database. It has high scores on these parameters, and it is also preferred because its an open-source distributed and NoSQL database management system.
4. What are the applications of Cassandra?
Cassandra has become the primary choice for many companies when it comes to app development and data management. Even new start-ups are preferring it because of the ease with which an operator can work.
Cassandra is a great application where data is collected at high speed from different kinds of sources. As the internet of things application could use Cassandra. It could also be used in a product and retail apps, messaging, social media analytics, and even by a recommendation engine.
5. Explain Apache Cassandra vs Traditional Databases
Although traditional databases provide you with many other features here are some highlights and benefits only a database like Cassandra have:
|Traditional databases||Cassandra database|
|Data is written in mostly one location.||Data is written in many locations.|
|Data volumes are moderate.||Processing data volumes are high.|
|It can handle only moderate incoming data.||It can handle high incoming data volumes.|
|Supports complex transactions||Supports simple transactions.|
|Lines up for just read scalability.||Supports both read and write scalability.|
6. Name the features of Cassandra.
Cassandra has become famous for its outstanding technical features. Here are some features you must know:
- Elastic scalability
- Always on architecture
- Fast linear and scale performance
- Flexible in data storage
- Easy to do data distribution
- Excellent transaction support.
7. What are the main components of Cassandra?
The components of Cassandra include:
- 1. Node
- 2. Data cluster
- 3. Commit log
- 4. Cluster
- 5. Meme-table
- 6. SSTable
- 7. Bloom filter
8. What are the functions of Cassandra?
This database supports two main categories of functions:
Scalar functions: Its primary purpose is taking some groups of values and producing an output with it.
Aggregate functions: Its primary function is producing a combined result using selected multiple rows.
Subscribe For Free Demo[contact-form-7 404 "Not Found"]
9. What are the key terms in Cassandra?
They go as follows:
- Data centre
- Commit log
10. What is a node?
A node is a basic unit of Cassandra, and it is a system which is part of a cluster. Node is the main area where the data is stored. And the units of a node is represented as computer/server
11. What is the data centre?
A data centre is a collection of Cassandra nodes.The data in a datacenter is stored in the form of a cluster, where the cluster is also referred to as a collection of nodes.
12. Describe what is memtable?
MemTable is a location where data is written and stored temporarily. Data is written in memtable after the data is completed in the commit log.
Memtable is a storage engine in Cassandra. Data in MemTable is classified into a key, and where the data is retrieved using the key as each column category has its own MemTable. When the write memory is full, it deletes the messages automatically.
13. What is SSTable?
SSTable also means ‘Sorted String Table’. SSTable is a data file in Cassandra, and its main function is to save data which is flushed from memtable. Unlike MemTable, SSTbale doesn’t delete any data or lets any further addition once data is written.
14. What is the difference between memtable and SSTable?
In MemTable it doesn’t store the data. It temporarily accumulates ‘write data’, but it cannot store it into the disk.
Whereas in SStable, it is used to store the data from Memtable into Cassandra database. The data stored in SSTable is permanent and cannot be changed.
15. How is data distribution done?
Cassandra database is a highly-available database, and it stores data by evenly dividing the data around its nodes. For this, it uses the Murmur3 partitioning function to distribute given data in nodes evenly.
16. How does Cassandra store data?
The data storage path in Cassandra begins with the memtable where the data is stored temporarily and is also called a commit log. And once committed, the data is periodically flushed and written into SSTable
17. What are the general operations of Cassandra CQL?
There are two types of operations carried by Cassandra:
- 1. Read operation
- 2. Write operation
18. What is a direct request?
Direct request in Cassandra is a part of the read operation. In this, the coordinator node contacts the replica node.
19. Define digest request?
When the coordinator node contacts replicas, it actually requests those nodes which reply fastest. Then these contacted nodes respond with a digest of the data required.
20. Explain read repair request?
When the coordinator node sends requests, it checks in the nodes for any outdated data. This data is sent for a background read and repair and is replaced with the updated data. Read and repair requests, is a method to keep the data updated, and it also makes sure that the requested row is consistent on all replicas.
21. What is a write operation?
There are step by step operations in writing, which goes as follows.
Step1: It is as soon as it receives its request it sends the data to the commit log to save the data.
Step2: Data is inserted upon request and then sent to commit log to save data.
Step3: If the memtable reaches its limit then data is flushed to SSTable.
22. What is Cassandra: CAP Theorem?
The CAP theorem, also knowns Brewer’s theorem, states that a distributed computer system can’t use all its three properties at the same time which are
23. What do you mean by ACID?
ACID stands for
Atomicity: Which means either your transaction can fail or commit
Consistency: Its definition changes from software to software or an application to application, but its general meaning is that data has to stay consistent.
Isolation: Data has to be isolated and separated from each other
Durability: It assures you that once the database receives data, it should ensure that the data is processed. So it is an advantage if the database fails, then the data will not be lost.
24. What is BASE?
Not every application or software needs this strong consistency, so this is where base comes into action. The BASE stands for Basically Available Soft-state Eventually-consistent properties.NoSQL databases basically use these models.
25. Explain, what is tunable consistency?
Consistency refers to updating and synchronizing a row of Cassandra data in all of its replicas. By offering tunable consistency for a given operation (read/write., helps the application to decide the right consistency of data.
26. What is the relation between tunable consistency and Cassandra?
Tunable consistency ensures proper levels of consistency for its reads and writes which is the main reason why Cassandra prefers NoSQL databases.
27. What are the best monitor tools of Cassandra?
Although Cassandra comes with built-in tolerance features, it still needs to be monitored for effective results. Here are some tools which Cassandra uses to monitor its databases:
- 1. Solarwind server and application monitor
- 2. Instana
- 3. Instaclustr
- 4. AppDynamics
- 5. Dynatrace
- 6. Machine engine applications manager.
28. What is the NoSQL database?
The primary purpose of usage of NoSQL databases is because it provides smooth handling of large data. Its simplicity of design and simplicity in horizontal scaling to clusters and fine control are few of the reasons why Cassandra uses a NoSQL database.
29. What are the objectives of NoSQL?
The primary objectives of NoSQL DB are:
- To have the simplicity of design
- More exceptional control over availability and
- Horizontal scaling
30. Describe a bloom filter?
A bloom filter is a tool used by Cassandra. The read path of Cassandra has to go through Memtable and the row cache. A bloom filter is a partition cache, and its role is the read path is to avoid checking every SStable to find one particular data.
31. What is CQL?
Initially, Cassandra required an API to do some of the basic tasks like insert, get and delete. But over time, these basic queries were improved and then named Cassandra Query Language. (CQL..
CQL provides a great set of built-in data types, and it also helps the applications to make their own custom data types. Cassandra is also classified as a NoSQL database.
32. Name the key roles of CQL?
It is very necessary to provide different types of users with different kinds of roles depending upon their requirements. It ensures the security of database users. and their key roles goes as follows:
- 1. Create a role
- 2. Alter role
- 3. Drop role
- 4. Grant role
- 5. Revoke role
- 6. List role
33. What is a cluster in Cassandra?
A cluster is a collection of nodes. This collection of nodes represents a single system. It is the outermost structure of the ring in Cassandra.
34. What are CRUD operations?
These operations are used to make changes in the Cassandra database.
CRUD stands for
- Create operation
- Read operation
- Update operation and
- Delete/drop operation.
35. Describe Keyspace.
A keyspace is a part of the cluster which controls the replication of the data in a database. A cluster contains one keyspace per node.
Learn Hands-on Experience from Cassandra Certification Training CourseWeekday / Weekend BatchesSee Batch Details
36. Name the types of Keyspace in Cassandra?
Cassandra keyspace contains 3 types of operations which go as follows:
- 1. Create keyspace
- 2. Alter keyspace
- 3. Drop keyspace
37. Define column family in Cassandra?
Column family in Cassandra is defined as the collection of rows in an ordered and systematic way. It is used to represent the stored data in a structured manner. These are contained in a keyspace, at least one column family in a keyspace
38. What are the characteristics of a column family?
There are many characteristics of a column family, and few of them goes as follows:
- Key cached
- Rows cached
- Preload row cache
39. Explain about the super column in Cassandra?
A super column in Cassandra is an extraordinary and important column. It has so much value because it has the roadmap to all the sub-columns in the database.
These super columns are used to improve the performance of the database These are some important interview questions on Cassandra for beginners and experienced candidates. I hope these questions will help you get familiarise with the concepts and insights of Cassandra and help you prepare for the interviews as well.
40. Define replication factor
The data in a node undergoes replication. The data is copied from one node to another to ensure fault tolerance. The replication factor is the number of copies of the data that are sent to different nodes.
41. Define replication strategy.
These strategies define the technique of how the replicas are placed in a cluster. There are mainly two types of Replication Strategy:
- 1. Simple strategy
- 2. Network Topology Strategy
42. Name some features of Apache Cassandra.
Cassandra has following features:
- 1. High Scalability
- 2. High fault tolerant
- 3. Flexible Data storage
- 4. Easy data distribution
- 5. Tunable Consistency
- 6. Efficient Wires
- 7. Cassandra Query Language
43. Name different types of NoSQL database.
There are four types of NoSQL Database:
- Key Value Store type database (Redis and Voldemort)
- Document Store type database (MongoDB and CouchDB)
- Column STore type database (Cassandra)
- Graph Database ( Neo4j and Giraph)
44. Define NoSQL Database.
It is a database that deals with the non-relational database. It is referred to as a Not only SQL database. It provides a mechanism to store and retrieve the different types of data that includes images, sounds etc.
45. Give key features of any NoSQL database.
The features of NoSQL Database are:
- 1. Schema Agnostic
- 2. AutoSharding and Elasticity
- 3. Highly Distributable
- 4. Easily Scalable
- 5. Integrated Caching
46. Define a column family.
A keyspace contains many column families. They basically represent the table. Furthermore, it basically defines titles or application specific tables.
47. Define Node.
A node represents a system that is a part of a cluster. It is the main area in which data is stored.
48. Define data centre.
Data centre consists of all the data that is contained in Cluster.
49. What is a Keyspace?
Keyspace is the outermost storage unit in a node. It contains many column families.
50. Give the data storage units in Cassandra.
The storage units are:
- Column Family
51. What is data replication in Cassandra?
Data replication is an electronic copying of data from a database in one computer or server to a database in another so that all users can share the same level of information. Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. The replication strategy decides the nodes where replicas are placed.
52. Tell something about the query language used in Cassandra Database.
Cassandra query language is used for Cassandra Database. It is an interface that a user uses to access the database. It basically is a communication medium. All the operations are carried out from this panel.
53. Give some advantages to Cassandra.
These are the advantages if Cassandra:
- 1. Since data can be replicated to several nodes, Cassandra is fault tolerant.
- 2. Cassandra can handle a large set of data.
- 3. Cassandra provides high scalability.
54. Define Cassandra.
Cassandra is a free and open source distributed database management system. It is used to handle a large amount of data with a high fault tolerance and high scalability.
55. Who developed Cassandra and in which language?
Avinash Lakshman and Prashant Malik developed Cassandra using Java. Later Apache took it under it for further development.
Enroll in Cassandra Training with Industry Oriented Modules from Expert Instructors
- Instructor-led Sessions
- Real-life Case Studies
56. What is the main objective of creating Cassandra?
The main objective of Cassandra is to handle a large amount of data. Furthermore, the objective also ensures fault tolerance with the swift transfer of data.
57. Define data replication.
Data replication is an operation in which data from one node is copied to different nodes in the cluster. This operation ensures redundancy and fault tolerance in the database. The replication factor decides the number of copies and the replication strategy decides the nodes in which the data is copied.
58. Define commit log.
It is a mechanism that is used to recover data in case the database crashes. Every operation that is carried out is saved in the commit log. Using this the data can be recovered.
59. Define composite key.
Composite keys include row key and column name. They are used to define column families with a concatenation of data of different types.
60. Define consistency.
This is a technique to synchronize and update rows of Cassandra data and its replica.
61. Name the types of tunable consistency.
Cassandra support two types of consistencies:
- 1. Eventual Consistency
- 2. Strong Consistency
62. Describe Memtable.
Memtables are basically a cache space containing content in key and column format.
63. Define SSTable.
SSTable is a Sorted String Table. It is a data file that accepts regular Mem Tables.
64. Name the management tools in Cassandra.
These are the management tools used in Cassandra.
- 1. DataStaxOpsCenter
- 2. SPM
65. In which language Cassandra is written?
Cassandra is written in Java. It is originally designed by Facebook consisting of flexible schemas. It is highly scalable for big data.
66. Who was the original author of Cassandra?
The original authors of Cassandra are Avinash Lakshman and Prashant Malik. It was initially developed at Facebook to power the Facebook inbox search feature.
67. Which query language is used in Cassandra database?
Cassandra introduced its own Cassandra Query Language (CQL). CQL is a simple interface for accessing Cassandra, as an alternative to the traditional Structured Query Language (SQL).
68. What are the benefits/advantages of Cassandra?
- Cassandra delivers real-time performance simplifying the work of Developers, Administrators, Data Analysts and Software Engineers.
- It provides extensible scalability and can be easily scaled up and scaled down as per the requirements.
- Data can be replicated to several nodes for fault-tolerance.
- Being a distributed management system, there is no single point of failure.
- Every node in a cluster contains different data and is able to serve any request.
69. Where Cassandra stores its data?
Cassandra stores its data in the data dictionary.
70. What was the design goal of Cassandra?
The main design goal of Cassandra was to handle big data workloads across multiple nodes without a single point of failure.
71. Explain what Cassandra is?
Cassandra is an open source data storage system developed at Facebook for inbox search and designed for storing and managing large amounts of data across commodity servers. It can server as both
- 1. Real time data store system for online applications
- 2. Also as a read intensive database for business intelligence system
72. What is the use of Cassandra and why to use Cassandra?
Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure. The various factors responsible for using Cassandra are
- 1. It is fault tolerant and consistent
- 2. Gigabytes to petabytes scalabilities
- 3. It is a column-oriented database
- 4. No single point of failure
- 5. No need for separate caching layer
- 6. Flexible schema design
- 7. It has flexible data storage, easy data distribution, and fast writes
- 8. It supports ACID (Atomicity, Consistency, Isolation, and Durability)properties
- 9. Multi-data center and cloud capable
- 10. Data compression
73. Explain what a composite type is in Cassandra?
In Cassandra, composite type allows to define a key or a column name with a concatenation of data of different types. You can use two types of Composite Type
- Row Key
- Column Name
74. How Cassandra stores data?
All data stored as bytes When you specify validator, Cassandra ensures those bytes are encoded as per requirement
Then a comparator orders the column based on the ordering specific to the encoding
While composite are just byte arrays with a specific encoding, for each component it stores a two byte length followed by the byte encoded component followed by a termination bit.
75. Mention what are the main components of Cassandra Data Model?
The main components of Cassandra Data Model are
- 1. Cluster
- 2. Keyspace
- 3. Column
- 4. Column & Family
76. Explain what is a column family in Cassandra?
Column family in Cassandra is referred for a collection of Rows.
77. Explain what a cluster is in Cassandra?
A cluster is a container for keyspaces. Cassandra database is segmented over several machines that operate together. The cluster is the outermost container which arranges the nodes in a ring format and assigns data to them. These nodes have a replica which takes charge in case of data handling failure.
78. What is keyspace in Cassandra?
In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster contains one keyspace per node.
79. Explain what is a keyspace in Cassandra?
In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consists of one keyspace per node.
80. What is the syntax to create a keyspace in Cassandra?
Syntax for creating keyspace in Cassandra is
- CREATE KEYSPACE <identifier> WITH <properties>
81. Mention what are the values stored in the Cassandra Column?
In Cassandra Column, basically there are three values
- Column Name
- Time Stamp
82. Mention when you can use Alter keyspace?
ALTER KEYSPACE can be used to change properties such as the number of replicas and the durable_write of a keyspace.
83. Explain what Cassandra-Cqlsh is?
Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things
- 1. Define a schema
- 2. Insert a data and
- 3. Execute a query
84. Mention what does the shell commands “Capture” and “Consistency” determine?
There are various Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and adds it to a file while, command “Consistency” displays the current consistency level or sets a new consistency level.
85. What is mandatory while creating a table in Cassandra?
While creating a table primary key is mandatory, it is made up of one or more columns of a table.
86. Mention what needs to be taken care while adding a Column?
While adding a column you need to take care that the
- 1. Column name is not conflicting with the existing column names
- 2. Table is not defined with compact storage option
87. Mention what Cassandra- CQL collections are?
Cassandra CQL collections help you to store multiple values in a single variable. In Cassandra, you can use CQL collections in following ways
- List: It is used when the order of the data needs to be maintained, and a value is to be stored multiple times (holds the list of unique elements)
- SET: It is used for group of elements to store and returned in sorted orders (holds repeating elements)
- MAP: It is a data type used to store a key-value pair of elements
88. Explain how Cassandra writes data?
Cassandra writes data in three components
- 1. Commitlog write
- 2. Memtable write
- 3. SStable write
Cassandra first writes data to a commit log and then to an in-memory table structure memtable and at last in SStable
89. Explain what SStable consists of?
SStable consist of mainly 2 files
- 1. Index file ( Bloom filter & Key offset pairs)
- 2. Data file (Actual column data)
90. Explain what Bloom Filter is used for in Cassandra?
A bloom filter is a space efficient data structure that is used to test whether an element is a member of a set. In other words, it is used to determine whether an SSTable has data for a particular row. In Cassandra it is used to save IO when performing a KEY LOOKUP.
91. Explain how Cassandra writes changed data into commitlog?
- Cassandra concatenate changed data to commitlog
- Commitlog acts as a crash recovery log for data
- Until the changed data is concatenated to commitlog write operation will be never considered successful
Data will not be lost once commitlog is flushed out to file
92. Explain how Cassandra deletes Data?
SSTables are immutable and cannot remove a row from SSTables. When a row needs to be deleted, Cassandra assigns the column value with a special value called Tombstone. When the data is read, the Tombstone value is considered as deleted.
Are you looking training with Right Jobs?Contact Us
- MongoDB Interview Questions and Answers
- Apache Spark & Scala Tutorial
- PostgreSQL Tutorial
- MySQL DBA Interview Questions and Answers
- kafka Interview Questions and Answers
- What is Dimension Reduction? | Know the techniques
- Difference between Data Lake vs Data Warehouse: A Complete Guide For Beginners with Best Practices
- What is Dimension Reduction? | Know the techniques
- What does the Yield keyword do and How to use Yield in python ? [ OverView ]
- Agile Sprint Planning | Everything You Need to Know