NoSQL vs SQL- 4 Reasons Why NoSQL is better for Data Science/Data Science applications.

Vishalsheth
5 min readMar 29, 2022

How to read, remember, re-use. → Whenever situations arise for selecting a database either No-SQL or SQL. (Read)→ Consider 4 Points (Remember) → Based on 4 Factors evaluate what your business need (Re-use).

RDBMS is not always the best solution for all situations as it cannot meet the increasing growth of unstructured data.

Big Data NoSQL databases were pioneered by top internet companies like Amazon, Google, LinkedIn, and Facebook to overcome the drawbacks of RDBMS

NoSQL is a dynamic and cloud-friendly approach to dynamically processing unstructured data with ease

There is a need for a database technology that can render 24/7 support to store, process, and analyze this data.

Relational Databases –

The fundamental concept behind databases, namely MySQL, Oracle Express Edition, and MS-SQL which uses SQL, is that they are all Relational Database Management Systems that make use of relations (generally referred to as tables) for storing data.

NoSQL Databases

NoSQL is a database technology driven by Cloud Computing, the Web, Big Data, and Big Users.

NoSQL now leads the way for popular internet companies such as LinkedIn, Google, Amazon, and Facebook — to overcome the drawbacks of the 40-year-old RDBMS.

It does not require any kind of fixed table schemas, unlike SQL.

NoSQL generally scales horizontally and avoids major join operations on the data. NoSQL database can be referred to as structured storage which consists of a relational database as the subset.

NoSQL Database covers a swarm of multitude databases, each having a different kind of data storage model. The most popular types are Graph, Key-Value pairs, Columnar, and Document.

4 Major points to look out for…

1. Nature of Data and Its Storage- Tables vs. Collections
2. Speed Normalization vs. StorageCost
3. Horizontal Scalability vs. Vertical Scalability
4. NoSQL vs SQL / CAP vs.ACID

1. Nature of Data and Its Storage- Tables vs. Collections

The foremost criterion for choosing a database is the nature of the data that your enterprise is planning to control and leverage. If the enterprise plans to pull data similar to an accounting excel spreadsheet, i.e. the basic tabular structured data, then the relational model of the database would prefer to fulfill your business requirements but the current trends demand storing and processing unstructured and unpredictable information.

On the contrary, molecular modeling, geospatial, or engineering parts data is so complex to be dealt with — that the Data Model created for this kind of data is highly complicated due to several levels of nesting. Though several attempts were made to model this kind of data with the ‘2D (Row-Column) Database’ — it did not fit.

Image Credit: couchbase.com

To overcome this drawback, a NoSQL database was considered as an alternate option. NoSQL Databases ease the representation of multi-level hierarchies and nesting using the JSON i.e. JavaScript Object Notation format.

In this world of the dynamic schema where changes pour in every hour, it is not possible to adhere to the “Get it Right First” Strategy — which was a success with the outmoded static schema.

Web-centric businesses like Amazon, eBay, etc., were in need of a database like NoSQL vs SQL that can best match up with the changing data model rendering them greater levels of flexibility in operations.

2. Speed — Normalization vs. Storage Cost

RDBMS requires a higher degree of Normalization i.e. data needs to be broken down into several small logical tables to avoid data redundancy and duplication. Normalization helps manage data in an efficient way, but the complexity of spanning several related tables involved with normalization hampers the performance of data processing in relational databases using SQL.

On the other hand, in NoSQL databases such as Couchbase, Cassandra, and MongoDB, data is stored in the form of flat collections where this data is duplicated repeatedly and a single piece of data is hardly ever partitioned off but rather it is stored in the form of an entity. Hence, reading or writing operations to a single entity have become easier and faster.

NoSQL databases can also store and process data in real-time — something that SQL is not capable of doing it.

3. Horizontal Scalability vs. Vertical Scalability

The most beneficial aspect of NoSQL databases like HBase for Hadoop, MongoDB, Couchbase and 10Gen is — the ease of scalability to handle huge volumes of data.

For instance, if you operate an eCommerce website similar to Amazon and you happen to be an overnight success — you will have tons of customers visiting your website.

Under such circumstances, if you are using a relational database, i.e., SQL, you will have to meticulously replicate and repartition the database so as to fulfill the increasing demand of the customers.

“Most people who choose NoSQL as their primary data storage are trying to solve two main problems: scalability and simplifying the development process,” said Danil Zburivsky, solutions architect at Pythian

Image Credit: couchbase.com

The manner in which NoSQL vs SQL databases scale up to meet the business requirements affects the performance bottleneck of the application.

Generally, with increase in demand, relational databases tend to scale up vertically which means that they add extra horsepower to the system — to enable faster operations on the same dataset.On the contrary, NoSQL Databases like the HBase, Couchbase and MongoD, scale horizontally with the addition ofextra nodes (commodity database servers) to the resource pool, so that the load can be distributed easily.

4. NoSQL vs SQL / CAP vs. ACID

Relational databases using SQL have been legends in the database landscape for maintaining integrity through the ACID properties (Atomicity, Consistency, Isolated, and Durable) of transactions and most of the storage vendors rely on properties.

However, the main motive is to shore up isolated non-dividable transactions — where changes are permanent, leaving the data in a consistent state.

NoSQL Databases work on the concept of the CAP priorities and at a time you can decide to choose any of the 2 priorities out of the CAP Theorem (Consistency-Availability-Partition Tolerance) as it is highly difficult to attain all the three in a changing distributed node system.

One can term NoSQL Databases as BASE, the opposite of ACID — meaning:

BA= Basically Available –In the bag Availability

S= Soft State — The state of the system can change anytime devoid of executing any query because node updates take place every now and then to fulfill the ever-changing requirements.

E=Eventually Consistent- NoSQL Database systems will become consistent in the long run.

Image Credit: smist08.wordpress.com/

Why should you choose a NoSQL Database like HBase, Couchbase or Cassandra over RDBMS?

1)Applications and databases need to work with Big Data

2)Big Data needs a flexible data model with a better database architecture

3)To process Big Data, these databases need continuous application availability with modern transaction support

--

--