The first part of this article put the “Not only Structured Query Language” (NoSQL) movement in a context. We saw that so-called “relational” databases dominated the database management systems (DBMS) for years. New web (and M2M) needs gradually emerged and created a demand for alternative approaches.
While the NoSQL players are united in their drive to rethink data management and get off the beaten path of SQL, each new product has blazed its own unique trail. In fact, the major trend in NoSQL DBMS is specialization. The goal here is to offer a solution tailored to a special set of user needs for their data.
the NoSQL extended family
A few trends emerge when we look at all the different solutions that fall under the NoSQL label. Below is a non-exhaustive list. Any given NoSQL database may fit in with one, several or none of the following categories:
- "in-memory" operation (ex: Redis): this kind of DBMS is designed to work with any data stored in RAM, which offers enormous performance in exchange for some data volatility (loss possible in case of system crash)
- a “document-oriented”/“no schema” data structure (ex: MongoDB):
- data is stored as documents, which are in turn organized into collections. No fixed schema is required for documents in the same collection, so you can store a mixed group of items together. The approach is very flexible in terms of schema management, and it also speeds up tests and modeling. In any case, “no schema” does not mean “no planning,” and you should make sure to de-standardize the data so as to optimize database performance
- "big table"/"column-oriented" structure (ex: Cassandra): data is stored in multidimensional tables where the number of columns can vary end evolve over time. This approach offers high-performance distributed structures
- "graph-oriented" approach (ex: Neo4j): these DBMS are specialized in the management of very closely linked data, and are, for example, especially useful for representing connections between user accounts in a social network
- sharding and/or replication support: the DBMS makes it possible to distribute a database over several machines (“nodes”) to increase performance and/or availability. This distribution can be administered or automated in various degrees and can offer different levels of resilience depending on how data is distributed and how nodes are synchronized (data replication level, synchronous or asynchronous replication, multi-master operation, data conflict resolution management, etc.)
- MapReduce (or similar model) support: the DBMS can run an operation at the same time as a calculation on the different nodes where data is distributed. This allows for advanced statistical analysis of potentially enormous data volumes. This solution type is therefore especially useful for Big Data
It’s clear: NoSQL databases offer a broad variety of possible approaches to data management. In some cases they even provide substantial gains in performance or development time.
However, NoSQL database usage has its price:
- training: each tool has its own principles, terminology, programming interface, administration methods, and maintenance procedures. Does your team have enough time to explore all of these techniques and find the shoe that fits? Will you be able to find the trained staff to use, install, and maintain these technologies?
- security: which has yet to receive more than superficial treatment from most of these technologies (preferring to leave it to the front-end side). Though relational solutions have a reputation for being more thoroughly developed, they can surprise you, and not in a good way.
- the (re)introduction of strong coupling between a DBMS option and client application design (once you pick a NoSQL DBMS, it’s hard to change it).
Choosing a NoSQL DBMS is no easy decision. It should only be done after serious reflection on the structure and use of your data.
That’s the great success of this movement (a “Big Data” precursor): putting our focus back on data.
The approach also works very well in fields like M2M, where being able to store, treat, and correlate large volumes of raw and diverse data is the main source of added value.
photo: © JohnKwan - Fotolia.com
This post was originally published in French here.
Architecte logiciel chez It&L@bs, entité d’Orange Business Services, j’interviens depuis deux ans sur des projets trans-techno (J2EE, .Net, web, embarqué) dans les domaines du M2M et de l’informatique industrielle. Passionné par l’informatique, j’aime suivre de près les dernières tendances technologiques dans le domaine du web et de l’ingénierie de SI.