Main Big Data Technologies: NoSQL

When considering the technologies required to approach the problem of Big Data, it’s only natural to consider the database management system first. Most of the most widely used databases are already optimized to store and handle large data volumes. For some years now, systems based on the relational model have been successfully used both in the industry and in research environments. However, the threshold defined by the term “Big Data” entails, above all, a paradigm shift in the information management model.

While databases based on the relational model guarantee certain properties which at first sight might seem more important or even necessary (the famous ACID trifecta), nowadays it is impossible to handle certain volumes without relaxing some of them. It is precisely out of this relaxing and out of the need to provide other properties that a new data management paradigm has arisen: NoSQL.

The properties to be provided by these new systems, in particular in an Internet environment, are:

  • High availability
  • Failure tolerance
  • Large storage capacity
  • High input/output capacity

To provide all this, ACID is usually not guaranteed, and the system has a distributed architecture. In addition, the data access interface loses expression capacity with respect to SQL (hence the name NoSQL), given that the complexity of the data scheme is much lower.

Even though in the most modern systems all these features can be configured, when optimum performance is required most of them must be given up. For example, a strong consistency is usually not guaranteed, as in a distributed architecture changes must be spread among the various machines. Fulfilling ACID principles would be a burden on performance which would make the system non-optimal for the most usual Big Data scenario.

Nowadays various distributed database systems can be found to boost different aspects. In fact, there is a distributed system theorem, the CAP theorem , which clearly groups them. By this theorem, a distributed system cannot fully provide the following attributes at the same time: Strong Consistency, High Availability, and Partition Tolerance. Taking into account the fact that a distributed database can satisfactorily provide two of these attributes, these systems can be grouped on the basis of these features. The following grouping shows some examples:

cap-theorem As can be seen, the group of systems that meet the availability and consistency requirements includes those derived from the relational model. On the right hand side are those inspired by Amazon Dynamo, and on the lower part the descendants of Google Big Table. This classification can be very useful to face Big Data problems. In our next posts we will discuss some of these systems in depth and compare their features on the ground.

Subscribe to our mailing list

Our Personalization Solution

Want to increase conversions and sales of your eCommerce Website? Discover our 360º eCommerce Personalization Solution and Try it for Free!

  1. We offer expertise drawn from years of experience in the European incorporation market.By applying this experience and using our excellent network of country representative offices, legal and financial
    contacts we can assist you with company formations in all 26 countries throughout Europe.See more
    at: European Company registration online

  2. Company formation is the term for the process of incorporation of a business in the UK. It
    is also sometimes referred to as company registration. These terms are both also used when incorporating a business in the Republic of Ireland.See more at: Offshore Merchant Account In USA

Post your thoughts