NoSQL-i õpetus: NoSQL-i andmebaaside tüübid ja näited

⚡ Nutikas kokkuvõte

NoSQL is a non-relational database management system that does not require a fixed schema, avoids joins, and scales easily. This resource explains what NoSQL is, why it exists, its history, features, the four database types, the CAP theorem, eventual consistency, and its advantages and disadvantages.

  • 📦 Määratlus: A non-relational, schema-free store built for huge, distributed datasets.
  • 📈 Miks: Scaling out across many hosts handles big data faster than scaling up.
  • 🗂️ Four Types: Key-value, column-oriented, graph-based, and document-oriented.
  • 🇧🇷 CAP Theorem: A distributed store can guarantee only two of consistency, availability, partition tolerance.
  • 🔁 PÕHJUS: Basically Available, Soft state, Eventual consistency across replicas.

NoSQL-i õpetus

Mis on NoSQL?

NoSQL andmebaas is a non-relational data management system that does not require a fixed schema. It avoids joins, and is easy to scale. The major purpose of using a NoSQL database is for distributed data stores with humongous data storage needs. NoSQL is used for big data and real-time web apps. For example, companies like Twitter, Facebook, and Google koguvad iga päev terabaite kasutajaandmeid.

NoSQL andmebaas stands for “Not Only SQL” or “Not SQL”. Though a better term would be “NoREL”, NoSQL caught on. Carl Strozzi introduced the NoSQL concept in 1998.

Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a NoSQL database system encompasses a wide range of database technologies that can store structured, semi-structured, unstructured, and polymorphic data. Let us understand about NoSQL with a diagram in this NoSQL database tutorial:

NoSQL andmebaas

Miks NoSQL?

NoSQL-i andmebaaside kontseptsioon muutus populaarseks selliste internetihiiglaste seas nagu Google, Facebook, Amazonjne, kes tegelevad tohutute andmemahtudega. Süsteemi reageerimisaeg muutub aeglaseks, kui kasutate suurte andmemahtude jaoks RDBMS-i.

Selle probleemi lahendamiseks võiksime oma süsteeme laiendada, uuendades olemasolevat riistvara. See protsess on kallis.

The alternative for this issue is to distribute the database load on multiple hosts whenever the load increases. This method is known as “scaling out”.

NoSQL

NoSQL database is non-relational, so it scales out better than relational databases, as they are designed with web applications in mind.

NoSQL-i andmebaaside lühiajalugu

  • 1998 – Carlo Strozzi uses the term NoSQL for his lightweight, open-source relational database.
  • 2000 – Graph database Neo4j is launched.
  • 2004 - Google BigTable is launched.
  • 2005 - CouchDB käivitatakse.
  • 2007 – The research paper on Amazon Dynamo is released.
  • 2008 – Facebook open sources the Cassandra projekti.
  • 2009 – The term NoSQL was reintroduced.

NoSQL-i omadused

Mittesuhteline

  • NoSQL-i andmebaasid ei järgi kunagi relatsiooniline mudel.
  • Never provide tables with flat fixed-column records.
  • Work with self-contained aggregates or BLOBs.
  • Do not require object-relational mapping and data normalization.
  • No complex features like query languages, query planners, referential integrity joins, or ACID.

Skeemivaba

  • NoSQL databases are either schema-free or have relaxed schemas.
  • Do not require any sort of definition of the schema of the data.
  • Offer heterogeneous structures of data in the same domain.
NoSQL-i omadused
NoSQL on skeemivaba

Lihtne API

  • Offers easy-to-use interfaces for storage and querying data.
  • APIs allow low-level data manipulation and selection methods.
  • Text-based protocols mostly used with HTTP REST with JSON.
  • Mostly used no standard-based NoSQL query language.
  • Web-enabled databases running as internet-facing services.

Levitatakse

  • Multiple NoSQL databases can be executed in a distributed fashion.
  • Offers auto-scaling and fail-over capabilities.
  • Often the ACID concept can be sacrificed for scalability and throughput.
  • Mostly no synchronous replication between distributed nodes; asynchronous multi-master replication, peer-to-peer, HDFS replication.
  • Only providing eventual consistency.
  • Shared-nothing architecture. This enables less coordination and higher distribution.
NoSQL-i omadused

NoSQL on jagatud mitte midagi.

NoSQL-i andmebaaside tüübid

NoSQL andmebaasid are mainly categorized into four types: Key-value pair, Column-oriented, Graph-based, and Document-oriented. Every category has its unique attributes and limitations. None of the above-specified databases is better at solving all the problems. Users should select the database based on their product needs.

NoSQL-i andmebaaside tüübid:

  • Võtme-väärtuste paaripõhine
  • Veerule orienteeritud graafik
  • Graafikupõhine
  • Dokumendile orienteeritud

NoSQL-i andmebaaside tüübid

Põhiväärtuste paaril põhinev

Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy load. Key-value pair storage databases store data as a hash table where each key is unique, and the value can be a JSON, BLOB (Binary Large Objects), string, etc.

Näiteks võib võtme-väärtuse paar sisaldada võtit nagu „Veebisait”, mis on seotud väärtusega nagu „Guru99. "

Põhiväärtuste paaril põhinev

It is one of the most basic NoSQL database examples. This kind of NoSQL database is used as a collection, dictionaries, associative arrays, etc. Key-value stores help the developer to store schema-less data. They work best for shopping ostukorvi sisu.

Redis, Dynamo, and Riak are some NoSQL examples of key-value store databases. They are all based on AmazonDünamo paber.

Veerupõhine

Column-oriented databases work on columns and are based on the BigTable paper by Google. Every column is treated separately. Values of single-column databases are stored contiguously.

Veerupõhine NoSQL-i andmebaas

Veerupõhine NoSQL-i andmebaas

They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN, etc., as the data is readily available in a column. Column-based NoSQL databases are widely used to manage data warehouses, ärianalüüsi, CRM, and library card catalogs.

HBase, Cassandra, and Hypertable are NoSQL query examples of column-based databases.

Dokumendile orienteeritud

Document-Oriented NoSQL DB stores and retrieves data as a key-value pair, but the value part is stored as a document. The document is stored in JSON or XML formats. The value is understood by the DB and can be queried.

Suhteline vs. Dokument

Suhteline vs. Dokument

In this diagram on your left, you can see we have rows and columns, and on the right, we have a document database which has a similar structure to JSON. Now for the relational database, you have to know what columns you have, and so on. However, for a document database, you have a data store like a JSON object. You do not need to define it, which makes it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time analytics, and e-commerce applications. It should not be used for complex transactions which require multiple operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, and Lotus Notes are popular document-oriented DBMS-süsteemid.

Graafikupõhine

A graph type database stores entities as well as the relations amongst those entities. The entity is stored as a node with the relationship as edges. An edge gives a relationship between nodes. Every node and edge has a unique identifier.

Graafikupõhine

Compared to a relational database where tables are loosely connected, a graph database is multi-relational in nature. Traversing relationships is fast, as they are already captured in the DB, and there is no need to calculate them. Graph base databases are mostly used for social networks, logistics, and spatial data.

Neo4J, lõpmatu graafik, OrientDB, and FlockDB are some popular graph-based databases.

Päringumehhanismi tööriistad NoSQL-i jaoks

The most common data retrieval mechanism is the REST-based retrieval of a value based on its key/ID with a GET resource.

Document store databases offer more difficult queries, as they understand the value in a key-value pair. For example, CouchDB allows defining views with MapReduce.

Mis on CAP teoreem?

CAP theorem is also called Brewer’s theorem. It states that it is impossible for a distributed data store to offer more than two out of three guarantees:

  1. järjepidevus
  2. Kättesaadavus
  3. Jaotuse tolerants

Järjepidevus: Andmed peaksid jääma järjepidevaks ka pärast toimingu sooritamist. See tähendab, et pärast andmete kirjutamist peaksid kõik tulevased lugemistaotlused neid andmeid sisaldama. Näiteks pärast tellimuse staatuse uuendamist peaksid kõik kliendid nägema samu andmeid.

Saadavus: Andmebaas peaks olema alati kättesaadav ja tundlik. Sellel ei tohiks olla seisakuid.

Jaotuse tolerants: Partition Tolerance tähendab, et süsteem peaks jätkama toimimist ka siis, kui serveritevaheline side ei ole stabiilne. Näiteks saab servereid jagada mitmeks rühmaks, mis ei pruugi omavahel suhelda. Siin, kui osa andmebaasist pole saadaval, ei mõjuta see alati teisi osi.

Lõplik järjepidevus

The term “eventual consistency” means to have copies of data on multiple machines to get high availability and scalability. Thus, changes made to any data item on one machine have to be propagated to other replicas.

Data replication may not be instantaneous, as some copies will be updated immediately while others in due course of time. These copies may be mutually inconsistent, but in due course of time, they become consistent. Hence, the name eventual consistency.

PÕHJUS: Bpõhimõtteliselt Asaadaval, Ssageli, Eventuaalne järjepidevus

  • Basically available means the DB is available all the time as per the CAP theorem.
  • Soft state means even without an input, the system state may change.
  • Eventual consistency means that the system will become consistent over time.

Lõplik järjepidevus

NoSQL-i eelised

  • Can be used as a primary or analytic data source.
  • Big data capability.
  • No single point of failure.
  • Easy replication.
  • No need for a separate caching layer.
  • See tagab kiire jõudluse ja horisontaalse mastaapsuse.
  • Can handle structured, semi-structured, and unstructured data with equal effect.
  • Object-oriented programming which is easy to use and flexible.
  • NoSQL databases do not need a dedicated high-performance server.
  • Support key developer languages and platforms.
  • Simpler to implement than using RDBMS.
  • See võib olla võrgurakenduste peamise andmeallikana.
  • Handles big data which manages data velocity, variety, volume, and complexity.
  • Excels at distributed database and multi-data center operations.
  • Eliminates the need for a specific caching layer to store data.
  • Offers a flexible schema design which can easily be altered without downtime or service disruption.

NoSQL-i puudused

  • No standardization rules.
  • Limited query capabilities.
  • RDBMS databases and tools are comparatively mature.
  • See ei paku traditsioonilisi andmebaasivõimalusi, nagu järjepidevus, kui korraga tehakse mitu tehingut.
  • When the volume of data increases, it is difficult to maintain unique values as keys become difficult.
  • Does not work as well with relational data.
  • The learning curve is stiff for new developers.
  • Open source options are not so popular for enterprises.

KKK

NoSQL databases handle the large, varied, and fast-changing data that AI and big data pipelines produce. Their flexible schema and horizontal scaling suit storing training data, logs, and real-time features across distributed clusters.

Yes. Several NoSQL databases, such as MongoDB and Elasticsearch, now support vector fields and similarity search. This lets AI applications store embeddings next to documents for semantic search and recommendation features.

SQL databases are relational with a fixed schema and use tables, rows, and joins. NoSQL databases are non-relational, use flexible schemas, and scale horizontally, storing data as documents, key-values, columns, or graphs.

Avoid NoSQL when you need strong ACID transactions, complex joins, or strict data integrity, such as in banking. Mature relational databases handle multi-record consistency and standardized queries better in those cases.

Võta see postitus kokku järgmiselt: