Mongodb is a document-oriented NoSQL database used for high volume data storage. In this tutorial you will learn how Mongodb can be accessed and some of its important features like indexing, regular expression, sharding data, etc.
MongoDB is a database which came into light around the mid-2000s. It falls under the category of a NoSQL database.
Here is what we cover in the Course
- MongoDB is a document database. Each database contains collections which in turn contains documents. Each document can be different with varying number of fields. The size and content of each document can be different from each other.
- The document structure is more in line with how developers construct their classes and objects in their respective programming languages. Developers will often say that their classes are not rows and columns but have a clear structure with key-value pairs.
- As seen in the introduction with NoSQL databases, the rows (or documents as called in MongoDB) doesn't need to have a schema defined beforehand. Instead, the fields can be created on the fly.
- The data model available within MongoDB allows you to represent hierarchical relationships, to store arrays, and other more complex structures more easily.
- Scalability – The MongoDB environments are very scalable. Companies across the world have defined clusters with some of them running 100+ nodes with around millions of documents within the database
The below example shows how a document can be modeled in MongoDB.
- The _id field is added by MongoDB to uniquely identify the document in the collection.
- What you can note is that the Order Data ( OrderID , Product and Quantity ) which in RDBMS will normally be stored in a separate table, while in MongoDB it is actually stored as an embedded document in the collection itself. This is one of the key differences of how data is modelled in MongoDB.
What Is Meant By NoSQL
NoSQL is not a relational database. It provides more flexibility since all records are not restricted by the same column names and types defined across the entire table. The below example will give a better idea of what is NoSQL.
Following 2 tables are simple example of a Customer table and an Order table wherein the Customer's table is linked to the Order's table via a relationship.
- Customer Table
- Order Table
In NoSQL, the tables can probably look like the ones as shown below
- Customer Table
- Order Table
- The first thing you will notice straightaway is that you don't have columns with special column names defined, but instead each field has a key-value pair.
- You will notice that in the customer's table that the first 3 keys are the same for all 3 rows, but the fourth key (City and Status) is different for the first 2 rows and not applicable for the third row.
- Likewise, in the Orders tables, the 2nd and 3rd row have no values defined for the 4 column (shipment date).
This is what makes NoSQL so special and unique and also very flexible. In our dynamic and ever changing technology world, business owners now demand for a faster turnaround time to software solutions.
By using flexible databases such as NoSQL databases, we can inculcate a faster turnaround time, because we have more flexibility and less constraints in the way data can be defined.
Just imagine the amount of time spent in adding or editing columns to existing tables in a relational database compared to the amount of effort required in adding the same in a NoSQL database.
Common Terms in MongoDB
Below are the a few of the common terms used in MongoDB
- _id – This is a field required in every MongoDB document. The _id field represents a unique value in the MongoDB document. The _id field is like the document's primary key. If you create a new document without an _id field, MongoDB will automatically create the field. So for example, if we see the example of the above customer table, Mongo DB will add a 24 digit unique identifier to each document in the collection.
- Collection – This is a grouping of MongoDB documents. A collection is the equivalent of a table which is created in any other RDMS such as Oracle or MS SQL. A collection exists within a single database. As seen from the introduction collections don't enforce any sort of structure.
- Cursor – This is a pointer to the result set of a query. Clients can iterate through a cursor to retrieve results.
- Database – This is a container for collections like in RDMS wherein it is a container for tables. Each database gets its own set of files on the file system. A MongoDB server can store multiple databases.
- Document - A record in a MongoDB collection is basically called a document. The document in turn will consist of field name and values.
- Field - A name-value pair in a document. A document has zero or more fields. Fields are analogous to columns in relational databases.
The following diagram shows an example of Fields with Key value pairs. So in the example below CustomerID and 11 is one of the key value pair's defined in the document.
Just a quick note on the key difference between the _id field and a normal collection field. The _id field is used to uniquely identify the documents in a collection and is automatically added by MongoDB when the collection is created.
Why to Use MongoDB
Below are the few of the reasons as to why one should start using MongoDB
- Document-oriented – Since MongoDB is a NoSQL type database, instead of having data in a relational type format, it stores the data in documents. This makes MongoDB very flexible and adaptable to real business world situation and requirements.
- Ad hoc queries - MongoDB supports searching by field, range queries, and regular expression searches. Queries can be made to return specific fields within documents.
- Indexing - Indexes can be created to improve the performance of searches within MongoDB. Any field in a MongoDB document can be indexed.
- Replication - MongoDB can provide high availability with replica sets. A replica set consists of two or more mongo DB instances. Each replica set member may act in the role of the primary or secondary replica at any time. The primary replica is the main server which interacts with the client and performs all the read/write operations. The Secondary replicas maintain a copy of the data of the primary using built-in replication. When a primary replica fails, the replica set automatically switches over to the secondary and then it becomes the primary server.
- Load balancing - MongoDB uses the concept of sharding to scale horizontally by splitting data across multiple MongoDB instances. MongoDB can run over multiple servers, balancing the load and/or duplicating data to keep the system up and running in case of hardware failure.
As we have seen from the Introduction section, the data in MongoDB has a flexible schema. Unlike in SQL databases, where you must have a table's schema declared before inserting data, MongoDB's collections do not enforce document structure. This sort of flexibility is what makes MongoDB so powerful.
When modeling data in Mongo, keep the following things in mind
- What are the needs of the application – Look at the business needs of the application and see what data and the type of data needed for the application. Based on this, ensure that the structure of the document is decided accordingly.
- What are data retrieval patterns – If you foresee a heavy query usage then consider the use of indexes in your data model to improve the efficiency of queries.
- Are frequent insert's, updates and removals happening in the database – Reconsider the use of indexes or incorporate sharding if required in your data modeling design to improve the efficiency of your overall MongoDB environment.
Difference between MongoDB & RDBMS
Below are some of the key term differences between MongoDB and RDBMS
In RDBMS, the table contains the columns and rows which are used to store the data whereas, in MongoDB, this same structure is known as a collection. The collection contains documents which in turn contains Fields, which in turn are key-value pairs.
In RDBMS, the row represents a single, implicitly structured data item in a table. In MongoDB, the data is stored in documents.
In RDBMS, the column denotes a set of data values. These in MongoDB are known as Fields.
In RDBMS, data is sometimes spread across various tables and in order to show a complete view of all data, a join is sometimes formed across tables to get the data. In MongoDB, the data is normally stored in a single collection, but separated by using Embedded documents. So there is no concept of joins in Mongodb.
Apart from the terms differences, a few other differences are shown below
- Relational databases are known for enforcing data integrity. This is not an explicit requirement in MongoDB.
- RDBMS requires that data be normalized first so that it can prevent orphan records and duplicates Normalizing data then has the requirement of more tables, which will then result in more table joins, thus requiring more keys and indexes.
As databases start to grow, performance can start becoming an issue. Again this is not an explicit requirement in MongoDB. MongoDB is flexible and does not need the data to be normalized first.