Skip to content
English
On this page

Amazon DocumentDB

During re:Invent 2018, AWS released Amazon DocumentDB with the intent of meeting customer demands around managed document databases. I will try to introduce the benefits of a Document Database, before looking at the architecture and the interfaces for the database.

Why a Document Database?

The data world today is pretty complex, and we have a variety of data types and file formats. Gone are the days when companies would retrofit their use cases into a single tool, as now you have the option to choose the tool that fits your specific use cases. The common categories of data and use cases are depicted in the following table.

Common data categories and use cases

Data CategoryKey FeaturesUse Case
Relational Referential integrity
ACID transactions
schema-on-write
Databases, data warehouses
ERP
Finance
CRMs
Lift and shift from other systems
Key/value High-throughput
Low latency (typically milliseconds)
read and writes
Unlimited scale
Real-time bidding
Shopping cart
Social
Product catalog
Customer preferences
Recommendations
Adverts
Document Raw JSON documents
Quick access
Native reads
Content management
Personalization
Mobile
Catalog
Retail and marketing
In-memoryMicrosecond latency Leaderboards
Real-time analytics
Caching
GraphFinding complex relationships
Navigate/traverse between data elements
Fraud detection
Social network analysis
Recommendation engine
Time seriesCollect/Store/process data sequenced by timeIoT applications
Event tracking
LedgerComplete
Immutable and verifiable history of all changes to data
Systems of record
Supply Chain
Healthcare
Registrations
Financial

A document database is useful when storing raw JSON data and navigating through the JSON document. Since JSON documents are semi-structured in nature, they do not naturally fit with relational databases, which don’t have native JSON handling capabilities. A JSON document model maps naturally to application data, and each document can have a different structure and is independent of other documents. You can index on any key within the document and run ad hoc aggregation and analysis across your entire dataset. As you know, in a relational database, each row has the same number of attributes, whereas non-relational data tends to have varying attributes for each item. Let us consider a use case where a relational database doesn’t scale as effectively as a document database. You are running a leading social media company and you have decided to add a new game to the platform called Ring-o-Ring. You would like to track the stats of the users that play the game. You have two options, a relational database or a document database. In a relational database, you would create a user’s table:

useridusernamefirstnamelastname
101fati1122FatimaAbbasi

This may work for one game, but the moment you add more games and apps on your platform, this can become really complex. You can “normalize” your way out of complexity at the cost of performance by doing joins at runtime, but the complexity moves from one side to another, making it more expensive to maintain the application.

In a document database, you can simply add the stats to the document. No table creation is needed, and no expensive joins are required at runtime, making it simple to develop and cheaper to maintain.

Amazon DocumentDB Overview

Amazon DocumentDB is a fast, scalable, highly available and fully managed MongoDBcompatible service. It has the following key features that make it an ideal fit for storing, processing, and analyzing documents:

  • It can handle millions of requests at a single-digit millisecond latency.
  • It is deeply integrated with the AWS platform.
  • You can use the same code, drivers, and tools that you use with MongoDB.
  • It is secure.
  • It is fully managed.

Amazon DocumentDB supports flexible storage, where the storage volume grows as your database storage needs to grow in increments of 10 GB up to a maximum of 64 TB. You don’t need to provision any excess storage for your cluster to handle growth as this is done behind the scenes. For read-heavy applications, Amazon DocumentDB can support up to 15 read replica instances. Since the replicas share the same underlying storage, you can reduce the overall cost by eliminating the need to perform writes at read-replica nodes. The replica lag time is reduced to a single-digit number of milliseconds and provides you with better response for your read heavy workloads. Amazon DocumentDB provides a cluster endpoint for read/write and a reader endpoint for reading data, which allows applications to connect without having to track replicas as they are added or removed. You can add read replicas in minutes regardless of the storage volume sizes. You can scale the compute and memory resources of your cluster up and down, and these scaling operations complete within a few minutes.

Amazon DocumentDB continuously monitors the health of your cluster, and upon detecting an instance failure, it will automatically restart the instance and associated processes. Amazon DocumentDB doesn’t require a crash recovery replay of database redo logs, which greatly reduces restart times. Amazon DocumentDB also isolates the database cache from the database process, enabling the cache to survive an instance restart. Upon failure, one of the 15 read replicas provisioned can act as a failover instance, and in case a customer hasn’t provisioned a read replica, a new Amazon DocumentDB instance is created automatically

Amazon DocumentDB also allows point-in-time recovery (PITR), which allows you to restore your cluster to any second during the retention period, up to the last 5 minutes. The retention period can be configured up to 15 days, with backups being stored in Amazon S3. Backups are done in an automatic incremental fashion without impacting the cluster performance.

Amazon Document DB Architecture

The figure displays the DocumentDB architecture. An Amazon DocumentDB cluster contains 0 (zero) to 16 instances, and a cluster storage volume manages the storage for the instance. All writes are done through the primary instance, whereas the reads are done through primary or read replicas. Amazon DocumentDB instances can only run in a VPC, which therefore gives you full control over your virtual networking environment. Amazon DocumentDB architecture separates storage and compute. For each storage layer, Amazon DocumentDB will replicate six copies of your data across three AWS Availability Zones, which provides additional fault tolerance and redundancy.

Amazon DocumentDB architecture

DocumentDB

Amazon DocumentDB Interfaces

There are multiple ways in which you can interact with Amazon DocumentDB:

  • AWS Management Console
  • AWS CLI
  • The Mongo shell: You can use the Mongo shell to connect to your cluster to create, read, update, and delete documents in your database.
  • MongoDB Drivers: For developing applications against an Amazon DocumentDB cluster, you can use the MongoDB drivers with Amazon DocumentDB.