Nonrelational Databases

Nonrelational databases are commonly used for internet-scale applications that do not require any complex queries.

NoSQL Database

NoSQL databases are nonrelational databases optimized for scalable performance and schema-less data models. NoSQL databases are also widely recognized for their ease of development, low latency, and resilience. NoSQL database systems use a variety of models for data management, such as inmemory key-value stores, graph data models, and document stores. These types of databases are optimized for applications that require large data volume, low latency, and flexible data models, which are achieved by relaxing some of the data consistency restrictions of traditional relational databases.

When to Use a NoSQL Database

NoSQL databases are a great fit for many big data, mobile, and web applications that require greater scale and higher responsiveness than traditional relational databases. Because of simpler data structures and horizontal scaling, NoSQL databases typically respond faster and are easier to scale than relational databases.

Comparison of SQL and NoSQL Databases

Many developers are familiar with SQL databases but might be new to working with NoSQL databases. Relational database management systems (RDBMS) and nonrelational (NoSQL) databases have different strengths and weaknesses. In a RDBMS, data can be queried flexibly, but queries are relatively expensive and do not scale well in high-traffic situations. In a NoSQL database, you can query data efficiently in a limited number of ways. Table 4.3 shows a comparison of different characteristics of SQL and NoSQL databases.

SQL vs. NoSQL Database Characteristics

Type	SQL	NoSQL
Data Storage	Rows and columns	Key-value, document, wide-column, graph
Schemas	Fixed	Dynamic
Querying	Using SQL	Focused on collection of documents
Scalability	Vertical	Horizontal
Transactions	Supported	Support varies
Consistency	Strong	Eventual and strong

The storage format for SQL versus NoSQL databases also differs. As shown in the figure, SQL databases are often stored in a row and column format, whereas NoSQL databases, such as Amazon DynamoDB, have a key-value format that could be in a JSON format, as shown in this example

SQL versus NoSQL format comparison

SQL versus NoSQL

NoSQL Database Types

There are four types of NoSQL databases: columnar, document, graph, and in-memory key-value. Generally, these databases differ in how the data is stored, accessed, and structured, and they are optimized for different use cases and applications.

Columnar databases Columnar databases are optimized for reading and writing columns of data as opposed to rows of data. Column-oriented storage for database tables is an important factor in analytic query performance because it drastically reduces the overall disk I/O requirements and reduces the amount of data that you must load from disk.

Document databases Document databases are designed to store semi-structured data as documents, typically in JSON or XML format. Unlike traditional relational databases, the schema for each NoSQL document can vary, giving you more flexibility in organizing and storing application data and reducing storage required for optional values.

Graph databases Graph databases store vertices and directed links called edges. Graph databases can be built on both SQL and NoSQL databases. Vertices and edges can each have properties associated with them.

In-memory key-value stores In-memory key-value stores are NoSQL databases optimized for read-heavy application workloads (such as social networking, gaming, media sharing, and Q&A portals) or compute-intensive workloads (such as a recommendation engine). In-memory caching improves application performance by storing critical pieces of data in memory for low-latency access.

Amazon DynamoDB

Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database, and it supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity make it a great fit for the following:

Mobile
Gaming
Adtech
Internet of Things (IoT)
Applications that do not require complex queries

With DynamoDB, you can create database tables that can store and retrieve any amount of data and serve any level of request traffic. You can scale up or scale down your table throughput capacity without downtime or performance degradation. DynamoDB automatically spreads the data and traffic for your tables over a sufficient number of servers to handle your throughput and storage requirements while maintaining consistent and fast performance. All of your data is stored on solid-state drives (SSDs) and automatically replicated across multiple Availability Zones in an AWS Region, providing built-in high availability and data durability. You can use global tables to keep DynamoDB tables in sync across AWS Regions.

Core Components of Amazon DynamoDB

In DynamoDB, tables, items, and attributes are the core components with which you work. A table is a collection of items, and each item is a collection of attributes. DynamoDB uses partition keys to identify uniquely each item in a table. Secondary indexes can be used to provide more querying flexibility. You can use DynamoDB Streams to capture data modification events in DynamoDB tables.

The figure shows the DynamoDB data model, including a table, items, attributes, a required partition key, an optional sort key, and an example of data being stored in partitions.

Amazon DynamoDB tables and partitions

DynamoDB

Tables

Similar to other database systems, DynamoDB stores data in tables. A table is a collection of items. For example, a table called People could be used to store personal contact information about friends, family, or anyone else of interest.

Items

An item in DynamoDB is similar in many ways to rows, records, or tuples in other database systems. Each DynamoDB table contains zero or more items. An item is a collection of attributes that is uniquely identifiable for each record in that table. For a People table, each item represents a person. There is no limit to the number of items that you can store in a table.

Attributes

Each item is composed of one or more attributes. Attributes in DynamoDB are similar in many ways to fields or columns in other database systems. An attribute is a fundamental data element, something that does not need to be broken down any further. You can think of an attribute as similar to columns in a relational database. For example, an item in a People table contains attributes called PersonID, Last Name, First Name, and so on.

The figure shows a table named People with items and attributes. Each block represents an item, and within those blocks you have attributes that define the overall item:

Each item in the table has a unique identifier, a primary key, or a partition key that distinguishes the item from all of the others in the table. The primary key consists of one attribute (PersonID).
Other than the primary key, the People table is schemaless, which means that you do not have to define the attributes or their data types beforehand. Each item can have its own distinct attributes. This is where the contrast begins to show between NoSQL and SQL. In SQL, you would have to define a schema for each person, and every person would need to have the same data points or attributes. As you can see in figure, with NoSQL and DynamoDB, each person can have different attributes.
Most of the attributes are scalar, so they can have only one value. Strings and numbers are common examples of scalars.
Some of the items have a nested attribute (Address). DynamoDB supports nested attributes up to 32 levels deep.

Amazon DynamoDB table with items and attributes

People

Primary Key

When you create a table, at a minimum, you are required to specify the table name and primary key of the table. The primary key uniquely identifies each item in the table. No two items can have the same key within a table. DynamoDB supports two different kinds of primary keys: partition key and partition key and sort key.

Partition key (hash key) A simple primary key, composed of one attribute, is known as the partition key. DynamoDB uses the partition key’s value as an input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item is stored.

In a table that has only a partition key, no two items can have the same partition key value. For example, in the People table, with a simple primary key of PersonID, you cannot have two items with PersonID of 000-07-1075.

The partition key of an item is also known as its hash attribute. The term hash attribute derives from the use of an internal hash function in DynamoDB that evenly distributes data items across partitions based on their partition key values.

Each primary key attribute must be a scalar (meaning that it can hold only a single value). The only data types allowed for primary key attributes are string, number, or binary. There are no such restrictions for other, nonkey attributes.

Partition key and sort key (range attribute) A composite primary key is composed of two attributes: partition key and the sort key. The sort key of an item is also known as its range attribute. The term range attribute derives from the way that DynamoDB stores items with the same partition key physically close together, in sorted order, by the sort key value.

The sort key of an item is also known as its range attribute. The term range attribute derives from the way that DynamoDB stores items with the same partition key physically close together, in sorted order, by the sort key value.

The partition key acts the same as the sort key, but in addition to also using a sort key, the items with the same partition key are stored together, in sorted order, by sort key value.

In a table that has a partition key and a sort key, it’s possible for two items to have the same partition key value, but those two items must have different sort key values. You cannot have two items in the table that have identical partition key and sort key values.

For example, if you have a Music table with a composite primary key (Artist and SongTitle), you can access any item in the Music table directly if you provide the Artist and SongTitle values for that item.

A composite primary key gives you additional flexibility when querying data. For example, if you provide only the value for Artist, DynamoDB retrieves all of the songs by that artist. To retrieve only a subset of songs by a particular artist, you can provide a value for Artist with a range of values for SongTitle.

As a developer, the attribute you choose for your application has important implications. If there is little differentiation among partition keys, all of your data is stored together in the same physical location.

The figure shows an example of these two types of keys. In the SensorLocation table, the primary key is the SensorId attribute. This means that every item (or row) in this table has a unique SensorId, meaning that each sensor has exactly one location or latitude and longitude value.

Amazon DynamoDB primary keys

Primaty Key

Conversely, the SensorReadings table has a partition key and a sort key. The SensorId attribute is the partition key and the Time attribute is the sort key, which combined make it a composite key. For each SensorId, there may be multiple items corresponding to sensor readings at different times. The combination of SensorId and Time uniquely identifies items in the table. This design enables you to query the table for all readings related to a particular sensor.

Secondary Indexes

If you want to perform queries on attributes that are not part of the table’s primary key, you can create a secondary index. By using a secondary index, you can query the data in the table by using an alternate key, in addition to querying against the primary key. DynamoDB does not require that you use indexes, but doing so may give you more flexibility when querying your data depending on your application and table design.

After you create a secondary index on a table, you can then read data from the index in much the same way as you do from the table. DynamoDB automatically creates indexes based on the primary key of a table and automatically updates all indexes whenever a table changes.

A secondary index contains the following:

Primary key attributes
Alternate key attributes
(Optional) A subset of other attributes from the base table (projected attributes)

DynamoDB provides fast access to the items in a table by specifying primary key values. However, many applications might benefit from having one or more secondary (or alternate) keys available. This allows efficient access to data with attributes other than the primary key.

DynamoDB supports two types of secondary indexes: local secondary indexes and global secondary indexes. You can define up to five global secondary indexes and five local secondary indexes per table.

Local Secondary Index

A local secondary index is an index that has the same partition key as the base table, but a different sort key. It is “local” in the sense that every partition of a local secondary index is scoped to a base table partition that has the same partition key value. You can construct only one while creating the table, but you cannot add, remove, or modify it later.

Local secondary index

Secondary Index

Global Secondary Index

A global secondary index is an index with a partition key and a sort key that can be different from those on the base table. It is considered “global” because queries on the index can span all of the data in the base table across all partitions. You can create one during table creation, and you can add, remove, or modify it later.

Global secondary index

global Secondary

You can create a global secondary index, not a local secondary index, after table creation.

For example, by using a Music table, you can query data items by Artist (partition key) or by Artist and SongTitle (partition key and sort key). Suppose that you also wanted to query the data by Genre and Album Title . To do this, you could create a global secondary index on Genre and AlbumTitle and then query the index in much the same way as you’d query the Music table.

The figure shows the example Music table with a new index called GenreAlbumTitle. In the index, Genre is the partition key, and AlbumTitle is the sort key.

Amazon DynamoDB table and secondary index

Secondary Index

Note the following about the GenreAlbumTitle index:

Every index belongs to a table, which is called the base table for the index. In the preceding example, Music is the base table for the GenreAlbumTitle index.
DynamoDB maintains indexes automatically. When you add, update, or delete an item in the base table, DynamoDB adds, updates, or deletes the corresponding item in any indexes that belong to that table.
When you create an index, you specify which attributes will be copied, or projected, from the base table to the index. At a minimum, DynamoDB projects the key attributes from the base table into the index. This is the case with GenreAlbumTitle, wherein only the key attributes from the Music table are projected into the index.

You can query the GenreAlbumTitle index to find all albums of a particular genre (for example, all Hard Rock albums). You can also query the index to find all albums within a particular genre that have certain album titles (for example, all Heavy Metal albums with titles that start with the letter M).

Comparison of Local Secondary Indexes and Global Secondary Indexes

To determine which type of index to use, consider your application’s requirements. The table shows the main differences between a global secondary index and a local secondary index.

Comparison of Local and Global Secondary Indexes

Characteristic	Global Secondary Index	Local Secondary Index
Query Scope	Entire table, across all partitions.	Single partition, as specified by the partition key value in the query.
Key Attributes	Partition key, or partition and sort key. Can be any scalar attribute in the table.	Partition and sort key. Partition key of index must be the same attribute as base table.
Projected Attributes	Only projected attributes can be queried.	Can query attributes that are not projected. Attributes are retrieved from the base table.
Read Consistency	Eventual consistency only.	Eventual consistency or strong consistency.
Provisioned Throughput	Separate throughput settings from base table. Consumes separate capacity units	Same throughput settings as base table. Consumes base table capacity units.
Lifecycle Considerations	Can be created or deleted at any time.	Must be created when the table is created. Can be deleted only when the table is deleted.

Amazon DynamoDB Streams

Amazon DynamoDB Streams is an optional feature that captures data modification events in DynamoDB tables. The data about these events appears in the stream in near real time and in the order that the events occurred. Each event is represented by a stream record. If you enable a stream on a table, DynamoDB Streams writes a stream record whenever one of the following events occurs:

A new item is added to the table — The stream captures an image of the entire item,including all of its attributes. An item is updated — The stream captures the “before” and “after” images of any attributes that were modified in the item. An item is deleted from the table — The stream captures an image of the entire item before it was deleted.

Each stream record also contains the name of the table, the event timestamp, and other metadata. Stream records have a lifetime of 24 hours; after that, they are automatically removed from the stream.

The figure shows how you can use DynamoDB Streams together with AWS Lambda to create a trigger—code that executes automatically whenever an event of interest appears in a stream. For example, consider a Customers table that contains customer information for a company. Suppose that you want to send a “welcome” email to each new customer. You could enable a stream on that table and then associate the stream with a Lambda function. The Lambda function would execute whenever a new stream record appears, but only process new items added to the Customers table. For any item that has an EmailAddress attribute, the Lambda function could invoke Amazon Simple Email Service (Amazon SES) to send an email to that address.

Example of Amazon DynamoDB Streams and AWS Lambda

customers

In the example shown in Figure 4.16 , the last customer, Craig Roe, will not receive an email because he does not have an EmailAddress.

In addition to triggers, DynamoDB Streams enables other powerful solutions that developers can create, such as the following:

Data replication within and across AWS regions
Materialized views of data in DynamoDB tables
Data analysis by using Amazon Kinesis materialized views

Read Consistency

DynamoDB replicates data among multiple Availability Zones in a region. When your application writes data to a DynamoDB table and receives an HTTP 200 response ( OK), all copies of the data are updated. The data is eventually consistent across all storage locations, usually within 1 second or less. DynamoDB supports both eventually consistent and strongly consistent reads.

Eventually Consistent Reads

When you read data from a DynamoDB table immediately after a write operation, the response might not reflect the results of a recently completed write operation. The response might include some stale data. If you repeat your read request after a short time, the response should return the latest data. DynamoDB uses eventually consistent reads, unless you specify otherwise.

Strongly Consistent Reads

When querying data, you can specify whether DynamoDB should return strongly consistent reads. When you request a strongly consistent read, DynamoDB returns a response with the most up-to-date data, reflecting updates from all prior write operations that were successful. A strongly consistent read might not be available if there is a network delay or outage.

Comparison of Consistent Reads

As a developer, it is important to understand the needs of your application. In some applications, eventually consistent reads might be fine, such as a high-score dashboard. In other applications or parts of an application, however, such as a financial or medical system, an eventually consistent read could be an issue. You will want to evaluate your data usage patterns to ensure that you are choosing the right type of reads for each part of your application. There is an additional cost for strongly consistent reads, and they will have more latency in returning data than an eventually consistent read. So, that cost and timing should also play into your decision.

Read and Write Throughput

When you create a table or index in DynamoDB, you must specify your capacity requirements for read and write activity. By defining your throughput capacity in advance, DynamoDB can reserve the necessary resources to meet the read and write activity your application requires, while ensuring consistent, low-latency performance. Specify your required throughput value by setting the ProvisionedThroughput parameter when you create or update a table.

You specify throughput capacity in terms of read capacity units and write capacity units:

One read capacity unit (RCU) represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size. If you need to read an item that is larger than 4 KB, DynamoDB will need to consume additional read capacity units. The total number of read capacity units required depends on the item size and whether you want an eventually consistent or strongly consistent read.
One write capacity unit (WCU) represents one write per second for an item up to 1 KB in size. If you need to write an item that is larger than 1 KB, DynamoDB must consume additional write capacity units. The total number of write capacity units required depends on the item size.

For example, suppose that you create a table with five read capacity units and five write capacity units. With these settings, your application could do the following:

Perform strongly consistent reads of up to 20 KB per second (4 KB × 5 read capacity units).
Perform eventually consistent reads of up to 40 KB per second (twice as much read throughput).
Write up to 5 KB per second (1 KB × 5 write capacity units).

If your application reads or writes larger items (up to the DynamoDB maximum item size of 400 KB), it consumes more capacity units.

If your read or write requests exceed the throughput settings for a table, DynamoDB can throttle that request. DynamoDB can also throttle read requests excess for an index. Throttling prevents your application from consuming too many capacity units. When a request is throttled, it fails with an HTTP 400 code (Bad Request) and a ProvisionedThroughputExceededException. The AWS SDKs have built-in support for retrying throttled requests, so you do not need to write this logic yourself.

DynamoDB provides the following mechanisms for managing throughput as it changes:

Amazon DynamoDB Auto Scaling DynamoDB automatic scaling actively manages throughput capacity for tables and global secondary indexes. With automatic scaling, you define a range (upper and lower limits) for read and write capacity units. You also define a target utilization percentage within that range. DynamoDB auto scaling seeks to maintain your target utilization, even as your application workload increases or decreases.

Provisioned throughput If you aren’t using DynamoDB auto scaling, you have to define your throughput requirements manually. As discussed, with this setting you may run into a ProvisionedThroughputExceededException if you are throttled. But you can change your throughput with a few clicks.

Reserved capacity You can purchase reserved capacity in advance, where you pay a onetime upfront fee and commit to a minimum usage level over a period of time. You may realize significant cost savings compared to on-demand provisioned throughput settings.

On-demand It can be difficult to plan capacity, especially if you aren’t collecting metrics or perhaps are developing a new application and you aren’t sure what type of performance you require. With On-Demand mode, your DynamoDB table will automatically scale up or down to any previously reached traffic level. If a workload’s traffic level reaches a new peak, DynamoDB rapidly adapts to accommodate the workload. As a developer, focus on making improvements to your application and offload scaling activities to AWS.

Partitions and Data Distribution

When you are using a table in DynamoDB, the data is placed on multiple partitions (depending on the amount of data and the amount of throughput allocated to it; recall that throughput is determined by RCUs and WCUs). When you allocate RCUs and WCUs to a table, those RCUs and WCUs are split evenly among all partitions for your table. For example, suppose that you have allocated 1,000 RCUs and 1,000 WCUs to a table, and this table has 10 partitions allocated to it. Then each partition would have 100 RCUs and 100 WCUs for it to use. If one of your partitions consumes all the RCUs and WCUs for the table, you may receive a ProvisionedThroughputExceededException error because one of your partitions is hot. To deal with hot partitions, DynamoDB has two features: burst capacity and adaptive capacity.

Burst Capacity

The previous example discussed how you had 10 partitions, each with 100 RCUs and 100 WCUs allocated to them. One of your partitions begins to become hot and now needs to consume more than 100 RCUs. Under normal circumstances, you may receive the ProvisionedThroughputExceededException error. However, with burst capacity, whenever your partition is not using all of its total capacity, DynamoDB reserves a portion of that unused capacity for later bursts of throughput to handle any spike your partition may experience. At the time of this writing, DynamoDB currently reserves up to 300 seconds (5 minutes) of unused read and write capacity, which means that your partition can handle a peak load for 5 minutes over its normal expected load. Burst capacity is enabled and runs in the background.

Adaptive Capacity

Adaptive capacity is when it is not always possible to distribute read and write activity to a partition evenly. In the example, a partition is experiencing not only peak demand but also consistent demand over and above its normal 100 RCU and 100 WCUs. Suppose that now this partition requires 200 RCUs instead of 100 RCUs.

DynamoDB adaptive capacity enables your application to continue reading and writing to hot partitions without being throttled, provided that the total provisioned capacity for the table is not exceeded. DynamoDB allocates additional RCUs to the hot partition; in this case, 100 more. With adaptive capacity, you will still be throttled for a period of time, typically between 5–30 minutes, before adaptive capacity turns on or activates. So, for a portion of time, your application will be throttled; however, after adaptive capacity allocates the RCUs to the partition, DynamoDB is able to sustain the new higher throughput for your partition and table. Adaptive capacity is on by default, and there is no need to enable or disable it.

Retrieving Data from DynamoDB

Two primary methods are used to retrieve data from DynamoDB: Query and Scan.

Query

In DynamoDB, you perform Query operations directly on the table or index. To run the Query command, you must specify, at a minimum, a primary key. If you are querying an index, you must specify both TableName and IndexName.

The following is a query on a Music table in DynamoDB using the Python SDK:

python

import boto3
import json
import decimal

# Helper class to convert a DynamoDB item to JSON.
class DecimalEncoder(json.JSONEncoder):
 	def default(self, o):
 		if isinstance(o, decimal.Decimal):
 			if o % 1 > 0:
 				return float(o)
 			else:
 				return int(o)
 		return super(DecimalEncoder, self).default(o)

dynamodb = boto3.resource('dynamodb', region_name='us-east-1')
table = dynamodb.Table('Music')

print("A query with DynamoDB")

response = table.query(
 	KeyConditionExpression=Key('Artist').eq('Sam Samuel')
)
for i in response['Items']:
 	print(i['SongTitle'], "-", i['Genre'], i['Price'])

Scan

You can also perform Scan operations on a table or index. The Scan operation returns one or more items and item attributes by accessing every item in a table or a secondary index. To have DynamoDB return fewer items, you can provide a FilterExpression operation. If the total number of scanned items exceeds the maximum dataset size limit of 1 MB, the scan stops, and the results are returned to the user as a LastEvaluatedKey value to continue the scan in a subsequent operation. The results also include the number of items exceeding the limit. A scan can result in no table data meeting the filter criteria. A single Scan operation reads up to the maximum number of items set (if you’re using the Limit parameter) or a maximum of 1 MB of data and then applies any filtering to the results by using FilterExpression. If LastEvaluatedKey is present in the response, you must paginate the result set.

Scan operations proceed sequentially; however, for faster performance on a large table or secondary index, applications can request a parallel Scan operation by providing the Segment and TotalSegments parameters. Scan uses eventually consistent reads when accessing the data in a table; therefore, the result set might not include the changes to data in the table immediately before the operation began. If you need a consistent copy of the data, as of the time that the Scan begins, you can set the ConsistentRead parameter to true. The following is a scan on a Movies table with the Python SDK:

python

// Return all of the data in the index
import boto3
import json
import decimal

# Create the DynamoDB Resource
dynamodb = boto3.resource('dynamodb', region_name='us-east-1')

# Use the Music Table
table = dynamodb.Table('Music')

# Helper class to convert a DynamoDB decimal/item to JSON.
class DecimalEncoder(json.JSONEncoder):
 	def default(self, o):
 		if isinstance(o, decimal.Decimal):
 			if o % 1 > 0:
 				return float(o)
 			else:
 				return int(o)
 		return super(DecimalEncoder, self).default(o)

# Specify some filters for the scan
# Here we are stating that the Price must be between 12 - 30
fe = Key('Price').between(12, 30)
pe = "#g, Price"
# Expression Attribute Names for Projection Expression only.
ean = { "#g": "Genre", }

response_scan = table.scan(
	 FilterExpression=fe,
	 ProjectionExpression=pe,
	 ExpressionAttributeNames=ean
 )
# Print all the items
for i in response_scan['Items']:
 	print(json.dumps(i, cls=DecimalEncoder))

while 'LastEvaluatedKey' in response:
response = table.scan(
	ProjectionExpression=pe,
	FilterExpression=fe,
	ExpressionAttributeNames= ean,
	ExclusiveStartKey=response['LastEvaluatedKey']
)

for i in response['Items']:
	print(json.dumps(i)

As you can see from the Python code, the scan returns all records with a price of between 12 and 30 and the genre and the price. The astEvaluatedKey property is included to continue to loop through the entire table.

Global Tables

Global tables build upon the DynamoDB global footprint to provide you with a fully managed, multiregion, and multimaster database that provides fast, local, read-and-write performance for massively scaled, global applications. DynamoDB performs all the necessary tasks to create identical tables in these regions and propagate ongoing data changes to all of them. The figure shows an example of how global tables can work with a global application and globally dispersed users.

Global tables

Global Tables

A global table is a collection of one or more DynamoDB tables, all owned by a single AWS account, identified as replica tables. A replica table (or replica, for short) is a single DynamoDB table that functions as a part of a global table. Each replica stores the same set of data items. Any given global table can have only one replica table per region, and every replica has the same table name and the same primary key schema. Changes made in one replica are recorded in a stream and propagated to other replicas

Replication flow in global tables

Global Tables

If your application requires strongly consistent reads, then it must perform all of its strongly consistent reads and writes in the same region. DynamoDB does not support strongly consistent reads across AWS Regions.

Conflicts can arise if applications update the same item in different regions at about the same time (concurrent updates). To ensure eventual consistency, DynamoDB global tables use a “last writer wins” reconciliation between concurrent updates whereby DynamoDB makes a best effort to determine the last writer. With this conflict resolution mechanism, all replicas agree on the latest update and converge toward a state in which they all have identical data.

To create a DynamoDB global table, perform the following steps:

Create an ordinary DynamoDB table, with DynamoDB Streams enabled, in an AWS Region.
Repeat step 1 for every other AWS Region where you want to replicate your data.
Define a DynamoDB global table based on the tables that you have created.

The AWS Management Console automates these tasks so that you can create a global table quickly and easily.

Object Persistence Model

Persistence Model

Support for the object persistence model is available in the Java and .NET SDKs.

Amazon DynamoDB Local

DynamoDB Local is the downloadable version of DynamoDB that lets you write and test applications by using the Amazon DynamoDB API without accessing the DynamoDB web service. Instead, the database is self-contained on your computer. When you’re ready to deploy your application in production, you can make a few minor changes to the code so that it uses the DynamoDB web service. Having this local version helps you save on provisioned throughput, data storage, and data transfer fees. In addition, you don’t need an internet connection while you’re developing your application.

IAM and Fine-Grained Access Control

You can use AWS IAM to grant or restrict access to DynamoDB resources and API actions. For example, you could allow a user to execute the GetItem operation on a Books table. DynamoDB also supports fine-grained access control so that you can control access to individual data items and attributes. This means that perhaps you have a Users table and you want the specific user to have access only to his or her data. You can accomplish this with fine-grained access control. Use a condition inside an IAM policy with the dynamodb:LeadingKeys property.

By using LeadingKeys, you can limit the user so that they can access only the items where the partition key matches the userID. In the following example in the Users table,

you want to restrict who can view the profile information to only the user to which the data or profile information belongs:

json

{
	"Version": "2012-10-17",
	"Statement": [
	{
		"Sid": "LeadingKeysExample",
		"Effect": "Allow",
		"Action": [
		"dynamodb:GetItem",
		"dynamodb:BatchGetItem",
		"dynamodb:Query",
		"dynamodb:PutItem",
		"dynamodb:UpdateItem",
		"dynamodb:DeleteItem",
		"dynamodb:BatchWriteItem"
		],
		"Resource": [
		"arn:aws:dynamodb:us-east-1:accountnumber:table/UserProfiles"
		],
		"Condition": {
		"ForAllValues:StringEquals": {
		"dynamodb:LeadingKeys": [
		"${www.amazon.com:user_id}"
		],
		"dynamodb:Attributes": [
		"UserId",
		"FirstName",
		"LastName",
		"Email",
		"Birthday"
		]
	},
	"StringEqualsIfExists": {
	"dynamodb:Select": "SPECIFIC_ATTRIBUTES"
}
}
}
]
}

As you can see in the IAM policy, only the specific user is allowed to access a subset of the total attributes that are defined in the Attributes section of the policy. Furthermore, the SELECT statement specifies that the application must provide a list of specific attributes to act upon, preventing the application from requesting all attributes.

Backup and Restore

You can create on-demand backups and enable point-in-time recovery for your DynamoDB tables. On-demand backups create full backups of your tables or restore them on-demand at any time. These actions execute with zero impact on table performance or availability and without consuming any provisioned throughput on the table. Point-in-time recovery helps protect your DynamoDB tables from accidental write or delete operations. For example, suppose that a test script accidentally writes to a production DynamoDB table. With point-in-time recovery, you can restore that table to any point in time during the last 35 days. DynamoDB maintains incremental backups of your table. These operations will not affect performance or latency.

Encryption with Amazon DynamoDB

DynamoDB offers fully managed encryption at rest, and it is enabled by default. DynamoDB uses AWS KMS for encrypting the objects at rest. By default, DynamoDB uses the AWS-owned customer master key (CMK); however, you can also specify your own AWS KMS CMK key that you have created. For more information on AWS KMS, see Chapter 5, “Encryption on AWS.”

Amazon DynamoDB Best Practices

Now that you understand what DynamoDB is and how you can use it to create a scalable database for your application, review some best practices for using DynamoDB.

Distribute Workload Evenly

The primary key or partition key portion of a table’s primary key determines the logical partitions in which the table’s data is stored. These logical partitions also affect the underlying physical partitions. As a result, you want to distribute your workload across the partitions as evenly as possible, reducing the number of “hot” partition issues that may arise.

The table compares the more common partition key schemas and whether they are good for DynamoDB.

Amazon DynamoDB Partition Key Recommended Strategies

Partition Key Value	Uniformity
User ID, where the application has many users	Good
Status code, where there are only a few possible status codes	Bad
Item creation date, rounded to the nearest time period (for example, day, hour, or minute)	Bad
Device ID, where each device accesses data at relatively similar intervals	Good
Device ID, where even if there are many devices being tracked, one is by far more popular than all the others	Bad

Comparison of Query and Scan Operations

The Query operation finds items in a table based on primary key values. You must provide the name of the partition key attribute and the value of that attribute. You can provide a sort key attribute name and value to refine the search results (for example, all of the forums with this ID in the last seven days). By default, Query returns all of the data attributes for those items with specified primary keys. The results are sorted by the sort key in ascending order, which can be reversed. Additionally, queries are set to be Eventually Consistent, with an option to change to Strongly Consistent, if necessary. The Scan operation returns all of the item attributes by accessing every item in the table. It is for this reason that Query is more efficient than the Scan operation.

23

Nonrelational Databases ​

NoSQL Database ​

When to Use a NoSQL Database ​

Comparison of SQL and NoSQL Databases ​

SQL vs. NoSQL Database Characteristics ​

SQL versus NoSQL format comparison ​

NoSQL Database Types ​

Amazon DynamoDB ​

Core Components of Amazon DynamoDB ​

Amazon DynamoDB tables and partitions ​

Tables ​

Items ​

Attributes ​

Amazon DynamoDB table with items and attributes ​

Primary Key ​

Amazon DynamoDB primary keys ​

Secondary Indexes ​

Local Secondary Index ​

Local secondary index ​

Global Secondary Index ​

Global secondary index ​

Amazon DynamoDB table and secondary index ​

Comparison of Local Secondary Indexes and Global Secondary Indexes ​

Comparison of Local and Global Secondary Indexes ​

Amazon DynamoDB Streams ​

Example of Amazon DynamoDB Streams and AWS Lambda ​

Read Consistency ​

Eventually Consistent Reads ​

Strongly Consistent Reads ​

Comparison of Consistent Reads ​

Read and Write Throughput ​

Partitions and Data Distribution ​

Burst Capacity ​

Adaptive Capacity ​

Retrieving Data from DynamoDB ​

Query ​

Scan ​

Global Tables ​

Global tables ​

Replication flow in global tables ​

Object Persistence Model ​

Amazon DynamoDB Local ​

IAM and Fine-Grained Access Control ​

Backup and Restore ​

Encryption with Amazon DynamoDB ​

Amazon DynamoDB Best Practices ​

Distribute Workload Evenly ​

Amazon DynamoDB Partition Key Recommended Strategies ​

Comparison of Query and Scan Operations ​

Nonrelational Databases

NoSQL Database

When to Use a NoSQL Database

Comparison of SQL and NoSQL Databases

SQL vs. NoSQL Database Characteristics

SQL versus NoSQL format comparison

NoSQL Database Types

Amazon DynamoDB

Core Components of Amazon DynamoDB

Amazon DynamoDB tables and partitions

Tables

Items

Attributes

Amazon DynamoDB table with items and attributes

Primary Key

Amazon DynamoDB primary keys

Secondary Indexes

Local Secondary Index

Local secondary index

Global Secondary Index

Global secondary index

Amazon DynamoDB table and secondary index

Comparison of Local Secondary Indexes and Global Secondary Indexes

Comparison of Local and Global Secondary Indexes

Amazon DynamoDB Streams

Example of Amazon DynamoDB Streams and AWS Lambda

Read Consistency

Eventually Consistent Reads

Strongly Consistent Reads

Comparison of Consistent Reads

Read and Write Throughput

Partitions and Data Distribution

Burst Capacity

Adaptive Capacity

Retrieving Data from DynamoDB

Query

Scan

Global Tables

Global tables

Replication flow in global tables

Object Persistence Model

Amazon DynamoDB Local

IAM and Fine-Grained Access Control

Backup and Restore

Encryption with Amazon DynamoDB

Amazon DynamoDB Best Practices

Distribute Workload Evenly

Amazon DynamoDB Partition Key Recommended Strategies

Comparison of Query and Scan Operations