Skip to content
English
On this page

AWS Object Storage Services

Now we are going to dive into object storage. An object is a piece of data like a document, image, or video that is stored with some metadata in a fl at structure. Object storage provides that data to applications via application programming interfaces (APIs) over the internet.

Amazon Simple Storage Service

Building a web application, which delivers content to users by retrieving data via making API calls over the internet, is not a diffi cult task with Amazon S3. Amazon Simple Storage Service (Amazon S3) is storage for the internet. It is a simple storage service that offers software developers a highly scalable, reliable, and low-latency data storage infrastructure at low cost. AWS has seen enormous growth with Amazon S3, and AWS currently has customers who store terabytes and exabytes of data.

Amazon S3 is featured in many AWS certifications because it is a core enabling service for many applications and use cases.

To begin developing with Amazon S3, it is important to understand a few basic concepts.

Buckets

A bucket is a container for objects stored in Amazon S3. Every object is contained in a bucket. You can think of a bucket in traditional terminology similar to a drive or volume.

Limitations

The following are limitations of which you should be aware when using Amazon S3 buckets:

  • Do not use buckets as folders, because there is a maximum limit of 100 buckets per account.
  • You cannot create a bucket within another bucket.
  • A bucket is owned by the AWS account that created it, and bucket ownership is not transferable.
  • A bucket must be empty before you can delete it.
  • After a bucket is deleted, that name becomes available to reuse, but the name might not be available for you to reuse for various reasons, such as someone else taking the name after you release it when deleting the bucket. If you expect to use same bucket name, do not delete the bucket.

You can only create up to 100 buckets per account. Do not use buckets as folders or design your application in a way that could result in more than 100 buckets as your application or data grows.

Universal Namespace

A bucket name must be unique across all existing bucket names in Amazon S3 across all of AWS—not just within your account or within your chosen AWS Region. You must comply with Domain Name System (DNS) naming conventions when choosing a bucket name. The rules for DNS-compliant bucket names are as follows:

  • Bucket names must be at least 3 and no more than 63 characters long.
  • A bucket name must consist of a series of one or more labels, with adjacent labels separated by a single period ( .).
  • A bucket name must contain lowercase letters, numbers, and hyphens.
  • Each label must start and end with a lowercase letter or number.
  • Bucket names must not be formatted like IP addresses (for example, 192.168.5.4).
  • AWS recommends that you do not use periods ( . ) in bucket names. When using virtual hosted-style buckets with Secure Sockets Layer (SSL), the SSL wildcard certificate only matches buckets that do not contain periods. To work around this, use HTTP or write your own certificate verification logic.

Amazon S3 bucket names must be universally unique.

Versioning

Versioning is a means of keeping multiple variants of an object in the same bucket. You can use versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket, including recovering deleted objects. With versioning, you can easily recover from both unintended user actions and application failures. There are several reasons that developers will turn on versioning of files in Amazon S3, including the following:

  • Protecting from accidental deletion
  • Recovering an earlier version
  • Retrieving deleted objects

Versioning is turned off by default. When you turn on versioning, Amazon S3 will create new versions of your object every time you overwrite a particular object key. Every time you update an object with the same key, Amazon S3 will maintain a new version of it.

versioning

As those additional writes apply to a bucket, you can retrieve any of the particular objects that you need using GET on the object key name and the particular version. Amazon S3 versioning tracks the changes over time . Amazon S3 versioning also protects against unintended deletes. If you issue a delete command against an object in a versioned bucket, AWS places a delete marker on top of that object, which means that if you perform a GET on it, you will receive an error as if the object does not exist. However, an administrator, or anyone else with the necessary permissions, could remove the delete marker and access the data. When a delete request is issued against a versioned bucket on a particular object, Amazon S3 still retains the data, but it removes access for users to retrieve that data. Versioning-enabled buckets let you recover objects from accidental deletion or overwrite. Your bucket’s versioning confi guration can also be MFA Delete–enabled for an additional layer of security. MFA Delete is discussed later in this chapter. If you overwrite an object, it results in a new object version in the bucket. You can always restore from any previous versions. In one bucket, for example, you can have two objects with the same key, but different version IDs, such as photo.gif (version 111111) and photo.gif (version 121212).

versioning

Later in this chapter, we will cover lifecycle policies . You can use versioning in combination with lifecycle policies to implement them if the object is the current or previous version. If you are concerned about building up many versions and using space for a particular object, confi gure a lifecycle policy that will delete the old version of the object after a certain period of time.

It is easy to set up a lifecycle policy to control the amount of data that’s being retained when you use versioning on a bucket. If you need to discontinue versioning on a bucket, copy all of your objects to a new bucket that has versioning disabled and use that bucket going forward.

It is easy to set up a lifecycle policy to control the amount of data that’s being retained when you use versioning on a bucket. If you need to discontinue versioning on a bucket, copy all of your objects to a new bucket that has versioning disabled and use that bucket going forward.

It is important to be aware of the cost implications of the bucket that is versioning-enabled. When calculating cost for your bucket, you must calculate as though every version is a completely separate object that takes up the same space as the object itself. As you can probably guess, this option might be cost prohibitive for things like large media fi les or performing many updates on objects.

Region

Amazon S3 creates buckets in a region that you specify. You can choose any AWS Region that is geographically close to you to optimize latency, minimize costs, or address regulatory requirements.

Objects belonging to a bucket that you create in a specific AWS Region never leave that region unless you explicitly transfer them to another region.

Operations on Buckets

There are a number of different operations (API calls) that you can perform on Amazon S3 buckets. We will summarize a few of the most basic operations in this section. For more comprehensive information on all of the different operations that you can perform, refer to the Amazon S3 API Reference document available in the AWS Documentation repository. In this section, we show you how to create a bucket, list buckets, and delete a bucket.

Objects

You can store an unlimited number of objects within Amazon S3, but an object can only be between 1 byte to 5 TB in size. If you have objects larger than 5 TB, use a file splitter and upload the file in chunks to Amazon S3. Then reassemble them if you download the file parts for later use. The largest object that can be uploaded in a single PUT is 5 GB. For objects larger than 100 MB, you should consider using multipart upload (discussed later in this chapter). For any objects larger than 5 GB, you must use multipart upload.

Object Facets

An object consists of the following facets:

Key The key is the name that you assign to an object, which may include a simulated folder structure. Each key must be unique within a bucket (unless the bucket has versioning turned on). Amazon S3 URLs can be thought of as a basic data map between “bucket + key + version” and the web service endpoint. For example, in the URL http://doc.s3.amazonaws.com/2006-03-01/AmazonS3.wsdl, doc is the name of the bucket and 2006-03-01/AmazonS3.wsdl is the key.

Version ID Within a bucket, a key and version ID uniquely identify an object. If versioning is turned off, you have only a single version. If versioning is turned on, you may have multiple versions of a stored object.

Value The value is the actual content that you are storing. An object value can be any sequence of bytes, and objects can range in size from 1 byte up to 5 TB.

Metadata Metadata is a set of name-value pairs with which you can store information regarding the object. You can assign metadata, referred to as user-defined metadata, to your objects in Amazon S3. Amazon S3 also assigns system metadata to these objects, which it uses for managing objects.

Subresources Amazon S3 uses the subresource mechanism to store additional object-specific information. Because subresources are subordinates to objects, they are always associated with some other entity such as an object or a bucket. The subresources associated with Amazon S3 objects can include the following:

  • Access control list (ACL) A list of grants identifying the grantees and the permissions granted.
  • Torrent Returns the torrent file associated with the specific object.
  • Access Control Information You can control access to the objects you store in Amazon S3. Amazon S3 supports both resource-based access control , such as an ACL and bucket policies , and user-based access control.

Object Tagging

Object tagging enables you to categorize storage. Each tag is a key-value pair. Consider the following tagging examples. Suppose an object contains protected health information (PHI) data. You can tag the object using the following key-value pair:

PHI=True
or
Classification=PHI

While it is acceptable to use tags to label objects containing confidential data (such as personally identifiable information (PII) or PHI), the tags themselves should not contain any confidential information.

Suppose that you store project fi les in your Amazon S3 bucket. You can tag these objects with a key called Project and a value, as shown here: Project=Blue

You can add multiple tags to a single object, such as the following:

Project=SalesForecast2018
Classification=confidential

You can tag new objects when you upload them, or you can add them to existing objects. Note the following limitations when working with tagging:

  • You can associate 10 tags with an object, and each tag associated with an object must have unique tag keys.
  • A tag key can be up to 128 Unicode characters in length, and tag values can be up to 256 Unicode characters in length.

Keys and values are case sensitive. Developers commonly categorize their fi les in a folder-like structure in the key name (remember, Amazon S3 has a fl at fi le structure), such as the following:

photos/photo1.jpg
project/projectx/document.pdf
project/projecty/document2.pdf

This allows you to have only one-dimensional categorization, meaning that everything under a prefi x is one category. With tagging, you now have another dimension. If your photo1 is in project x category, tag the object accordingly. In addition to data classifi cation, tagging offers the following benefi ts:

  • Object tags enable fine-grained access control of permissions. For example, you could grant an AWS Identity and Access Management (IAM) user permissions to read-only objects with specific tags.
  • Object tags enable fine-grained object lifecycle management in which you can specify a tag-based filter, in addition to key name prefix, in a lifecycle rule.
  • When using Amazon S3 analytics, you can configure filters to group objects together for analysis by object tags, by key name prefix, or by both prefix and tags.
  • You can also customize Amazon CloudWatch metrics to display information by specific tag filters. The following sections provide details.

Cross-Origin Resource Sharing

Cross-Origin Resource Sharing (CORS) defi nes a way for client web applications that are loaded in one domain to interact with resources in a different domain. With CORS support in Amazon S3, you can build client-side web applications with Amazon S3 and selectively allow cross-origin access to your Amazon S3 resources while avoiding the need to use a proxy.

Cross-origin request Scenario

Suppose that you are hosting a website in an Amazon S3 bucket named website on Amazon S3. Your users load the website endpoint: http://website.s3-website-useast-1.amazonaws.com.

Your website will use JavaScript on the web pages that are stored in this bucket to be able to make authenticated GET and PUT requests against the same bucket by using the Amazon S3 API endpoint for the bucket: website.s3.amazonaws.com.

A browser would normally block JavaScript from allowing those requests, but with CORS, you can confi gure your bucket to enable cross-origin requests explicitly from website.s3-website-us-east-1.amazonaws.com.

Suppose that you host a web font from your Amazon S3 bucket. Browsers require a CORS check (also referred as a preflight check ) for loading web fonts, so you would configure the bucket that is hosting the web font to allow any origin to make these requests.

There are no coding exercises as part of the exam, but these case studies can help you visualize how to use Amazon S3 and CORS.

Operations on Objects

There are a number of different operations (API calls) that you can perform on Amazon S3 buckets. We will summarize a few of the most basic operations in this section. For more comprehensive information on all of the different operations that you can perform, refer to the Amazon S3 API Reference document available in the AWS Documentation repository.

Write an object

python
import boto3
 # Create an S3 client
 s3 = boto3.client('s3')
 filename = 'file.txt'
 bucket_name = 'my-bucket'
 # Uploads the given file using a managed uploader, which will split up large
 # files automatically and upload parts in parallel.
 s3.upload_file(filename, bucket_name, filename)

Reading Objects

java
AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());
 S3Object object = s3Client.getObject(
 					new GetObjectRequest(bucketName, key));
 InputStream objectData = object.getObjectContent();
 // Process the objectData stream.
 objectData.close();

Deleteing Objects

You can delete one or more objects directly from Amazon S3. You have the following options when deleting an object:

Delete a Single Object Amazon S3 provides the DELETE API to delete one object in a single HTTP request.

Delete Multiple Objects Amazon S3 also provides the Multi-Object Delete API to delete up to 1,000 objects in a single HTTP request.

Storage Classes

There are several different storage classes from which to choose when using Amazon S3. Your choice will depend on your level of need for durability, availability, and performance for your application.

Amazon S3 Standard

Amazon S3 Standard offers high-durability, high-availability, and performanceobject storage for frequently accessed data. Because it delivers low latency and high throughput, Amazon S3 Standard is ideal for a wide variety of use cases, including the following:

  • Cloud applications
  • Dynamic websites
  • Content distribution
  • Mobile and gaming applications
  • Big data analytics

Amazon S3 Standard is designed to achieve durability of 99.999999999 percent of objects (designed to sustain the loss of data in two facilities) and availability of 99.99 percent over a given year (which is backed by the Amazon S3 Service Level Agreement). Essentially, the data in Amazon S3 is spread out over multiple facilities within a region. You can lose access to two facilities and still have access to your files.

Reduced Redundancy Storage

Reduced Redundancy Storage (RRS) (or Reduced_Redundancy) is an Amazon S3 storage option that enables customers to store noncritical, reproducible data at lower levels of redundancy than Amazon S3 Standard storage. It provides a highly available solution for distributing or sharing content that is durably stored elsewhere or for objects that can easily be regenerated, such as thumbnails or transcoded media. The RRS option stores objects on multiple devices across multiple facilities, providing 400 times the durability of a typical disk drive, but it does not replicate objects as many times as Amazon S3 Standard storage. RRS is designed to achieve availability of 99.99 percent (same as Amazon S3 Standard) and durability of 99.99 percent (designed to sustain the loss of data in a single facility).

Amazon S3 Standard-Infrequent Access

Amazon S3 Standard-Infrequent Access (Standard_IA) is an Amazon S3 storage class for data that is accessed less frequently but requires rapid access when needed. It offers the same high durability, throughput, and low latency of Amazon S3 Standard, but it has a lower per-gigabyte storage price and per-gigabyte retrieval fee. The ideal use cases for using Standard_IA include the following:

  • Long-term storage
  • Backups
  • Data stores for disaster recovery

Standard_IA is set at the object level and can exist in the same bucket as Amazon S3 Standard, allowing you to use lifecycle policies to transition objects automatically between storage classes without any application changes. Standard_IA is designed to achieve availability of 99.9 percent (but low retrieval time) and durability of 99.999999999 percent of objects over a given year (same as Amazon S3 Standard).

Amazon S3 One Zone-Infrequent Access

Amazon S3 One Zone-Infrequent Access (OneZone_IA) is similar to Amazon S3 Standard-IA. The difference is that the data is stored only in a single Availability Zone instead of a minimum of three Availability Zones. Because of this, storing data in OneZone_IA costs 20 percent less than storing it in Standard_IA. Because of this approach, however, any data stored in this storage class will be permanently lost in the event of an Availability Zone destruction.

Amazon Simple Storage Service Glacier

Amazon Simple Storage Service Glacier (Amazon S3 Glacier) is a secure, durable, and extremely low-cost storage service for data archiving that offers the same high durability as Amazon S3. Unlike Amazon S3 Standard’s immediate retrieval times, Amazon S3 Glacier’s retrieval times run from a few minutes to several hours. To keep costs low, Amazon S3 Glacier provides three archive access speeds, ranging from minutes to hours. This allows you to choose an option that will meet your recovery time objective (RTO) for backups in your disaster recovery plan. Amazon S3 Glacier can also be used to secure archives that need to be kept due to a compliance policy. For example, you may need to keep certai n records for seven years before deletion and only need access during an audit. Amazon S3 Glacier allows redundancy in your files when audits do occur, but at an extremely low cost in exchange for slower access.

Vaults

Amazon S3 Glacier uses vaults as containers to store archives. You can view a list of your vaults in the AWS Management Console and use the AWS software development kits (SDKs) to perform a variety of vault operations, such as the following:

  • Create vault
  • Delete vault
  • Lock vault
  • List vault metadata
  • Retrieve vault inventory
  • Tag vaults for filtering
  • Configure vault notifications

You can also set access policies for each vault to grant or deny specific activities to users. You can have up to 1,000 vaults per AWS account. Amazon S3 Glacier provides a management console to create and delete vaults. All other interactions with Amazon S3 Glacier, however, require that you use the AWS CLI or write code.

Vault Lock

Amazon S3 Glacier Vault Lock allows you to deploy and enforce compliance controls easily on individual Amazon S3 Glacier vaults via a lockable policy. You can specify controls such as Write Once Read Many (WORM) in a Vault Lock policy and lock the policy from future edits. Once locked, the policy becomes immutable, and Amazon S3 Glacier will enforce the prescribed controls to help achieve your compliance objectives.

Archives

An archive is any object, such as a photo, video, or document that you store in a vault. It is a base unit of storage in Amazon S3 Glacier. Each archive has a unique ID and optional description. When you upload an archive, Amazon S3 Glacier returns a response that includes an archive ID. This archive ID is unique in the region in which the archive is stored. You can retrieve an archive using its ID, but not its description.

Amazon S3 Glacier provides a management console to create and delete vaults. However, all other interactions with Amazon S3 Glacier require that you use the AWS CLI or write code.

To upload archives into your vaults, you must either use the AWS CLI or write code to make requests, using either the REST API directly or the AWS SDKs.

Maintaining Client-Side Archive Metadata

Except for the optional archive description, Amazon S3 Glacier does not support any additional metadata for the archives. When you upload an archive, Amazon S3 Glacier assigns an ID—an opaque sequence of characters—from which you cannot infer any meaning about the archive. Metadata about the archives can be maintained on the client side. The metadata can include identifying archive information such as the archive name.

If you use Amazon S3, when you upload an object to a bucket, you can assign the object an object key such as MyDocument.txt or SomePhoto.jpg . In Amazon S3 Glacier, you cannot assign a key name to the archives you upload.

If you maintain client-side archive metadata, note that Amazon S3 Glacier maintains a vault inventory that includes archive IDs and any descriptions that you provided during the archive upload. We recommend that you occasionally download the vault inventory to reconcile any issues in the client-side database that you maintain for the archive metadata. Amazon S3 Glacier takes vault inventory approximately once a day. When you request a vault inventory, Amazon S3 Glacier returns the last inventory it prepared, which is a pointin-time snapshot.

Using the AWS SDKS with Amazon S3 Glacier

AWS provides SDKs for you to develop applications for Amazon S3 Glacier in various programming languages. The AWS SDKs for Java and .NET offer both high-level and low-level wrapper libraries. The SDK libraries wrap the underlying Amazon S3 Glacier API, simplifying your programming tasks. The low-level wrapper libraries map closely to the underlying REST API supported by Amazon S3 Glacier. To simplify application development further, these SDKs also offer a higher-level abstraction for some of the operations in the high-level API. For example, when uploading an archive using the low-level API, if you need to provide a checksum of the payload, the high-level API computes the checksum for you.

Encryption

All data in Amazon S3 Glacier will be encrypted on the server side using key management and key protection, which Amazon S3 Glacier handles using AES-256 encryption. If you want, you can manage your own keys and encrypt the data prior to uploading.

restoring Objects From Amazon S3 Glacier

Objects in the Amazon S3 Glacier storage class are not immediately accessible and cannot be retrieved via copy/paste once they have been moved to Amazon S3 Glacier. Remember that Amazon S3 Glacier charges a retrieval fee for retrieving objects. When you restore an archive, you pay for both the archive and the restored copy. Because there is a storage cost for the copy, restore objects only for the duration that you need them. If you need a permanent copy of the object, create a copy of it in your Amazon S3 bucket.

Archive Retrievel Options

There are several different options for restoring archived objects from Amazon S3 Glacier to Amazon S3.

Retrieval OptionRetrieval TimeNote
Expedited retrieval1–5 minutes
On-DemandProcessed immediately the vast majority of the time. During high demand, may fail to process, and you will be required to repeat the request.
ProvisionedGuaranteed to process immediately. After purchasing provisioned capacity, all of your retrievals are processed in this manner.
Standard retrieval3–5 hours
Bulk retrieval5–12 hoursLowest-cost option

Do not use Amazon S3 Glacier for backups if your RTO is shorter than the lowest Amazon S3 Glacier retrieval time for your chosen retrieval option. For example, if your RTO requires data retrieval of two hours in a disaster recovery scenario, then Amazon S3 Glacier standard retrieval will not meet your RTO.

Storage Class Comparison

The table shows a comparison of the Amazon S3 storage classes. This is an important table for the certification exam. Many storage decision questions on the exam center on the level of durability, availability, and cost. The table’s comparisons can help you make the right choice for a question, in addition to understanding trade-offs when choosing a data store for an application.

StandardStandard_IAOneZone_IAAmazon S3 Glacier
Designed for durability99.999999999%*
Designed for availability99.99%99.99%99.99%N/A
Availability SLAs99.9%99%N/A
Availability zones≥31≥3
Minimum capacity charge per objectN/A128 KB*N/A
Minimum storage duration chargeN/A30 days90 days
Retrieval feeN/APer GB retrieved*
First byte latencymillisecondsPer GB retrieved*Minutes or hours*
Storage type Object
Lifecycle transitionsYes
  • Because One Zone_IA stores data in a single Availability Zone, data stored in this storage class will be lost in the event of Availability Zone destruction. Standard_IA has a minimum object size of 128 KB. Smaller objects will be charged for 128 KB of storage. Amazon S3 Glacier allows you to select from multiple retrieval tiers based upon your needs.

Data Consistency Model

When deciding whether to choose Amazon S3 or Amazon EBS for your application, one important aspect to consider is the consistency model of the storage option. Amazon EBS provides read-after-write consistency for all operations, whereas Amazon S3 provides readafter-write consistency only for PUT s of new objects. Amazon S3 offers eventual consistency for overwrite PUT s and DELETE s in all regions, and updates to a single key are atomic. For example, if you PUT an object to update an existing object and immediately attempt to read that object, you may read either the old data or the new data. For PUT operations with new objects not yet in Amazon S3, you will experience readafter-write consistency. For PUT updates when you are overwriting an existing fi le or DELETE operations, you will experience eventual consistency.

Amazon S3 does not currently support object locking. If two PUT requests are simultaneously made to the same key, the request with the latest time stamp wins. If this is an issue, you will be required to build an object locking mechanism into your application.

You may be wondering why Amazon S3 was designed with this style of consistency. The consistency, availability, and partition tolerance theorem (CAP theorem) states that you can highly achieve only two out of the three dimensions for a particular storage design.

CAP theorem

CAP

Think of partition tolerance in this equation as the storage durability. Amazon S3 was designed for high availability and high durability (multiple copies across multiple facilities), so the design trade-off is the consistency. When you PUT an object, you are not only putting the object into one location but into three, meaning that there is either a slightly increased latency on the read-after-write consistency of a PUT or eventual consistency on the PUT update or DELETE operations while Amazon S3 reconciles all copies. You do not know, for instance, which facility a file is coming from when you GET an object. If you had recently written an object, it may have propagated to only two facilities, so when you try to read the object right after your PUT, you may receive the old object or the new object.

Concurrent Applications

As a developer, it is critical to consider the way your application works and the consistency needs of your files. If your application requires read-after-write consistency on all operations, then Amazon S3 is not going to be the right choice for that application. If you are working with concurrent applications, it is important to know how your application performs PUT, GET, and DELETE operations concurrently to know whether eventual consistency will not be the right choice for your application. In Figure, Amazon S3, both W1 (write 1) and W2 (write 2) complete before the start of R1 (read 1) and R2 (read 2). For a consistent read, R1 and R2 both return color = ruby. For an eventually consistent read, R1 and R2 might return color = red, color = ruby, or no results, depending on the amount of time that has elapsed.

Consistency example 1

Consistency

In the next figure, W2 does not complete before the start of R1. Therefore, R1 might return color = ruby or color = garnet for either a consistent read or an eventually consistent read. Depending on the amount of time that has elapsed, an eventually consistent read might also return no results.

Consistency example 2

Consistency

For a consistent read, R2 returns color = garnet. For an eventually consistent read, R2 might return color = ruby, color = garnet, or no results depending on the amount of time that has elapsed. In the next figure, client 2 performs W2 before Amazon S3 returns a success for W1, so the outcome of the final value is unknown (color = garnet or color = brick). Any subsequent reads (consistent read or eventually consistent) might return either value. Depending on the amount of time that has elapsed, an eventually consistent read might also return no results.

Consistency example 3

Consistency

If you need a strongly consistent data store, choose a different data store than Amazon S3 or code consistency checks into your application.

Presigned URLs

A presigned URL is a way to grant access to an object. One way that developers use presigned URLs is to allow users to upload or download objects without granting them direct access to Amazon S3 or the account. For example, if you need to send a document hosted in an Amazon S3 bucket to an external reviewer who is outside of your organization, you do not want to grant them access using IAM to your bucket or objects. Instead, generate a presigned URL to the object and send that to the user to download your fi le. Another example is if you need someone external to your organization to upload a file. Maybe a media company is designing the graphics for the website you are developing. You can create a presigned URL for them to upload their artifacts directly to Amazon S3 without granting them access to your Amazon S3 bucket or account. Anyone with valid security credentials can create a presigned URL. For you to upload an object successfully, however, the presigned URL must be created by someone who has permission to perform the operation upon which the presigned URL is based. The following Java code example demonstrates generating a presigned URL:

java
AmazonS3 s3Client = new AmazonS3Client( new ProfileCredentialsProvider());
java.util.Date expiration = new java.util.Date();
long msec = expiration.getTime();
msec += 1000 * 60 * 60 ; // Add 1 hour.
expiration.setTime(msec);
GeneratePresignedUrlRequest generatePresignedUrlRequest = new
GeneratePresignedUrlRequest(bucketName, objectKey);
generatePresignedUrlRequest.setMethod(HttpMethod.PUT);
generatePresignedUrlRequest.setExpiration(expiration);
URL s = s3client.generatePresignedUrl(generatePresignedUrlRequest);
// Use the pre-signed URL to upload an object.

Amazon S3 presigned URLs cannot be generated within the AWS Management Console, but they can be generated using the AWS CLI or AWS SDKs.

Encryption

Data protection refers to protecting data while in transit (as it travels to and from Amazon S3) and at rest (while it is stored on Amazon S3 infrastructure). As a best practice, all sensitive data stored in Amazon S3 should be encrypted, both at rest and in transit.

You can protect data in transit by using Amazon S3 SSL API endpoints, which ensures that all data sent to and from Amazon S3 is encrypted using the HTTPS protocol while in transit.

For data at rest in Amazon S3, you can encrypt it using different options of Server-Side Encryption (SSE). Your objects in Amazon S3 are encrypted at the object level as they are written to disk in the data centers and then decrypted for you when you access the objects using AES-256.

You can also use client-side encryption, with which you encrypt the objects before uploading to Amazon S3 and then decrypt them after you have downloaded them. Some customers, for some workloads, will use a combination of both server-side and client-side encryption for extra protection.

Envelope Encryption Concepts

Before examining the different types of encryption available, we will review envelope encryption, which several AWS services use to provide a balance between performance and security.

The following steps describe how envelope encryption works:

Generating a data key

  1. A data key is generated by the AWS service at the time you request your data to be encrypted.

data key

Encrypting the data

  1. The data key generated in step 1 is used to encrypt your data. data key

Encrypted data key

  1. The data key is then encrypted with a key-encrypting key unique to the service storing your data.

encryting data key

Encrypted data and key storage

  1. The encrypted data key and the encrypted data are then stored by the AWS storage service (such as Amazon S3 or Amazon EBS) on your behalf.

encryting data key

When you need access to your plain-text data, this process is reversed. The encrypted data key is decrypted using the key-encrypting key, and the data key is then used to decrypt your data.

The important point to remember regarding envelope encryption is that the key-encrypting keys used to encrypt data keys are stored and managed separately from the data and the data keys.

Server-Side Encryption (SSE)

You have three, mutually exclusive options for how you choose to manage your encryption keys when using SSE with Amazon S3.

SSE-S3 (Amazon S3 managed keys) You can set an API fl ag or check a box in the AWS Management Console to have data encrypted before it is written to disk in Amazon S3. Each object is encrypted with a unique data key. As an additional safeguard, this key is encrypted with a periodically-rotated master key managed by Amazon S3. AES-256 is used for both object and master keys. This feature is offered at no additional cost beyond what you pay for using Amazon S3.

SSE-C (Customer-provided keys) You can use your own encryption key while uploading an object to Amazon S3. This encryption key is used by Amazon S3 to encrypt your data using AES-256. After the object is encrypted, the encryption key you supplied is deleted from the Amazon S3 system that used it to encrypt your data. When you retrieve this object from Amazon S3, you must provide the same encryption key in your request. Amazon S3 verifi es that the encryption key matches, decrypts the object, and returns the object to you. This feature is also offered at no additional cost beyond what you pay for using Amazon S3.

SSE-KMS (AWS KMS managed encryption keys) You can encrypt your data in Amazon S3 by defi ning an AWS KMS master key within your account to encrypt the unique object key (referred to as a data key ) that will ultimately encrypt your object. When you upload your object, a request is sent to AWS KMS to create an object key. AWS KMS generates this object key and encrypts it using the master key that you specifi ed earlier. Then, AWS KMS returns this encrypted object key along with the plaintext object key to Amazon S3. The Amazon S3 web server encrypts your object using the plaintext object key and stores the now encrypted object (with the encrypted object key) and deletes the plaintext object key from memory.

To retrieve this encrypted object, Amazon S3 sends the encrypted object key to AWS KMS, which then decrypts the object key using the correct master key and returns the decrypted (plaintext) object key to Amazon S3. With the plaintext object key, Amazon S3 decrypts the encrypted object and returns it to you. Unlike SSE-S3 and SSE-C, using SSE-KMS does incur an additional charge. Refer to the AWS KMS pricing page on the AWS website for more information.

For maximum simplicity and ease of use, use SSE with AWS managed keys (SSE-S3 or SSE-KMS). Also, know the difference between SSE-S3, SSE-KMS, and SSE-C for SSE.

Client-Side Encryption

Client-side encryption refers to encrypting your data before sending it to Amazon S3. You have two options for using data encryption keys.

Client-Side Master Key

The fi rst option is to use a client-side master key of your own. When uploading an object, you provide a client-side master key to the Amazon S3 encryption client (for example, AmazonS3EncryptionClient when using the AWS SDK for Java). The client uses this master key only to encrypt the data encryption key that it generates randomly. When downloading an object, the client fi rst downloads the encrypted object from Amazon S3 along with the metadata. Using the material description in the metadata, the client fi rst determines which master key to use to decrypt the encrypted data key. Then the client uses that master key to decrypt the data key and uses it to decrypt the object. The client-side master key that you provide can be either a symmetric key or a public/private key pair.

The process works as follows:

  1. The Amazon S3 encryption client locally generates a one-time-use symmetric key (also known as a data encryption key or data key ) and uses this data key to encrypt the data of a single Amazon S3 object (for each object, the client generates a separate data key).
  2. The client encrypts the data encryption key using the master key that you provide.
  3. The client uploads the encrypted data key and its material description as part of the object metadata. The material description helps the client later determine which client-side master key to use for decryption (when you download the object, the client decrypts it).
  4. The client then uploads the encrypted data to Amazon S3 and also saves the encrypted data key as object metadata ( x-amz-meta-x-amz-key) in Amazon S3 by default.

AWS KMS-Managed Customer Master Key (CMK)

The second option is to use an AWS KMS managed customer master key (CMK). This process is similar to the process described earlier for using KMS-SSE, except that it is used for data at rest instead of data in transit. There is an Amazon S3 encryption client in the AWS SDK for Java.

Access Control

By default, all Amazon S3 resources—buckets, objects, and related sub-resources (for example, lifecycle confi guration and website confi guration)—are private. Only the resource owner, an account that created it, can access the resource. The resource owner can optionally grant access permissions to others by writing an access policy. Amazon S3 offers access policy options broadly categorized as resource-based policies and user policies. Access policies that you attach to your resources (buckets and objects) are referred to as resource-based policies . For example, bucket policies and ACLs are resourcebased policies. You can also attach access policies to users in your account. These are called user policies . You can choose to use resource-based policies, user policies, or some combination of both to manage permissions to your Amazon S3 resources. The following sections provide general guidelines for managing permissions.

Using Bucket Policies and User Policies

Bucket policy and user policy are two of the access policy options available for you to grant permissions to your Amazon S3 resources. Both use a JSON-based access policy language, as do all AWS services that use policies. A bucket policy is attached only to Amazon S3 buckets, and it specifi es what actions are allowed or denied for whichever principals on the bucket to which the bucket policy is attached (for instance, allow user Alice to PUT but not DELETE objects in the bucket). A user policy is attached to IAM users to perform or not perform actions on your AWS resources. For example, you may choose to grant an IAM user in your account access to one of your buckets and allow the user to add, update, and delete objects. You can grant them access with a user policy. Now we will discuss the differences between IAM policies and Amazon S3 bucket policies. Both are used for access control, and they are both written in JSON using the AWS access policy language. However, unlike Amazon S3 bucket policies, IAM policies specify what actions are allowed or denied on what AWS resources (such as, allow ec2:TerminateInstance on the Amazon EC2 instance with instance_id=i8b3620ec). You attach IAM policies to IAM users, groups, or roles, which are then subject to the permissions that you have defi ned. Instead of attaching policies to the users, groups, or roles, bucket policies are attached to a specifi c resource, such as an Amazon S3 bucket.

Managing Access with Access Control Lists

Access with access control lists (ACLs) are resource-based access policies that you can use to manage access to your buckets and objects, including granting basic read/write permissions to other accounts. There are limits to managing permissions using ACLs. For example, you can grant permissions only to other accounts; you cannot grant permissions to users in your account. You cannot grant conditional permissions, nor can you explicitly deny permissions using ACLs. ACLs are suitable only for specifi c scenarios (for example, if a bucket owner allows other accounts to upload objects), and permissions to these objects can be managed only using an object ACL by the account that owns the object.

You can only grant access to other accounts using ACLs—not users in your own account.

Defense in Depth—Amazon S3 Security

Amazon S3 provides comprehensive security and compliance capabilities that meet the most stringent regulatory requirements, and it gives you fl exibility in the way that you manage data for cost optimization, access control, and compliance. With this fl exibility, however, comes the responsibility of ensuring that your content is secure. You can use an approach known as defense in depth in Amazon S3 to secure your data. This approach uses multiple layers of security to ensure redundancy if one of the multiple layers of security fails.

Figure 3.14 represents defense in depth visually. It contains several Amazon S3 objects (A) in a single Amazon S3 bucket (B). You can encrypt these objects on the server side or the client side, and you can also confi gure the bucket policy such that objects are accessible only through Amazon CloudFront, which you can accomplish through an origin access identity (C). You can then confi gure Amazon CloudFront to deliver content only over HTTPS in addition to using your own domain name (D).

Defense in depth on Amazon S3

defense

To meet defense in depth requirements on Amazon S3:

  • Data must be encrypted at rest and during transit.
  • Data must be accessible only by a limited set of public IP addresses.
  • Data must not be publicly accessible directly from an Amazon S3 URL.
  • A domain name is required to consume the content.

You can apply policies to Amazon S3 buckets so that only users with appropriate permissions are allowed to access the buckets. Anonymous users (with public-read/public-read-write permissions) and authenticated users without the appropriate permissions are prevented from accessing the buckets. You can also secure access to objects in Amazon S3 buckets. The objects in Amazon S3 buckets can be encrypted at rest and during transit to provide end-to-end security from the source (in this case, Amazon S3) to your users.

Query String Authentication

You can provide authentication information using query string parameters. Using query parameters to authenticate requests is useful when expressing a request entirely in a URL. This method is also referred to as presigning a URL. With presigned URLs, you can grant temporary access to your Amazon S3 resources. For example, you can embed a presigned URL on your website, or alternatively use it in a command line client (such as Curl), to download objects.

Hosting a Static Website

If your website contains static content and optionally client-side scripts, then you can host your static website directly in Amazon S3 without the use of web-hosting servers. To host a static website, you confi gure an Amazon S3 bucket for website hosting and upload your website content to the bucket. The website is then available at the AWS Region–specifi c website endpoint of the bucket in one of the following formats:

<bucket-name>.s3-website-<AWS-region>.amazonaws.com
<bucket-name>.s3-website.<AWS-region>.amazonaws.com

Instead of accessing the website by using an Amazon S3 website endpoint, use your own domain (for instance, example.com ) to serve your content. The following steps allow you to confi gure your own domain:

  1. Register your domain with the registrar of your choice. You can use Amazon Route 53 to register your domain name or any other third-party domain registrar.
  2. Create your bucket in Amazon S3 and upload your static website content.
  3. Point your domain to your Amazon S3 bucket using either of the following as your DNS provider:
    • Amazon Route 53
    • Your third-party domain name registrar

Amazon S3 does not support server-side scripting or dynamic content. We discuss other AWS options for that throughout this study guide.

Static websites can be hosted in Amazon S3.

MFA Delete

MFA is another way to control deletes on your objects in Amazon S3. It does so by adding another layer of protection against unintentional or malicious deletes, requiring an authorized request against Amazon S3 to delete the object. MFA also requires a unique code from a token or an authentication device (virtual or hardware). These devices provide a unique code that will then allow you to delete the object. Figure 3.15 shows what would be required for a user to execute a delete operation on an object when MFA is enabled.

MFA Delete

MFA Delete

Cross-Region Replication

Cross-region replication (CRR) is a bucket-level configuration that enables automatic, asynchronous copying of objects across buckets in different AWS Regions. We refer to these buckets as the source bucket and destination bucket. These buckets can be owned by different accounts.

To activate this feature, add a replication configuration to your source bucket to direct Amazon S3 to replicate objects according to the configuration. In the replication configuration, provide information including the following:

  • The destination bucket
  • The objects that need to be replicated
  • Optionally, the destination storage class (otherwise the source storage class will be used)

The replicas that are created in the destination bucket will have these same characteristics as the source objects:

  • Key names
  • Metadata
  • Storage class (unless otherwise specified)
  • Object ACL

All data is encrypted in transit across AWS Regions using SSL.

You can replicate objects from a source bucket to only one destination bucket. After Amazon S3 replicates an object, the object cannot be replicated again. For example, even after you change the destination bucket in an existing replication confi guration, Amazon S3 will not replicate it again.

After Amazon S3 replicates an object using CRR, the object cannot be replicated again (such as to another destination bucket).

Requirements for CRR include the following:

  • Versioning is enabled for both the source and destination buckets.
  • Source and destination buckets must be in different AWS Regions.
  • Amazon S3 must be granted appropriate permissions to replicate files.

VPC Endpoints

A virtual private cloud (VPC) endpoint enables you to connect your VPC privately to Amazon S3 without requiring an internet gateway, network address translation (NAT) device, virtual private network (VPN) connection, or AWS Direct Connect connection. Instances in your VPC do not require public IP addresses to communicate with the resources in the service. Traffi c between your VPC and Amazon S3 does not leave the Amazon network. Amazon S3 uses a gateway type of VPC endpoint. The gateway is a target for a specifi ed route in your route table, used for traffi c destined for a supported AWS service. These endpoints are easy to confi gure, are highly reliable, and provide a secure connection to Amazon S3 that does not require a gateway or NAT instance. Amazon EC2 instances running in private subnets of a VPC can have controlled access to Amazon S3 buckets, objects, and API functions that are in the same region as the VPC. You can use an Amazon S3 bucket policy to indicate which VPCs and which VPC endpoints have access to your Amazon S3 buckets.

Using the AWS SDKs, AWS CLI, and AWS Explorers

You can use the AWS SDKs when developing applications with Amazon S3. The AWS SDKs simplify your programming tasks by wrapping the underlying REST API. The AWS Mobile SDKs and the AWS Amplify JavaScript library are also available for building connected mobile and web applications using AWS. In addition to AWS SDKs, AWS explorers are available for Visual Studio and Eclipse for Java Integrated Development Environment (IDE). In this case, the SDKs and AWS explorers are available bundled together as AWS Toolkits. You can also use the AWS CLI to manage Amazon S3 buckets and objects.

AWS has deprecated SOAP support over HTTP, but it is still available over HTTPS. New Amazon S3 features will not be supported over SOAP. We recommend that you use either the REST API or the AWS SDKs for any new development and migrate any existing SOAP calls when you are able.

Making Requests

Every interaction with Amazon S3 is either authenticated or anonymous. Authentication is the process of verifying the identity of the requester trying to access an AWS product (you are who you say you are, and you are allowed to do what you are asking to do). Authenticated requests must include a signature value that authenticates the request sender, generated in part from the requester’s AWS access keys. If you are using the AWS SDK, the libraries compute the signature from the keys that you provide. If you make direct REST API calls in your application, however, you must write code to compute the signature and add it to the request.

Stateless and Serverless Applications

Amazon S3 provides developers with secure, durable, and highly scalable object storage that can be used to decouple storage for use in serverless applications. Developers can also use Amazon S3 for storing and sharing state in stateless applications. Developers on AWS are regularly moving shared file storage to Amazon S3 for stateless applications. This is a common method for decoupling your compute and storage and increasing the ability to scale your application by decoupling that storage. We will discuss stateless and serverless applications throughout this study guide.

Data Lake

Traditional data storage can no longer provide the agility and flexibility required to handle the volume, velocity, and variety of data used by today’s applications. Because of this, many organizations are shifting to a data lake architecture.

A data lake is an architectural approach that allows you to store massive amounts of data in a central location for consumption by multiple applications. Because data can be stored as is, there is no need to convert it to a predefined schema, and you no longer need to know what questions to ask of your data beforehand.

Amazon S3 is a common component of a data lake solution on the cloud, and it can complement your other storage solutions. If you move to a data lake, you are essentially separating compute and storage, meaning that you are going to build and scale your storage and compute separately. You can take storage that you currently have on premises or in your data center and instead use Amazon S3, which then allows you to scale and build your compute in any desired configuration, regardless of your storage.

That design pattern is different from most applications available today, where the storage is tied to the compute. When you separate those two features and instead use a data lake, you achieve an agility that allows you to invent new types of applications while you are managing your storage as an independent entity.

In addition, Amazon S3 lets you grow and scale in a virtually unlimited fashion. You do not have to take specific actions to expand your storage capacity—it grows automatically with your data.

In the data lake diagram shown in Figure 3.16, you will see how to use Amazon S3 as a highly available and durable central storage repository. From there, a virtually unlimited number of services and applications, both on premises and in the cloud, can take advantage of using Amazon S3 as a data lake.

Data lakes

Data Lakes

Performance

There are a number of actions that Amazon S3 takes by default to help you achieve high levels of performance. Amazon S3 automatically scales to thousands of requests per second per prefix based on your steady state traffic. Amazon S3 will automatically partition your prefixes within hours, adjusting to increases in request rates.

Consideration for Workloads

To optimize the use of Amazon S3 mixed or GET-intensive workloads, you must become familiar with best practices for performance optimization.

Mixed request types If your requests are typically a mix of GET, PUT, DELETE, and GET Bucket (list objects), choosing appropriate key names for your objects ensures better performance by providing low-latency access to the Amazon S3 index.

GET-intensive workloads If the bulk of your workload consists of GET requests, you may want to use Amazon CloudFront, a content delivery service (discussed later in this chapter).

Tips for Object Key Naming

The way that you name your keys in Amazon S3 can affect the data access patterns, which may directly impact the performance of your application. It is a best practice at AWS to design for performance from the start. Even though you may be developing a new application, that application is likely to grow over time. If you anticipate your application growing to more than approximately 1,000 requests per second (including both PUT s and GET s on your object), you will want to consider using a three- or four-character hash in your key names. If you anticipate your application receiving fewer than 1,000 requests per second and you don’t see a lot of traffi c in your storage, then you do not need to implement this best practice. Your application will still benefi t from Amazon S3’s default performance.

In the past, customers would also add entropy in their key names. Because of recent Amazon S3 performance enhancements, most customers no longer need to worry about introducing entropy in key names.

A random hash should come before patterns, such as dates and sequential IDs.

Using a naming hash can improve the performance of heavy-traffi c applications. Object keys are stored in an index in all regions. If you’re constantly writing the same key prefi x over and over again (for example, a key with the current year), all of your objects will be close to each other within the same partition in the index. When your application experiences an increase in traffi c, it will be trying to read from the same section of the index, resulting in decreased performance as Amazon S3 tries to spread out your data to achieve higher levels of throughput.

Always first ensure that your application can accommodate a naming hash.

By putting the hash at the beginning of your key name, you are adding randomness. You could hash the key name and place it at the beginning of your object right after the bucket name. This will ensure that your data will be spread across different partitions and allow you to grow to a higher level of throughput without experiencing a re-indexing slowdown if you go above peak traffi c volumes.

Amazon S3 Transfer Acceleration

Amazon S3 Transfer Acceleration is a feature that optimizes throughput when transferring larger objects across larger geographic distances. Amazon S3 Transfer Acceleration uses Amazon CloudFront edge locations to assist you in uploading your objects more quickly in cases where you are closer to an edge location than to the region to which you are transferring your files.

Instead of using the public internet to upload objects from Southeast Asia, across the globe to Northern Virginia, take advantage of the global Amazon content delivery network (CDN). AWS has edge locations around the world, and you upload your data to the edge location closest to your location. This way, you are traveling across the AWS network backbone to your destination region, instead of across the public internet. This option might give you a significant performance improvement and better network consistency than the public internet.

To implement Amazon S3 Transfer Acceleration, you do not need to make any changes to your application. It is enabled by performing the following steps:

  1. Enable Transfer Acceleration on a bucket that conforms to DNS naming requirements and does not contain periods (.).
  2. Transfer data to and from the acceleration-enabled bucket by using one of the s3-accelerate endpoint domain names.

There is a small fee for using Transfer Acceleration. If your speed using Transfer Acceleration is no faster than it would have been going over the public internet, however, there is no additional charge.

The further you are from a particular region, the more benefit you will derive from transferring your files more quickly by uploading to a closer edge location. Figure 3.17 shows how accessing an edge location can reduce the latency for your users, as opposed to accessing content from a region that is farther away.

Using an AWS edge location

Edge Location

Multipart Uploads

When uploading a large object to Amazon S3 in a single-threaded manner, it can take a significant amount of time to complete. The multipart upload API enables you to upload large objects in parts to speed up your upload by doing so in parallel. To use multipart upload, you first break the object into smaller parts, parallelize the upload, and then submit a manifest file telling Amazon S3 that all parts of the object have been uploaded. Amazon S3 will then assemble all of those individual pieces to a single Amazon S3 object. Multipart upload can be used for objects ranging from 5 MB to 5 TB in size.

Range GETs

Range GETs are similar to multipart uploads, but in reverse. If you are downloading a large object and tracking the offsets, use range GETs to download the object as multiple parts instead of a single part. You can then download those parts in parallel and potentially see an improvement in performance.

Amazon CloudFront

Using a CDN like Amazon CloudFront, you may achieve lower latency and higherthroughput performance. You also will not experience as many requests to Amazon S3 because your content will be cached at the edge location. Your users will also experience the performance improvement of having cached storage through Amazon CloudFront versus going back to Amazon S3 for each new GET on an object.

TCP Window Scaling

Transmission Control Protocol (TCP) window scaling allows you to improve network throughput performance between your operating system, application layer, and Amazon S3 by supporting window sizes larger than 64 KB. Although it can improve performance, it can be challenging to set up correctly, so refer to the AWS Documentation repository for details.

TCP Selective Acknowledgment

TCP selective acknowledgment is designed to improve recovery time after a large number of packet losses. It is supported by most newer operating systems, but it might have to be enabled. Refer to the Amazon S3 Developer Guide for more information.

Pricing

TCP selective acknowledgment is designed to improve recovery time after a large number of packet losses. It is supported by most newer operating systems, but it might have to be enabled. Refer to the Amazon S3 Developer Guide for more information.

You pay for the following:

  • The storage that you use
  • The API calls that you make ( PUT, COPY, POST, LIST, GET)
  • Data transfer out of Amazon S3

Data transfer out pricing is tiered, so the more you use, the lower your cost per gigabyte. Refer to the AWS website for the latest pricing.

Amazon S3 pricing differs from the pricing of Amazon EBS volumes in that if you create an Amazon EBS volume and store nothing on it, you are still paying for the storage space of the volume that you have allocated. With Amazon S3, you pay for the storage space that is being used—not allocated.

Object Lifecycle Management

To manage your objects so that they are stored cost effectively throughout their lifecycle, use a lifecycle configuration . A lifecycle confi guration is a set of rules that defi nes actions that Amazon S3 applies to a group of objects. There are two types of actions:

Transition actions: Transition actions defi ne when objects transition to another storage class. For example, you might choose to transition objects to the STANDARD_IA storage class 30 days after you created them or archive objects to the GLACIER storage class one year after creating them.

Expiration actions Expiration actions defi ne when objects expire. Amazon S3 deletes expired objects on your behalf.

When Should You Use Lifecycle Configuration?

You should use lifecycle configuration rules for objects that have a well-defined lifecycle. The following are some examples:

  • If you upload periodic logs to a bucket, your application might need them for a week or a month. After that, you may delete them.
  • Some documents are frequently accessed for a limited period of time. After that, they are infrequently accessed. At some point, you might not need real-time access to them, but your organization or regulations might require you to archive them for a specific period. After that, you may delete them.
  • You can upload some data to Amazon S3 primarily for archival purposes. For example, archiving digital media, financial, and healthcare records; raw genomics sequence data, long-term database backups; and data that must be retained for regulatory compliance.

With lifecycle configuration rules, you can tell Amazon S3 to transition objects to less expensive storage classes or archive or delete them.

Configuring a Lifecycle

A lifecycle configuration (an XML file) comprises a set of rules with predefined actions that you need Amazon S3 to perform on objects during their lifetime. Amazon S3 provides a set of API operations for managing lifecycle configuration on a bucket, and it is stored by Amazon S3 as a lifecycle subresource that is attached to your bucket. You can also configure the lifecycle by using the Amazon S3 console, the AWS SDKs, or the REST API. The following lifecycle configuration specifies a rule that applies to objects with key name prefix logs/. The rule specifies the following actions:

  • Two transition actions
  • Transition objects to the STANDARD_IA storage class 30 days after creation
  • Transition objects to the GLACIER storage class 90 days after creation
  • One expiration action that directs Amazon S3 to delete objects a year after creation
xml
<LifecycleConfiguration>
 <Rule>
	 <ID>example-id</ID>
	 <Filter>
	 	<Prefix>logs/</Prefix>
	 </Filter>
	 <Status>Enabled</Status>
	 <Transition>
		 <Days>30</Days>
		 <StorageClass>STANDARD_IA</StorageClass>
	 </Transition>
	 <Transition>
		 <Days>90</Days>
		 <StorageClass>GLACIER</StorageClass>
	 </Transition>
	 <Expiration>
	 <Days>365</Days>
	 </Expiration>
 </Rule>
</LifecycleConfiguration>

Amazon S3 lifecycle policies

Shows a set of Amazon S3 lifecycle policies in place. These policies move files automatically from one storage class to another as they age out at certain points in time.

life cycle