Skip to content
English
On this page

Cloud Data Migration

Data is the cornerstone of successful cloud application deployments. Your evaluation and planning process may highlight the physical limitations inherent to migrating data from on-premises locations into the cloud. To assist you with that process, AWS offers a suite of tools to help you move data via networks, roads, and technology partners in and out of the cloud through offl ine, online, or streaming models. The daunting realities of data transport apply to most projects: Knowing how to move to the cloud with minimal disruption, cost, and time, and knowing what is the most effi - cient way to move your data.

To determine the best-case scenario for effi ciently moving your data, use this formula:

Number of Days = (Total Bytes)/(Megabits per second * 125 * 1000 * Network Utilization * 60 seconds * 60 minutes * 24 hours)

For example, if you have a T1 connection (1.544 Mbps) and 1 TB (1024 × 1024 × 1024 × 1024 bytes) to move in or out of AWS, the theoretical minimum time that it would take to load over your network connection at 80 percent network utilization is 82 days. Instead of using up bandwidth and taking a long time to migrate, many AWS customers are choosing one of the data migration options that are discussed next.

Multiple-choice questions that ask you to choose two or three true answers require that all of your answers be correct. There is no partial credit for getting a fraction correct. Pay extra attention to those questions when doing your review.

AWS Storage Gateway

AWS Storage Gateway is a hybrid cloud storage service that enables your on-premises applications to use AWS cloud storage seamlessly. You can use this service for the following:

  • Backup and archiving
  • Disaster recovery
  • Cloud bursting
  • Storage tiering
  • Migration

Your applications connect to the service through a gateway appliance using standard storage protocols, such as NFS and internet Small Computer System Interface (iSCSI). The gateway connects to AWS storage services, such as Amazon S3, Amazon S3 Glacier, and Amazon EBS, providing storage for fi les, volumes, and virtual tapes in AWS.

File Gateway

A file gateway supports a file interface into Amazon S3, and it combines a cloud service with a virtual software appliance that is deployed into your on-premises environment as a VM. You can think of file gateway as an NFS mount on Amazon S3, allowing you to access your data directly in Amazon S3 from on premises as a file share.

Volume Gateway

A volume gateway provides cloud-based storage volumes that you can mount as iSCSI devices from your on-premises application servers. A volume gateway supports cached mode and stored volume mode configurations. Note that the volume gateway represents the family of gateways that support blockbased volumes, previously referred to as gateway-cached volumes and gateway-stored volumes.

Cached Mode

In the cached volume mode, your data is stored in Amazon S3, and a cache of the frequently accessed data is maintained locally by the gateway. This enables you to achieve cost savings on primary storage and minimize the need to scale your storage on premises while retaining low-latency access to your most used data.

Stored Volume Mode

In the stored volume mode, data is stored on your local storage with volumes backed up asynchronously as Amazon EBS snapshots stored in Amazon S3. This provides durable offsite backups.

Tape Gateway

A tape gateway can be used for backup to migrate off of physical tapes and onto a costeffective and durable archive backup such as Amazon S3 Glacier. For a tape gateway, you store and archive your data on virtual tapes in AWS. A tape gateway eliminates some of the challenges associated with owning and operating an on-premises physical tape infrastructure. It can also be used for migrating data off of tapes, which are nearing end of life, into a more durable type of storage that still acts like tape.

AWS Import/Export

AWS Import/Export accelerates moving large amounts of data into and out of the AWS Cloud using portable storage devices for transport. It transfers your data directly onto and off of storage devices using Amazon’s high-speed internal network and bypassing the internet.

For significant datasets, AWS Import/Export is often faster than internet transfer and more cost-effective than upgrading your connectivity. You load your data onto your devices and then create a job in the AWS Management Console to schedule shipping of your devices. You are responsible for providing your own storage devices and the shipping charges to AWS.

It supports (in a limited number of regions) the following:

  • Importing and exporting of data in Amazon S3 buckets
  • Importing data into Amazon EBS snapshots

You cannot export directly from Amazon S3 Glacier. You must first restore your objects to Amazon S3 before exporting using AWS Import/Export.

AWS Snowball

AWS Snowball is a petabyte-scale data transport solution that uses physical storage appliances, bypassing the internet, to transfer large amounts of data into and out of Amazon S3. AWS Snowball addresses common challenges with large-scale data transfers, including the following:

  • High network costs
  • Long transfer times
  • Security concerns

The figure shows a physical AWS Snowball device.

AWS Snowball

Snowball

When you transfer your data with AWS Snowball, you do not need to write any code or purchase any hardware. To transfer data using AWS Snowball, perform the following steps:

  1. Create a job in the AWS Management Console. The AWS Snowball appliance is shipped to you automatically.
  2. When the appliance arrives, attach it to your local network.
  3. Download and run the AWS Snowball client to establish a connection.
  4. Use the client to select the file directories that you need to transfer to the appliance. The client will then encrypt and transfer the files to the appliance at high speed.
  5. Once the transfer is complete and the appliance is ready to be returned, the E Ink shipping label automatically updates and you can track the job status via Amazon Simple Notification Service (Amazon SNS), text messages, or directly in the console.

The table shows some common AWS Snowball use cases.

AWS Snowball Use Cases

Use CaseDescription
Cloud migrationIf you have large quantities of data that you need to migrate into AWS, AWS Snowball is often much faster and more cost-effective than transferring that data over the internet.
Disaster recoveryIn the event that you need to retrieve a large quantity of data stored in Amazon S3 quickly, AWS Snowball appliances can help retrieve the data much quicker than high-speed internet.
Data center decommissionThere are many steps involved in decommissioning a data center to make sure that valuable data is not lost. Snowball can help ensure that your data is securely and cost-effectively transferred to AWS during this process.
Content distributionUse Snowball appliances if you regularly receive or need to share large amounts of data with clients, customers, or business associates. Snowball appliances can be sent directly from AWS to client or customer locations.

AWS Snowball Edge

AWS Snowball Edge is a 100-TB data transfer service with on-board storage and compute power for select AWS capabilities. In addition to transferring data to AWS, AWS Snowball Edge can undertake local processing and edge computing workloads. Figure shows a physical AWS Snowball Edge device.

Snowball Edge

Features of AWS Snowball Edge include the following:

  • An endpoint on the device that is compatible with Amazon S3
  • A file interface with NFS support
  • A cluster mode where multiple AWS Snowball Edge devices can act as a single, scalable storage pool with increased durability
  • The ability to run AWS Lambda powered by AWS IoT Greengrass functions as data is copied to the device
  • Encryption taking place on the appliance itself

The transport of data is done by shipping the data in the appliances through a regional carrier. The appliance differs from the standard AWS Snowball because it can bring the power of the AWS Cloud to your local environment, with local storage and compute functionality. There are three types of jobs that can be performed with Snowball Edge appliances:

  • Import jobs into Amazon S3
  • Export jobs from Amazon S3
  • Local compute and storage-only jobs

Use AWS Snowball Edge when you need the following:

  • Local storage and compute in an environment that might or might not have an internet connection
  • To transfer large amounts of data into and out of Amazon S3, bypassing the internet

AWS Snowball Device Use Cases

Use CaseAWS SnowballAWS Snowball Edge
Import data into Amazon S3
Copy data directly from HDFS
Export from Amazon S3
Durable local storage
Use in a cluster of devices
Use with AWS IoT Greengrass
Transfer files through NFS with a GUI

AWS Snowmobile

AWS Snowmobile is an exabyte-scale data transfer service used to move extremely large amounts of data from on premises to AWS. You can transfer up to 100 PB per AWS Snowmobile, a 45-foot long ruggedized shipping container pulled by a semi-trailer truck. AWS Snowmobile makes it easy to move massive volumes of data to the cloud, including video libraries, image repositories, or even a complete data center migration. In 2017, one AWS customer moved 8,700 tapes with 54 million files to Amazon S3 using AWS Snowmobile. The figure shows an AWS Snowmobile shipping container being pulled by a semi-trailer truck.

AWS Snowmobile

How do you choose between AWS Snowmobile and AWS Snowball? To migrate large datasets of 10 PB or more in a single location, you should use AWS Snowmobile. For datasets that are less than 10 PB or distributed in multiple locations, you should use AWS Snowball.

Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose lets you prepare and load real-time data streams into data stores and analytics tools. Although it has much broader uses for loading data continuously for data streaming and analytics, it can be used as a one-time tool for data migration into the cloud. Amazon Kinesis Data Firehose can capture, transform, and load streaming data into Amazon S3 and Amazon Redshift, which will be discussed further in Chapter 4, “Hello, Databases.” With Amazon Kinesis Data Firehose, you can avoid writing applications or managing resources. When you configure your data producers to send data to Amazon Kinesis Data Firehose, as shown in the figure, it automatically delivers the data to the destination that you specified. This is an efficient option to transform and deliver data from on premises to the cloud.

Amazon Kinesis Data Firehose

AWS Kinesis Data Firehose

Destinations include the following:

  • Amazon S3
  • Amazon Redshift
  • Amazon Elasticsearch Service
  • Splunk

Key Concepts

As you get started with Amazon Kinesis Data Firehose, you will benefit from understanding the concepts described next.

Kinesis Data Delivery Stream

You use Amazon Kinesis Data Firehose by creating an Amazon Kinesis data delivery stream and then sending data to it.

Record

A record is the data that your producer sends to a Kinesis data delivery stream, with a maximum size of 1,000 KB.

Data Producer

Data producers send records to Amazon Kinesis data delivery streams. For example, your web server could be configured as a data producer that sends log data to an Amazon Kinesis delivery stream.

Buffer Size and Buffer Interval

Amazon Kinesis Data Firehose buffers incoming data to a certain size or for a certain period of time before delivering it to destinations. Buffer size is in megabytes, and buffer interval is in seconds.

Streaming to Amazon S3

Streaming S3

AWS Direct Connect

Using AWS Direct Connect, you can establish private connectivity between AWS and your data center, office, or colocation environment, which in many cases can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than internet-based connections.

These benefits can then be applied to storage migration. Transferring large datasets over the internet can be time-consuming and expensive. When you use the cloud, you may find that transferring large datasets can be slow because your business-critical network traffic is contending for bandwidth with your other internet usage. To decrease the amount of time required to transfer your data, increase the bandwidth to your internet service provider. Be aware that this frequently requires a costly contract renewal and minimum commitment.

VPN Connection

You can connect your Amazon VPC to remote networks by using a VPN connection. The table shows some of the connectivity options available to you.

Amazon VPC Connectivity Options

VPN Connectivity OptionDescription
AWS managed VPNCreate an IP Security (IPsec) VPN connection between your VPC and your remote network. On the AWS side of the VPN connection, a virtual private gateway provides two VPN endpoints (tunnels) for automatic failover. You configure your customer gateway on the remote side of the VPN connection.
AWS VPN CloudHubIf you have more than one remote network (for example, multiple branch offices), create multiple AWS managed VPN connections via your virtual private gateway to enable communication between these networks.
Third-party software VPN appliance Create a VPN connection to your remote network by using an Amazon EC2 instance in your VPC that’s running a third-party software VPN appliance. AWS does not provide or maintain third-party software VPN appliances; however, you can choose from a range of products provided by partners and open-source communities.

You can also use AWS Direct Connect to create a dedicated private connection from a remote network to your VPC. You can combine this connection with an AWS managed VPN connection to create an IPsec-encrypted connection. You will learn more about VPN connections in subsequent chapters.