How can I use an Amazon VPC endpoint to activate a DataSync agent across AWS Regions or across accounts?

8 minute read
0

I want to use Amazon Virtual Private Cloud (Amazon VPC) to configure my environments and AWS DataSync to transfer data in a private network.

Resolution

Important: The example configuration assumes the following:

  • The resources won't connect to the public internet except for the connection between the private endpoints to AWS.
  • The source of the data transfer is an on-premises or remote VPC environment with an NFS or SMB data source. The destination of the data transfer is in an Amazon VPC. The Amazon VPC has access to Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), or Amazon FSx. After you complete the setup, reverse the transfer direction based on the supported location combinations for DataSync. This setup also works with on-premises sources, such as HDFS and Object storage.

Set up the source network environment (NFS or SMB data source)

The DataSync agent runs on the source network that's close to the NFS or SMB data source. For this configuration, the source network can be either on-premises or a private Amazon VPC.

Note: If you want to use VPC peering to set up transfers between VPCs, then review the limitations of VPC peering. Make sure that the feature supports your configuration.

Set up the destination network environment (Amazon S3, Amazon EFS, or Amazon FSx)

For this configuration, the destination network is a private Amazon VPC. The Amazon VPC must access a destination location, such as Amazon S3, Amazon EFS, or Amazon FSx.

On the destination private VPC, complete the following steps:

  1. Create a VPC endpoint for DataSync.
  2. Confirm that the subnet associated with the VPC endpoint has at least four IP addresses available for DataSync execution endpoints.
    Note: Each DataSync task uses four IP addresses for the task execution endpoints.
  3. Configure a security group for the DataSync VPC endpoints. The security group must allow the following options:
    Inbound traffic on TCP port 443 to the endpoint
    Outbound ephemeral traffic
    Inbound traffic on TCP port range 1024-1062 to the destination VPC endpoint
    To open an AWS Support channel, allow inbound traffic on TCP port 22

Set up the network connection between the source and destination environments

For this configuration, the data transfer is from one of the following:

  • A source on-premises environment to a destination private VPC
  • Between private VPCs that are in different AWS Regions
  • From sources that belong to different AWS accounts.

Configure the following connection and network requirements between the source and destination environments:

  1. Set up an active network connection between the source environment and the destination VPC. For example, you can use AWS Direct Connect, VPC peering, or a transit VPC to set up this connection.
  2. Confirm that there's no overlap in the private network address space between the source and destination environments. Then, verify the CIDR blocks.
  3. Confirm that the routing table entries in both the source subnet and destination subnet allow traffic between the networks without issues. For example, if you use VPC peering, then update your route tables for the peering connection.
  4. If there's a firewall between the source and destination networks, then you must allow the following options:
    Traffic on TCP port 443 to the destination VPC endpoint subnets
    Traffic on TCP port range 1024-1062 to the destination VPC endpoint
    To open an AWS Support channel, allow traffic on TCP port 22
  5. Confirm that all security groups and firewalls allow ephemeral outbound traffic or the use of connection tracking tools.

Configure a machine to use to activate the DataSync agent

Note: If you receive errors when you run AWS Command Line Interface (AWS CLI) commands, then see Troubleshoot AWS CLI errors. Also, make sure that you're using the most recent AWS CLI version.

You can use a physical computer, a virtual machine, or an Amazon Elastic Compute Cloud (Amazon EC2) instance to activate the DataSync agent. Complete the following steps on the machine:

  1. Set up a connection to one of the private networks in the source or destination environment. You must configure valid network routes to both networks.
  2. If there's no internet connection, you must set up network access to the DataSync agent on TCP port 80 (HTTP).
  3. (Optional) Install the cURL command to get the activation key.
  4. Install the AWS CLI to activate the DataSync agent.
  5. Configure the AWS CLI with AWS Identity and Access Management (IAM) permission that allow you to activate the DataSync agent. The permissions look similar to the following example:
    {
      "Version": "2012-10-17",
      "Statement": [{
          "Sid": "VisualEditor2",
          "Effect": "Allow",
          "Action": [
            "datasync:*"
          ],
          "Resource": "arn:aws:datasync:us-east-1:123456789012:*"
        },
        {
          "Sid": "VisualEditor3",
          "Effect": "Allow",
          "Action": [
            "ec2:*VpcEndpoint*",
            "ec2:*subnet*",
            "ec2:*security-group*"
          ],
          "Resource": "*"
        }
      ]
    }
    Note: If you use an Amazon EC2 instance to activate the agent, then attach the IAM role with the correct permissions to the instance profile.

Activate the DataSync agent

Note: Replace us-east-1 with the AWS Region of your choice.

To activate the DataSync agent, complete the following steps:

  1. Deploy the DataSync agent on a virtual machine (on-premises) or on an EC2 instance (private VPC).

  2. From the machine that you configured, run the following cURL command to get the DataSync agent's activation key:

    curl -vvv -G \
      --data-urlencode "activationRegion=us-east-1" \
      --data-urlencode "gatewayType=SYNC" \
      --data-urlencode "endpointType=PRIVATE_LINK" \
      --data-urlencode "privateLinkEndpoint=vpc_endpoint_ip_address" \
      --data-urlencode "redirect_to=https://us-east-1.console.aws.amazon.com/datasync/home?region=us-east-1#/agents/create" \
      "http://datasync_agent_ip"

    Note: You can optionally include --data-urlencode "no_redirect" to simplify and shorten the command and output. Or, you can use the local console to obtain the activation key.

  3. Note the activation key from the command output.

  4. Run the describe-vpc-endpoints command to get the VpcEndpointId, VpcId, SubnetIds, and Security GroupId for the destination VPC endpoint:

    aws ec2 describe-vpc-endpoints --region us-east-1
  5. Note the VpcEndpointId from the command output. The output looks similar to the following:

            {
                "VpcEndpointId": "vpce-0ba3xxxxx3752b63",
                "VpcEndpointType": "Interface",
                "VpcId": "vpc-aabb1122",
                "ServiceName": "com.amazonaws.us-east-1.datasync",
                ...
                "SubnetIds": [
                    "subnet-f0f6cd97",
                    "subnet-990da7c1",
                    "subnet-41241008"
                ],
                "Groups": [
                    {
                        "GroupId": "sg-8ae9abf1",
                        "GroupName": "default"
                    }
                ],
                ...

    Note: If you use the same subnet and security group for your DataSync agent, then skip the following Optional steps.

  6. (Optional) Run the describe-security-groups command to get the security group ID of the destination VPC. The DataSync execution endpoints use this security group to connect to the DataSync VPC endpoint.

    aws ec2 describe-security-groups --region us-east-1

    Note: To reduce the complexity of the configuration, it's a best practice to use the same security group as the VPC endpoint.

  7. (Optional) Note the GroupID from the command output. The command output looks similar to the following:

    "GroupId": "sg-000e8edxxxx4e4701"
  8. (Optional) Run the describe-subnets command to get the subnet ID associated with the VPC endpoint:

    aws ec2 describe-subnets --region us-east-1

    Note: To reduce the complexity of the configuration, it's a best practice to use the same subnet as the VPC endpoint.

  9. (Optional) Note the SubnetArn from the command output. The command output looks similar to the following:

    "SubnetArn": "arn:aws:ec2:us-east-1:123456789012:subnet/subnet-03dc4xxxx6905bb76"
  10. Run the create-agent command to activate the DataSync agent:

    aws datasync create-agent --agent-name your_agent_name --vpc-endpoint-id vpce-0cxxxxxxxxxxxxf57 --activation-key UxxxQ-0xxxB-LxxxL-AUxxV-JxxxN --subnet-arns arn:aws:ec2:us-east-1:123456789012:subnet/subnet-0cxxxxxxxxxxxx3 --security-group-arns arn:aws:ec2:us-east-1:123456789012:security-group/sg-xxxxxxxxxxxxxx --region us-east-1

    Note: For activation-key, enter the activation key. For vpc-endpoint-id, enter your value for VpcEndpointId. For security-group-arns, enter your value for GroupID. For subnet-arns, enter your value for SubnetArn.
    The command returns the DataSync agent's ARN:

    {
        "AgentArn": "arn:aws:datasync:us-east-1:123456789012:agent/agent-0bxxxxxxxxxxxxxx57c"
    }
  11. Run the list-agents command to confirm that you created the agent successfully:

    aws datasync list-agents --region us-east-1
  12. Confirm that your DataSync agent's ARN is returned in the output:

    {
        "Agents": [
            {
                "AgentArn": "arn:aws:datasync:us-east-1:123456789012:agent/agent-0bxxxxxxxxxxxxxx57c",
                "Status": "ONLINE",
                "Name": "your_agent_name"
            }
        ]
    }

    After your DataSync agent is activated, use the DataSync console to create locations and tasks for your transfers.

Troubleshoot errors during DataSync agent activation

You might encounter errors during DataSync agent activation. To troubleshoot, review the following information:

Traffic on TCP port 443 isn't allowed

The cURL command returns the following error, and doesn't return the activation key:

"errorType=PRIVATE_LINK_ENDPOINT_UNREACHABLE"

This error typically occurs when traffic on TCP port 443 isn't allowed to the VPC endpoint.

Public activation key in the create-agent command

The following InvalidRequestException error occurs when you call the CreateAgent operation. You might see an error similar to the following:

"Private link configuration is invalid: VPC Endpoint Id should remain unspecified for public-endpoint activation keys."

This error occurs when you enter the public activation key for the --activation-key parameter in the create-agent command. You must enter the private activation key for the private endpoint type in this configuration.

IAM identity has insufficient permissions

When you configure the DataSync agent activation, you might receive one of the following errors:

"An error occurred (InvalidRequestException) when calling the CreateAgent operation: Invalid EC2 subnet, ARN: arn:aws:ec2:us-east-1:123456789012:subnet/subnet-41xxxx08, reason: invalid subnet, StatusCode: 403"

-or-

"An error occurred (InvalidRequestException) when calling the CreateAgent operation: Invalid EC2 security group, ARN: arn:aws:ec2:us-east-1:123456789012:security-group/sg-000e8xxxx9d4e4701, reason: invalid security group, StatusCode: 403"

-or-

"An error occurred (InvalidRequestException) when calling the CreateAgent operation: Private link configuration is invalid: VPC endpoint vpce-0ba34edxxxx752b63 is not valid"

These errors occur when the IAM identity configured on your AWS CLI has insufficient permissions. Confirm that your IAM identity's policy grants permissions for ec2:*VpcEndpoint*, ec2:*subnet*, and ec2:*security-group*.

Related information

How AWS DataSync works

Using AWS DataSync agents with VPC endpoints

Requirements for AWS DataSync

AWS OFFICIAL
AWS OFFICIALUpdated a month ago