How to Easily Transfer & Sync Data Across Amazon S3 and Azure

How to Easily Transfer & Sync Data Across Amazon S3 and Azure

Amazon S3 (Amazon Simple Storage Service) and Microsoft Azure Blobs are two of the most popular cloud object storage services available for storing unstructured data.  They’re massively scalable, offer excellent SLAs for uptime and data availability, and are hard to beat on price — if your data never moves. 

But what happens when you need to migrate files, change cloud providers, or download files on a per project basis?  Conventional transfer tools are inefficient, slow, unreliable — and come with a price. 

Many companies deploying hybrid clouds need to regularly transfer, replicate, or sync files across storage buckets. Hybrid clouds require efficient, high-performance replication solutions that can quickly and efficiently sync data on-demand and in real-time, from on-prem to the cloud and across cloud regions as well.

Want to see how quickly, easily, and efficiently Resilio can replicate data across AWS and Azure for your organization? Schedule a demo with our team.

While AWS (Amazon Web Services) DataSync and Azure Data Factory are popular solutions for transferring data within and across their respective cloud platforms, they’re slow, unreliable, and limited in other critical features (e.g., they lack native security and management capabilities and provide limited types of synchronization). 

In addition, data transfer fees can add up quickly since traditional transfer solutions aren’t as efficient as possible when it comes to moving data across cloud regions.

Where these solutions fall short, our file synchronization software system — Resilio Connect — excels. Resilio is a superior solution for transferring and syncing data within or across AWS S3 and Azure Blobs because it provides:

  • High-performance, P2P synchronization: Resilio replicates data using a peer-to-peer replication architecture that enables it to sync data 3-10x faster than DataSync and Data Factory, sync in any direction, provide bulletproof reliability, and scale organically to support environments and files of any size.
  • WAN acceleration: Resilio utilizes a proprietary, UDP-based WAN acceleration protocol to quickly and efficiently transfer files across high-latency, loss-prone, and low-quality networks (such as WANs, low-grade consumer networks, and unreliable connections at the edge).
  • Efficient file access: Resilio can also be used as a file gateway that provides low-latency access to files stored in any cloud, block, or object storage (including S3 and Azure Blobs). You can sync and access data in a way that maximizes efficiency, minimizes data transfer costs, and enhances productivity across your organization.
  • Versatile deployment: You can install Resilio agents on just about any device, cloud storage platform, and operating system. This kind of flexibility means that you can deploy Resilio on your existing IT infrastructure with minimal operational interruption and begin replicating in as little as 2 hours.
  • Centralized management and automation: Resilio’s Management Console provides granular control over all of your cloud and on-premises endpoints from one centralized location. And you can use Resilio’s powerful automation and scripting capabilities to minimize management time and the need for human intervention.
  • Built-in security features: Unlike other replication solutions, Resilio includes built-in security features that protect your data at rest and in transit and eliminate the need to invest in 3rd-party security solutions.

In this article, we’ll discuss how to use Resilio Connect to replicate data across AWS and Azure storage accounts, as well as all of the capabilities that make Resilio the best option for cloud data sync and transfer. 

Organizations in gaming, media, retail, tech, and more use Resilio Connect to sync data across on-premises, cloud, and hybrid cloud environments. To learn more about how Resilio can provide your organization with fast, efficient, and automated data sync and access, schedule a demo with our team.

How to Transfer and Sync Data Across AWS and Azure with Resilio

Resilio Connect is an agent-based solution that works by installing agents on the cloud (or on-premises) endpoints you want to sync across.

After installing Resilio agents on your Azure and AWS S3 buckets, use the following steps to create a Distribution job for transferring data across cloud endpoints:

Step 1: Create a Job

  • Navigate to the “Jobs” tab in the Resilio Management Console
  • Select “Configure Jobs” 
  • Select “Create New Job”
  • Select “Distribution” for the job type to create a one-time or periodic data transfer. 

Note: If you want to create an ongoing sync job, follow the directions in this tutorial.

Resilio Connect: Jobs - Job Type (Synchronization)

Step 2: Specify Job Details

Fill in information such as Job Name and Description.

Resilio Connect: Jobs - Create New Job (Job Details)

Step 3: Select the Source Agent

Select the Source Agent. 

For example, if you’re transferring files from AWS to Azure, this will be your AWS agent. Be sure to select the right bucket name for the bucket you want to replicate from.

Create new job: Choose source agent

Step 4: Select Destination Agents

Select the agents that will be receiving the files.

Create new job: Choose destination groups

Step 5: Specify Path

Specify the source and destination share paths. 

This must be an already existing folder on the Source Agent. On both the Source and Destination Agents, you can use the default path macros or specify custom paths.

Edit Source Path: Path Folder Name

Step 6: Specify Triggers

You can add triggers to Distribution jobs that make it possible to run a command before the job starts or after it completes. 

By default, all commands are executed in the root of the specified destination folder.

Create new job: Triggers

Step 7: Create Job Schedule

You can configure a one-time replication job or schedule the job to occur periodically at a certain time. 

Click “Next >” when your desired schedule is configured.

Create new job: Job Scheduler

Step 8: Save the Job

Review your job summary to ensure everything is correct, and press “Save”.

Create new job: Review Summary - Details and Source Agent

High-Performance P2P Replication 

Traditional file replication solutions like AWS DataSync and Azure Data Factory sync data using point-to-point replication architectures. These architectures are deployed in one of two models:

  • Hub-and-spoke: This model consists of a hub server and several remote servers. The remote servers can’t share data directly with each other. Instead, they must first transfer files to the hub server, which then replicates the files to each remote server one by one.
  • Follow-the-sun: In this model, Server 1 syncs data with Server 2. Then Server 2 syncs with Server 3. Then Server 3 syncs with Server 4, and so forth.

Both of these models suffer from multiple weaknesses that make them poorly suited to many enterprise data replication scenarios, such as:

  • Slow synchronization: Since files are only synchronized between two servers at a time, synchronizing your entire environment can take a long time.
  • Limited sync types and directions: Most point-to-point solutions can only sync one-way or bi-directionally. And solutions like DataSync and Data Factory can only sync on a schedule, but can’t perform real-time sync.
  • Unreliability: Point-to-point solutions are unreliable and create single points of failure. Synchronization is limited by the slowest endpoint in your system (i.e., if one server is on a slow network, it can delay full sync for every other endpoint. And if your hub server goes down in the hub-and-spoke model, replication fails entirely).
  • Poor scalability: Since syncs can only occur between two servers at a time, point-to-point solutions scale poorly (i.e., it will take a long time to sync large files, large numbers of files, and/or many endpoints).

While Resilio can be deployed in a hub-and-spoke configuration, it syncs data with a P2P replication architecture that overcomes all of these limitations. In a P2P environment, every endpoint with a Resilio agent on it can share data directly with any other endpoint. And all endpoints can work together to sync data across your entire system simultaneously. 

This enables Resilio to provide high-performance synchronization that is:

Blazing Fast (3–10x Faster Than Point-to-Point)

Because of Resilio’s P2P architecture and a process known as file chunking, Resilio can sync objects at unparalleled speeds.

File chunking is the process of splitting a file up into multiple chunks that can transfer independently of each other. Every endpoint in your environment can then share file chunks with other endpoints concurrently.

For example, imagine you wanted to sync a file across five endpoints. Resilio could split that file into five chunks. Endpoint 1 can share the first chunk with Endpoint 2. Endpoint 2 can then share that first chunk with Endpoint 3, even before it receives the remaining chunks. Soon, every endpoint will be sharing file chunks at the same time, enabling Resilio to sync your environment 3–10x faster than most point-to-point solutions.

P2P vs Client-Server architecture GIF

Resilio turns your infrastructure into a distributed mesh network, effectively giving every endpoint the power of a data center. It leverages every endpoint in your environment in order to distribute network and CPU load across your system, reduce the load on your servers and internet channels, and improve data availability.

And you can use a process known as scale-out replication to cluster nodes together in order to pool network resources and increase speeds linearly — our engineers have tested and achieved speeds of 100+ Gbps per cluster (though there is no design limit on how fast it can go).

Operations Team: Resilio Connect Management Console - Object Storage: London <> Australia

Sync In Any Direction

Resilio can sync in any direction, such as:

  • One-way: You can sync data from one endpoint to another, such as for simple backup jobs.
  • Bidirectional: You can sync data between two endpoints, such as in distributed collaboration scenarios.
  • One-to-many: Distribute files from a central server out to many endpoints. For example, you can distribute software updates or distribute mission-critical files to a fleet of vehicles.
  • Many-to-one: Consolidate files from many endpoints to one. This can be used to backup multiple endpoints or to collect important datasets from a fleet of vehicles.
  • N-way: You can keep files synchronized across multiple endpoints simultaneously.

N-way sync is especially powerful for use cases such as remote work and disaster recovery.

In remote and distributed collaboration scenarios, you can use N-way sync to keep files synchronized across multiple geographically distributed offices, remote employees, and cloud storage buckets. Anytime an employee makes a change to a file, that change is automatically synchronized across every other endpoint.

With N-way sync, you can effectively turn every endpoint in your environment into a backup server for hot-site disaster recovery. In the event of a disaster, files can be recovered from any endpoint. 

Plus, every endpoint can work together to bring your system back online, enabling Resilio to achieve sub-five-second RPOs (Recovery Point Objectives) and RTOs (Recovery Time Objectives) within minutes of an outage.

Hot/Live DR: Multi-site Active/Active; Warm DR: Active/Active; Cold DR: Active/Passive; Offsite Copy: Backup Copy

Sync Data Reliably

Resilio’s P2P replication architecture eliminates single points of failure. If any endpoint in your environment goes down, the required files or services can be retrieved from any other endpoint in your system. And Resilio can dynamically route around outages and downed servers.

Further enhancing Resilio’s reliability are two key features:

  • Automatic retries: If a file transfer fails for any reason, Resilio automatically retries the transfer until it is complete.
  • Checkpoint restart: If a file transfer is interrupted, Resilio can perform a checksum restart to resume the transfer at the point of interruption.

Scale Organically to Support Environments of Any Size

In a point-to-point system, adding more endpoints increases the time it takes to reach full synchronization. 

But, since all endpoints in a P2P environment can work together, Resilio is organically scalable and can handle ever-increasing workloads. In other words, adding more endpoints only increases replication speed and resources (CPU, bandwidth, etc.).

For example, Resilio can sync 200 endpoints in roughly the same time it takes most point-to-point solutions to sync just two. And Resilio can also sync objects of any size, type, and number (our engineers successfully synchronized 450+ million files in a single job).

Sync Types

DataSync and Data Factory can only perform scheduled file syncs. Resilio can sync manually, on a fixed schedule, and in real-time.

Resilio uses optimized checksum calculations (identification markers that change when a change is made to a file) and notification events from the host OS to immediately detect and replicate file changes (and it replicates only the changed portions of files).

Fast, Efficient Transfers Across Any Network with WAN Acceleration

If you’re syncing data across AWS regions, Azure regions, or cloud storage platforms, you’ll likely need to transfer data over high-latency, loss-prone WANs.

Most file sync solutions provide little or no features for optimizing WAN transfers. DataSync, for example, provides a few WAN optimization features, such as incremental transfers, sparse file detection, and in-line compression. While these features can speed WAN transfers, they can’t properly maximize bandwidth utilization and ensure fast, predictable transfers across any WAN.

Resilio Connect utilizes a highly-resilient, UDP-based WAN transport protocol known as Zero Gravity Transport™ (ZGT). 

ZGT intelligently analyzes the underlying conditions of a network (such as latency, loss, and throughput over time) and automatically adjusts to maintain a consistent speed, maximize network utilization, and adapt to conditions in real-time. This allows ZGT to achieve 100x greater WAN transfer speeds than existing solutions. 

In fact, our engineers successfully transferred a 1 TB payload across Azure storage regions in 90 seconds.

Resilio Connect vs Other WAN Optimizers

ZGT optimizes bandwidth utilization and consumption by using:

  • A congestion control algorithm: ZGT’s congestion control algorithm constantly probes the round-trip time (RTT) of a network to identify and maintain the ideal data packet send rate and maintain a uniform packet distribution over time.
  • Interval acknowledgments: ZGT sends acknowledgments for groups of packets to eliminate replication delays.
  • Delayed retransmission: ZGT retransmits lost packets once per RTT to reduce unnecessary retransmissions.

Because of ZGT, Resilio can utilize any type of network connection, such as VSAT, cell (3G, 4G, 5G), WiFi, and any IP connection. And it can reliably and predictably sync data over intermittent, unreliable connections, such as at sea or in communities with underdeveloped network infrastructure — making Resilio a great solution for maritime file transfer and IoT data ingestion.

Efficient File Access with Resilio’s S3 File Gateway

You can also use Resilio Connect as an efficient file storage gateway for files stored in any cloud or on-premises file, block, or object storage. This means that Resilio enables you to sync and access files across all of your cloud and on-premises endpoints from one location.

When calculating storage costs for AWS and/or Azure, many organizations only consider cloud storage prices and forget to account for data egress costs. Moving data into both services is free. But both charge fees whenever you move data within their platforms (i.e., across cloud regions) as well as out over the internet.

Resilio’s file gateway is designed to be incredibly efficient in order to help organizations minimize cloud storage costs and maximize productivity. It accomplishes this through features such as:

  • Selective caching: Resilio enables you to choose which files you want to cache, so you can store frequently accessed files locally and keep infrequently accessed files in long-term cloud storage.  Doing so enables you to provide employees with faster access to mission-critical files while also reducing data transfer costs.
  • Policy-based automation: You can create policies that govern how files are synced, cached, downloaded, and purged. This enables you to automate these processes, which frees up employees and IT professionals to spend their time on more important tasks.
  • Partial downloads: Employees can download entire files/folders or download just the portions of files they need. This provides them with faster access to the data they need while also minimizing data transfer costs.
  • Unified interface: End-users can browse and access files (via SMB or NFS) through a unified file system interface that operates much like Microsoft OneDrive — ensuring everyone has a uniform view of files.
How to select the "Always keep on this device" option.

Versatile Deployment 

As we’ve discussed, some organizations may want to keep data stored in both Azure and AWS services for extra redundancy or some other workflow-specific reason — i.e., access to features specific to either storage platform. If that’s the case (or if you want to also store data in any other cloud storage), vendor-specific solutions like DataSync and Data Factory aren’t a good option.

Vendor-specific solutions are designed to be used within their respective cloud and have limited/no functionality in other cloud platforms. For example, DataSync can only be used to sync objects across AWS and on-premises storage. It also has some limited communication with Google Cloud Storage and Azure Blob Storage. So if you want to store data in both AWS and Azure (or in another cloud), you’ll need to invest in separate solutions for managing and syncing data across each — which increases the costs and complexity of managing your data environment.

The complexity of storing and managing your data in either cloud is already high, as you may be using some combination of additional AWS and Azure services — such as Amazon EC2, EFS, Azure VMs, Azure Data Lake Storage, and so on. Resilio helps reduce that complexity by enabling you to sync and manage data stored across your entire environment from one location.

Resilio Connect is an extremely versatile solution that supports any:

  • Cloud provider: Resilio works with just about any cloud storage solution, such as AWS S3, Azure Blobs, Google Cloud Platform, MinIO, Wasabi, Backblaze, and more.
  • Device: You can install Resilio agents on file servers, NAS/DAS/SAN devices, laptops, desktops, mobile devices (Resilio offers iOS and Android apps), IoT devices, and virtual machines.
  • Operating system: You can use Resilio with Windows, MacOS, Linux, Unix, Ubuntu, FreeBSD, OpenBSD, and more.
Resilio Connect works with any S3-compatible cloud storage provider, such as AWS, Google Cloud Platform, Microsoft Azure, Wasabi, MinIO, Oracle, and more.

Because of Resilio’s flexibility, you can easily install it on your existing IT infrastructure with minimal operational interruption and begin replicating in as little as two hours.

Centralized Management, Granular Control, and Easy Automation

Resilio enables you to manage replication across your entire data environment — on-premises and cloud — from one centralized location.

Resilio’s Management Console provides granular control over replication jobs and each endpoint. You can:

  • Manage files stored in any cloud
  • Create, control, and monitor replication jobs
  • Collect real-time performance metrics
  • Manage and monitor Resilio agents and job functions
  • Configure notifications to be delivered to email or Webhooks
  • Deploy instructions across private, public, or hybrid cloud storage
  • Adjust replication parameters, such as disk I/O, buffer size, hashing, and more

Resilio also provides granular control over bandwidth allocation. You can manually adjust bandwidth at each endpoint or create profiles that govern how much bandwidth is allocated to each endpoint at certain times of the day and on certain days of the week.

Edit bandwidth schedule 'default'

With Resilio’s powerful REST API, you can automate replication jobs and script any type of functionality a job requires. Resilio provides three scripting triggers:

  • Before a job starts
  • After a job completes
  • After all jobs are complete

Bulletproof Native Data Security Features

Many file sync solutions don’t include native security features. This forces you to invest in 3rd-party security solutions and VPNs.

Resilio protects your data with built-in security features that were reviewed by 3rd-party security experts (they have TPN Blue and SOC2 certificates), such as:

  • End-to-end encryption: Resilio encrypts data at rest and in transit using AES 256-bit encryption.
  • Mutual authentication: Resilio requires each endpoint to provide an authentication key before receiving any files, ensuring your data is only delivered to approved endpoints.
  • Cryptographic integrity validation: Resilio validates files using cryptographic integrity validation to ensure files arrive at their destination intact and uncorrupted.
  • Forward secrecy: Resilio encrypts sessions with one-time session encryption keys.
  • Access controls: You can control permissions for who’s allowed to access specific files and folders.
Mutual Authentication: Data is only delivered to designated endpoints; In-Transit Encryption: Data can't be intercepted or hacked; Integrity Validation Process: Ensures data remains intact

Use Resilio to Sync Data Across Clouds and Cloud Regions

Resilio is the best solution for hybrid and multi-cloud file synchronization because it provides:

  • High-performance replication: Resilio’s P2P architecture enables it to sync data 3-10x faster than competing solutions, scale organically to support big data transfers and large environments, sync in any direction, and reliably deliver files to their destination. It can also perform manual, scheduled, and real-time syncs.
  • Network optimization: Resilio’s proprietary WAN acceleration protocol optimizes file transfers over any network, including high-latency, loss-prone WANs. It provides fast, predictable file transfers over any type of connection.
  • Efficient file access: Resilio can be used as an object storage gateway that provides low-latency access to files stored in any block, file, or object storage service. It’s designed to enhance efficiency, minimize cloud storage costs, and maximize productivity.
  • Versatile deployment: Resilio is an incredibly versatile solution that supports just about any type of device, cloud storage platform, and operating system. You can easily deploy Resilio on your existing IT infrastructure with minimal operational interruption.
  • Centralized management and automation: You can manage your entire hybrid or multi-cloud data environment from Resilio’s Management Console. It provides granular control over replication jobs, bandwidth allocation, and each endpoint. And you can automate replication jobs with Resilio’s REST API.
  • Native security: Resilio includes native security features that protect your data at rest and in transit.

Organizations in gaming, media, retail, tech, and more use Resilio Connect to sync data across on-premises, cloud, and hybrid cloud environments. To learn more about how Resilio can provide your organization with fast, efficient, and automated data sync and access, schedule a demo with our team.

Overview

Explore how Resilio lets you transfer, sync, and access data across Amazon S3 and Azure quickly, securely, and efficiently.

Related Posts

Schedule Demo

Step 1: fill in your details

On the next step you will be able to choose date and time of the demo session

Additional Resources

Resilio Connect for Server Sync

Related Posts