Rsync Alternative: Resilio Connect

Rsync Overview

Rsync is still a popular tool for synchronizing smaller data sets in basic scenarios for uni- and bi-directional file sync.  When Andrew “Tridge” Tridgell developed rsync for Linux back in the 90’s, file sizes and file systems were relatively small, counted in gigabytes (not petabytes) and no more than a few thousand files per file system.

For smaller data sets across relatively low-latency networks, rsync provides an efficient unidirectional approach. Rsync relies on scanning the file system and reading all files into memory to acquire information on file changes or deltas. In rsync vernacular, this file change information is referred to as “blocks” (not to be confused with block changes). Rsync then stores this information about each file in memory on the source and target systems. Rsync then communicates over TCP to compare local file chunks on the source system with remote file chunks on the target to make decisions on which files to replicate.

Rsync’s major limitation is the time it takes to scan a source and target file system for changes and to compare and synchronize those changes over networks of varying conditions.  Through delta encoding and compression algorithms, rsync offers some level of optimization.  Yet, without the ability to capture incremental file changes in real-time, rsync is not a practical solution for larger file systems containing millions of files; nor is rsync well suited to more complex synchronization scenarios requiring multi-directional synchronization over WANs.

Depending on factors such as file system size and network conditions, rsync may be useful in scenarios such as:

  1. Sync files and folders between two (2) offices. The offices can be located anywhere in the world as long as the file systems are small and network conditions good (low latency and minimal packet loss).
  2. Distribute files from one office to another or several offices. DevOps, for example, faces this problem delivering builds, videos, or other files from the main office to regional offices.
  3. Consolidate (ingest) data from one or several offices using rsync to a single office.

Why Resilio Connect instead of Rsync?

Resilio Connect is a superior alternative to rsync when enterprise customers require:

  • Real-time synchronization of files anywhere in the world, where updates are efficiently captured and propagated in (near) real-time.
  • Scalability to support large capacity file systems (measured in TBs and PBs) containing many files (sometimes measured in millions) of varying file sizes (small to very large)
  • Flexible N-way synchronization (uni- , bi- , multi-directional, or full mesh)
  • Moving data over unreliable networks such as cell, sat, or WANs: Resilio Connect is WAN-optimized and versatile for use over any network, with built-in compression, delta detection, and efficient recovery from failures to minimize data transfer.
  • Centralized management: Resilio Connect enables all jobs to be centrally managed and easily configured for Distribution, Sync, Consolidation, and Scripting.
  • Automation: all data movement jobs can be automated, scheduled, scripted, or integrated into workflows through a complete REST API.
  • Multi-cloud-ready with your cloud services storage vendor of choice–for on-prem, hybrid, or cloud native deployments.

Other Rsync Limitations

In today’s massively big-and-bulky data world, rsync’s architecture poses a number of challenges to data-intensive global enterprises.

Some of these challenges include:

No real-time file change detection

As stated earlier, rsync is not optimized for real-time file change detection with a large number of files. It is usually very slow synchronizing folders with millions of files.  Rsync’s architecture is limited by the time it takes to scan large folders, find changes, and transfer those changes.  As the complexity and size of the directory (dir) structure increase, rsync’s replication ability of changes decreases.  As stated earlier, Resilio Connect offers an alternative approach based on real-time file system monitoring to efficiently detect and replicate changes on-the-fly in real-time.

Poor scalability

Rsync is notoriously slow synchronizing folders with large numbers of files.  As file system sizes grow into the millions of files, it may become impractical to use rsync.  Rsync’s open source architecture is limited by the time it takes to scan a folder or directory, find changes, and transfer those changes.

Rsync and WAN connections

Rsync is slow when used over WANs and unreliable networks (cell, vsat, et al) with long retransmission times and varying degrees of packet loss.  Rsync uses TCP/IP as its transport mechanism.  TCP/IP treats every packet loss or acknowledgement delay as network congestion and backs off rsync speed in order to reduce the load on the connection. This approach helps applications that are TCP/IP-based share networks and collectively agree on the maximum speed they can use for data transfer. In the case of wide-area networks (WAN), a delay or a packet loss doesn’t mean the network is congested. Therefore, the logic of rsync (and TCP/IP) is not appropriate for WAN connectivity.

Quickly transfer files to more than one destination

It is rare these days for an organization to only send or copy files to just one location or server. Usually, most companies require synchronizing across multiple locations or servers. Thus, a common approach with rsync and FTP is to “follow the sun”, executing jobs individually; once the previous job completes, a second job is started, and so forth.  What was reasonably quick for one-to-one transfers becomes very slow when it has to be repeated many times, usually via command line, serialized in sequence.

Rsync and dynamic IPs

Rsync needs static IPs to establish a connection. If a machine has a new IP, rsync stops operations and needs human intervention to resume file transfers.

Rsync and remote script execution

Rync can be wrapped by a script to perform additional operations after a file is delivered or folders are synchronized. However, it becomes tricky in case of more than one destination and a need to synchronize script execution on all destinations (e.g., a software patch that should only be done if all machines have it). If you add-in a mix of different operating systems (Linux, Unix, Windows, MacOS) it becomes even more complex to develop cross-platform synchronization of events.

Resilio Connect: An Rsync Alternative

Through real-time data synchronization and other key functionality, Resilio Connect scales-out data movement in parallel over any network, efficiently scaling transfer performance up to 20x faster than rsync. Resilio enables true multidirectional (n-way) data movement to overcome transfer bottlenecks–over any distance and location.

Architecturally, Resilio Connect is an agent-based solution.  Resilio Agents are installed on all devices participating in data movement jobs.  Job types include Distribution, Consolidation, Scripting, and Synchronization. Resilio Connect agents support popular operating systems such as Windows, MacOS, Linux, FreeBSD, and Android.  Connect also supports popular virtualization platforms, servers, storage, NAS devices, networks, and cloud storage services providers.

The Resilio Connect Management Console is a centralized, web-based management system used to manage and monitor all job functions through an easy-to-administer graphical user interface. Optionally, Resilio offers a complete API set to expose and automate all functions performed by the Management Console.  You can install and configure the Management Console on Windows and Linux servers.

Rsync Optimization

How to make rsync faster?

It’s hard. Performance is limited due to rsync’s basic set of technologies. Performance optimization is limited to delta encoding and compression.

To get faster file transfer speeds with rsync, you would need to use a replacement. Resilio Connect adds peer-to-peer scalable data transfer, WAN optimization, and real-time file system monitoring to speed up syncing for today’s enterprise.

Rsync & large file synchronization

It is possible but slow. Rsync doesn’t have an optimized way of calculating the checksum of files. This leads to an extremely long time to calculate file differences across large file sets.  It is also not very good at recovering from connection failures and sometimes a transmission of a large amount of data will start over.

Resilio Connect optimizes the checksum calculations so that it can sync faster than rsync, with files of any size.  It also moves files in small chunks and minimizes re-transmission in case of a failure.

Rsync & transferring folders

Rsync is a file synchronization tool and it’s designed to scan each folder file by file. This means it could take hours or days before rsync discovers the changed file and transfers it to the destination.

Resilio Connect uses real-time notification events from the host OS to detect changed files. This guarantees that the changed file will be delivered to its destination much faster than with rsync, and holds true for any folder size.

Rsync & end-to-end encryption

Rsync lacks end-to-end encryption, making it insecure to use rsync without additional encryption. The lack of traffic encryption requires the installation of an additional encryption solution such as SSH or VPN.

Any good rsync alternative should have built-in end-to-end encryption of data in transit. Resilio Connect uses AES256 in CTR mode to encrypt all the traffic sent between endpoints.

Rsync & static IP addresses

Dynamic network environments present a challenge to rsync. Rsync requires static IP and port addresses for both source and destination machines. As soon as an IP address changes, rsync fails.

Resilio Connect uses a dynamic routing approach. When a rule specifies that two machines need to exchange data, both machines use a tracker or multicast to discover the addresses of each other on the fly. No human intervention is necessary when a new IP is assigned.

Rsync & long-haul WAN connections

Rsync fails to utilize the available bandwidth over long, high latency, or lossy connections, which leads to slow rsync transfer speeds. The long-distance between offices makes TCP packet travel time long (high latency) and increases the chances of packet loss due to equipment failure or congestion. TCP will slow down the speed significantly for these types of networks.

Resilio Connect has a built-in, pre-configured WAN optimization module. With Connect you can utilize 100% of the available bandwidth in your network independent of distance, latency, or loss. Resilio Connect uses a unique, UDP based protocol, called uTP3, that uses bulk packet transfer with selective acknowledgment of lost packets.  You can read more about uTP here.

Rsync & multiple destinations

Using Rsync free file sync synchronization to multiple destinations is very inefficient. One option is to run multiple rsync instances. This will split the network channel and increase the time to complete transfers to any single destination. Another approach is “follow the sun”, where files are transferred first to one destination and then to a second destination, once the first transfer completes. This way, the file transfer will use the full bandwidth, but the second destination needs to wait until transfer to the first one is completed. Both of these solutions are slow and fragile.  Both solutions also leave most of your network underutilized.

Resilio Connect uses a scale-out, peer-to-peer approach that leverages networking between all offices/servers and significantly speeds up data transfers. This optimized approach splits each file into blocks and sends these blocks independently. Each recipient can send the block to other recipients once received. This dramatically speeds up syncing operations: Resilio transfers concurrently to N-number of destinations. Resilio also makes efficient use of all the available network capacity that may otherwise be left unused.

Rsync & NAT

Unfortunately, rsync doesn’t mix well with NATs. You will need to forward ports for rsync to be able to connect to devices behind a NAT.

Unlike rsync, Resilio Connect uses NAT traversal techniques that establish a direct connection between computers without needing manual configuration.

Resilio also provides a Resilio Connect Proxy Server and other enhancements in release 2.12 and later.

Resilio Connect vs. Rsync

Here is a handy summary table of the features needed in a synchronization solution today, and how Resilio Connect stacks up as an rsync alternative.

Resilio Connect vs. Rsync

Here is a handy summary table of the features needed in a synchronization solution today, and how Resilio Connect stacks up as an rsync alternative.

Resilio Connect Rsync
Delta encoding + +
Compression + +
Dynamic IP support +
Encryption +
WAN optimization +
NAT traversal +
Cross-platform + +
1M+ files +
Big folders +
Real-time file sync +

Are you interested in learning more to see if Resilio Connect is the Rsync replacement or alternative you’ve been looking for?

Or, would you prefer to schedule a Resilio Connect demo or start a free trial to see how much faster your syncing could be.