What is rsync?
Before understanding rsync alternatives, it’s essential to understand its purpose and application first. Rsync is one of the most popular tools for synchronizing and sending data over the network. Rsync was the very first tool that combined file synchronization with delta encoding and compression, with three underlying technologies required for optimal data transfer between machines, commonly used to:
- Rsync can keep folders in sync between two offices. The DevOps and IT departments face this challenge when people in several offices work on the same documents, and they need to be synchronized between offices. The offices can be located anywhere in the world.
- Distribute files from one office to another or several offices with rsync. Usually, DevOps face this problem when they need to deliver files from one location to another. A good example of this use case is delivering builds, videos, or other files from the main office to regional offices.
- Consolidate data from one or several offices using rsync. This usually represents a need to backup or centralize data from several offices to a central location.
While the initial combination of technologies made rsync useful and fast, in today’s world, rsync’s limitations arise where it lacks key components for delivering fast synchronization speeds, leading users to look for rsync alternatives.
Rsync and WAN connections
Usually, you see rsync over WAN when you need to send data across the ocean or to the offices over mobile or satellite connection. These types of Rsync over WAN connections have a very long retransmission time and can have packet losses. Rsync uses TCP/IP as its transport mechanism. TCP/IP treats every packet loss as network congestion and backs off rsync speed in order to reduce the load on the connection. This approach helps applications that are TCP/IP based share networks and collectively agree on the maximum speed they can use for data transfer. In the case of wide-area networks (WAN), the packet loss might represent some failure on the intermediate device, and the channel is often not congested. Therefore, the logic of rsync’s slow transfer speed in case of packet loss is not appropriate for WAN connectivity.
TCP/IP guaranteed data delivery
First, it needs a recipient to acknowledge that the packet arrived at the destination. Once a recipient gets a packet, it sends a confirmation packet to the sender acknowledging that it has received the specific packet. The time during which the packet travels from sender to receiver is called retransmission time (RTT). Over the local network (LAN) it is typically below 0.01ms, but over the WAN it can be as high as 800ms or even more. Therefore, a sender sometimes waits up to a second or even more before it’s able to send another packet. These TCP/IP deficiencies are inherited by rsync. Overcoming these bottlenecks requires specific hardware or software, like Resilio Connect.
Quickly transfer files to more than one destination
It is extremely rare that an organization needs to send or synchronize files just to one location or server. Usually, it is more than one destination server in more than one location. It such a case, a common approach is to execute jobs one by one. You send files to one location, then to another. The time adds up pretty fast. What was reasonably quick for one to one transfer becomes very slow when it has to be repeated many times.
Real-time file change detection across millions of files
Rsync is not optimized for real-time file change detection with a large number of files. It is usually very slow when you need to synchronize folders with millions of files since it takes forever to scan such folders, find changes, and transfer them. A better approach would be to use real-time file system monitoring to pick up changes on the fly as Resilio Connect does.
Rsync and dynamic IPs
Rsync needs static IPs to establish a connection. If a machine has a new IP, rsync stops operations and needs human intervention to resume file transfers.
Rsync and remote script execution
Rync can be wrapped by a script to perform additional operations after a file is delivered or folders are synchronized. However, it becomes tricky in case of more than one destination and a need to synchronize script execution on all destinations (e.g., a software patch that should only be done if all machines have it). If you add-in a mix of different operating systems (Linux, Windows, Mac OS) it becomes even more complex to develop cross-platform synchronization of events.
Resilio Connect: An Rsync Alternative
Enough about problems, let’s talk solutions, Resilio Connect alleviates many of the limitations organizations running rsync encounter. Using peer-to-peer synchronization, rebuilt for the enterprise, Resilio Connect offers significant performance improvements over rsync in all sorts of scenarios. Let’s cover these in more detail.
How to make rsync faster?
It’s hard. Performance is limited because its basic set of technologies is limited to delta encoding and compression.
To get faster file transfer speeds with rsync, you would need to use a replacement. Our solution, Resilio Connect, adds peer-to-peer data transfer, WAN optimization, smart routing, and real-time file system monitoring to speed up syncing for today’s enterprise.
Rsync & large file synchronization
It is possible but slow. Rsync doesn’t have an optimized way of calculating the checksum of files. This leads to an extremely long time to calculate file differences across large file sets.
Resilio Connect optimizes the checksum calculations so that it can sync faster than rsync, with files of any size.
Rsync & transferring folders
Rsync is designed to scan each folder file by file. This means it could take hours or days before rsync discovers the changed file and transfers it to the destination.
Connect uses real-time notifications events from the OS to detect changed files. This guarantees that the changed file will be delivered to its destination much faster than with rsync, and holds true for any folder size.
Rsync & end-to-end encryption
Rsync lacks end-to-end encryption, making it insecure to use rsync without additional encryption. The lack of embedded traffic encryption requires people to install and configure additional encryption channels such as SSH or VPN.
Any good rsync alternative should have built-in end-to-end encryption of data in transit. Resilio Connect uses AES256 in CTR mode to encrypt all the traffic sent between clients.
Rsync & static IP addresses
A major limitation of Rsync is that dynamic network environments present a challenge to rsync. Rsync requires static IP and port addresses for both source and destination machines. As soon as an IP address changes, rsync fails.
Resilio Connect uses a dynamic routing approach. When a rule specifies that two machines need to exchange data, both machines use a tracker or multicast to discover the addresses of each other on the fly. No human intervention is necessary.
Rsync & long WAN connection
Rsync will fail to utilize the available bandwidth over long, high latency, or lossy connections, which leads to slow rsync transfer speeds. The long-distance between offices makes TCP packet travel time long (high latency) and increases the chances of packet loss due to equipment failure or congestion. TCP will slow down the speed significantly for these types of networks.
Resilio Connect has a built-in WAN optimization module. With Connect you can utilize 100% of the available bandwidth in your network independent of distance, latency, or loss. Resilio Connect uses a unique, UDP based protocol, called uTP2, that uses bulk packet transfer with selective acknowledgment of lost packets. You can read more about it here.
Rsync & multiple destinations
Using Rsync to synchronize to multiple destinations is very inefficient. One option is to run multiple rsync instances. This will split the network channel and increase the time to complete any single destination. Another approach is to transfer the file to one destination and then to another one. This way, the file transfer will use the full bandwidth, but the second destination needs to wait until transfer to the first one is done. Both of these solutions are slow and fragile. Both solutions also leave most of your network underutilized.
Resilio Connect uses a peer-to-peer approach that leverages networking between all offices/servers and significantly speeds up data transfer. This optimized approach splits each file into blocks and sends these blocks independently. Each recipient can send the block to other recipients once received. This dramatically speeds up syncing operation since not only Resilio transfers concurrently to several machines, it also makes efficient use of all the network capacity that is otherwise left unused.
Rsync & NAT
Rsync doesn’t mix well with NATs. You will need to forward ports for rsync to be able to connect to devices behind a NAT.
Unlike rsync, Resilio Connect uses NAT traversal techniques that establish a direct connection between computers without needing manual configuration.
Resilio Connect vs. Rsync
Here is a handy summary table of the features needed in a synchronization solution today, and how Resilio Connect stacks up as an rsync alternative.
|Dynamic IP support||+||–|
|Real-time file sync||+||–|
Are you interested in learning more to see if Resilio Connect is the Rsync replacement or alternative you’ve been looking for?