Start synchronizing faster to multiple locations, handling large files and folders with ease, and overcoming WAN synchronization limitations
Rsync is one of the most well-known tools for synchronizing and sending data over the network. It was the very first tool that combined file synchronization with delta encoding and compression, with three basic technologies required for optimal data transfer between machines, commonly used to:
- Keep folders in sync between two offices. The DevOps and IT departments face this challenge when people in several offices work on same documents, and they need to be synchronized between offices. The offices can be located anywhere in the world.
- Distribute files from one office to another or several offices. Usually, DevOps face this problem when they need to deliver files from one location to another. A good example of this use case is delivering builds, videos or other files from the main office to regional offices.
- Consolidate data from one or several offices. This usually represents a need to backup or consolidate data from several offices to a central location.
While the initial combination of technologies made rsync useful and fast, in today’s world it lacks key components for delivering fast synchronization speeds, leading users to look for rsync alternatives.
Problems with Rsync
Working over long WAN connections
Usually, you see this when you need to send data across the ocean or to the offices over mobile or satellite connection. These types of connections have a very long retransmission time and can have packet losses. The TCP/IP protocol, on top of which rsync is based, treats every packet loss as a network congestion and backs off speed in order to reduce the load of the connection. This approach helps applications that are TCP/IP based share networks and collectively agree on the maximum speed they can use for data transfer. In case of wide-area networks (WAN), the packet loss might represent some failure on the intermediate device and the channel is often not congested. Therefore, the logic of reducing speed in case of packet loss is not appropriate for WAN connectivity.
TCP/IP guaranteed data delivery
First, it needs a recipient to acknowledge that packet arrived at the destination. Once a recipient gets a packet, it sends a confirmation packet to the sender acknowledging that it has received the specific packet. The time during which the packet travels from sender to receiver is called retransmission time (RTT). In the local network (LAN) it is below 0.01ms, but in WAN networks it can be as high as 800ms or more. Therefore, a receiver can wait up to a second or even more before it’s able to send another packet. These TCP/IP deficiencies are inherited by rsync. Overcoming these bottlenecks requires specific hardware or software to overcome these bottlenecks.
Transferring files to more than one destination
It is extremely rare that an organization needs to send or synchronize files just to one location or server. Usually, it is more than one destination server, in more than one location. It such a case, a common approach is to execute jobs one by one. You send files to one location, then to another. The time adds up pretty fast and what was fast for one to one transfer, can make rsync very slow.
Tens of millions of files and real-time change detection
Rsync is not optimized for a large number of files. It is usually very slow when you need to synchronize folders with few millions of files since it takes forever to scan thru this folder, find changes and transfer them. A better approach would be to use real-time file system monitoring to pick up changes on the fly without the need to browse thru the whole directory tree.
Rsync and Dynamic IPs
Rsync needs static IPs to establish a connection. If a machine has a new IP, rsync stops operations and needs human intervention.
Rsync and remote script execution
Rync can be wrapped by a script to perform additional operations after a file is delivered or folders are synchronized. However, it becomes tricky in case of more than one destination and a need to synchronize script execution on all destinations (e.g. a software patch that should only be done if all machines have it). If you add-in a mix of different OS (Linux, Windows, OSX) it becomes even more complex to develop cross system synchronization of events.
Enough about problems, let’s talk solutions, Resilio Connect alleviates many of the limitations organizations running rsync encounter. It’s peer-to-peer synchronization, rebuilt for the enterprise, and offers significant performance improvements over rsync in all sorts of scenarios. Let’s cover these in more detail.
How to make rsync faster?
It’s hard. Performance is limited because its basic set of technologies is limited to delta encoding and compression.
You would need to replace rsync to get faster performance. Our solution, Resilio Connect, adds peer-to-peer data transfer, WAN optimization, smart routing and real-time file system monitoring in order to speed up syncing for today’s enterprise.
Rsync & large files?
It is possible, but slow. Rsync doesn’t have an optimized way of calculating checksum of the file, which leads to an extremely long time to calculate file differences.
Resilio Connect optimizes the checksum calculations so that it can sync very fast for files of any size.
Rsync & large folders?
That would be extremely challenging. Rsync scans the folder file by file and it could take hours or days before rsync discovers the changed file and transfers it to the destination.
Connect uses real-time notifications events from the OS to detect changed files. This guarantees that the changed file will be delivered to its destination much faster than with rsync, and holds true for any folder size.
Rsync & end-to-end encryption
It is insecure to use rsync without additional encryption. The lack of embedded traffic encryption requires people to install and configure additional encryption channels such as SSH or VPN.
Any good rsync replacement should include encryption in the product, so no additional products are required. The Connect uses AES128 in CTR mode to encrypt all the traffic that is sent between clients. This includes both data and all the control traffic.
Rsync & static IP addresses
Dynamic network environment presents a challenge to rsync. Rsync requires static IP and port addresses for a destination. The static IP addresses expose another problem with rsync, since as soon as IP address of the server is changed, the rsync will fail to operate.
Resilio Connect uses a dynamic routing approach. When a rule specifies that machine A and B need to exchange data, both machines use tracker or multicast to discover IP:Port addresses on the fly. No human intervention necessary.
Rsync & long WAN connection
Rsync over long or high latency connection will fail to utilize the available bandwidth. The long distance between offices makes TCP packet travel time long and increases chances of packet loss due to equipment failure or congestion. TCP will slow down the speed significantly for these types of networks. Rsync is based on TCP protocol, therefore the rsync speed will be slow.
Any good rsync alternative should have a WAN optimization built it, which Resilio Connect does. With Connect you can utilize 100% of the available bandwidth in your network independent of distance, latency, or loss. To achieve that Connect uses UDP based protocol, called uTP2, that uses bulk packet transfer with selective acknowledgment of lost packets.
Rsync & multiple destinations
Rsync to multiple destinations is very inefficient. One option is to run multiple rsync instances. This will split the network channel and increase the time to complete any single destination. Another approach is to transfer the file to one destination and then to another one. This way the file will use the full bandwidth but the second destination needs to wait until transfer for the first one will finish. Both of these solutions are slow and fragile.
A good rsync alternative will use a peer-to-peer approach that leverages networking between all offices and significantly speeds up data transfer. This approach splits each file into blocks and sends these blocks independently. Each recipient can send the block to other recipients once received. This dramatically speeds up syncing operation, since not only we are transferring concurrently to several machines, but also using other network channels to offload load from a sender network channel. This is the approach Resilio Connect takes.
Rsync & NAT
Rsync don’t mix well with NATs. You will need to forward ports for rsync to be able to connect devices behind a NAT.
Connect uses NAT traversal techniques that establishes a direct connection between computers without needing manual configuration.
Here is a handy summary table of the features needed in a synchronization solution today, and how Resilio Connect stacks up against rsync.
|Dynamic IP support||+||–|
|Real-time file sync||+||–|
To see how much faster your syncing could be with Resilio Connect fill out the schedule a demo form below. We’ll help you calculate your time savings over the LAN and WAN combined.