Start synchronizing faster to multiple locations, handling large files and folders with ease, and overcoming WAN synchronization limitations
Rsync is one of the most well-known tools for synchronizing and sending data over the internet. It was the very first tool that combined file synchronization with delta encoding and compression, with three basic technologies required for optimal data transfer between machines, commonly used to:
- Keep folders in sync between two offices. The DevOps and IT departments face this challenge when people in several offices work on same documents, and they need to be synchronized between offices. The offices can be located anywhere in the world.
- Distribute files from one office to another or several offices. Usually, DevOps face this problem when they need to deliver files from one location to another. A good example of this use case is delivering builds, videos or other files from the main office to regional offices.
- Consolidate data from one or several offices. This usually represents a need to backup or consolidate data from several offices to a central location.
While the initial combination of technologies made rsync useful and fast, in today’s world it lacks key components for delivering fast synchronization speeds, leading users to look for rsync alternatives.
Typical problems users run into with rsync:
- Working over long WAN, connections. Usually, you see this when you need to send data across the ocean or to the offices over mobile or satellite connection. These types of connections have a very long retransmission time and can have packet losses. The TCP/IP protocol, on top of which rsync is based, treats every packet loss as a network congestion and backs off speed in order to reduce the load of the connection. This approach helps applications that are TCP/IP based share networks and collectively agree on the maximum speed they can use for data transfer. In case of wide-area networks (WAN), the packet loss might represent some failure on the intermediate device and the channel is often not congested. Therefore, the logic of reducing speed in case of packet loss is not appropriate for WAN connectivity.
- Another problem in the WAN relates to the way TCP/IP guarantees data delivery. First, it needs a recipient to acknowledge that packet arrived at the destination. Once a recipient gets a packet, it sends a confirmation packet to the sender acknowledging that it has received the specific packet. The time during which the packet travels from sender to receiver is called retransmission time (RTT). In the local network (LAN) it is below 0.01ms, but in WAN networks it can be as high as 800ms or more. Therefore, a receiver can wait up to a second or even more before it’s able to send another packet. These TCP/IP deficiencies are inherited by rsync. Overcoming these bottlenecks requires specific hardware or software to overcome these bottlenecks.
- Synchronizing or delivering files to more than one destination. It is extremely rare that an organization needs to send or synchronize files just to one location or server. Usually, it is more than one destination server, in more than one location. It such a case, a common approach is to execute jobs one by one. You send files to one location, then to another. The time adds up pretty fast and what was fast for one to one transfer, can make rsync very slow.
- Tens of millions of files and real-time change detection. Rsync is not optimized for a large number of files. It is usually very slow when you need to synchronize folders with few millions of files since it takes forever to scan thru this folder, find changes and transfer them. A better approach would be to use real-time file system monitoring to pick up changes on the fly without the need to browse thru the whole directory tree.
- Smart data routing. Rsync needs static ip:ports to establish a connection to different machines. If a machine has a new IP:port or is not available, rsync stops operation and needs a human to re-configure.
- Scripting. Rync can be wrapped by a more extensive script to perform additional operations while the file is delivered on two folders is synchronized. However, it becomes tricky in case of more than one destination. We need to detect that all destinations received files, there are no failed destinations and then trigger events on the remote machine to complete the operation. If you add in a mix of different OS such as Linux, Windows, Mac it becomes even more complex to develop cross system synchronization of events.
Enough about problems, let’s talk solutions, Resilio Connect alleviates many of the limitations organizations running rsync encounter. It’s peer-to-peer synchronization, rebuilt for the enterprise, and offers significant performance improvements over rsync in all sorts of scenarios. Let’s cover these in more detail.
How to make rsync faster
It’s impossible to make rsync faster. Performance is limited because its basic set of technologies is limited to delta encoding and compression.
You would need to replace rsync to get faster performance. Our solution, Resilio Connect, adds peer-to-peer data transfer, WAN line optimization, smart routing and real-time file system monitoring in order to speed up syncing for today’s enterprise.
Rsync & large files
It is impossible to use rsync for large files. Rsync doesn’t have an optimized way of calculating checksum of the file, which leads to an extremely long time to calculate file differences.
Resilio Connect optimizes the checksum calculations so that it can sync very fast for files of any size.
Rsync & large folders
It is impossible to use rsync for large folders. Rsync scans folder file by file and it could take hours or days before rsync discovers the changed file and transfers it to the destination.
Connect uses real-time notifications events from OS to detect changed files. This guarantees that the changed file will be delivered to its destination much faster than with rsync, and holds true for any folder size.
Rsync & SSH
It is impossible to use rsync without additional encryption. The lack of embedded traffic encryption requires people to install and configure additional encryption channels such as SSH or VPN.
Any good rsync replacement should include encryption in the product, so no additional products are required. The Connect uses AES128 in CTR mode to encrypt all the traffic that is sent between clients. This includes both data and all the control traffic.
Rsync & static IP addresses
It is impossible to use rsync in a dynamic network environment. Rsync requires static IP and port addresses for a destination. The static IP addresses expose another problem with rsync, since as soon as IP address of the server is changed, the rsync will fail to operate.
The Resilio Connect uses a dynamic routing approach. When a rule specifies that machine A and B need to exchange data, both machines use tracker or multicast to discover IP:Port addresses on the fly. No human intervention necessary.
Rsync & WAN connection
It is impossible to use rsync over long connections for offices located 3,000 miles and more apart. Usually, these networks referred to as WAN networks. The long distance between offices makes TCP packet travel time long and increases chances of packet loss due to equipment failure or congestion. The TCP will slow down the speed significantly for these types of networks. Rsync is based on TCP protocol, therefore the rsync speed will be slow.
Any good rsync alternative should have a WAN network support, which Resilio Connect does. With Connect you can utilize 100% of the available bandwidth in your network independent of distance, latency, or loss. To achieve that Connect uses UDP based protocol uTP2 that uses bulk packet transfer with selective acknowledgment of lost packets.
Rsync & several destinations
It is impossible to use rsync to distribute data to several destinations unless you’re running several copies of rsync in parallel. The several copies of rsync will split the network channel and reduce the transfer speed to a single destination. Another approach is to transfer the file to one destination and then to another one. This way the file will use the full bandwidth but the second destination needs to wait until transfer for the first one will finish. Both of these solutions can be quite slow.
A good rsync alternative will use a peer-to-peer approach that leverages networking between all offices and significantly speeds up data transfer. This approach splits each file into blocks and sends these blocks independently. Each recipient can send the block to other recipients once received. This dramatically speeds up syncing operation, since not only we are transferring concurrently to several machines, but also using other network channels to offload load from a sender network channel. This is the approach Resilio Connect takes.
Rsync & NAT
It is impossible to use rsync to connect to a server behind NAT. NAT is usually a firewall that hides server internal address and provides a connection with external IP address. You will need to open ports for incoming connection on the device so rsync can establish a connection.
Connect uses NAT traversal techniques that could establish a direct connection between computers without a need to open ports.
Here is a handy summary table of the features needed in a synchronization solution today, and how Resilio Connect stacks up against rsync.
To see how much faster your syncing could be with Resilio Connect fill out the schedule a demo form below. We’ll help you calculate your time savings over the LAN and WAN combined.