Virtuos Games Uses Sync To Move Huge Datasets When Porting Titles

Jurgen Kluft, Senior Technical Director at Virtuos Games, takes us through the process of using Sync to move the huge amount of data generated when porting titles to PlayStation4 and XBox One:

Virtuos Games is an outsourcing company that provides co-development services to game companies. We consult on everything from art production and porting of existing titles to end-to-end game development projects. Our main development site is in Shanghai, and we’ve got team members in Cheng Du as well as Paris.

Our current project involves porting two titles to PlayStation 4 and XBox One and upgrading the visuals for these titles to what has become expected on these platforms. Our major challenge in executing this project has been the huge amount of data: the current generation of consoles can handle at least 8 times more than their previous generations. This project alone has 10 data ‘depots’ with over 5TB of LZ4 compressed data.

For some time we’ve been unsatisfied with managing code and data in Perforce. We love to work with Mercurial because of its distributed nature and great branching functionality and wanted to use it to replace Perforce, but Mercurial is not great for managing many small and large binary files. Our projects require anywhere from 20,000 to 160,000 files, so this was a real issue for us. Perforce also uses a server-client model that made moving data at peak times very cumbersome. Resilio Sync’s (formerly BitTorrent Sync) P2P infrastructure offered a potential solution that would take advantage of the upload speed of all available clients to scale to even larger teams or datasets and use active synchronization to avoid the bottleneck we were experiencing.

Our experience with Mercurial foreshadowed a couple of issues that would prevent us from easily using it with Resilio Sync to replicate binaries:

* Mercurial’s storage backend is tightly integrated, so, it’s not easy to switch with another implementation.
* Mercurial is not good at managing binaries. It loads them in memory to access them, so manipulating large files is not possible.
* Mercurial has difficulty handling a large number of files — we have over 200,000 binaries in one repository alone.
We searched for existing solutions, but in the end decided to build something on our own to deal with all the binaries. Mercurial would ignore them and we would commit metadata to Mercurial to track the binaries. We called this solution Hgx.

Our approach to handling large amount of small and big binary files is to store metadata (File Name, Length, Time, Number of Chunks, Array of Chuck Hashes) in custom data files to track the state and content of the files. We have written a tool using C# and .NET 4.5 for x64 that can deal with the working directory and the chunk database. We added support for a number of basic command-line commands and options including Add, Commit, Update, Forget and Status. All files and directories that are tracked by Hgx are configured to be ignored by Hg.

To hash and compress the content of binary files we wrote a tool in C++ taking advantage of multi-core. We use Skein512-256 for hashing and LZ4 for compression and files are split into 256 KB fixed size chunks.

The reason we hash the data is to decouple the file and the data, storing the hashes as metadata and the actual data in a data-store. Another reason is verification: we always verify the chunk data with the hash, so data corruption is easily detected. The data store is a plain key-value store where the key is the 256 bit hash and the value is the chunk data.

User commits are either pushed to a shared folder over the LAN or otherwise synchronized to the server using Resilio Sync (formerly BitTorrent Sync). We setup Sync by adding the read-only key through the API and use it to synchronize the hash-chunk database, along with data and index files. Because the data only holds chunks, it’s impossible to reconstruct the binaries without access to the meta information, so we don’t have to worry about security or think about encryption. We have an convenient ‘install/setup’ utility that installs Sync, writes our config file with API key, starts Sync, configures and adds every project to Sync.