Note that the problems related to dedupe/compression observed in this post were fixed by CrashPlan in the 3.03 software release in March of 2011. As of March, 2011, the problems with CogentCo remain problematic (limits of 300-500 Kbps through CogentCo vs. 3 Mbps + through any other route).
The recent comments in a forum thread at CrashPlan re: compression / deduplication and CPU usage prompted me to run a series of tests with different configuration settings to see how/if it impacted upload speeds to CrashPlan Central (their online backup archive). Below are the details on the tests I performed as well as the results.
Test file: 3GB ISO file of a DVD
Archive location: central5.crashplan.com
Online Archive Size: ~100k files, 2.4TB
Server OS: Windows Server 2008 R2
RAM: 12 GB
CPU: i7-860, 2.8GBz Quad Core with Hyperthreading
System Drive: 64GB SSD
Source File Drives: 10x2TB RAID 6
[Given this configuration, if this server ends up being resource constrained, you can imagine that almost any other machine would be as well given a similarly large backup set]
Between each change of the settings below, I restarted the CrashPlan service using the restart command in the GUI. Speed results below are taken from observing Windows network monitoring for several minutes of sustained performance, not from the CPU GUI (which sometimes reports different throughput based on compression & de-dupe). During testing, I was monitoring the WAN interface on my router during all tests to ensure there was no significant consumption of my bandwidth from anything else on my network.
Lastly, this Windows install is brand new as of yesterday with only a handful of applications installed other than CrashPlan, so interference from other software should be minimal.
With all that said... the results are:
Test 1: (Control case)
Data de-dupe: Minimal
Compression: Off
Results: 2.1 Mbps upload
CPU usage: ~4% average across 8 threads
Test 2: (test 1 with compression to ON)
Data de-dupe: Minimal
Compression: *On*
Results: 1.95 Mbps upload (essentially the same as test 1)
CPU usage: ~4% average across 8 threads
Test 3: (test 1 with compression to auto)
Data de-dupe: Minimal
Compression: *Auto*
Results: 2.1 Mbps Upload
CPU usage: ~5% average across 8 threads
Test 4: (test 1 with de-dupe to Full)
Data de-dupe: *Full*
Compression: Off
**Results: 530 Kbps Upload**
CPU usage: ~15% average across 8 threads (or effectively 100% of one thread)
Test 5: (test 1 with de-dupe to Auto)
Data de-dupe: *Auto*
Compression: Off
***Results: 550 Kbps Upload***
CPU usage: ~15% average across 8 threads (or effectively 100% of one thread)
Conclusion: De-dupe set to anything but minimal is an absolute throughput killer when backing up large compressed files and/or when backing up from a large backup set. This could be due to the size of my backup set (100k files and 2.4 TB, which is a lot to de-dupe), but it appeared to be limited by the ability of the de-dupe process to use only 1 CPU (other than this, I observed low network usage, low overall CPU usage, low memory usage, no disk queues, etc.).
Also, this was a head-slap moment for me as I've been struggling to figure out why my throughput to CrashPlan was so poor for some time now and was actually reimaging this server just for the purpose of starting fresh with CrashPlan and getting my speed issues fixed once and for all. Duh...
Takeaway for me: I'm turning compression to Auto and De-dupe to Minimal. I can't tell you how happy I am to have finally figured out what was causing the performance bottleneck on this backup set... This might not work for everyone, but if you're in similar scenario to mine, I'd encourage you to try these settings.
I’ll close by saying I love CrashPlan. It is the best backup software I’ve found out there and I’ve tried several. Now that I have this figured out, it’s even better. :-) Hopefully this knowledge helps others out as well.
Finally, I’ll close by mentioning one other well-known issue affecting upload speeds to CrashPlan Central. It involves routing through the cogentco network and is discussed here and here (and probably elsewhere) on CrashPlan’s forum. I wrote up a related post discussing how this bottleneck can be avoided through use of a VPN to avoid the CogentCo network.