Back in April I wrote a post called Overhauling my Digital Life, in which (amongst other things) I wrote about signing up for a cloud backup service.
At the time I picked ADrive as our storage provider for a couple of reasons ā the price is extremely reasonable ($25 a year for 100gb) and the fact that they support rsync, which makes it extremely easy to write a backup script or two and have the server run them periodically.
This week as I was taking a look at the logs from my backup script I noticed something alarming: Iād used up all of my 100gb quota and my backup jobs were failing as a result.
I thought for a while about what I should do about this. ADriveās next account level up offers 250gb storage ā 2.5x as much ā but is also 2.5x the price at $62.50 a year. If you survey the cloud backup marketplace as I did eight months ago youāll find this to be an extremely reasonable price, but it doesnāt feel like good value to me for a couple of reasons. For one, I would prefer to see reduction in the per-GB cost if Iām going to move up to a larger account and on that basis thereās no difference to what I pay now for my 100gb plan, but also because thatās much more storage space than I actually need. Buying an extra 150gb of space to store the one or two extra gigabytes that donāt fit in my 100gb plan just doesnāt feel sensible.
When I looked at ADrive in the first place one of the alternatives I considered was Amazonās AWS. If youāre not familiar, Amazon sell services like storage and cloud computing power and they have some pretty big customers ā theyāre the service that powers Netflix and Instagram, amongst others. The reason I didnāt choose Amazon in the first place is that they really arenāt a consumer-focused service and you need to have a much higher degree of tech-savvy to be able to use them. Theyāre also a little more expensive than ADrive for the storage volume I need (3ā” per GB per month comes out to $36 a year for my 100gb backup), but their pricing model places no upper limit on the amount of storage you could use and you pay only for what you do use. Perfect.
They also offer an option called Glacier which on the face of it seems perfect for what I want ā itās a third of the regular price and itās designed explicitly to be backup storage: if you need to restore files then you may have a couple of hours of waiting before they can be made available. That would be fine, except I do incremental backups ā each week, month or quarter (depending on whatās being backed up) I synchronize the backup with whatās on my server, sending only files that are newly created or changed. In order to do that my backup tool needs access to whatās already in the backup so that it knows what it needs to send. Glacier was a non-starter for this reason.
Regardless, Iād all but decided to give AWS a try and Iād signed up for an account and created a storage ābucket.ā I was reading online about tools that offer rsync-like functionality but can upload to AWS storage. Iād found one that looked good, and had noticed that it supported a variety of storage providers in addition to AWS. One of the other providers supported was another cloud services company that you might have heard of: Google.
I use Googleās consumer services pretty heavily (I have an Android phone and tablet, so it makes sense to), and in fact Iād used Googleās App Engine service once before for a previous project, but Iād never really realized that App Engine is part of a wider Google Cloud Platform offering that includes a cloud storage service very similar to Amazonās S3, but costs just 2ā” per GB per month. This makes it cheaper than the service I was already getting from ADrive (by a whole dollar a year). I was sold, and I signed up.
The next step was to convert all my rsync-based backup scripts to send data to Google instead of ADrive, and this was extremely easy. Google offers a command-line utility called gsutil which can be for a variety of functions, including incremental, rsync-style file copying. The whole thing (from signup to having the scripts done) took just a couple of hours (including the time it took me to find and read the documentation). The documentation was absolutely necessary here: in contrast to the intuitive ease of use Iāve come to expect from Googleās consumer services everything I did to set up my cloud platform storage seemed foreign and complicated. You really do need a decent amount of technical knowledge to understand it. Thatās not a comment against Google, necessarily: I would assume AWS is much the same. It feels complicated because it is complicated. Cloud Platform is a set of tools for developers to use however they see fit, not a single-task solution for consumers like I was used to.
Anyway, everything was set up, and I was happyā¦ except for one thing. I still had to get the 100gb or so of data from my home server to Googleās. My backup scripts were done and would take care of that for me when they were next run, except I knew it was going to take a very long time for them to start from a blank slate. When I originally set up my ADrive storage Iām pretty sure it took several weeks to run the initial backup, and Iād had to run them only at night because sending that much data used up all our available bandwidth.
Really what I wanted was a method for importing data from ADrive to Google. If I could do that then I wouldnāt have to send 100gb of data from our home server at all, I could just move things across and then my backup scripts would take care of any changes since the last successful ADrive backup. Thereās no such service, but wait! Google Cloud Platform is for developers to create their own services, why not build what I needed?
When Iād signed up for Google Cloud Platform theyād given me $300 credit with a 60-day expiry, intended, I guess, to help me play around and get my app off the ground. Iād dismissed it ā the only service I needed was cloud storage, and to chew through the $300 before it expired Iād have to store 7.5tb of data. But the credit allowed me to explore the other Cloud Platform offerings and more or less use whatever I wanted for free during those initial 60 days. In a couple of clicks Iād provisioned and started a linux VM on Googleās infrastructure and was at the command prompt. I wrote a two-line script to download my backup from ADrive (using rsync) to the VM, then re-upload it to Cloud Storage (using gsutil). Our home internet connection would probably max out at about 300kb/s upload ā less with ADrive where their infrastructure also seems to be something of a limiting factor. Downloading my data from ADrive to Googleās VM is not super-speedy either at an average of about 1mb/s, but re-uploading it to my Cloud Storage bucket races along a pretty staggering 12mb/s and, most importantly, all this happens without clogging up my home internet connection in any way.
The VM is running and doing its thing as I write this. I expect it to finish in about 10 hours time, at which point Iāll run the backup scripts on my home server to upload anything that was missing from ADrive and weāll be done.