Blog

Cloud Backup, Episode III

Iā€™ve written a couple
of times
before about what I do to backup all my important data.

My last post on the topic was more than a year ago though,
so Iā€™ll forgive you if youā€™ve forgotten. Hereā€™s a recap: originally I was using
a fairly traditional consumer backup service, ADrive.
This worked well because theyā€™re one of the few services that provides access
by Rsync, which made it easy to run scripted backup jobs on my linux
server
. Their account structure didnā€™t really meet my needs, however: you
pay for the storage you have available to you, not what you use. When I hit the
upper limit of my account the next tier up didnā€™t make financial sense, so I
switched.

About 15 months ago I moved my backups over to Googleā€™s Cloud Platform. This gives me an
unlimited amount of storage space, and I just pay for what I use at a rate of
$0.02/GB/Month. This has been working well for me.

In December I
found
Backblaze
B2
. They offer a service very similar to Googleā€™s (or Amazon S3, or
Microsoft Azure, or any of the other players in this space that you may have
heard of), except they cost a quarter of the price at $0.005/GB/Month. Thereā€™s
even a free tier, so you donā€™t pay anything for the first 10GB. When I first
looked at them their toolset for interacting with their storage buckets really
wasnā€™t where I needed it to be to make them viable, but theyā€™ve been iterating
quickly. I checked again this week, and Iā€™ve already started moving some of my
backups over.

In time, I plan to switch all my backups over. So far Iā€™ve
moved my documents folder and backups of my webserver, which totals about
2.5GB. Thatā€™s nice, because it means Iā€™m still within the free tier. The next
thing to tackle is the backups of all our photos and music, which combine at
around 110GB. That means I have to transfer 110GB of data though, which is
going to be a painful experience. Iā€™m still thinking about the best way to do
it, but probably the direction Iā€™ll go is to spin up a VPS and have it handle
the download of the backup from Google and the upload to Backblaze, then it
doesnā€™t hog all the bandwidth I have on my home internet connection.

The only other thing to think about with Backblaze is
versioning. Google offers versioning on their storage buckets, but I have it
disabled. With Backblaze there is no option (at least not that Iā€™ve found) to
disable this feature ā€“ meaning previous versions of files are retained and
count toward my storage bill.

Iā€™m torn on this. The easy thing to do would be to disable
it, assuming one of Backblazeā€™s future iterations of their service offering is
in fact the ability to turn this off. Iā€™m thinking though that the smarter
thing to do is make use of it.

For me and my consumer needs, that will most likely mean I
put together a PHP script or two to more intelligently manage it, however. Some
past versions are nice, but some of the files in my backup are changed pretty
frequently, and I definitely donā€™t need an unlimited history.

Still, Iā€™m very much pleased with the price of B2, and
watching the feature set rapidly improve over the past couple of months gives
me confidence that I can move my backups over and keep them there for the
long-term, because the transition from one service to another is not something
I want to put myself through too often.

Blog

Cloud Backup, Redux

Back in April I wrote a post called Overhauling my Digital Life, in which (amongst other things) I wrote about signing up for a cloud backup service.

At the time I picked ADrive as our storage provider for a couple of reasons ā€“ the price is extremely reasonable ($25 a year for 100gb) and the fact that they support rsync, which makes it extremely easy to write a backup script or two and have the server run them periodically.

This week as I was taking a look at the logs from my backup script I noticed something alarming: Iā€™d used up all of my 100gb quota and my backup jobs were failing as a result.

I thought for a while about what I should do about this. ADriveā€™s next account level up offers 250gb storage ā€“ 2.5x as much ā€“ but is also 2.5x the price at $62.50 a year. If you survey the cloud backup marketplace as I did eight months ago youā€™ll find this to be an extremely reasonable price, but it doesnā€™t feel like good value to me for a couple of reasons. For one, I would prefer to see reduction in the per-GB cost if Iā€™m going to move up to a larger account and on that basis thereā€™s no difference to what I pay now for my 100gb plan, but also because thatā€™s much more storage space than I actually need. Buying an extra 150gb of space to store the one or two extra gigabytes that donā€™t fit in my 100gb plan just doesnā€™t feel sensible.

image

When I looked at ADrive in the first place one of the alternatives I considered was Amazonā€™s AWS. If youā€™re not familiar, Amazon sell services like storage and cloud computing power and they have some pretty big customers ā€“ theyā€™re the service that powers Netflix and Instagram, amongst others. The reason I didnā€™t choose Amazon in the first place is that they really arenā€™t a consumer-focused service and you need to have a much higher degree of tech-savvy to be able to use them. Theyā€™re also a little more expensive than ADrive for the storage volume I need (3ā‚” per GB per month comes out to $36 a year for my 100gb backup), but their pricing model places no upper limit on the amount of storage you could use and you pay only for what you do use. Perfect.

They also offer an option called Glacier which on the face of it seems perfect for what I want ā€“ itā€™s a third of the regular price and itā€™s designed explicitly to be backup storage: if you need to restore files then you may have a couple of hours of waiting before they can be made available. That would be fine, except I do incremental backups ā€“ each week, month or quarter (depending on whatā€™s being backed up) I synchronize the backup with whatā€™s on my server, sending only files that are newly created or changed. In order to do that my backup tool needs access to whatā€™s already in the backup so that it knows what it needs to send. Glacier was a non-starter for this reason.

Regardless, Iā€™d all but decided to give AWS a try and Iā€™d signed up for an account and created a storage ā€œbucket.ā€ I was reading online about tools that offer rsync-like functionality but can upload to AWS storage. Iā€™d found one that looked good, and had noticed that it supported a variety of storage providers in addition to AWS. One of the other providers supported was another cloud services company that you might have heard of: Google.

image

I use Googleā€™s consumer services pretty heavily (I have an Android phone and tablet, so it makes sense to), and in fact Iā€™d used Googleā€™s App Engine service once before for a previous project, but Iā€™d never really realized that App Engine is part of a wider Google Cloud Platform offering that includes a cloud storage service very similar to Amazonā€™s S3, but costs just 2ā‚” per GB per month. This makes it cheaper than the service I was already getting from ADrive (by a whole dollar a year). I was sold, and I signed up.

The next step was to convert all my rsync-based backup scripts to send data to Google instead of ADrive, and this was extremely easy. Google offers a command-line utility called gsutil which can be for a variety of functions, including incremental, rsync-style file copying. The whole thing (from signup to having the scripts done) took just a couple of hours (including the time it took me to find and read the documentation). The documentation was absolutely necessary here: in contrast to the intuitive ease of use Iā€™ve come to expect from Googleā€™s consumer services everything I did to set up my cloud platform storage seemed foreign and complicated. You really do need a decent amount of technical knowledge to understand it. Thatā€™s not a comment against Google, necessarily: I would assume AWS is much the same. It feels complicated because it is complicated. Cloud Platform is a set of tools for developers to use however they see fit, not a single-task solution for consumers like I was used to.

Anyway, everything was set up, and I was happyā€¦ except for one thing. I still had to get the 100gb or so of data from my home server to Googleā€™s. My backup scripts were done and would take care of that for me when they were next run, except I knew it was going to take a very long time for them to start from a blank slate. When I originally set up my ADrive storage Iā€™m pretty sure it took several weeks to run the initial backup, and Iā€™d had to run them only at night because sending that much data used up all our available bandwidth.

Really what I wanted was a method for importing data from ADrive to Google. If I could do that then I wouldnā€™t have to send 100gb of data from our home server at all, I could just move things across and then my backup scripts would take care of any changes since the last successful ADrive backup. Thereā€™s no such service, but wait! Google Cloud Platform is for developers to create their own services, why not build what I needed?

When Iā€™d signed up for Google Cloud Platform theyā€™d given me $300 credit with a 60-day expiry, intended, I guess, to help me play around and get my app off the ground. Iā€™d dismissed it ā€“ the only service I needed was cloud storage, and to chew through the $300 before it expired Iā€™d have to store 7.5tb of data. But the credit allowed me to explore the other Cloud Platform offerings and more or less use whatever I wanted for free during those initial 60 days. In a couple of clicks Iā€™d provisioned and started a linux VM on Googleā€™s infrastructure and was at the command prompt. I wrote a two-line script to download my backup from ADrive (using rsync) to the VM, then re-upload it to Cloud Storage (using gsutil). Our home internet connection would probably max out at about 300kb/s upload ā€“ less with ADrive where their infrastructure also seems to be something of a limiting factor. Downloading my data from ADrive to Googleā€™s VM is not super-speedy either at an average of about 1mb/s, but re-uploading it to my Cloud Storage bucket races along a pretty staggering 12mb/s and, most importantly, all this happens without clogging up my home internet connection in any way.

The VM is running and doing its thing as I write this. I expect it to finish in about 10 hours time, at which point Iā€™ll run the backup scripts on my home server to upload anything that was missing from ADrive and weā€™ll be done.

Blog

Overhauling my Digital Life

If you’ve been reading or watching the news recently youā€™ll no doubt have heard about the Heartbleed bug thatā€™s been widely reported. Itā€™s a vulnerability in the OpenSSL library that many websites use to enable the SSL/TLS encryption that secures your traffic to that site, keeping your passwords and credit card information safe.

Itā€™s probably about time to go and update all your passwords (although you should wait until the site tells you to, because they need to patch the bug and reissue their SSL certificates before theyā€™re properly protected and not all sites will have done this yet), but all this coincidentally comes when Iā€™m in the midst of a plan to get my digital life in order.

There are a couple of things I’ve done recently that, in truth, I should have done a long time ago ā€“ and you should do them too.

Backup

First of all, an experiment: raise your hand if you think backing up your data to a remote location is a good idea, or perhaps even an essential practice. OK? Now keep your hand raised if actually do this.

Right, thatā€™s what I thought. Until the start of this month I would also have sheepishly lowered my hand at the second question. At home we have a server that handles the syncing of documents between our several computers ā€“ the result being that all those really important files exist in a few places, including the server itself. Thatā€™s not bad, but if there were some kind of catastrophe affecting our home then everything would be lost because itā€™s all in one physical location.

And it gets worse. The server has a 1TB drive that was big enough to back up all our photos, video and music when I bought it. Itā€™s still big enough to hold all that stuff, but as our collection of digital assets like that has grown thereā€™s no longer room on the individual computers to store everything. Not a big deal ā€“ all that stuff is on the server anyway and we can just stream it to whatever device we want to play things on. Fine, except now thereā€™s only one copy of all our photos and music. If the drive in the server failed weā€™d lose all that stuff. For me thatā€™s more than a decade of pictures.

In the past I’ve been unwilling to spend the money necessary to get enough cloud backup space to put all this stuff in, but prices have dropped recently (which really is what prompted me to look at my needs) and anyway, this really is something worth paying for.

I got myself 100GB of online storage from ADrive. I donā€™t know that Iā€™d recommend them for everyone because the transfer speed I get when I upload stuff is pretty slow, but for me itā€™s perfect: theyā€™re a good price ($25 a year) and I can upload files using rsync, which means itā€™s extremely easy for me to set up automated backup jobs on our server without needing to install anything. I don’t really need a super-fast transfer speed because my future backup jobs will be incremental (only uploading files that have changed) and syncing documents between computers is not a need – we already have that.

Two-Factor Authentication

The other big upgrade to my digital life recently has been two-factor authentication. Iā€™ve known of it for a while, although I hadnā€™t used it at all until recently. Basically though itā€™s for website logins, and the two factors it talks of are something you know (your password) and something you have (your cell phone).

Iā€™ve turned on two-factor authentication wherever I could, using the Google Authenticator app from the play store where possible, and text messaging elsewhere. Essentially the way this works is that when you sign in to a website using your username and password, youā€™re prompted to enter a code you get either from the app or texted to you ā€“ the point being that even if somebody knew your username and password, if they donā€™t have your phone they wonā€™t be able to log in.

Iā€™ve enabled this on Google (Gmail, drive, etc), facebook, twitter, tumblr, Evernote, PayPal and anywhere else I could find that offers it too.