Replacing Dropbox with BitTorrent Sync

[Edit 12/2015 – Since BitTorrent Sync hit 2.x, I’m no longer using it and I can no longer recommend it.]

Too many times, you’ve heard a cloud storage/sync product described as “like Dropbox.” There’s Box, OneDrive, Google Drive, iCloud Drive, Bitcasa, SpiderOak, Wuala, Transporter, and I’ve missed a bunch. It doesn’t matter because they’re all pretty bad, and nearly all have the same problem, which is that any data you upload can be decrypted by the provider. In the event of a bug or a breach, anyone could have access to your files.

BitTorrent Sync draws the inevitable comparison, but it’s different and better. It lets you sync folders between multiple machines, and it supports every major computing platform, but it works without a cloud component. It’s peer-to-peer, encrypted, and fast. Sync is in beta, but I replaced Dropbox with Sync over five months ago, and it’s been great. The most recent version even handles syncing OS X extended attributes with an intermediate Linux peer.

I’ve been using Sync to publish files to the web, replicate a Minecraft server, sync personal documents between my computers, access files on the go with my iPhone, automatically upload security camera footage offsite, and even back up my iPhone’s camera roll to a home computer. It works.

Sync makes ad-hoc sharing easy, with expiring and optionally read-only links. It’s one of the easiest and fastest ways to share large files.

The most intriguing feature of BitTorrent Sync is its ability to include peers that can sync without having a decryption key. I’ve taken advantage of that feature to keep a copy of my documents synchronized with my own cloud server. On that server, the file contents, names, and metadata are encrypted and I feel reasonably secure knowing that if someone hacked the server, my tax returns and security camera footage would remain private.

Sync is hard to get right, and BitTorrent Sync is impressive. On my wishlist: Hosted plans for folks who need the always-on aspect of cloud storage and can’t roll their own, and a Dropbox-compatible SDK for mobile app developers.

TestDisk Data Recovery on OS X

One of the 4TB external USB hard drives I use for local backups started randomly disconnecting a few days ago. Today it failed completely. It’s a Seagate Backup Plus model, where the bottom of the enclosure consists of a small, removable shim that contains the USB & power connections and the USB to SATA converter chip. After trying different USB ports and cables without success, I decided to hook up the drive directly using SATA. After trimming a SATA cable with a utility knife to make it fit the narrow port opening, hooking it up, and rebooting… Finder offered to initialize an unreadable disk.

Disk Utility showed a single unreadable 500GB partition and a FAT partition table. The drive previously had a GUID partition table, not FAT. I have no idea what corrupted the disk in such an interesting way, but TestDisk was able to quickly scan the drive, locate the partitions and types, and repair everything in just a few seconds. The user interface hails from the 1990’s, but the software worked wonders and it’s completely free and open-source. It also runs on Linux, Windows, and DOS.

All my backups are intact and valid. I haven’t figured out what to do with the drive, though. Anecdotal evidence from the Internet suggests USB/SATA adapters are prone to failure, but I’m guessing the cause is probably cheap, poorly-designed power supplies. I’m not sure if it’s worth opening a support case with Seagate.

Migrating virtual machines from Amazon EC2 to Google Compute Engine

My Amazon EC2 discount contract is almost up, and I’ve been playing with Google Compute Engine (GCE). Initial impressions are that it’s faster and costs less money, particularly if you don’t want to pay up-front for EC2 reserved instances. Google’s web console is more modern than Amazon’s, though slightly less sophisticated. Google’s CLI tools are much faster and don’t require Java. Google’s API uses JSON instead of XML.

In terms of capabilities, GCE is not as advanced as EC2, but it’s vastly more powerful than Linode, Digital Ocean, and the like. One exception is that Google doesn’t permit sending SMTP directly from GCE instances. They have a partnership with Sendgrid for that. I’m using Mandrill instead, and so far I’m very pleased with that choice.

Migration from EC2 to GCE without re-installation

It’s possible to migrate virtual machines from EC2 to GCE. This post explains how I migrated my production Ubuntu 12.04 LTS instance. It’s not a detailed guide. If you possess a good amount of Linux operations knowledge, I hope the information here will help you do your own migration quickly.

Assumptions

Important differences between EC2 and GCE

EC2 uses Xen for virtualization. GCE uses KVM.

Most EC2 instances are paravirtualized (PV). They do not emulate actual PC hardware, and depend on Xen support in the kernel. Most of the time, EC2 instances use PVGRUB to boot. PVGRUB is part of the Amazon Kernel Image (aki-xxxxxxxx) associated with your instance. PVGRUB basically parses a GRUB configuration file in your root filesystem, figures out what kernel you want to boot, and tells Xen to boot it. You never actually run GRUB inside your instance.

With KVM, you have a full hardware virtual machine that emulates a PC. It requires a functioning bootloader in your boot disk image. Without one, you won’t boot. Fixing this, and using a kernel with the proper support, are the two main obstacles in migrating a machine from EC2 to GCE.

Let’s get started.

On EC2:

  • Snapshot your system before you do anything else. If you’re paranoid, create the snapshot while your system isn’t running.
  • Install a recent kernel. The Ubuntu 12.04 LTS kernel images don’t have the virtio SCSI driver needed by GCE. I used HPA’s 3.13.11 generic kernel. (These days it isn’t necessary to use a “virtual” kernel image. The generic ones have all the paravirtualized drivers and Xen/KVM guest support.)
  • Make sure your EC2 system still boots! If it doesn’t boot on EC2, it won’t do much good on GCE.

On GCE:

  • Create and boot a new (temporary) instance on GCE using one of their existing distribution bundles.
  • Create a new volume large enough to receive the boot volume you have at EC2, and attach it to your temporary instance.
  • Create an MBR partition table on the target volume, partition it, and create a root filesystem.
  • Mount your new filesystem.

On EC2:

  • Copy data to your new GCE filesystem. Use any method you like; consider creating a volume on EC2 from the snapshot you just created and using that as your source. That will make sure you copy device nodes and other junk you might overlook otherwise. Remember to use a method that preserves hard links, sparse files, extended attributes, ACL’s, and so on.

On GCE:

  • Verify you received your data on your target volume and everything looks OK.
  • Bind-mount /proc and /dev into your target volume and chroot into it.
  • Install grub2 and grub-pc (Or whatever provides grub2 on your distribution.)
  • Remove any legacy grub ec2 packages you might have.
  • Remove /boot/grub/menu.lst
  • Add and edit the following in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="console=ttyS0,38400n8 ro root=LABEL=(your root fs label)"
GRUB_TERMINAL=serial
GRUB_SERIAL_COMMAND="serial --speed=38400 --unit=0 --word=8 --parity=no --stop=1"
  • Run update-grub
  • Install grub onto your new volume (probably grub-install /dev/sdb).
  • Edit your fstab to disable any other disks you haven’t migrated over
  • Edit the hostname (/etc/hostname)
  • Edit /etc/resolv.conf to use a valid resolver
  • Uninstall any ec2-specific software packages.
  • Exit the chroot
  • Un-mount the bind mounts and target fs
  • Detach the target fs
  • Create a new GCE instance using the target fs, and boot!
  • If it boots, destroy your temporary instance. If it doesn’t, re-attach the target disk to it and see what went wrong.

These are the minimum changes required to boot the image on GCE. You’ll still want to clean things up and make changes according to Google’s suggestions.

Troubleshooting

Check the serial console output. Is the kernel starting?

... KVM messages omitted ...
Booting from Hard Disk...
Booting from 0000:7c00
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.14.3-031403-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201405061153 SMP Tue May 6 15:54:50 UTC 2014

If you don’t see anything after “Booting from 0000:7c00” then you haven’t installed GRUB properly.

If the kernel starts but the root filesystem doesn’t mount, make sure you see the root disk being detected. Make sure the root disk label is properly set in the filesystem and the GRUB configuration.

Please help me improve this post. Leave a comment below!

Use Dropbox to host public files on your own domain name

I’ve been using a Dropbox public folder and some Apache trickery to share files directly from Dropbox on my own domain at pub.noxon.cc. Dropbox is drag-and-drop file sharing at its finest, and by sharing my files on pub.noxon.cc instead of on dl.dropboxusercontent.com, my files are accessible to corporate folks who would otherwise find themselves blocked by an over-zealous web filter. Last but not least, if one of my files becomes too popular, Dropbox won’t shut down my account.

product logos

Dropbox doesn’t offer a custom hosting service, so I had to build it. I already have an Apache server, so I created a new virtual host and added some reverse proxy magic. I set up my virtual host as the origin server for the Amazon CloudFront content distribution network, ensuring a minimal load on my own server and the ability to handle virtually unlimited amounts of traffic.

Here’s a recipe for Apache 2.2, mod_proxy, and mod_rewrite:

DirectoryIndex disabled

ProxyRequests off

RewriteEngine on
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-d
RewriteRule ^/(.*) http://dl.dropboxusercontent.com/u/xxxxxx/$1 [P,L]
ProxyPassReverse / http://dl.dropboxusercontent.com/u/xxxxxx/

Header unset cache-control
Header unset Pragma
Header merge cache-control max-age=3600
Header merge cache-control must-revalidate
RequestHeader set User-Agent Mozilla

The cache-control settings dictate that CloudFront should cache my content for an hour (3600 seconds). CloudFront currently ignores the specified max-age for 404 results, instead preferring to cache them for about 10 minutes. I’d prefer a shorter lifetime for failed requests, but that’s not easy with Apache 2.2; With 2.4, it’s do-able.

The requesting User-Agent override is necessary because Dropbox blocks requests from the Amazon CloudFront User-Agent.

Using mod_rewrite makes it possible to host overlapping content outside of Dropbox. If it exists on the server, it gets served locally; If it’s missing, Apache tries to fetch it from Dropbox. I locally host the favicon, robots.txt, a 404 handler, and a couple of other things.

If you want to use your own 404 handler, you’ll need this:

ProxyErrorOverride On
ErrorDocument 404 /path/to/404.html

Before you deploy something like this, carefully consider the security implications and make the necessary adjustments. Do you want PHP code in a Dropbox folder running on your server?

Dropbox public folders are not available to users who signed up for Dropbox after July 31, 2012.