Encrypted backup client for Dropbox

DropboxBackupService is a Java program that encrypts and uploads local files to the Dropbox online service. DropboxBackupDecrypter lets you decrypt a raw backup to restore the original files.

All of this is accomplished using only pure Java code and the Dropbox HTTP API, without using the official Dropbox client application or their convenient SDK. This fully independent approach offers high security and confidentiality, in case you mistrust online cloud storage services and the client software that they provide.

Source code

DropboxBackupService.java
DropboxBackupDecrypter.java
Utils.java
The files above must be put into the package io.nayuki.dropboxbackup.
My JSON library is also required.

Instructions

The DropboxBackupService program takes a single configuration file as an argument. At the start, it scans the listed local directories and uploads all files to Dropbox (with encryption applied to the file names and file contents). After that phase is complete, it waits for an hour, then it rescans all the directories and uploads only the files that have changed. It cycles indefinitely between waiting and incremental upload, until terminated by the user.

The configuration file is in JSON syntax, and here is an example:

{
  "dropbox-access-token":
    "r9DhIqHfwLMVYwkctcx2Vi9_UuV6O3pHcwdVYt-gCVf2APyh_Z1xVzSLC7S-TDJp",
  
  "file-encryption-key":
    "000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F",
  
  "backup-paths": [
    {"local" : "/var/www/jane-doe",
     "remote": "/files/www-djane/"},
    
    {"local" : "C:\\Users\\John Smith\\My Documents",
     "remote": "/backup/docs-john-smith/"},
    
    {"local" : "D:\\Sales reports",
     "remote": "/backup/sales-reports/"}
  ]
}

Format notes:

The key must be 64 hexadecimal characters (256 bits), case-insensitive.
A backslash (\) in a path must be written as double-backslash (\\) due to the JSON format.
Each remote path must start and end with slash (/).
Each local path must be a directory, not a file.
Trailing commas are disallowed in JSON lists.

The security of this entire system depends on just your 256-bit cipher key. Please generate this key securely using a random number generator or hash function!

The Dropbox access token is obtained in a multi-step process. It begins by creating a custom app. The rest of the instructions can be found by searching online. Please keep your access token secret, or else other people can read and write files on your account!

The DropboxBackupDecrypter program takes as arguments a configuration file, an input directory, and an output directory. The configuration file contains the encryption key. The input directory contains encrypted files that were created by DropboxBackupService and subsequently downloaded to a local machine. The output directory must be empty. Simply run this program, and every encrypted file in the input dir will yield a new decrypted file in the output dir.

Encryption format

The custom encryption format produced by DropboxBackupService is designed to be secure, but also easy to understand and independently reimplement. For maximum confidentiality, both the file names and the file contents are encrypted.

File content encryption

The file content encryption is relatively simple, so it will be discussed first. The main goals are to achieve confidentiality and integrity. A minor goal is to hide partial updates by turning them into full updates (this assumes that files are small and bandwidth is plentiful). The algorithm appends PKCS #7 (RFC 5652) padding to the file data, prepends a randomly generated initialization vector, then encrypts the whole thing using AES-256-CBC. Finally, this encrypted data is put through HMAC-SHA-256 (using the same cipher key) and the MAC is appended. Using cheap text to illustrate, the encryption process looks like this:

Step 0: [Message payload (variable-length)].
Step 1: [Random initialization vector (16 bytes)] +
        [Message payload (variable-length)] +
        [PKCS #7 padding (1 to 16 bytes)].
Step 2: [AES-256-CBC encrypted data (multiple of 16 bytes)].
Step 3: [AES-256-CBC encrypted data (multiple of 16 bytes)] +
        [HMAC-SHA-256 of encrypted data (32 bytes).

Note that the encrypted file length is always a multiple of 16 bytes and at least 64 bytes long. The system security doesn’t rely much on the initialization vector. The IV should be unique, but being guessable is okay (e.g. a simple incrementing counter). The only vulnerability with regard to the IV is that if it stays constant, then an eavesdropper can tell when two encrypted files (or two revisions of the same file) have the same plaintext prefix.

File name encryption

The name encryption scheme is more complex, as it faces more constraints from the environment. The goals are to hide patterns present in the plaintext, to minimize the length expansion, to use only safe characters, and to provide a deterministic mapping.

Each component of the file path (relative to the backup root) is encrypted independently. For example, if “hello” encrypts to “RVoJ8rD” and “world.txt” encrypts to “ff3q_lpNLq-x”, then the local path “/home/alice/to-backup/hello/world.txt” could map to the remote path “/dropbox/backup/RVoJ8rD/ff3q_lpNLq-x”.

Note that the encryption of each name string only depends on the string content and the cipher key, not on anything else (file contents, full path, file vs. directory, randomization, etc.). Thus a file that stays at the same local path will always be mapped to a consistent encrypted path on Dropbox.

The name string to be encrypted is any Unicode string that does not contain the NUL character. The encrypted output is a base64url ASCII string using only the characters 0 to 9, A to Z, a to Z, hyphen, underscore (no period, space, slash, equal sign, etc.). Here is the full algorithm:

First, the input string is converted to an array of bytes according to the UTF-8 character encoding.

If the byte array's length is less than or equal to an AES block length (16 bytes), then it is padded with zeros and simply encrypted in ECB mode. (Because NUL is disallowed in the input string, a decrypter can strip off the padding precisely.)

Otherwise the length is strictly greater than a block. Then the bytes are encrypted AES-256-CBC with an all-zero IV and with ciphertext stealing mode 3. The bytes are reversed, then sent through the same encryption algorithm again.

Finally, the encrypted bytes are encoded using base64url (RFC 4648) without padding.

The goal of reducing length expansion is achieved by the lack of IV and the use of ciphertext stealing (instead of padding). Indeed, for strings at least 17 bytes long, the binary ciphertext has the same length as the binary plaintext. For strings 16 bytes or shorter, the use of ECB mode forces us to pad the data up to one block. Although using a stream cipher would prevent length expansion, it would be too unreasonably malleable because we do not have a message authentication code. For ASCII strings where one character is one byte, the asymptotic expansion ratio due to base64 encoding is 4/3 (for long strings).

The use of two-pass CBC and message reversal also deserves an explanation. If only one-pass CBC were used, then the lack of IV would expose prefix patterns in the plaintext. For example if two strings began with the same 32 bytes, then the first 1 or 2 (16-byte) blocks of ciphertext would be identical – this is bad. Because every file in a directory has a unique name, it means that as we encrypt each block in a string, at least one pair of plaintext blocks will differ. By reversing the ciphertext and encrypting it again, this difference will propagate to the other side of the ciphertext, thus making the entire ciphertext different. Thus any change in the plaintext will cause the whole ciphertext to change, obfuscating all patterns except the length.