Thursday, November 19, 2009

Finding the strongest security and best value in remote storage

Data can be priceless. No expense should be spared to prioritize the following two objectives:
  • Information privacy
  • Information safety and redundancy
I'm trying to come up with a strategy that addresses these needs, and I'd like a little feedback. Also, corrections are much appreciated and I'll do my best to keep this up to date based on any information in the comments.

Data needs to be private. Businesses count on trade secrets, and keeping information private is what keeps a business running. Eventually, sensitive information will end up on a host that is in contact with the internet. The contact could be indirect, temporal, or both. Your privacy should be hedged on the assumption that any host is in danger of being compromised if it's ever in contact with an internet-facing network. The only way to design a system with the highest-possible degree of security is to operate with this assumption in mind. Information also needs to be safe. Nothing less than nuclear holocaust should lead to information loss.

With the right application of freely available software, this is not only possible, but cheap.

The most important program needed is the GNU Privacy Guard, or, best known by it's command-line name, gpg. gpg is a free implementation of the OpenPGP standard. When it comes to keeping your information private, OpenPGP is the de facto standard on the cutting edge of computer science. As far as academia is concerned, even theoretical liabilities have all been weeded out since 1996. This encryption is invulnerable, except possibly to the military, or corporate black ops. The proof is statistical; if you applied every supercomputer in the world to the task of analyzing a data with this encryption, it would take lifetimes to find the decode. OpenPGP also provides a second layer of privacy; it allows you to sign your data so that it is protected from alteration in transit and in storage. With OpenPGP you are assured that data you store is meaningless to anyone without the key. You are assured that the data you retrieve is identical to the data in your original source (Schneier, Bruce (1995-10-09). Applied Cryptography. New York: Wiley. p. 587. ISBN 0471117099.).

The benefit of secure encryption, from the perspective of remote backup, is that you do not need to trust ANY hosts outside of your network with your information. If your storage provider is in the news because the security was cracked by hackers, you know your data will stay private because you've already taken precautions against the host becoming compromised.

Encryption is still very much relevant to locally stored, physical media. Locks can be picked. Cars can be stolen. Packages can get lost. Every piece of media that is not bolted down, and even some that are, could be the source of a potential leak.

The next topic is a discussion of value. The industry benchmark when it comes to the cost of storage is Amazon Simple Storage Service (S3). To back up a total of 1TB of data, and transfer in 100GB of data per month results in a cost of $1920 per year. S3 features a pay-what-you-use model, which means your needs are guaranteed to be met as they grow. A cheaper option is to use a less permissive backup solution like Backblaze. With Backblaze, you are purchasing an unlimited backup license for one computer for $50 per year. This computer must be running a Microsoft Windows operating system, so there is a one-time investment in a Windows license, and there must be a Windows host on your network. The disadvantage of requiring a Windows computer on your network is small compared to the license terms: if the computer you license to Backblaze provides services in your intranet, your complacency with Backblaze's terms is questionable, and your service could be terminated depending on the interpretation of their license agreement.

So, the cost range of acquiring third-party remote storage services has a wide range of $50 to about $2000 per year. The cost of implementing the same solution in house is somewhere in between these two numbers, but it's harder to pin down because it involves man-hours designing, building and maintaining storage systems with a wide range of strict requirements for data integrity.

When using third-party remote storage, you're basically covered if you subscribe to two distinct services; if one suffers a catastrophic failure, your data is still hosted on another site, and hopefully locally as well. When implementing backup first-party, it's your responsibility to maintain data integrity as well as the physical integrity of the system. Parts and computers must be maintained, software must be updated, and the whole system must be protected from physical harm, such as a fire.

Let's estimate the cost in hardware to backup, redundantly, 1TB of data to be $800. At a rate of $18/hour, it only takes 60 hours over the course of a year to match the cost of Amazon S3 services, and an additional 100 man hours to match the cost of having two distinctly located buckets with S3. Consider that in order to approach the data safety offered by a service like S3, you would have maintain distinctly located data repositories. This involves maintaining remote hosts and connections between the hosts. Alternatively, data could be physically moved on a regular schedule, but this puts an increased strain on the hardware and results in a higher hardware cost to protect the data from physical shock, purchase special cases, or increase hardware redundancy. From this perspective, S3 seems like a bargain. Costs of S3 are also likely to fall in proportion to the decreased cost of storage, but the cost of man hours puts a price floor on the service; in other words, don't expect prices of S3 storage to fall below $1000 per TB in the next three to five years.

For any organization that depends on data, information privacy and safety are crucial. Here is everything I need to know (I think) to make sure that these issues are addressed and policies are in place to meet these needs for a business.

No comments:

Post a Comment