Storage Soup - A SearchStorage.com blog

Storage Soup:

 

A SearchStorage.com blog


A data storage blog offering commentary on the storage industry, as well as a behind-the-scenes look at developments in storage management, SAN, NAS, backup, disaster recovery and storage strategy.

Blog dialogue: Online vs. traditional backup

I was very happy to see one of my regular blog-stops, Anil Gupta’s Network Storage, pick up on a recent post I wrote–the one about HP’s new online storage services.

In his response post, Gupta picks up on this graf in particular:

Like most online storage offerings to date, this offering is small in scale and limited in its features when compared with on-premise products. Most analysts and vendors say online storage will be limited by bandwidth constraints and security concerns to the low end of the market, with most services on the market looking a lot like HP Upline.

And responds:

there is nothing unique in most Online Backup Services that couldn’t be in traditional backup for laptop/desktop. At least traditional backup also come with peace of mind that all backups are stored on company’s own infrastructure. In last few years, I tried over a dozen online backup services in addition to putting up with traditional backup clients for laptop/desktop and I don’t see much difference among the two.

IMO, most online backup services are just taking existing on-premise backup strategy for laptops/desktops and repackaging it to run backups to somebody else’s infrastructure instead of your own.

I see what he’s saying, but in my opinion Gupta probably has “too much” experience with backup clients to necessarily see things from the SMB customer’s point of view. For him, installing a backup client isn’t a big deal–for some, it might be enough of a reason to let somebody else deal with it. Or at least, backup SaaS vendors are hoping so.

A storage reporter’s shameful secret comes to an end

I feel the need to make a confession here. Up until yesterday, despite spending a generous portion of my waking hours covering data backup, disaster recovery and data protection, I myself did not have a backup plan.

I do digital photography in my spare time, and creative writing outside work, and I’ve been a digital music addict since the advent of Napster. So I have about 100 GB on two IDE drives inside a Windows XP machine custom-built for me by a highly geeky friend. And it’s just been sitting there, waiting to be snatched away into the ether.

Then another friend of mine told me about how his MacBook hard drive crashed. On his birthday. While he also had the flu.

He told me how his entire visual design portfolio, an important part of his resume for the business he’s in, has been lost, along with all of his digital photographs, many of which he didn’t have posted on Flickr or stored anywhere else.

He went on to tell me that his costs for trying to recover the data from the drive are going to run him upwards of $2,000–if he’s lucky. It could be cheaper, but that would mean less of his data has been recovered, and so now he finds himself in the position of hoping he’ll have to spend more money.

It’s a bittersweet subject for him that so many people he knows, myself included, have credited his experience with finally getting them off their butts and backing up. But that’s the reality.

I ended up going with the 500 GB Western Digital MyBook, because that’s what my friend also ordered once he learned his lesson the hard way, and he’s far more technical than me, so I trust his judgment. The MyBook came with Memeo’s AutoBackup and AutoSync software, of which I’m only using the former. It also came with a bunch of Google software including Google Desktop, which I found rather odd.

Having covered data storage for the enterprise, I’ve had a chuckle whenever I’ve checked on the initial backup job’s progress. Granted, it’s got a QoS feature that cedes system resources to the PC, but let’s just say I’m not seeing the kind of data transfer rates with this thing I’m used to hearing about. It’s been funny, after being immersed in systems that perform at 8 Gbit or 10 Gbit for a few years, to watch my little PC poke along at what seems like 1 MB/hr, if that.

But still. At least I have a backup. Finally. And I can finally rid my closet of that skeleton.

Now my issue becomes off-site disaster recovery. It’s far more likely that my hard drive(s) will crash than that my house will be napalmed or something (knock on wood), but no sooner had I told Tory that he could stop bugging me about backup, than he started bugging me about taking the drive to my office once the data transfer is done.

But the AutoBackup software, like so many low-end and consumer backup offerings, is set to automatically backup changed files, and what I told Tory was, I like having a low RPO over here. And I made that napalm comment, I’ll admit (I can just feel karma coming to get me). So I’m thinking about some kind of backup SaaS for off-site DR, but capacity with those services is at a much higher premium than it is in 3.5 inch external SATA. And so you know what that means…data classification!

I may be poking along at 1 MB/hr, but it all feels like a slow-motion, small-scale version of the issues I cover every day. It’s interesting to see firsthand how ”Digital Life ™” is, in fact, blurring the boundaries between home and business computing.

What’s up with CDP for 2008

Some analysts touted CDP as being the dark horse technology for corporate adoption in 2007. As we all know, that didn’t occur and the multitude of CDP technologies ended up confusing analysts, press and IT alike as they each tried to sort out the differences between available CDP products and what CDP’s true value proposition was. All of these factors contributed to spoiling CDP’s debut.

However, I anticipate CDP will make a comeback in 2008 for two reasons: corporate needs for data replication and higher availability. Data replication has been around for a long time (only recently under the moniker of continuous), so it is a mature technology and well understood by storage professionals in the field.

“Higher availability” is the more important feature of CDP. Companies now must choose between high availability and semi-availability. High availability is associated with synchronous replication software and provides application recoveries in seconds or minutes but at an extremely high cost. At the other extreme, is backup software that only delivers semi-availability so it can take hours, days or even weeks to recover data. CDP delivers higher availability which is an acceptable compromise between these two extremes as it can quickly recover data (typically under 30 minutes) to any point in time and at a price that is competitive with backup software.

CDP also compliments deduplication. While some may view CDP and deduplication as competing technologies (and in some respects they are), the real goal of data protection is data recovery.

This is where CDP and deduplication part ways. CDP captures all changes to data but keeps the data for shorter periods of time, typically 3 to 30 days, to minimize data stores. Deduplication’s primary objective is data reduction, not data recovery. Faster recoveries may be a byproduct of deduplication since the data is kept on disk but it is not the focus of deduplication so recoveries from deduplicated data do not approach the granularity that CDP provides.

So what’s in store for CDP in 2008? The staying power of new data protection technologies is now largely determined by whether it is adopted by small and midsize businesses. If it’s practical and works there, it will find its way into the enterprises because more and more enterprises work as a conglomeration of small businesses despite corporate consolidations. So, it is not a matter of if CDP will gain momentum in 2008, it is a question of how quickly it will become the predomimant technology that companies use to protect all of their application data.

2008 recommendations for deduplication, encryption and VMware

As 2007 draws to a close, there are three technologies that appear near the top of many storage managers’ priority lists going into 2008.

· Deduplication

· Tape encryption

· VMware

The mix of old and new technologies is intriguing. One would think that as deduplication and VMware rise in importance, more companies would start to abandon storing data on tape. Yet that does not appear to be the case. Symantec’s Director of Product Marketing, Marty Ward, recently told me that the new encryption features in NetBackup 6.5 are its #2 most sought-after feature (deduplication is #1).

Don’t rush into a deduplication purchase decision. I have yet to talk to a user who doesn’t report faster backup times using a deduplicating backup appliance or backup software and ensuing reductions in data stores. However, I sense that users are rushing into purchasing decisions and not stepping back to look at what other options they have available.

ExaGrid System’s CEO, Bill Andrews, told me this past week that in 50% of its customer deals, the company is seeing no competition. I suspect this percentage probably holds true for Data Domain and Quantum as well. But storage managers should avoid rushing out and buying a deduplicating product to solve their backup problems. Taking just a few extra days to check out what other products are available, how each product adds more capacity and performance, and how viable the company behind the product is can save you some management headaches.

The big cautionary note with tape encryption is to verify how encryption keys are created and managed. So, I recommend using a third-party appliance to create and manage the encryption keys. Though appliances can encrypt the data, more are starting to work in conjunction with backup software and tape drives to provide encryption keys. When companies encrypt data stored to tape, most are hoping they never to access the data again. So managers need to think in terms of how best to manage the recovery of data in five years, not five days. Encryption appliances create highly secure encryption keys, manage the keys long-term, and give companies assurance that they can manage the encryption keys and then recover the data years later.

Storage companies also need to account for the very real storage problems that server virtualization creates. One of the best things you can do in 2008 to prevent VMware from negatively impacting your environment is to change the way you back up VMware virtual machines (VMs). One approach is to use the latest versions of backup software that support the VMware Consolidated Backup (VCB) framework, which back up just the VMDK file which contains the data for all VMs on a VMware server. The other is to install a host-based CDP or dedupe agent on each VM. This eliminates the overhead that backup software agents introduce on each VM. I recommend using CDP. If you are going to change your backup approach anyway, choose the one that gives you the most granular recovery options.

Tape is the only option

Just when I think that I have heard every reason for keeping data on tape, new arguments keep emerging. Now the latest is that tape is more energy efficient than disk.

My first real insight into this came a few weeks ago when I was speaking to Spectra Logic’s director of technical marketing, Molly Rector, who had just returned to Denver after meetings with Spectra Logic channel partners, resellers and users in the New York and Boston area. The feedback that she received from her meetings was that some data centers in the Northeast were running low on power and no longer able to obtain new power. In these cases, the shortage of power was forcing their customers to choose tape because it was more energy efficient than disk even though they wanted to buy disk for their backup environment.

While it may be true that tape consumes less power than disk, it is disconcerting that some companies find themselves in this predicament of needing to choose tape over disk because of something as seemingly preventable as an inadequate supply of power.

Keeping data on tape costs businesses in ways that are sometimes hard to measure. Legal discoveries, the personnel needed to manage tape and moving and storing tapes offsite all add to the costs of tape management and also consume power in more subtle ways. To somehow conclude that the choice between disk and tape somehow needs to stop and start with a company’s rate of energy consumption seems a bit archaic to me.

Tape may consume less power than disk, but that does not necessarily make tape a better choice. Disk and tape are both choices that companies need to have available to them and either one, if managed properly and looked at from a total cost of ownership, can save companies money and cut energy consumption in the process.

Companies in this situation are obviously looking at some hard choices in the near term as their choices are less about the choice between disk and tape but if it is time to change how and even where they manage their data. In the Northeast, it appears some companies have already waited too long to make a decision because when the number of outlets left in the wall dictates what storage media they need to buy, the only choices left are unpleasant ones.

Another day, another unencrypted backup tape lost

I have yet to get a letter from an institution with which I do business that starts like this:

Dear Current or Former PEIA, WVCHIP, or AccessWV Member:

We are writing to you because of a recent data security incident. On October 16, 2007, a mainframe computer tape containing your and your dependents’ name, address, and social security number was reported as lost by United Parcel Service (UPS) while en route to PEIA’s data analyst.

But the longer I stay on the storage beat, the more I feel like the day is coming. Read more »

CDP’s evolution takes shape

The evolution of the use of continuous data protection in companies is taking shape. BakBone Software’s inclusion of CDP as a new feature in its NetVault:Backup 8.0 release puts it in the growing number of products such as Asigra’s TeleVaulting and InMage Systems DR-Scout that use CDP to protect Windows and Linux servers.

The rationale for including CDP in backup is simple. Easy backup and recovery of standalone Linux and Windows servers remains a significant challenge for administrators. Companies still have too many of this class of servers with too few administrators, who are struggling to provide a cost-effective means to backup and recover this class of servers.

Using CDP as part of the backup client addresses this issue on several fronts. It replicates data to disk locally and remotely; it provides for fast point-in-time recoveries at any past point-in-time (typically 3 - 30 days); and by creating and keeping a complete copy of the data on disk on another host, administrators can manipulate this copy of data in multiple ways.

Read more »

Storage newcomers deliver on predecessors’ promises

CDP and DPRM software are relative storage newcomers, but they may be the software that finally delivers on the promises of their SRM and storage virtualization software predecessors.

Storage resource management (SRM) and storage virtualization software have taken their turns sharing the storage spotlight over the past few years but have, for the most part, largely failed to deliver on their promise. Though companies may use them in some tactical way, such as doing LUN masking, fabric zoning or data migrations, neither has really delivered the simplified, automated storage management environments that vendors promised and customers hoped they would.

Working for a company that tried both, my company saw the strategic value that both SRM and storage virtualization software could deliver but never could figure out a way to turn that promise into a reality. For when push came to shove, it became almost impossible to find a risk-averse and profitable way to transition from Excel-based FC SAN management to SAN management based on the use of these two software tools.

What my company needed, and what is still needed, is a method to segue from FC SANs managed by Excel spreadsheets to the introduction of SRM and storage virtualization software without a rip-and-replace strategy. So, it was while I was evaluating the latest generations of data protection and recovery management (DPRM) software and continuous data protection (CDP) software that I may have stumbled across a way for companies to make this transition.

Companies usually bring DPRM software in-house to report on the success and failures of backup jobs. Though DPRM software still does that, DPRM software is quickly expanding to monitor and report on other components of the backup infrastructure, including server performance, fabric switches and virtual and physical tape libraries. Though the impetus for offering these features is to better troubleshoot systemic problems in the backup infrastructure as well as do capacity planning, companies are inadvertently using DPRM software in much the same way SRM software was intended.

A similar pattern is emerging with CDP software at the high-end, with products such as EMC’s RecoverPoint, HP’s CIC and Symantec’s CDP/R. These CDP software appliances install into FC SAN fabrics and operate just like the original FC SAN-based storage virtualization products except CDP appliances journal all writes and are only used when production storage fails. But other than these characteristics, they are essentially the same as the original FC SAN-based storage virtualization appliances.

The reason users are now willing to introduce either CDP or DPRM software into their production environments is that they no longer feel like they are risking their production applications or stretching their budgets for products whose value proposition is dubious. CDP and DPRM products solve immediate corporate pain points, are justified with existing dollars and require less risk – a win for both the vendors and the users.

Now the question is, will CDP and DPRM software eventually evolve to assume responsibilities that their SRM and storage virtualization software predecessors never really delivered on in the minds of customers? My guess is yes.

New data backup SaaS players emerge

Once upon a time in the storage market, storage service providers were all the rage. Then, the tech bubble burst and most of them went the way of the dodo bird.

But with storage growth in recent years forcing companies to consider new strategies for managing data, storage service providers are making a comeback. Within that market space, meanwhile, backup and recovery is the most popular area, as users struggle with the cost of protecting more and more data, the distraction of backup and recovery management from core business and IT operations, and ever-increasing regulation.

Naturally, this is the market where the lion’s share of new players are springing up. EMC Corp. and Symantec Corp. are among the heavy hitters that say they’re planning backup SaaS. But there are also some new and emerging vendors that are gaining attention in the market with the return of interest in outsourcing.

One of the companies that’s made its presence known in recent weeks is Nirvanix Inc., which is aiming to be a business-to-business outsourcer for large companies. It’s come out of the gate overtly challenging Amazon’s S3 service, saying it can overcome the performance issues that have been reported by some large S3 users. The service is also offering a 99.9% uptime SLA to customers.

Nirvanix claims it can offer better performance because it is constructing a global network of storage “nodes” based on the way Web content delivery systems work — by moving the storage and processing power closer to the user, cutting down on network latency.

Within each of Nirvanix’s storage nodes is a global namespace layered over a clustered file system, running on Dell servers residing in colocation facilities. These nodes also perform automatic load balancing by making multiple copies of ”popular” files and spreading them over different servers within the cluster. With this storage infrastructure, the company is claiming that it can offer users a wider pipe as well as a faster one, allowing file transfers of up to 256 GB. Moving forward, according to CEO Patrick Harr, the company plans to offer an archival node for “cold storage” within 6 months.

One potential issue for the company in comparison to S3 is a lack of financial clout to match Amazon. Building out the storage node infrastructure will be an expensive proposition in comparison to creating software and running a typical data center, and so far, the company says it has received just $12 million in funding, some from venture capital firms and some from research grants. However, it also says 25 customers have already signed up for beta testing, and says one of those customers is supporting 50 million end users.

Base pricing for the service is 18 cents per stored gigabyte per month, a “slight premium” over Amazon’s price according to Harr. The company is hoping that it can increase sales volume and drive down the price.

Meanwhile, on the consumer/SMB side, a company called Intronis LLC is souping up its features in the hopes of gaining traction in the low end of the storage market. Version 3.0 of its eSureIT backup service will allow users to create a tapelike rotation scheme for files, creating backup sets and setting policies for data retention on a weekly, monthly and yearly basis. The company has added a plugin it calls Before and After, which will allow users to create scripts dictating what their computer systems should do before, during or after engaging with the Intronis service — for example, the script can have the user’s machine shut down applications prior to backup and restart them after backup has finished. Another new plugin will allow mailbox and message-level backups and restores of Exchange databases, and adds a text search for email repositories.

But the biggest new development, and the one that’s taken it the better part of two years to develop, according to Sam Gutmann, co-founder and CEO, is a feature the company is calling Intelliblox, which like other enterprise-level backup services such as Asigra, backs up only changed blocks over the wire. The feature uses a set of checksum and hashing algorithms to identify blocks and keep them together with their corresponding files (an existing feature of Intronis’s service is total separation between the company’s admins and users’ data — each user is given an encryption key to access its storage at Intronis’s data center, and Intronis says it has no way of reading any of its customer data).

This use of hashing algorithms also has this blogger wondering if they might also be able to offer fixed-content archiving down the road.

Much ado about Microsoft VSS

When Microsoft Windows Server 2003 was released about five years ago, there was much ado about its new Volume Copy Shadow Service (VSS) framework. This feature allows administrators to take snapshots of Windows volumes and then restore data from the snapshots. At the time, Microsoft claimed that it provided the backup infrastructure for Microsoft Windows XP and Windows Server 2003 servers, but companies have to date seen only negligible benefits from this technology. That is about to change.

The three-sided triangle of providers, requestors and writers which comprise the VSS architecture are coming together to provide Windows administrators a powerful new alternative to back up and recover Microsoft servers using snapshot technology. This option is possibly as or more powerful than the much hyped virtual tape libraries (VTLs) and data deduplication technology and may be an option companies already own.

The first side of the VSS triangle, the provider, is the application component. Applications such as Microsoft Exchange and SQL Server now support external calls from third-party software to make a call to the application and acquiesce it. However, these calls only work if the second side of the VSS triangle, the requestor, supports it.

A requestor is an application such as backup software which controls the entire snapshot process which includes pausing the application, initiating the snapshot, restarting the application and then backing up the newly created snapshot. Most backup software products now support VSS and organizations may have this feature lying dormant in their backup software or can obtain it for an additional licensing fee.

The writer, which is the third side of the VSS triangle, actually generates the snapshot. Though snapshots can occur on Windows XP and Windows Server 2003, VSS compatible backup software can also initiate snapshot on storage systems from most vendors including incumbents like EMC, HDS, HP, IBM and NetApp or, with some scripting, from storage newcomers like Compellent, EqualLogic and Lefthand Networks.

Moving the creation of the snapshots from the Windows server to the storage system can also remove the server overhead normally associated with backup. Since the volume created by the snapshot can be presented to another server, the other server can then backup the data to tape.

The maturation of VSS technology is significant because, with all of hype about VTLs and the data deduplication, everyone seems to have forgotten about this low-cost or potentially free option that users may have available to them. Users willing to invest a little extra time to explore VSS may find that they can pay a fraction of the price of VTLs and data deduplication technology and achieve comparable or better results.