Storage Soup - A SearchStorage.com blog

Storage Soup:

 

A SearchStorage.com blog


A data storage blog offering commentary on the storage industry, as well as a behind-the-scenes look at developments in storage management, SAN, NAS, backup, disaster recovery and storage strategy.

The Storage Admin, DR, and the Down Market

The economy has been on the mind of just about everybody recently, and with good reason. Gas at near record highs, unemployment rising, housing values reportedly dropping, the credit crunch and foreclosures numbering in the bazillions it is easy to see why people are not exactly upbeat about the state of our economy.

In the storage market, however, it’s looking like a blockbuster year. EMC and others are reportedly on track to meet or beat financial analysts’ estimates, and that leads me to today’s blog.

As it turns out, the impetus for this blog post was my recent attendance at a DR seminar put on by Storage Decisions featuring Jon Toigo. Looking around the room, I couldn’t help but think of what it looked like in the early days of “network administrators” when people didn’t think of network pros as any different from the server guys. Today, the storage admin is being called on to be part lawyer, part business analyst, part networking guru and all-knowing about all things storage, but there are very few companies with a dedicated storage team (outside the Fortune 500’s that have Exabytes of storage to manage).

For the most part (and please chime in with your experience) storage folks are still viewed as “server guys”. This is, of course, changing, and I wouldn’t bring it up if there weren’t a bigger point to be made: if you do a quick scan of Monster, Dice or Jobcircle, there are more and more listings specifically calling for a “Storage Administrator”. Storage is fast becoming the segment to be in–the information infrasructure could not function without it, and it is increasingly becoming the focus of much planning and resource allocation, in terms of both time and money. Talk to most companies, and they have storage budgets that are going up even in a down market, and they are hiring people to dedicate to the task of storage. Storage pros are more highly valued, and their pay is going up.

So what does this have to do with DR? DR is, at its basic level, moving data from one place to another, on a regular basis, far enough away that if you had a disaster you could recover your data and continue operations in the face of a disaster. This, in almost every case that I can think of, requires storage, storage networking technologies and someone who knows enough about them to set it all up and keep it working in a changing environment. Hence all the storage pros in the room vs business types that normally involve themselves in DR.

Toigo put on a great presentation. It was filled with a ton of valuable information and even if you have nothing to do with the DR planning and implementation at your company, I would enthusiastically recommend attending one in your area. I walked in thinking I had a passable grasp of DR best practices and walked out realizing I had barely scratched the surface, and that as a storage professional I needed to understand more about business practices as they relate to DR.

For example, Toigo discussed what a data model was and not only how to build one but suggestions on explaining it to non-technical analysts so we could all use it together to ultimately build a workable DR plan around valuable data instead of putting together a set of technologies to make our systems highly available but unable to really recover from a disaster. And it’s the storage guy who should be taking the lead on that.

Think of the value you bring to the table when you can not only provide the information infrastructure, but also assist in developing a DR plan that will keep the business functioning, and generating revenue in a disaster. In the process, you can also create things that have intrinsic value to multiple business units–think of what information security can do if they know what a document or document type is worth as compared to other documents. My fellow storage pros, I’m seeing a bright future for us.

New data protection gadgetry hits the streets

Two storage-related announcements came out of CeBIT this week that have turned a few heads.

The first is the FlashBack Adapter from thumb-drive king SanDisk. The device fits into the ExpressCard slot of a user’s PC, and automatically and continuously backs up and encrypts data onto a flash memory card. This way, to quote SanDisk, when “you’re at a conference and someone spills coffee on your laptop PC, shorting out the system and cutting you off from your presentation and notes. Or your computer slips out of your hands and crashes to the floor,” you can extract the memory card from the smoking wreckage, find another PC and be on your way.

The second announcement comes from a UK company called Retrodata, which is reportedly getting ready to release a do-it-yourself drive recovery system. The beast, which has yet to be photographed, reportedly weighs 75 kg (165 lbs.) and will be priced at around $7000. But for all you Austin Powers fans out there, it does come equipped with…”lasers”.

Photobucket

According to techchee, a blog dedicated to high-tech products:

The device uses laser-guided positioning to help it accurately extract platters from any 3.5 inch hard drive with minimal user intervention. What’s unusual element is that such devices normally require highly skilled operators, whereas the System P. EX can be used by a relative novice at a data recovery company.

Maybe if Retrodata plays its cards right, it’ll get an order for…one million dollars.

Photobucket

Blackberry outage a storage issue?

As approximately the last person in the Western Hemisphere not to own a PDA, I escaped the Great Blackberry Outage of Aught Eight last week, and got to have that much more time to be smug about my lack of dependence on such a thing before I inevitably get one and grow so dependent on it I need Tommy John surgery on my thumbs.

This week, though, the plot thickened for storage folks as it was revealed that the outage was caused by a failure during a systems upgrade. According to Reuters, the outage was caused by an upgrade to a data routing system inside one of the company’s data centers. In the past, RIM suffered an outage to its Blackberry service because of cache upgrades. Drunken Data auteur Jon Toigo thinks they’re still having storage problems, and cites an AP report on MSNBC saying the failure happened during a system upgrade designed to increase capacity.

Meanwhile, Reuters seems to imply that at heart, data growth is what bit RIM. “RIM has been adding corporate, government and retail subscribers at a torrid pace and has had to expand its capacity in step to handle increased e-mail and other data traffic. Its total subscriber base sits at about 12 million according to latest available data.”

The fact of the matter is that no system is failproof–but I think Reuters brings up a good point. We’re opening up new frontiers in massive multi-tenancy and creating new and unprecedented demands on computer systems; we’re also consolidating data into the hands of service providers like RIM. My sense is we’re going to start seeing more of this kind of issue as these trends continue, especially as more and more new services come online. So maybe I’ll just rely on good old dinosaur Outlook for a little while longer.

Vengeful militant dolphins…and the Internet

That, friends, is without a doubt the best headline I’ve ever written.

As many of you are surely aware, underwater Internet cables in Asia were cut last week, one of them by an errant ship’s anchor, and another two (or three–I’ve seen stories that say there were a total of three cut cables, and stories that say there were four)…unexplained.

It all happened last week, but repairs are still ongoing in the region. The cable cut by the anchor has been fixed, and reportedly most of the region of Asia, the Middle East and North Africa that was Net-less has come back online (all those Saharan nomads are surely relieved wireless is back on their laptops again). Fixes to the other cables should be done Sunday according to authorities.

As always when human beings encounter the unknown, their immediate instinct is to fill it in with knowledge or theory as quickly as possible. This story is no exception, and according to this AFP piece, the conspiracy theories are flying fast and furious. Many suspect terrorism, yet no one knows how it would have been accomplished.

All of which leads to the following paragraph, which I will now quote verbatim:

Bloggers have speculated that the cutting of so many cables in a matter of days is too much of a coincidence and must be sabotage. Theories include a US-backed bid to cut off arch-foe Iran’s Internet access, terrorists piloting midget submarines or “vengeful militant dolphins.”

If this blog were the Daily Show, that right there would be your Moment of Zen. 

But in seriousness. While all this is happening, there are no doubt companies suffering a complete outage, and if the estimates for the repairs are true (personally I apply the same projection-to-reality formula for Internet fixes as I do to cable repair guy appointment times), these companies will have been suffering complete outages for at least a week to ten days.

Helpfully, IT companies are reminding us through press releases that most companies are not equipped to survive outages longer than seven days (per Gartner). They’re also reminding everyone that had these companies been using their product(s), and presumably a sufficiently distant secondary site, they would’ve been fine. How that would be if you don’t have a WAN to replicate and restore data, or a network through which to conduct commerce, is beyond me, but that’s really not the point; here in the trade press we expect to get press releases linking IT products to every conceivable natural or worldwide disaster, regardless of how tenuous the link may be.

The more I thought about it, the more I wondered…unless you’re a multinational company, how do you survive an outage that big? We’ve all heard about how 9/11 taught people to expand the scope of their DR plans, and Katrina taught people to expand the geographic area they consider potentially disaster-affected when sending tapes offsite. This type of disaster, though, is too big to be escaped by all but the biggest of global corporations. And it does beg the question–how far can DR go? How do you respond to a disaster of global or hemispheric proportions? Many companies are going through a painstaking process of broadening the scope of DR plans beyond their local area as a result of Katrina–should they start planning DR hot sites in Siberia instead?

Yet even as IT shops slowly inch toward better preparedness, disasters, and the global economy, wait for no man. Given our worldwide dependence on the Internet (and imagine what the effect would be if this had happened in North America and Europe), has this disaster suggested a practical limit to technical DR? If so, what’s the contingency plan for that?

Protecting millions of small files

Every week, I visit IT professionals and I often hear the same complaint about dealing with a file server environment that has grown out of control. The problem is that these file servers have millions of small files and customers are looking for ways to better protect this file data.

Second, disk-based archiving truly fixes areas of the backup that most D2D solutions do not. Customers are highly frustrated with backup applications stumbling over what I call the “millions of small files issue.” This is primarily caused by the never-ending growth of a standard file server’s data. Most backup applications struggle with this millions of files scenario. Customers are counting on D2D to help, and it will… a little. The target disk may be faster, but mostly it is much more forgiving than tape. Tape needs to stream, or be fed a constant flow of data, in order to reach maximum write performance. Millions of small files make it difficult for those tape drives to be fed consistently. Disk backup, on the other hand, will maintain the same write performance no matter how inconsistent the data feed is.

That solves half the backup problem. The other half of the performance problem with millions of small files backup is that the backup software still needs to walk those millions of small files, identifying which ones need to be backed up. This file system walk can be very time consuming. Then, the backup software needs to update its own database that tracks what files were backed up and where. Imagine adding millions of records to a database every night, as fast as possible. That database gets HUGE in a hurry, can easily be corrupted and again, even if everything goes right, is very time consuming. Lastly, with most D2D backup solutions you still need to send the entire data load across the network. Even with deduplication solutions, the entire data payload needs to get to the appliance before deduping happens. All of this consumes network bandwidth. Disk-based archiving may circumvent or delay the need to upgrade network bandwidth by clearing this old data out of the way.

Disk-based archiving eliminates the problem of moving most of these millions of files. With disk-based archiving, the “old” files are stored on the archive and no longer need to be backed up. They are safer on disk than they are on tape (data integrity checking and replication) and they are out of the way. The backup software no longer needs to walk those files to find which ones need to be protected, send the files across the wire to be backed up and they do not consume disk space on file server or the D2D backup target. Additionally, since the archive is disk and not tape, you can be more aggressive with what is archived.

With a classic tape-based archive, customers will wait for data to get very old before moving it to tape. In addition, they will invest in elaborate data movers to provide transparent access to tape. Lastly, data that has stopped changing but is still being referenced or viewed cannot move to tape at all. With a disk-based archive, the delivery back to the user is relatively fast, so you can be more aggressive with your move to archive disk storage and there is less of a need to build elaborate access schemes. Most disk-based archives simply show up as a share on the network and you can archive reference data, further eliminating the data that needs to be protected by traditional backup methods.

A disk-based archive is the perfect compliment to D2D backup. It will reduce the investment in disk needed for backup and an archive strategy may pay for its self on this reduction alone. This is because a disk-based archive will clear out the fixed data (data that has stopped changing), making the investment in the software modules required by most backup applications for D2D cheaper (since they charge on stored capacity) and disk-based archives reduce the disk capacity of the disk backup as well as on the primary (expensive) disk needed on the file server.

What does this look like in hard costs savings? Disk-based archiving can reduce primary storage requirements (at least 10X dollar saving: $4 vs. $43/GB) and they can reduce backup requirements (fixed information is said to occupy, on average 50% or most enterprise primary disk capacity) saving them an additional $6/GB.

For more information please email me at georgeacrump@mac.com or visit the Storage Switzerland Web site at: http://web.mac.com/georgeacrump.

Iron Mountain’s CEO, users, debate data security

A report surfaced last week in ComputerWorld that Iron Mountain will be adding a security system called InControl to its delivery trucks that are carting around sensitive data. This week, I’ve talked to some users about how they feel about the program and also caught up with Iron Mountain’s CEO, Richard Reese, to talk about Iron Mountain’s point of view on security and chains of custody for the data it transports. In both cases, I heard some interesting comments.

The Iron Mountain updates, which come as the result of a $15 million investment over the last 18 months, will not require an additional fee, according to Reese. Bundled under the InControl umbrella are products, services and processes including more extensive background checks on employees and an employee training program on chain of custody procedures.

Reese also said the company has added on-board computers into the majority of its North American truck fleet. The computers will detect common human errors through sensors in the vehicle–a driver using a vehicle retrofitted with this system can’t start the truck if all doors aren’t locked and alarmed. If the system fails and the door somehow comes open anyway, an alarm will sound in the truck cab. The truck will also only allow one door to be open at a time if there are multiple doors on the vehicle, “so you can’t put the box [of tapes] down on the sidewalk and then go behind an open door and lose sight of it,” Reese said.

Drivers will also be given RFID fobs to keep on their keychains, so if they fail to lock the doors while making a delivery, an alarm will go off. Hand-held GPS-enabled scanners will report the whereabouts of shipments back to users through a Web portal that was already in place. The scanners will also alert drivers immediately to inconsistencies so that errors in shipment routing can be corrected more quickly.

Going forward, the program will be expanded to cover Iron Mountain’s international businesses. Right now retrofits have begun in the UK, and Reese said the company is studying legal regulations in other countries before it figures out how to roll out InControl everywhere.

The customer view of this depends on who you talked to. Dwayne Suizer, VP/Director of Technical Operations for First Independent Bank, said looking into the details of the plan put his mind more at ease. “At first, I thought they were just going to be able to track the trucks, but as I read more and understand how the driver proximity works and the dual ignition systems, it seems like these are all great steps forward.”

But another user, who declined to be named for legal reasons, said it’s “‘too little, too late’ for Iron Mountain.  Many companies have been affected by Iron Mountain’s losses of tapes in transport mishaps and the seemingly-avoidable fires at two of their UK facilities last year.  Two fires, so closely together, could be seen as unlucky or ill-prepared.  It’s up to Iron Mountain’s customers to choose.”

Meanwhile, Reese’s response to the criticism that InControl is a day late and a buck short is that it’s only been in the last 18 months or so that data privacy laws have necessitated this type of control over data.  “If you go back 2 to 5 years, customers were more concerned about driving down the cost of transportation than data loss–they could make three or four copies of a tape and if one got lost in transit, it wasn’t a big deal. Now they’re changing their own inside operations as well to deal with the new privacy regulations, and we’re trying to take on the same burden.”

Reese also said that there are premium services Iron Mountain users can pay for to have things like point-to-point dedicated routes for their deliveries and two drivers in order to guard against theft, and that Iron Mountain had, until the addition of InControl, been pushing its customers concerned about data security to purchase those extra safeguards. “They just wouldn’t do it. They preferred the common carriers.”

Not everybody’s buying it. “I see RFID tracking and a rigorously-enforced chain-of-custody as standard requirements for today’s off-site storage vendors.  RFID tracking can be implemented inexpensively,” said the user who spoke on condition of anonymity.

So why did it take several instances of data loss and destruction for Iron Mountain to begin this grand security scheme? “Let me be clear that there will be other instances,” responded Reese. “InControl will also not be 100%. Any process that involves humans will have errors, and customers also need to understand where their high-risk data is and apply the right solutions. Especially for this baseline service which we just improved radically at no additional cost to customers, I’m not going to guarantee perfection.”

Suizer did have one suggestion for better security: RFID tags in each tape shipment box, an idea Reese said is good in theory, but is “not technically or economically feasible.” RFID tags’ antennas “need to see the sky”, he said, in order to communicate. “Once they go in the loading dock somewhere, the tracking is useless.” Passive RFID tags, which don’t contain batteries, have a much smaller transmission range–5 or 6 feet–than active RFID tags, but the Catch-22 is that active RFID tags require batteries, which are not long-lived. “RFID is not a cure-all,” he said.

The Onion: Internet crashes; all data lost

I have a hard time imagining that anyone who reads this blog isn’t already aware of The Onion, but just in case you missed it, no one in storage–particularly backup–should miss this video report on wide-scale DR from America’s Finest News Source ™.

Be sure to watch until about 1:40 for that rarest of birds: storage-related humor on a mainstream website. Even rarer: backup-specific storage-related humor.

If only The Onion could fill in the rest of what would surely follow this story: a huge swath of the US workforce left to office-chair races to pass the time; dramatic TV footage of Al Gore flying in to help troubleshoot his invention; and of course, every storage vendor in the world putting out press releases about how if the government had been backing up the InterWebs with [insert product name here], none of this would’ve happened.

Unfortunately, given the nature of the disaster, they’d probably have to start hanging their announcements up on telephone poles.

Meanwhile, however, as anyone reading this post on company time is no doubt keenly aware, there’s another very real workplace problem facing our nation right now, which leaves almost no one unaffected. For more, see this report.

When Plan B fails

This morning Plan B failed. I have high speed internet access into my office, but I pay a small monthly fee to keep a pay-per-minute dial-up account just in case my high speed internet provider ever goes offline. So, this morning, when I lost my internet access, I mentally started preparing myself for 56K upload and download speeds. What I had not mentally prepared myself for, was my phone lines also being down. Thankfully, I still had my cell phone and was able to reach the outside world and let some individuals know about my situation.

But, it occurred to me that this was fairly typical of how disasters go. Not that losing internet access or phone service is necessarily a disaster, but disasters are rarely neat and tidy, they never happen when it is convenient and you can generally count on them not to follow the plan you laid out.

In no way am I implying that companies should abandon either their data protection or disaster recovery planning efforts. What I am suggesting is that after you have backed up all your data, laid out your recovery plans and then tested them, introduce some reality back into the situation.

For instance, a concern that one records management provider recently expressed to me is that companies should evaluate their disaster preparedness after they have just finished a disaster recovery exercise. Tapes are out of order, the recovery environment is not properly configured and people are exhausted. How quickly and how well could your company recover in this situation if a disaster happened then?

Another important aspect to include in your plan is to identify someone who knows the plan but is not afraid to think outside the box. I was once in a disaster recovery situation where an entire production database had failed and there was not enough unallocated disk in the free disk pool at that site to recover the database. The plan called for us to recover to another site, but one individual asked “Do we have a SAN?” and “Can we move some allocated but unused disk on another server over to this one?” In both cases the answer was yes, and we were able to recover the application in 2 or 3 hours instead of 8 to 12 hours.

Disaster recovery plans are just that, plans – no more, no less. But like all plans, they were created at a past point in time and may not reflect the current reality. That is where having someone around who can assess the entire situation and not just follow the script becomes imperative if one is to turn the disaster into a recovery.

Storage Decisions Chicago: Blue-chips discuss DR, e-Discovery

Things have gotten kicked off in earnest out here in the Windy City at this year’s Storage Decisions conference in Chicago. Today was the first full day of sessions at this year’s edition of the conference, and attendees heard discussions of hot topics from blue-chip companies including United Airlines, Federal Reserve Bank, and Bank of America.

Gary Pilafas, managing director of enterprise architecture for United Airlines (UAL), gave a presentation this morning about his company’s DR plans, much of which centered around classifying data according to criticality, and setting disaster recovery levels appropriately, a common trend in DR of late. Pilafas said he steered application admins away from insisting on Tier 1 DR (after all, no application admin wants to say his data isn’t of top importance) by emphasizing cost.

On this he was challenged by Michael Thomas, storage architect for the Federal Reserve, who said he’d seen that kind of planning go awry in some cases after 9/11 and Hurricane Katrina. “Some business units had [scaled back] DR plans based on cost, but then their SLAs didn’t match their true business requirements,” Thomas said. “They still expected IT to respond, and we did, but not in as timely a manner as they would have liked in some cases.”

Pilafas acknowledged that getting a true sense of business requirements and managing application interdependencies made tiering for DR a tricky project. However, he said UAL is currently testing service-bus software products including IBM’s Websphere MQ and BEA’s Aqualogic, layered over Hitachi Data Systems’ Universal Storage Platform for a services-oriented architecture. That plan, he said, will decouple data services from individual business units, specific applications or devices, eliminating the issue of application interdependencies. He said it will also go a long way toward addressing the confusion about business units and their priorities. “This way we can discuss each business unit’s priorities, map it back to services, and the higher-priority services float to the top,” he said. “It’s like taking the opposite of the lowest common denominator.”

Thomas himself had a different approach to making DR plans more effective, which is to go back to the drawing board with testing. “One of the big problems in this industry is that a lot of people don’t really test their DR plans,” he said. “They send people out a week in advance and prepare, and then test.” Thomas advocated more spontaneous tests and recounted one test in an earlier position where employees were “toe-tagged” at random to more realistically simulate a disaster scenario. 

Meanwhile, if there’s anything that requires as much careful planning and precise procedure as DR, it’s e-discovery, and on hand with a keynote speech on that subject was Daniel Blair, e-discovery, investigation and incident support within the information security and business continuity division of Bank of America (say that five times fast).

Among the nuggets offered by Blair was the estimation that for every 1 GB of data produced for e-discovery, 6.25 GB of storage space is needed for multiple working copies, indexing and conversion to TIFF formats as well as the production of copies for opposing counsel. BOA’s approach to cut down on storage costs is to put the original “golden” copy of data onto lower-performing, high-capacity SATA disk (backed up vigorously, of course) and use higher-performing FC storage for the processing.

Blair wasn’t able to discuss specifics because of the sensitive nature of corporate litigation, but he did say that so far, he has yet to find a single comprehensive product for e-discovery. He also said that BOA uses a combination of in-house work and outsourcing, specifically with TIFF conversion, to lighten the workload and save financially.

Ultimately, though Blair said the new federal rules of civil procedure could make e-discovery a more bearable undertaking (since they recognize a “good faith” effort to preserve data), further attention on e-discovery means that more savvy practitioners will find new ways to key on process vulnerabilities during a lawsuit.  

As the pressure grows, Blair said there’s plenty of room for improvement in the technology space. “Real-time indexing, content categorization, records management for the lifecycle, true policy-based management, and better scalability,” he listed off immediately when asked for ideas.

One other item of note: Compellent was the name on everybody’s lips during the expo on the show floor tonight. Users said they had always liked Compellent’s automated tiered storage feature, but it had taken some time to see more customer traction in the market and product maturity for the emerging company.

So, what are you hearing at the show? Give us your thoughts in the comments section.