Storage Soup - A SearchStorage.com blog

Storage Soup:

 

A SearchStorage.com blog


A data storage blog offering commentary on the storage industry, as well as a behind-the-scenes look at developments in storage management, SAN, NAS, backup, disaster recovery and storage strategy.

Data deduplication: no lifeguard on duty?

In the course of a conversation today with a new SRM vendor, ArxScan, CEO Mark Fitzsimmons mentioned a use case for the startup’s product that had me raising my eyebrows: basically, keeping data deduplication systems honest.

According to Fitzsimmons, a large pharma company wanted the Arxscan product to migrate data identified as redundant by the data deduplication system to another repository and present it for review through a centralized GUI, so that the customer could sign off on what data was to be deleted.

“So you’re replacing an automated process in the data center with a manual one?” was the confused reaction from one of my editors on the conference call.

“Well, we’re working on automating it,” was the answer. “But the customer found dedupe applications weren’t working so well, and wanted a chance to look at the data before it’s deleted.”

I’ve heard of some paranoia at the high end of the market about data deduplication systems, particularly when it comes to virtual tape libraries or large companies in sensitive industries like, well, pharmaceuticals. One question I’ve heard brought up more than once by high-end users is about backing up the deduplication index on tape, the better to be able to recover data from disk drives should the deduplicating array fail. But breaking apart the process for better supervision? That’s a new one for me.

Anyone else heard of anything like this? Or is the customer going overboard?

Oracle jumps into archiving; can Microsoft be far behind?

Oracle is getting into the archiving game with the Oracle Universal Online Archive, which will archive email as well as unstructured files. The product will use Oracle’s own database as the underlying infrastructure, with Oracle Fusion Middleware on top for data ingestion and user interface.

Despite the name, the product is on-site software. There will also be an email-only option, Oracle E-Mail Archive Service, which supports Exchange, Notes and SMTP mail. The products are expected to be available sometime this year. The Universal Archive goes for $20 per named user or $75,000 per CPU, while the Email Archive is priced at $50 per named user or $40,000 per CPU.

Not only am I not surprised to see Oracle get into the data archiving space, to be honest, I’m wondering what took them so long. And while writing the previous paragraph, I said “Ouch” a few times–when it was noted that Oracle can archive multiple content types in one repository, which most third-party archivers can’t do yet; when it was noted that Oracle can support not only Notes but SMTP on top of Exchange, which most third party archivers can’t do yet; and again when I saw the steep pricing.

Be that as it may, it’s been well known that databases like SQL are the basis for most third-party archiving software today. It’s also been well known that customers are catching on to archiving for database data as well. Finally, it’s bleedin’ obvious that Exchange is the dominant email platform and the dominant focus in email archiving. And I’ve wondered for a long time why companies like Oracle and Microsoft didn’t get in on this, since they have what seems like a slam dunk: ownership of the application and core technology, and mighty brand power that could conceivably crush the third-party market.

Easy, there, killer, was the response from ESG analyst Brian Babineau, who studies the archiving space. He pointed out that database archiving systems have to understand both the underlying database structure and the overlaying application, something Oracle isn’t doing. They may have an 800-lb. gorilla brand, he said, “but they have a tougher fight because there are native database archiving and native enterprise application vendors.”

To me this still leaves open the question of why Microsoft doesn’t just add archiving to Exchange, but Babineau pointed out the folks from Redmond already dipped a toe into the archiving market with FrontBridge and didn’t get too far. But I still have trouble believing that the Exchange archiving market would last long if Microsoft were to make a stronger move, say by acquiring a company like Mimosa and making stubbing and archiving a part of the Exchange interface.

EMC hashes out enterprise archiving SaaS

PhotobucketIf you’ve been following the data archiving and compliance markets, you’ve probably heard the consensus that the real boom in software as a service (SaaS) will come from small to midsized businesses (SMBs). That’s the prevailing wisdom among analysts, anyway, as far as I’ve heard.

But EMC revealed today at a Writers Summit in Boston that it intends to push its Fortress-based SaaS offering into the high end space with a hybrid approach to on-site and off-site archiving.

The event today was unusual, at least compared with the rest of my experience in the industry. There were no end users or high-profile industry analyst firms represented and hardly any trade press, either. Most of the attendees were technology writers from new mediums such as blogs and Wikis. EMC executives explained that they wanted the summit to be an interactive discussion around industry trends (read: free help for their marketing research?).

It was an odd situation for me, since I’m used to listening and asking questions at industry events, rather than offering opinions. Along the way, though, the EMC execs dropped a few nuggets about their plans. Convergence was a pervasive theme–and the SaaS plans fit into it. EMC predicted a convergence not only between traditional technologies and new mobile technologies (that’s why they bought a stealth startup with no product on the market yet, in Pi) but between on-site and off-site data repositories.

The new aim of the Content Management and Archiving unit at EMC is to use Documentum to unify pieces of its archiving portfolio (CMA president Mark Lewis says EmaileXtender will be integrated into Documentum by mid-year), and also to unify those repositories. Lewis and Documentum founder Howard Shao, now EMC senior VP of CMA, said in their view there are four factors influencing this approach: enterprise content management and archiving place significant demands on outsourced infrastructures, especially when it comes to network bandwidth; companies are wary of letting sensitive, regulated data outside their firewalls; any application you’d want to deliver through SaaS is inextricable from applications that remain on-site; and that the volume and value of archival storage dictates a tiered approach.

This sparked some debate among some of the pundits at the meeting. Carl Frappaolo, VP of market intelligence for enterprise content management (ECM) industry association AIIM, pointed out that the biggest reasons companies resist deploying ECM is because of complexity. “Aren’t you just adding complexity to the equation?” he asked. Shao countered that a complex problem or a complex back-end doesn’t mean that management can’t be simple.

Kahn Consulting Inc.’s Barclay Blair piped up in support of Shao’s view that users will be wary of letting certain data outside their firewalls, but said “our clients would be attracted to a model that keeps the information on-site, but has the applications which manage the information being managed for them by someone else.”

Countered Frappaolo, “If EMC is doing its job right, shouldn’t users be willing to trust data to them? The whole idea is that you’re supposed to be better at security than me, and I should trust you to keep from exposing private data both inside and outside the data center.”

At any rate, the upshot according to Lewis will be a rollout of this hybrid ECM SaaS model by the end of this year. Another thing I got out of this discussion, with all its focus on security and privacy within a multitenant repository, is a clearer reason why EMC spent all that money on RSA.

HP buys records management partner

HP announced last night that it has bought its enterprise content management (ECM) partner, Tower Software, Australia-based makers of TRIM Context 6. TRIM is already sold with HP’s Information Access Platform (IAP–formerly RISS). Terms of the deal weren’t disclosed.

Tower’s software is tangential to digital data storage–it deals in paper records management and also offers workflow management similar to Documentum (though Documentum is a broader product), which doesn’t get much coverage on SearchStorage.com.

But HP is also framing the acquisition as an e-discovery play, according to Robin Purohit, vice president and general manager of information management for HP software. “The proposed deal will [give] HP software the broadest e-discovery capabilities and help manage the capture, collection and preservation of electronic records for government and highly regulated industries,” Purohit said.

Tower also has a good reputation when it comes to managing SharePoint, which Purohit predicted will be the next concern to hit the e-discovery market. “[The acquisition] allows HP software to address the next wave of e-discovery and compliance challenges posed by the explosion in business content stored in Microsoft SharePoint portals,” he said.

ESG analyst Brian Babineau said he agreed with that assessment, and said Tower’s work with Microsoft to integrate with SharePoint has been deeper than most. “Tower has been focused on integrating its application with other applications, from the desktop to the application server, and they’ve done a lot of work with Microsoft,” he said. An example of the integration Tower offers is the ability to mark files as TRIM records within the application, including Word and SharePoint documents.

“Everyone’s going to say they can archive SharePoint,” Babineau acknowledged. But “it’s a matter of how close you are with Microsoft.”

Tower’s going to have to get closer to HP, too, in Babineau’s estimation. Right now TRIM can draw from IAP as a content repository, but Babineau said he’d like to see TRIM and IAP work together to sort out data that’s being treated as a business record from data that’s being archived for storage management purposes, and to enforce policies on business records in tandem.

Learning this market space will also be a challenge for HP, Babineau predicted. “They need to understand the dynamics of records management and how to connect it to their software group,” he said. “They also need to figure out how to sell the technology.

“It’s not something they can’t handle, but it’s something they’ll have to learn,” he added. “As long as they can retain [Tower] people and figure out how to sell it, it’ll work.”

EMC Centera exec leaves for CAS startup

EMC’s Centera has been something of a question mark for many in the industry over the last 6-8 months. Rumors seem to continually swirl around a major overhaul or replacement for the first content-addressed storage (CAS) system to hit the market. Those rumors and speculation persist even after hardware and software refreshes, such as the introduction of CentraStar 4.0 software last week, and despite insistence from EMC officials that no further major overhauls to the system are planned.

So far Centera remains the leader in market share and the best-known CAS product in the industry, but as we all know, the archiving market is heating up like never before right now, and other big competitors like Hewlett-Packard and Hitchi Data Systems  have been refreshing archiving systems to compete better, to say nothing of archiving startups (or re-starts) popping up like mushrooms all over the industry.

Today, in an interesting twist, one of those startups, Caringo,  revealed that Centera’s director of technology, Jan Van Riel, has left EMC to be Caringo’s VP of Advanced Technology.

Execs leave EMC all the time, often for positions of higher responsibility at newer companies. But there’s a tangled, shared history between these players in particular. The founders of Caringo were also among the co-founders of FilePool, which became Centera when EMC acquired it in 2001. Van Riel was the CTO of FilePool prior to joining EMC as part of the acquisition.

Caringo’s CAS uses standard CIFS and NFS protocols to ingest data, rather than a proprietary API as Centera does. Caringo’s product can run on clusters of virtually any kind of hardware (one example they showed me was the software running on a Mac external drive). With this product, they find themselves in the strange position of launching attacks against what they view as the proprietary, hardware-bound nature of a competitive product that they themselves created.

Who knows if it really means anything that Van Riel has joined with his old buddies again, but he also made a public statement critical of EMC in the press release Caringo put out announcing the move: “With EMC scaling down the Centera unit and the future of Centera unclear, the chance to join Caringo, which understands the potential of CAS, and partner once again with Paul Carpentier was too good of an opportunity to pass up.”

The plot thickens…

Email archiving: focus or experience?

Last week I was briefed by Internet security software company Trend Micro on its new email archive offering, dubbed the Trend Micro Message Archiver, which was launched Monday.

The product, from a storage geek’s point of view, is about as bleeding-edge as its name. It has the usual checklist items we’ve been hearing about from earlier arrivals to this market, from indexing to .pst import. The product also does MD5 hashing for content-addressed storage, etc. At some points it feels like the email archiving players have all seen a Chinese menu somewhere, and they pick and choose certain features. There’s a superset of common product features so ubiquitous in that market it’s begun to feel commoditized.

What captured my attention when it came to TMMA isn’t the product but who’s offering it. Trend Micro is a 20-year-old, global, $848 million-a-year company. Since 2004, MSN Hotmail has been using Trend Micro to scan messages and attachments in its users’ accounts.

The first thing this means is that the product will be integrated as it matures with TM’s access controls, anti-spam and anti-virus filters, email certification and encryption features. Trend Micro’s not alone in this kind of integration (Lucid8 and others jump to mind), but they are pretty unique in terms of their size and brand recognition. And the times I’ve stepped out of my little storage-centric cave and spoken with people in adjacent markets–like, say, the e-discovery and legal compliance folks–I’ve heard many of them say that the storage guys aren’t getting it in some areas, like evidentiary standards that may apply to emails in court beyond what most email archivers offer today. It might be that a little expertise from other markets is what these products need.

This also might be where this new wave of non-storage vendors like Trend Micro making forays into the storage market will find a way to add value. For security-concerned customers, the TM product could offer a focus on security integration, delivery from an already-trusted vendor, and the ever-popular ‘one throat to choke’ as well.

But then again, the ultimate purpose of the product is to store and protect email data. The security features are nice, but secondary to the main function of the product. And many storage admins would probably rather go with a vendor that has experience in the core feature of the product, which is data protection.  

I’m also seeing this dichotomy emerge in another hot market–storage SaaS. In that market, there are also new offerings from experienced storage players competing against new ‘one stop shop’ offerings from adjacent players–EMC’s Fortress vs. new backup and hosted storage offerings from data center service providers like The Planet and Savvis.

I, for one, am curious to see which model users will find preferable as overlaps grow between the different disciplines of IT. Which will be more important: focus, as in focus on the existing relationship with the customer and consolidated vendor relationships, or experience, in designing and supporting storage products?

FRCP looking like a PITW (Pain in the Wallet)

I’m not sure how we get all mired in TLA’s but this FRCP is going to be a PITA (pain in the you-know-where), because it’s a four-letter acronym!

I’ve been fielding quite a few requests for legal holds recently, and I’ve been tracking the storage used by legal holds on our SAN and tape library. Out of curiosity, I started doing research on the average length of a trial, then tabulating the cost of storing the data requested on WORM for that time.

Guess what I’ve found?Some trials last a loooooooong time, and the costs are not insignificant. Now I see why Beth has been ringing the alarm about FRCP.

My company has been very lucky — we have a great risk and legal team as well as solid policy.  But people will still sue if you have a business address. The incidental cost of keeping someone’s mailbox around for five years or so while they litigate (then appeal when they lose) is high, but can a company afford not to do so? What happens when you can’t produce an email to back up your side of a dispute? Worse still, what if the other side accuses you of damaging their case by not providing them with the emails they’ve requested?

There’s a “Safe Harbor” clause in the FRCP that absolves companies of responsibility if the company has — and strictly follows — a deletion and retention policy. This protects the company from falling afoul of the regulation, but does my act (as an end user) of deleting an email fall under the “Safe Harbor” clause?

Let me put on my lawyer hat. Okay, it’s on. I’ve seen some precedent that leads me to believe that simply having and following a policy is not enough. Say that, as a network administrator, I have a policy that strictly prohibits viewing pornography on a company network. I can communicate the policy, but if I don’t have measures in place to actively block pornography or follow up complaints about it, I may leave myself open to suit. Some of you may be thinking, “Why would you have a rule that you can’t look at pornography and not have a content filter in place?” My point exactly: Why have a deletion and retention policy, and allow people to do their own deleting and retaining?

This is going to get very esoteric and confusing (as many of our laws are), but what I took away from this article was this: If you allow me to do something, you may be implicitly approving of the behavior. Not to mention that while the employee viewing the pornography is breaking the rules and doesn’t have a case against me, what about the person walking by their terminal who sees it against their will?

So as it relates to e-discovery, if you allow me to delete my own emails, are you implicitly approving of me disobeying retention and deletion policy?

I started thinking about this a little deeper (which almost always spells trouble) and technically, it seems like I would have to have CDP in place and store every email entering and leaving every mailbox forever to be really covered against every contingency. Suppose I’m an end-user, and I delete an incriminating email, but then sue and claim I need the email to prove my case, and that you should have that email available. . .BUT my mailbox wasn’t backed up before I deleted the message. Are you, the respondent, still in hot water?

Implications abound here. Will SMBs that fall under some form of regulation — SOX, HIPAA, etc. — have to store every email forever? I’d love some readers to weigh in on this. Have any of you out there fought this battle with management? Do you know of any vendors that have products that address this particular issue?

I’m curious as to how deep this particular rabbit hole goes and how many folks have been forced to follow it to its logical end. Is there a crazy playing card there yelling “Off with their heads!!”?