For most companies the demand for storage capacity is expanding at a rate never before experienced. Traditional scale up storage arrays and even the newer scale out NAS arrays used for primary storage are having difficulty keeping up with the growth. At the root of this problem is the explosion of unstructured data.
Typically the term unstructured data has referred to office productivity files like Word, Excel, PowerPoint, and Google Docs. But today what has really kicked the growth into high gear is the creation of rich media files that companies use in social media, training, communications, and advertising. Add to that machine generated files that come from sensors that monitor audio, video, temperature, pressure, and hundreds of other key data points that are tracked by companies for quality and security reasons. Video surveillance, drone, and body cameras now record in high definition and generate tens to thousands of GBs of data per day. Tally it up and it becomes easy to see how quickly the number and size of the data files generated daily can fill up any primary storage array.
Another challenge with unstructured data files is that once they are created and used, they are rarely, if ever, used again. Companies are reluctant to delete them however because compliance or industry regulations dictate that they are retained for years or even decades. Additionally the files may be reused or repurposed by the company at some point in the future. So what happens is that active data files that have high value at the time of creation become inactive over time and have little immediate value. Yet they continue to occupy up to 70% of the expensive space on primary storage arrays.
The demand for additional capacity is not going to subside so there needs to be a way to add it in a manageable and controlled manner. For most companies the “easy” fix is to add another primary storage array. This typically happens when the existing array passes 50% capacity utilization, the demarcation point for performance degradation. But this only addresses the issue of adding needed capacity. There are four other considerations that are not being taken into account. One, the physical requirements for the new storage array includes space, power, cooling, and networking. Two, the data on the new array needs to be included in the data protection scheme, which means a longer backup window and a larger backup target. Three, as the files age and become inactive, they still occupy space on the new storage. And four, the acquisition of the new storage array, the physical requirements to house it, and the data protection expansion will all add significant costs for the company.
One solution is to add an archive so that the inactive files reside on a less expensive tier of storage. Depending upon the need for file access, that tier can consist of disk, tape, or a combination of the two. Object storage provides a great disk solution for an archive, one in which the data files can be recalled quickly. And many object storage solutions, like Amplidata and Cloudian, are self-protecting because they use replication, erasure coding, and geo-dispersal. This means that the data does not need to be included in the standard data backup scheme. Crossroads’ StrongBox appliance uses a combination of disk and tape to move the files off of the primary storage. The most active files are cached on disk and the less active are stored on tape. StrongBox uses the open standard Linear Tape File System (LTFS) so that the tapes can be read in the future without need for special software. The tape archive is self-healing and again, the backup scheme is highly optimized; up to 70% more efficient without all of this unstructured data included.
There are many companies in the archive space that can provide object storage and tape. Quantum has a range of tier offerings including Lattus object storage and Scalar Tape Libraries. It does not matter whether disk, tape, or a combination is the chosen solution. The implementation of an active archive will improve the primary disk performance, reduce the need for data protection hardware and software, keep the inactive unstructured data readily accessible, and reduce total storage costs in the long run.