Mar 15, 2021 | Paul Reeve
Archive data is some of the most valuable data an organisation owns. Loss of this data could result in huge fines from compliancy agencies. Reworking a seismic survey would cost millions. Reshooting a film or advertisement may be prohibitively expensive. Regardless of what industry you are in, chances are you are already thinking of better ways to store your archives.
Organisations will need to keep data in long-term archives for many reasons. One of the major reasons is meeting their legal obligation to maintain certain records for minimum time periods. This applies to areas like medical, construction and financial services. Not only must they meet their legal requirements to store data safely and securely, but they must also be able to retrieve this on demand. Valuable commercial data like seismic surveys, medical research or media content will also need to be stored for future analysis or rework or as “gold copies”. And of course, there will be national or other archives stored for future posterity.
Let’s look at the top 6 considerations when using cloud for long term data archives.
Archive data is vitally important but rarely accessed. Many organisations end up keeping this data on their primary storage systems. This creates major challenges especially for large data sets. Primary storage systems are generally expensive and require complex and costly backup systems, both of which will need to be constantly expanded to meet long-term storage requirements. The 80:20 rule will typically apply here: it will likely be the case that 80% of the data on these systems will not have been accessed in the previous year.
Moving this 80% archive data to some other secure storage would clearly result in reduced operational cost and increased efficiency. The challenge is to ensure that the archive storage meets any compliance requirements and long-term data security.
Public or private cloud would seem to be an obvious choice for storing archive data. It gets the data off premises, has almost unlimited capacity and is a managed infrastructure with very high data protection. However, not all clouds are necessarily equal and there are important considerations in choosing these solutions.
1. Data storage
Most clouds use object storage for data storage. Object storage is inherently designed with a high level of data protection. Object storage creates immutable sets of data. It includes versioning and supports geo-diverse data replication schemes. It also has enhanced metadata capabilities, which can be augmented directly or with cloud AI programs. You will often see data reliability quoted as 99.999999999% (11 9’s).
It is important to check the underlying data centre infrastructure supporting this data reliability. Leading providers will have regional data centre clusters with a geo-diverse infrastructure, which means no single data centre failure could result in data loss or loss of access to data. Other providers may only have a single geographic data centre which itself will be highly reliable and with no single points of failure but may not be protection against some catastrophic disaster. If data sovereignty is an issue, then you must ensure the data centre is in the correct geography and that any geo-dispersal does not take it out of the sovereignty region
Cost will be a major factor in choosing a cloud provider. Cloud cost calculations can be very complex as they comprise storage costs, data ingest and egress charges, speed of access to data and minimum contract periods. Large providers offer different storage tiers and these will follow the basic principle of:
The higher the storage cost per GB/month, the faster the access to data, the lower the ingress and egress charges and the shorter-term contracts.
For archive data it is worth looking at the cold storage options. These will have significantly lower storage costs per GB/month. Egress charges will however be much higher and there may be minimum 3- or 6-month contracts. Most importantly, data retrieval may take up to 24 hours though this can often be expedited at an additional cost. Cold storage will be suitable for many archives for reasons we have already identified but it is essential to consider all the cost implications.
4. Regulatory compliance
It is also important to check if the cloud provider meets any regulatory requirements. Again, larger providers will meet standards like PCI-DSS, HIPAA/HITECH, FedRAMP, EU Data Protection Directive, and FISMA.
5. Bandwidth considerations
Network infrastructure is also paramount. Network bandwidths must be able to support data transfer rates and may need to be enhanced with additional routing capability for mission-critically timed data transfers.
Cloud archive best practices
Many organisations are rightly concerned about the security of cloud storage, particularly public cloud. These concerns can all be mitigated by following these best practice guidelines:
6. Moving data in and out of the cloud
Cloud providers want to make it easy for customers to use their services and provide specific tools or programable APIs to allow this. There are also bulk ingest tools which can be deployed for larger data sets. Typically, these tools require a significant amount of IT work both for the initial ingest and subsequent data management. They may not offer any integration with local applications unless custom APIs are written.
Third-party software solutions can also be used for moving data in an out of the cloud. Storage vendors like Dell and Hitachi have S3 connectors directly in the storage management tools. A number of companies provide storage gateway cache systems that allow caching of local data and tiering of older data to the cloud. Some backup applications also have an S3 connector to store long-term backups. The challenge with many of these systems is they are often proprietary and rely on vendor-specific tools to recover data in a usable format.
Tiger Bridge is a software-only solution that integrates directly with on-premises file systems and uses automated policies to move data to any cloud object storage including hot or cold tiers based on the age of the data. Policies are set by administrators and will be completely transparent to users and applications. Integration with Active Directory services ensures that all ACLs are transferred to the object in the cloud, fully in line with the cloud security best practices already discussed. Data is transferred securely over HTTPs and stored in its native format ensuring the data is always accessible with or without Tiger Bridge. Tiger Bridge also supports cloud AI so enhanced metadata created in cloud objects can be automatically transferred to the local file system to enable advanced searching capability.
Tiger Bridge can also allow you to sync cloud storage across multiple locations without transferring the actual data, which may be essential for a centralised archive solution for large geographically dispersed organisations.
Cloud storage offers many advantages for data archives. It is an inherently secure storage platform built on advanced data centre infrastructure. It could dramatically improve the efficiency of local IT infrastructures while meeting all the regulatory or commercial requirements for storing long-term data.
The key to building a secure cloud archive is ensuring that you choose the correct provider and service level that meets your requirements and following the best practice guidelines for using cloud infrastructure.
Most importantly, consider how to manage the data in and out of the archive. Some organisations will prefer centralised controlled access while others will prefer user-accessible archives with local application integration.
Want to learn more about using Tiger Bridge to build cloud data archives? Check out our dedicated videos.
Ready to try it out for yourself? You can archive up to 1TB of managed data per month to any cloud provider free of charge with our Tiger Bridge Archive plan. View our Tiger Bridge subscription plans.