Data, data everywhere, but where to put everything? Here’s a look at five current and potential approaches to fast and high-capacity storage.
As companies continue to store huge amounts of information generated by people, businesses, vehicles and a virtually endless list of other sources, many wonder where they can store all this data in an accessible, secure, secure and cost-effective way.
The data storage business has evolved considerably over the past five years and this transformation continues and is expanding. The big difference today is that while storage once involved hardware-related issues, such as SSDs, faster reading/writing speeds and capacity extension, the cloud and other advances in storage have tipped the market to the opposite side.
“For most companies, storage is more about software, including software-defined storage, software management of virtualization, and integration of AI and ML to improve storage optimization,” said Scott Golden, general manager of enterprise data practice and analysis in the global business and technology sector. consulting firm Protiviti.
Here’s a quick look at five promising storage technologies that can now, or at some point in the foreseeable future, help businesses cope with the growing needs of data storage.
1. Data Lakes
When it comes to managing and valuing large datasets, most customers always start with data lakes, but they leverage cloud services and software solutions to get more value out of their lakes, Golden said. “Data lakes, such as Amazon’s Azure ADL and S3, allow large volumes of structured, semi-structured and unstructured data to be collected and stored in Blobs (Binary Large OBjects) or parquet files for easy recovery.”
2. Virtualization of data
Data virtualization allows users to query data on many systems without having to copy and replicate data. It can also simplify scans, make them faster and more accurate, because users always query the latest data at their source. “This means that data should only be stored once, and different views of data for transactions, analytics, etc., in relation to copying and restructuring data for each use,” said David Linthicum, director of cloud strategy at Business Business and technology advisor Deloitte Consulting.
Data virtualization has been around for some time, but with the increase in data usage, complexity and redundancy, the approach is gaining popularity. On the other hand, data virtualization can be a constraint on performance if abstractions, or data mapping, are too complex and require additional processing, Linthicum noted. There is also a longer learning curve for developers, often requiring more training.
3. Hyper-converged storage
Although not exactly state-of-the-art technology, hyper-converged storage is also being embraced by a growing number of companies. Technology usually comes as a component of a hyper-converged infrastructure in which storage is combined with computing and networking in a single system, said Yan Huang, assistant professor of business technology at Carnegie Mellon University’s Tepper School of Business.
Huang noted that hyper-converged storage streamlines and simplifies data storage, as well as the processing of stored data. “It also allows the computing and storage capacity to evolve independently in a disaggregated way,” she said. Another great advantage is that companies can create a hyper-converged storage solution using the increasingly popular NVMe over Fabrics (NVMe oF) network protocol. “Because of the pandemic, remote work has become the new norm,” Huang said. “Because some organizations are always part of their staff remotely, hyper-converged storage is attractive because it is well suited to remote work.”
4. Computer storage
State-of-the-art technology, computer storage combines storage and processing, allowing applications to run directly onto the storage medium. “Computer storage integrates processors and low-consumption ASI on the SSD, reducing latency of data access by eliminating the need to move data,” said Nick Heudecker, senior strategy director for technology services provider Cribl.
Computer storage can benefit virtually any data-hungry use case. Observability data sources, such as logs, metrics, traces and events, overshadow other data sources in most companies, Heudecker noted. Currently, researching and processing this data is becoming a challenge, even at low volumes. “It’s easy to see applications for computer storage in observability, where complex searches are transmitted directly to SSD, which reduces latency while improving performance and carbon efficiency,” he said.
The main drawback of the technology is that applications need to be rewritten to take advantage of the new model. “It will take time and, before that happens, the space has to mature,” Heudecker said. In addition, the technology is currently dominated by small startups and standards have not emerged, making it difficult to surpass the first proofs of concept. “If organizations want to get involved, they can follow the work of the Storage Networking Industry Association’s Technical Task Force on Computer Storage to monitor the development of standards,” he suggested.
5. Storage of DNA data
DNA-based data storage is a potentially revolutionary technology. Synthetic DNA promises unprecedented data storage density. A single gram of DNA can store well over 200 Po of data. And this data is sustainable. “When stored under appropriate conditions, DNA can easily last 500 years,” Heudecker said.
In DNA data storage, digital bits (0 and 1) are translated into nucleobase codes and then converted to synthetic DNA (no real organic bits are used). The DNA is then stored. “If you need to reproduce it, you can do it cheaply and easily with PCR (polymerrase chain reaction), making millions of copies of data,” Heudecker said. When it’s time to re-read it, the existing sequencing technology converts nucleobases into 0 and 1.
In the next step, enzymes are used to process the data in its DNA representation. “Just as computer storage brings processing to the data, you can introduce enzymes into the DNA data, which gives you a massive parallelization of processing on huge amounts of data,” he noted. “Enzymes therefore write new strands of DNA, which are then sequenced and converted into digital data.”
Dna data storage also offers the advantage of carbon efficiency. “Because these are all-natural biological processes, there is minimal carbon impact,” Heudecker said. However, the disadvantages of technology are significant. Creating enough synthetic DNA for a meaningful DNA reader is currently prohibitive, but companies such as CATALOG are working on the problem, he noted.
Meanwhile, several companies looking to advance DNA storage technology, such as Microsoft, Illumina and Twist Bioscience, are working hard to make it convenient enough for routine use. “I anticipate that the first DNA readers will be available in a cloud delivery model within four years,” Heudecker said.