Store data in DNA? It's possible and it has amazing advantages
This year's European Inventor Award went to an invention with the potential to revolutionise data storage. ETH professor Robert Grass, together with his colleague Wendelin Stark, has found a method to store data in DNA and preserve it for centuries. In an interview, Grass explains what is still missing for the big breakthrough and in which area DNA as a data storage medium could soon be commonplace.
Oliver Bosse spoke to Robert Grass
You turn data into DNA. Can you explain to me how that works?
Actually, it's not that extraordinary once you start looking into it. Most people can probably best understand storage media such as hard drives or optical storage such as a DVD. One works magnetically, the other by having either a hole or no hole, i.e. 0 and 1. In nature, this can be broken down in a similar way: All living things store their genetic information in DNA. This is nothing more than a very long chemical molecule with up to three billion positions. At each of these positions, there are four different possibilities: A, C, T and G. This is how nature stores its information. We humans usually use the 26 letters of the alphabet to store information, computers use 0 and 1, and nature uses A, C, T and G.
And how can this be used?
In biology, the composition of this molecule is only copied, for example in the hereditary process. In order to store something specific in it, we have to be able to consciously determine the sequence of A, C, T and G and read it out again. This idea alone – to use DNA as a data storage medium – is not new, but probably as old as the knowledge of how DNA works. The challenge, however, was how to implement it on a large scale. There has been huge progress in this area in the last 10 years – and we are now using this to be able to write data into DNA and read it out again.
What is the advantage of not having zeros and ones, but DNA?
DNA brings two enormous technical advantages: One is the extreme compactness or density with which data can be stored. The theoretical limit has been calculated at 200 exabytes per gram. That is 200 million terabytes. This means that a tiny cube of DNA would have the storage capacity of 200 million one terabyte hard drives. The second big advantage is the durability of DNA. We know from fossil finds like bones that DNA can still be read after hundreds of thousands of years if properly preserved. This is a big contrast to modern storage media, where this stability is far smaller. A hard disk might last 20 years, an SD card even less – and the data is lost.
According to calculations, a tiny cube of DNA has the storage capacity of 200 million one terabyte hard drives.
Together with Wendelin Stark, you won the 2021 European Inventor Award in the research category for precisely this. You have found a method to preserve "data DNA" over centuries ...
Exactly, we have basically recreated the fossil I mentioned, thanks to which DNA can be protected from decay. In our process, the DNA is enclosed in glass that protects it from external chemical factors. Water in particular, including humidity, is a problem. Without the glass, the DNA would decay within years, with the glass, we can predict a stability of several hundred years - very similar to what we know from DNA in fossils.
Our background is nanotechnology. Glass, or silicon dioxide, is often used there because it can be polymerised. Glass can be used to coat all kinds of things. In addition, glass provides better protection than polymers, for example, and it is easier to build up molecularly than, say, ceramics. But we are currently doing further research, for example with calcium phosphate, which is an important component of bones.
In which areas can this durability of data be important, where does the potential lie?
There are a number of things. One is the storage of valuable information or data – valuable in social or economic terms – for which one wants to be sure that it will not break. There, costs play a decisive role, as DNA data storage is still relatively expensive. Great efforts are being made to make it cheaper. When this is achieved, the benefits will certainly extend from the very valuable data to those that should or must be archived. There, many hard disks and tape drives are still in use with a shelf life of around 50 years, which then have to be overwritten again and again.
With glass, we can predict a stability of a few hundred years – very similar to what we know from DNA in fossils.
Have you already implemented projects with this type of data storage?
We have already implemented some projects, for example with artists or Netflix, but more in the direction of PR projects. The band Massive Attack, for example, asked us if we could store their album as DNA. In the end, this is visually rather unspectacular; a glass bead containing the DNA, which you perceive as a white powder. We therefore thought about how we could also create an optical value. One member of Massive Attack is a graffiti artist. So we first turned the data into white "DNA powder", put it in the glass sphere and added graffiti paint to the powder. So we can actually put data somewhere as paint, which I personally find another exciting possibility for DNA data storage, but where exactly it could be of use is questionable.
Are there any other approaches?
There is the possibility of attaching data somewhere practically invisibly, because the glass beads are so tiny and almost transparent. For example, we have "hidden" a video in a spectacle lens. And we have already experimented with 3D printing. We printed the famous Standford Bunny and inside it is polymer with DNA that encodes the instructions for building the bunny. So it is possible to break off a piece of the bunny, read the DNA and get the data to print a new bunny. Until now, data has always been bound to a fixed physical unit with a specific shape. With this method, one can break away from that.
What are the biggest difficulties in the process of DNA-based data storage?
Especially in writing the data, i.e. how DNA can be synthesised quickly in large quantities. For that, you need an appropriate device, of which there are very few in the world so far. Where more work is certainly needed is in the automation of the read-out process. For our glass beads, it takes a day in the lab with large equipment to cleanly read and prepare the DNA. Small sequencers are already in use today. But these can only read out one viral genome per day, for example, which is much smaller than a human genome. Microsoft is working on automating this process, among other things. They have a prototype the size of a table that can store and read out "Hello World" in DNA fully automatically. However, they are still a long way from a hard drive. Another point are the aforementioned costs.
Until now, data has always been bound to a fixed physical unit with a specific shape. With this method, one can break away from that.
What would have to happen for these to no longer be so high?
One hope, of course, is that the field of application and the corresponding demand for DNA as a data storage medium will one day increase and that this will drive the price development further. In the field of gene analysis, especially in the medical field, there has been great progress in this regard.
How do you create greater demand?
That is very difficult to say. Who is willing to pay more for a data storage device that is much more compact and lasts much longer? And for how much data? There are the aforementioned areas of application, such as archiving data, where one day an established data storage device could be replaced. It is interesting to note that we know from the development history of other data carriers that success often only comes with a new application. A good example is the SD card. Its breakthrough only came with the digital photo camera, although the technology had already existed for some time.
So your DNA data storage is still missing the right product, so to speak?
In a manner of speaking. There is the plannable path like the data archive, which we are currently pursuing. But of course something could also come about in a way that cannot be planned.
You are already active in the market with the ETH spin-off Haelixa, also with DNA, but in a somewhat different direction ...
The idea of Haelixa is to put DNA into a product and thus make it recognisable and traceable. We know barcodes. But these are only on the packaging. Before that, there is a gap. For example, Haelixa deals with cotton, which should be traceable over a long supply chain. With Haelixa, it is possible to invisibly mark the cotton with DNA, make it part of the product and thus trace its origin at any time. There is also a project with emeralds. Already in the mine, the raw stone is marked with DNA, which then enters the pores. It no longer matters how much it is worked and cut, the information remains in place.
Haelixa is a first example of how data as DNA is already being used today in an economic context with a realistic cost structure. However, data is only stored in very small quantities of perhaps 100 or 200 bits. This is correspondingly cheap.
Already in the mine, the raw stone is marked with the DNA and this then enters the pores. It no longer matters how much it is worked and polished, the information remains in place.
And the readout works throughout the entire supply chain?
Yes, the read-out procedure is a bit easier there. In some cases, small sequencers can be used for this. But it is also possible with a simple PCR test, like those we all know from the current Corona tests.
So things are already moving towards everyday use in this area?
Maybe not for us as private individuals, but at customs, in purchasing or in warehouses. In the future, it can also be an added value for buyers, for example of clothing. They may not see the marking, but they will know from a label, for example, that what they are buying is additionally protected by a marker.
About Robert Grass
Robert Grass, who comes from Austria, studied chemical engineering at ETH Zurich and completed his doctorate with a thesis on nanopowder synthesis and application. He has been a titular professor at the Department of Chemistry and Applied Biosciences at ETH since 2017. Robert Grass is currently working on using DNA data storage as a digital information technology and applying this technology to the long-term archiving of digital data, as well as on using DNA as a technology for integrating digital production information into physical objects. With the ETH spin-off Haelixa AG, the DNA storage method he co-developed is already being commercialised.