Luis Ceze – 91̽��News

Popular third-party genetic genealogy site is vulnerable to compromised data, impersonations

Sarah McQuate — Tue, 29 Oct 2019 13:11:40 +0000

DNA testing services are making it easier for people to learn about their heritage. People can also use their genetic testing results to connect to potential relatives in their family trees by using third-party sites, like GEDmatch, where they can compare their DNA sequences to others in the database. Photo:

DNA testing services like 23andMe, Ancestry.com and MyHeritage are making it easier for people to learn about their ethnic heritage and genetic makeup. People can also use genetic testing results to connect to potential relatives by using third-party sites, like, where they can compare their DNA sequences to others in the database who have uploaded test results.

But a less happy ending is also possible. Researchers at the 91̽�� have found that GEDmatch is vulnerable to multiple kinds of security risks. An adversary can use only a small number of comparisons to extract someone’s sensitive genetic markers. A malicious user could also construct a fake genetic profile to impersonate someone’s relative.

The team Oct. 29. The researchers have also had this research accepted at the and will present these results in February in San Diego.

“People think of genetic data as being personal — and it is. It’s literally part of their physical identity,” said lead author, a postdoctoral researcher in the 91̽��Paul G. Allen School of Computer Science & Engineering. “This makes the privacy of genetic data particularly important. You can change your credit card number but you can’t change your DNA.”

91̽��researchers found that an adversary can use only a small number of comparisons on GEDmatch to extract sensitive genetic markers for someone and construct a fake genetic profile to impersonate someone’s relative. Shown here is a genetic pedigree outline of two parents with two kids. Then another child (red) falsely claims to be related to the father. Photo: Rebecca Gourley/91̽��

The mainstream use of genetic testing results for genealogy is a relatively recent phenomenon. The initial benefits may have obscured some underlying risks, the researchers say.

“When we have a new technology, whether it is smart automobiles or medical devices, we as a society start with ‘What can this do for us?’ Then we start looking at it from an adversarial perspective,” said co-author, a professor in the Allen School. “Here we’re looking at this system and asking: ‘What are the privacy issues associated with sharing genetic data online?'”

To look for security issues, the team created a research account on GEDmatch. The researchers uploaded experimental genetic profiles that they created by mixing and matching genetic data from multiple databases of anonymous profiles. GEDmatch assigned these profiles an ID that people can use to do one-to-one comparisons with their own profiles.

For the one-to-one comparisons, GEDmatch produces graphics with information about how much of the two profiles match. One graphic is a bar for each of the 22 non-sex chromosomes. Each bar changes length depending on how similar the two profiles are for that chromosome. A longer bar shows that there are more matching regions, while a series of shorter bars means that there are short regions of similarity interspersed with areas that are different.

For the one-to-one comparisons, GEDmatch produces a bar for each of the 22 non-sex chromosomes that changes length depending on how similar the two profiles are for that chromosome. Shown here is an example of this graphic. A longer bar shows that there are more matching regions (top), while a series of shorter bars means that there are short regions of similarity interspersed with areas that are different (bottom). Photo: Rebecca Gourley/91̽��

The team wanted to know if an adversary could use that bar to find out a specific DNA sequence within one region of a target’s profile, such as whether or not the target has a mutation that makes them susceptible to a disease. For this search, the team designed four “extraction profiles” that they could use for one-to-one comparisons with a target profile they created. Based on whether the bar stayed in one piece — indicating that the extraction profile and the target matched — or split into two bars — indicating no match — the team was able to deduce the target’s specific sequence for that region.

“Genetic information correlates to medical conditions and potentially other deeply personal traits,” said co-author, a professor in the Allen School. “Even in the age of oversharing information, this is most likely the kind of information one doesn’t want to share for legal, medical and mental health reasons. But as more genetic information goes digital, the risks increase.”

Next the researchers wondered if an adversary could use a similar technique to acquire a target’s entire profile. The team focused on another GEDmatch graphic that describes how well the profiles match by showing a line of colored pixels that mark how well each DNA segment in the query matches the target: green for a complete match, yellow for a half match — when one strand of DNA matched but not the other — and red for no match.

Then the team played a game of 20 questions: They created 20 extraction profiles that they used for one-to-one comparisons on a target profile that they created. Based on how the pixel colors changed, they were able to pull out information about the target sequence. For five test profiles, the researchers extracted about 92% of a test’s unique sequences with about 98% accuracy.

“So basically, all the adversary needs to do is upload these 20 profiles and then make 20 one-to-one comparisons to the target,” Ney said. “They could write a program that automatically makes these comparisons, downloads the data and returns the result. That would take 10 seconds.”

Once someone’s profile is exposed, the adversary can use that information to create a profile for a false relative. The team tested this by creating a fake child for one of their experimental profiles. Because children receive half their DNA from each parent, the fake child’s profile had their DNA sequences half matching the parent profile. When the researchers did a one-to-one comparison of the two profiles, GEDmatch estimated a parent-child relationship.

Have questions? Check out to learn more about this research project.

An adversary could generate any false relationship they wanted by changing the fraction of shared DNA, the team said.

“If GEDmatch users have concerns about the privacy of their genetic data, they have the option to delete it from the site,” Ney said. “The choice to share data is a personal decision, and users should be aware that there may be some risk whenever they share data. Security is a difficult problem for internet companies in every industry.”

Prior to publishing their results, the researchers shared their findings with GEDMatch, which has been working to resolve these issues, according to the GEDmatch team. The 91̽��researchers are not affiliated with GEDmatch, however, and can’t comment on the details of any fixes.

“We’re only beginning to scratch the surface,” Kohno said. “These discoveries are so fundamental that people might already be doing this and we don’t know about it. The responsible thing for us is to disclose our findings so that we can engage a community of scientists and policymakers in a discussion about how to mitigate this issue.”

This research was funded in part by the 91̽��, which receives support from: the William and Flora Hewlett Foundation, the John D. and Catherine T. MacArthur Foundation, Microsoft, and the Pierre and Pamela Omidyar Fund at the Silicon Valley Community Foundation. This research also was funded by a grant from the Defense Advanced Research Projects Agency Molecular Informatics Program.

For more information, contact the team at dnasec@cs.washington.edu.

With a ‘hello,’ Microsoft and 91̽��demonstrate first fully automated DNA data storage

91̽��News staff — Thu, 21 Mar 2019 13:21:01 +0000

Researchers from the 91̽�� and Microsoft have demonstrated the first fully automated system to store and retrieve data in manufactured DNA — a key step in moving the technology out of the research lab and into commercial data centers.

In a simple proof-of-concept test, the team successfully encoded the word “hello” in snippets of fabricated DNA and converted it back to digital data using a fully automated end-to-end system, which is described in a published March 21 in Nature Scientific Reports.

DNA can store digital information in a space that is orders of magnitude smaller than data centers use today. It’s one promising solution for storing the exploding amount of data the world generates each day, from business records and cute animal videos to medical scans and images from outer space.

The team at the 91̽��and Microsoft is exploring ways to close a looming gap between the that needs to be preserved and our capacity to store it. That includes developing algorithms and molecular computing technologies to , which could fit all the information currently stored in a warehouse-sized data center into a space roughly the size of a few board game dice.

“Our ultimate goal is to put a system into production that, to the end user, looks very much like any other cloud storage service — bits are sent to a data center and stored there and then they just appear when the customer wants them,” said principal researcher , a 91̽��affiliate associate professor in the Paul G. Allen School of Computer Science & Engineering and a senior researcher at Microsoft. “To do that, we needed to prove that this is practical from an automation perspective.”

Information is stored in synthetic DNA molecules created in a lab, not DNA from humans or other living beings, and can be encrypted before it is sent to the system. While sophisticated machines such as synthesizers and sequencers already perform key parts of the process, many of the intermediate steps until now have required manual labor in the research lab. But that wouldn’t be viable in a commercial setting, said lead author , senior research scientist in the Allen School.

“You can’t have a bunch of people running around a data center with pipettes — it’s too prone to human error, it’s too costly and the footprint would be too large,” he said.

For the technique to make sense as a commercial storage solution, costs need to decrease for both synthesizing DNA — essentially custom-building strands with meaningful sequences — and the sequencing process that extracts the stored information. Trends are , researchers say.

Automation is another key piece of that puzzle, as it would enable storage at a commercial scale and make it more affordable, the team says.

Under the right conditions, DNA can last much longer than current archival storage technologies that degrade in a matter of decades. Some DNA has managed to persist in less than ideal storage conditions for tens of thousands of years in mammoth tusks and bones of early humans, and it should have relevancy as long as people are alive.

The automated DNA data storage system uses software developed by the team that converts the ones and zeros of digital data into the As, Ts, Cs and Gs that make up the building blocks of DNA. Then it uses inexpensive, largely off-the-shelf lab equipment to flow the necessary liquids and chemicals into a synthesizer that builds manufactured snippets of DNA and then pushes them into a storage vessel.

When the system needs to retrieve the information, it adds other chemicals to properly prepare the DNA and uses microfluidic pumps to push the liquids into a machine that “reads” the DNA sequences and converts it back to information that a computer can understand. The goal of the project was not to prove how fast or inexpensively the system could work, researchers say, but simply to demonstrate that automation is possible.

One immediate benefit of having an automated DNA storage system is that it frees researchers up to probe deeper questions, instead of spending time searching for bottles of reagents or repetitively squeezing drops of liquids into test tubes.

“Having an automated system to do the repetitive work allows those of us working in the lab to take a higher view and begin to assemble new strategies — to essentially innovate much faster,” said Microsoft researcher .

The team from the UW’s has already demonstrated that it can store cat photographs, great literary works, pop videos and archival recordings in DNA, and retrieve those files without errors in a research setting. To date they’ve been able to store 1 gigabyte of data in DNA, besting their .

The Molecular Information Systems Lab team. Photo: Dennis Wise/91̽��

The researchers have also developed techniques to perform meaningful computation — like searching for and retrieving only images that contain an apple or a green bicycle — using the molecules themselves and without having to convert the files back into a digital format.

“We are definitely seeing a new kind of computer system being born here where you are using molecules to store data and electronics for control and processing. Putting them together holds some really interesting possibilities for the future,” said 91̽��Allen School professor .

Unlike silicon-based computing systems, DNA-based storage and computing systems have to use liquids to move molecules around. But fluids are inherently different than electrons and require entirely new engineering solutions.

The researchers are developing a programmable system that automates lab experiments by harnessing the properties of electricity and water to move droplets around on a grid of electrodes. The full stack of software and hardware, nicknamed ,” can mix, separate, heat or cool different liquids and run lab protocols.

See a related story in .

The goal is to automate lab experiments that are currently being done by hand or by expensive liquid handling robots — but for a fraction of the cost.

Next steps for the team include integrating the simple end-to-end automated system with technologies such as PurpleDrop and those that enable searching with DNA molecules. The researchers specifically designed the automated system to be modular, allowing it to evolve as new technologies emerge for synthesizing, sequencing or working with DNA.

“What’s great about this system is that if we wanted to replace one of the parts with something new or better or faster, we can just plug that in,” Nguyen said. “It gives us a lot of flexibility for the future.”

###

For more information, contact Ceze at luisceze@cs.washington.edu, Strauss at rrt@we-worldwide.com or Takahashi at cnt@cs.washington.edu.

Adapted from by Microsoft.

Lunar library to include photos, books stored in DNA

Sarah McQuate — Thu, 27 Sep 2018 16:11:18 +0000

A selection of images submitted to the #MemoriesInDNA project. Photo: 91̽��

People who have submitted photos to the have selected images of family members, favorite places and tasty food that will be preserved for years in the form of synthetic DNA. Now this collection — which currently contains more than 3,000 images and is still growing — will be headed to the final frontier: space.

The Lunar Library will also contain pages stored as (dime for scale). The team is still working on how the DNA contents of this library will be stored. Photo: Arch Mission Foundation

The , which creates archives that can survive for a long time in space, that it will be partnering with researchers at the 91̽��, Microsoft and Twist Bioscience to include media stored in DNA in its newest shipment, which is destined to go to the moon in less than two years.

Researchers at the at the 91̽�� and Microsoft plan to provide both the #MemoriesInDNA project and a DNA archive of e-books for this mission. The Arch Mission Foundation’s will also include instructions for how to sequence DNA and how to access the contents of the archive.

To prepare the DNA for its life in space, the researchers have been developing new methods to package and protect the information it stores.

“Sending DNA into space is a great opportunity for us to make our storage system more robust,” said , a professor in the UW’s Paul G. Allen School of Computer Science & Engineering. “How can we protect the DNA so that it will still be readable thousands of years into the future?”

Researchers at the Molecular Information Systems Lab plan to provide both the #MemoriesInDNA project and a DNA archive of e-books for this mission. Photo: Dennis Wise/91̽��

Storing electronic data in DNA molecules saves a lot of storage space. Data centers require acres of land and account for in the United States, but DNA molecules can store information millions of times more compactly using less energy.

“DNA is so dense that we can store a lot of information in a single gram,” said Ceze. “This is huge because room is so limited in space missions.”

The basic process converts digital data’s strings of ones and zeroes into the four basic building blocks of DNA sequences: adenine, guanine, cytosine and thymine. The team is working with to create synthetic DNA molecules in a lab. This DNA doesn’t come from living organisms. Instead, it is synthesized from scratch base by base (letter by letter).

In space, stray cosmic rays could break DNA strands, making them unreadable. So Ceze and his team have been working on methods to ensure that they can still decode all the information, even if some of the DNA degrades.

The first method, called physical redundancy, involves adding multiple copies of each strand of DNA to the archive. So if one copy is destroyed, there are still many other copies with the same information. The team is considering adding billions of copies of each strand to account for degradation over time, Ceze said.

The second method, called , attaches information about the data within the DNA itself, like adding information about how two puzzle pieces go together. That way if all copies of a DNA strand go missing, the researchers can piece together what was lost and still get all of the data.

For example, to store two numbers — two and three — researchers would also store the information that two plus three equals five. So if something happened to the number two, the numbers five and three would still exist. That logic could be reversed to conclude that the missing information is five minus three — or two.

Now that the team is working with the Arch Mission Foundation, it has a strict deadline to finalize all packaging and storage plans: The Lunar Library is expected to be .

“We’re proud that this partnership with Arch continues to push the boundaries of what’s possible in increasingly exciting ways and remarkable directions,” said collaborator , a senior researcher at Microsoft and a 91̽��affiliate associate professor of computer science and engineering. “This is an incredibly exciting project and we have a great multidisciplinary team working on it: coding theorists, computer architects, engineers and molecular biologists, all coming together to make this new technology a reality.”

An image of the moon (top right) that is included in the #MemoriesInDNA project. Photo: 91̽�� #MemoriesInDNA

For more details about how to include your own images in the #MemoriesInDNA project, visit the or email lunarlibrary@memoriesindna.com. Note: To be included in the DNA image collection, photographs cannot be copyrighted by any other party and must be free of violent or inappropriate content. The image dataset will be preserved in DNA indefinitely and shared with researchers worldwide.

###

For more information, contact misl-info@cs.washington.edu or visit the #MemoriesInDNA Project website: .

#MemoriesInDNA Project wants to store your photos in DNA for the benefit of science – and future generations

Jennifer Langston — Wed, 24 Jan 2018 16:56:41 +0000

A selection of images submitted to the #MemoriesInDNA Project. Photo: 91̽��

If you could pick an image to be preserved for thousands of years, what would it be? A picture of your family, an endangered landscape, a page of poetry, or a snapshot that sends a message to the future?

Researchers from the at the 91̽�� and Microsoft are looking to collect 10,000 original images from around the world to preserve them indefinitely in synthetic DNA manufactured by . DNA holds promise as a revolutionary storage medium that lasts much longer and is many orders of magnitude denser than current technologies.

The team has already encoded important compositions in DNA molecules, including The Universal Declaration of Human Rights, the top 100 books of Project Gutenberg, songs from the and an .

The invites the public to submit original photographs that they’d like to see preserved in DNA for millennia. The images — which can be uploaded at the — will be encoded in synthetic DNA and made available to researchers worldwide. The researchers also are encouraging people to share their images on social media with the hashtag #MemoriesInDNA and include a story about why the photograph or video is important to them.

Lead researchers on the UW/Microsoft DNA data storage project include (left to right) Georg Seelig, 91̽��associate professor of electrical engineering and of computer science and engineering; Luis Ceze, the Torode Family Career Development Professor of Computer Science & Engineering; and Karin Strauss, a Microsoft researcher and 91̽��affiliate associate professor of computer science and engineering. Photo: Tara Brown Photography

“It’s your turn to show us what should be preserved in DNA forever,” said , professor in the UW’s Paul G. Allen School of Computer Science & Engineering. “We want people to go out and take a picture of something that they want the world to remember — it’s a fun opportunity to send a message to future generations and help our research in the process.”

DNA data storage has emerged as a potential solution to bridge the growing gap between the amount of digital data generated today — by everything from commercial video to space imagery to medical records — and our ability to affordably and efficiently store that data.

Unlike data centers, which require acres of land and account for in the United States, DNA molecules can store information millions of times more compactly. The basic process converts the strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences — adenine, guanine, cytosine and thymine. It employs synthetic DNA molecules created in a lab, not living DNA.

The team of 91̽��computer scientists and electrical engineers, in collaboration with Microsoft researchers and working with Twist Bioscience, holds the for the amount of data stored in DNA. So far they have been able to encode photographic images and video in DNA and retrieve and convert those individual molecular “files” back into digital data.

Their next challenge involves exploring how to perform meaningful data processing directly in DNA — without having to convert the images back into their electronic form.

“Let’s suppose you have a trillion images encoded in DNA and want to find all the photographs that have a red car in them, or to find out whether a person’s face exists in those images,” said Ceze. “We want to be able to do that information processing in DNA directly — to search in a smart way and make the molecules themselves carry out that computer vision work.”

A digital microfluidics prototype in the UW’s Molecular Information Systems Lab. Photo: Dennis Wise/91̽��

The team will encode approximately 10,000 of the crowdsourced images in manufactured snippets of DNA. The researchers’ approach to searching images directly in DNA relies on the fact that certain nucleotides stick to others — A binds to T and C binds to G.

They can introduce strips of DNA into the solution that contains a coded “query” — essentially, a string of complementary DNA that causes all photographs with a red car or certain facial features or whatever meets the criteria of the query to bind to it. By attaching magnetic nanoparticles to the query DNA, they can use a magnet to pull out all the similar images that have stuck to it.

“It is thrilling to bring computer science and molecular biology together in this project,” said Microsoft senior researcher and collaborator Karin Strauss. “T��re has been amazing progress recently in both areas and, when combined, they can be very powerful in tackling problems created by the massive amounts of data we’ve been generating.”

“Having a set of diverse images from around the world will help us invent new ways to make molecules work with each other to carry out these computations directly,” said Microsoft partner architect and collaborator .

An Illumina NextSeq flow cell, which is used by researchers in the UW’s Molecular Information Systems Lab to sequence DNA samples that contain digital data. Photo: Dennis Wise/91̽��

The team will employ machine learning to devise methods to map and encode all the visual features contained in a photograph — such as colors, curves, lines and objects — in DNA. The main challenge is doing that in a way that allows scientists to extract similar things and perform meaningful data processing.

“We will use neural networks to explore ways to classify visual patterns in the images and video that we encode in DNA,” said , 91̽��associate professor of electrical engineering and in the Allen School. “For example, are there more red cars than blue cars in a photograph? Or are there people riding bicycles?”

“With proof-of-concept achieved for DNA as a digital data storage media, we are working to drive down the cost of synthesizing DNA to enable its potential as a widely-available commercial solution for the growing body of precious data in digital format, such as archival data, financial and health record backups, and all long-term data retention where current media is not practical,” said Emily M. Leproust, CEO of Twist Bioscience. “MemoriesInDNA is a fabulous project to showcase the technological, scientific and cultural importance of DNA worldwide and we look forward to our role in this historic event.”

#MemoriesInDNA will provide an important library of images to be encoded in a separately funded project supported by the Defense Advanced Research Projects Agency (DARPA) . 91̽��was recently awarded $6.3M to accelerate the pace at which data can be encoded in DNA, and to develop new capabilities to process this data through image search and classification. The work will build the foundation on which 91̽��can advance its next-generation work in molecular information processing.

Note: To be included in the DNA image collection, photographs cannot be copyrighted by any other party and must be free of violent or inappropriate content. The image dataset will be preserved in DNA indefinitely and shared with researchers worldwide. For more details about how to upload and share images, visit the .

###

For more information, contact misl-info@cs.washington.edu or visit the #MemoriesInDNA Project website: .

DNA sequencing tools lack robust protections against cybersecurity risks

Jennifer Langston — Thu, 10 Aug 2017 13:10:03 +0000

91̽��researchers have demonstrated for the first time that it is possible to remotely compromise a computer using information stored in DNA. This test tube holds hundreds of billions of copies of the exploit code stored in synthetic DNA molecules, which has the potential to compromise a computer system when it is sequenced and processed. Photo: Dennis Wise/91̽��

Rapid improvement in DNA sequencing has sparked a proliferation of medical and genetic tests that promise to reveal everything from one’s ancestry to fitness levels to microorganisms that live in your gut.

A new study from 91̽�� researchers that finds evidence of poor computer security practices used throughout the field .

In the , which will be presented Aug. 17 in Vancouver, B.C., at the , the team also demonstrated for the first time that it is possible — though still challenging — to compromise a computer system with a malicious computer code stored in synthetic DNA. When that DNA is analyzed, the code can become executable malware that attacks the computer system running the software.

So far, the researchers stress, there’s no evidence of malicious attacks on DNA synthesizing, sequencing and processing services. But their analysis of software used throughout that pipeline found known security gaps that could allow unauthorized parties to gain control of computer systems — potentially giving them access to personal information or even the ability to manipulate DNA results.

“One of the big things we try to do in the computer security community is to avoid a situation where we say, ‘Oh shoot, adversaries are here and knocking on our door and we’re not prepared,’” said co-author , professor at the UW’s Paul G. Allen School of Computer Science & Engineering.

“Instead, we’d rather say, ‘Hey, if you continue on your current trajectory, adversaries might show up in 10 years. So let’s start a conversation now about how to improve your security before it becomes an issue,’” said Kohno, whose previous research has provoked high-profile discussions about vulnerabilities in emerging technologies, such as and .

Lee Organick (left), Karl Koscher (center) and Peter Ney (right) from the UW’s Molecular Information Systems Lab and the Security and Privacy Research Lab prepare the DNA exploit for sequencing. Photo: Dennis Wise/91̽��

“We don’t want to alarm people or make patients worry about genetic testing, which can yield incredibly valuable information,” said co-author and Allen School associate professor . “We do want to give people a heads up that as these molecular and electronic worlds get closer together, there are potential interactions that we haven’t really had to contemplate before.”

In the new paper, researchers from the 91̽�� and 91̽�� offer recommendations to strengthen computer security and privacy protections in DNA synthesis, sequencing and processing.

The research team identified several different ways that a nefarious person could compromise a DNA sequencing and processing stream. To start, they demonstrated a technique that is scientifically fascinating — though arguably not the first thing an adversary might attempt, the researchers say.

“It remains to be seen how useful this would be, but we wondered whether under semi-realistic circumstances it would be possible to use biological molecules to infect a computer through normal DNA processing,” said co-author and Allen School doctoral student .

DNA is, at its heart, a system that encodes information in sequences of nucleotides. Through trial and error, the team found a way to include executable code — similar to computer worms that occasionally wreak havoc on the internet — in synthetic DNA strands.

This output from a sequencing machine includes the 91̽��team’s exploit, which is being sequenced with a number of unrelated strands. Each dot represents one strand of DNA in a given sample. Photo: Dennis Wise/91̽��

To create optimal conditions for an adversary, they introduced a known security vulnerability into a software program that’s used to analyze and search for patterns in the raw files that emerge from DNA sequencing.

When that particular DNA strand is processed, the malicious exploit can gain control of the computer that’s running the program — potentially allowing the adversary to look at personal information, alter test results or even peer into a company’s intellectual property.

“To be clear, there are lots of challenges involved,” said co-author , a research scientist in the Molecular Information Systems Lab. “Even if someone wanted to do this maliciously, it might not work. But we found it is possible.”

In what might prove to be a more target-rich area for an adversary to exploit, the research team also discovered known security gaps in many open-source software programs used to analyze DNA sequencing data.

This data file tells researchers what sequence their DNA had as well as the quality of the read (with E higher quality than A). The team demonstrated that it is possible to place malicious code in a strand of DNA that, when sequenced, could attack the software used for analysis. Photo: Dennis Wise/91̽��

Some were written in unsafe languages known to be vulnerable to attacks, in part because they were first crafted by small research groups who likely weren’t expecting much, if any, adversarial pressure. But as the cost of DNA sequencing has plummeted over the last decade, open-source programs have been adopted more widely in medical- and consumer-focused applications.

Researchers at the 91̽��Molecular Information Systems Lab are working to create next-generation archival storage systems by . Although their system relies on DNA sequencing, it does not suffer from the security vulnerabilities identified in the present research, in part because the MISL team has anticipated those issues and because their system doesn’t rely on typical bioinformatics tools.

Recommendations to address vulnerabilities elsewhere in the DNA sequencing pipeline include: following best practices for secure software, incorporating adversarial thinking when setting up processes, monitoring who has control of the physical DNA samples, verifying sources of DNA samples before they are processed and developing ways to detect malicious executable code in DNA.

“T��re is some really low-hanging fruit out there that people could address just by running standard software analysis tools that will point out security problems and recommend fixes,” said co-author , a research scientist in the 91̽��Security and Privacy Lab. “T��re are certain functions that are known to be risky to use, and there are ways to rewrite your programs to avoid using them. That would be a good initial step.”

The research was funded by the 91̽�� , the Short-Dooley Professorship and the Torode Family Professorship.

###

For more information, contact the research team at dnasec@cs.washington.edu.

Images available for download here:

Study available for download here:

Popular Science picks DNA data storage project for 2016 ‘Best of What’s New’ Award

Jennifer Langston — Wed, 19 Oct 2016 18:15:37 +0000

The award- winning Molecular Information Systems Lab research team includes: Front (left to right) — Bichlien Nguyen, Lee Organick, Hsing-Yeh Parker, Siena Dumas Ang, Chris Takahashi; Back (left to right): James Bornholt, Yuan-Jyue Chen, Georg Seelig, Randolph Lopez, Luis Ceze, Karin Strauss. Not pictured: Doug Carmean, Rob Carlson. Photo: Tara Brown Photography/91̽��

In the announced Wednesday, Popular Science recognized a technique developed by 91̽��and Microsoft researchers to store and retrieve digital data in DNA as one of the most innovative and game-changing technologies of the year.

The team from the 91̽�� announced in July that they had for the amount of digital data successfully encoded and retrieved in DNA molecules, which are a much denser and more durable long-term storage medium than current archival technologies like hard drives or magnetic tape.

They successfully encoded and decoded a , the Universal Declaration of Human Rights in more than 100 languages, the top 100 books of Project Gutenberg and the Crop Trust’s seed database — among other things— all on strands of DNA. The researchers have developed a novel approach to converting the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences — adenine, guanine, cytosine and thymine – as well as the ability to retrieve specific files from those sequences.

The team is currently focusing on automating and scaling up the DNA data storage technique, which was recognized in Popular Science’s .

“T�� honor the innovations that shape the future,” said Kevin Gray, executive editor of Popular Science. “From life-saving technology to incredible space engineering to gadgets that are just breathtakingly cool, this is the best of what’s new.”

The award is shared by lead 91̽��researchers , Torode Family Career Development Professor of computer science and engineering, and , associate professor of electrical engineering and of computer science & engineering; Microsoft principal project researchers Karin Strauss and Doug Carmean; and a team of two dozen students and researchers from both institutions.

Learn more about the DNA data storage project in this UWTV video:

UW, Microsoft researchers break record for DNA data storage

Jennifer Langston — Thu, 07 Jul 2016 12:34:30 +0000

91̽�� and Microsoft researchers have broken what they believe is the world record for the amount of digital data successfully stored — and retrieved — in DNA molecules.

The team of computer scientists and electrical engineers encoded and decoded this (featuring the craziest Rube Goldberg machine ever), the Universal Declaration of Human Rights in more than 100 languages, the top 100 books of Project Gutenberg and the Crop Trust’s seed database — among other things— all on strands of DNA.

More details about the breakthrough are available on Microsoft’s and in :

Here, , the UW’s Torode Family Career Development Professor of computer science and engineering and one of the project’s lead researchers, expands on the latest news-making accomplishment from the 91̽��:

Why are people interested in using DNA to store digital data?

LC: The world is producing data at an incredible rate, and storage technologies need to keep up. DNA is a remarkable storage molecule — it is millions of times denser than other storage media, it is incredibly durable (think millennia) and it never becomes obsolete. We humans, as DNA-based life forms, will always be interested in reading and writing DNA.

How quickly are we running out of room to warehouse all the data — from quirky cat videos to shopping preferences to essential medical records — the world is producing?

LC: Very quickly. Already today we can’t store all data produced. Sure, a lot of that data might not be so useful, but the gap is only increasing. That is especially true of all the video and genomic data that will be produced over the next decade.

How much data did the UW-Microsoft research team store and retrieve in DNA strands and what have you learned?

LC: We stored 200MB of data. This experiment led to several important breakthroughs that improved our ability to manipulate more complex pools of synthetic DNA. It allowed us to better understand what kinds of errors crop up and how to deal with them.

Why choose OK Go’s “This Too Shall Pass” video?

LC: We wanted to store something creative and in a modern format. HD video was a natural choice for format. And OK Go — being such a creative band — was a perfect fit. Also, there is an interesting connection between Rube Goldberg machines and molecular biology. Nature has produced incredible molecular machines, and when looked at closely enough might resemble a very complex but very reliable Rube Goldberg machine — without the soundtrack though!

Lead researchers on the UW/Microsoft DNA data storage project include (left to right) Georg Seelig, 91̽��associate professor of electrical engineering and of computer science and engineering; Luis Ceze, the Torode Family Career Development Professor of Computer Science & Engineering; and Microsoft researcher Karin Strauss. Photo: Tara Brown Photography/ 91̽��

How do you encode digital data — which is made up of 1s and 0s — in the building blocks of DNA?

LC: Interestingly, DNA already has a digital “flavor,” as it has four bases and molecules that “stick” to each other in a very programmable way. So the first step in storing digital data into DNA is to map strings of 1s and 0s into strings of As, Cs, Gs and Ts. Next, the DNA sequences are actually “manufactured” chemically, in a very parallel way. Our collaborator Twist Bioscience has a silicon-based DNA synthesis substrate that can make many different sequences in parallel. After the DNA molecules are manufactured, they are put in a test tube and dehydrated. And if protected from light and heat, they can last a long — and I mean very long — time.

How can you find and retrieve the files you’re looking for?

LC: When one wants to read data, the DNA is re-suspended and read by a DNA sequencer, which determines what A, C, G, T letters comprise the molecules. From that, our algorithms recover the original digital data. Despite being reliable, DNA writing and reading have errors, just like hard drives and electronic memories have errors, so we needed to develop error-correcting codes to reliably retrieve data. We also developed a method for “random access,” which means you selectively read only the data you want and not the whole thing. We do that by borrowing from nature again and using DNA amplification — using specifically — to only amplify the desired data.

What’s next for the Molecular Information Systems Lab?

LC: There are still many challenges in making DNA storage mainstream. We will continue to focus on developing an end-to-end system and work with our Microsoft and Twist Bioscience collaborators to reduce the cost and increase the speed of writing and reading DNA.

For more information, read this on the DNA data storage project or contact Ceze at luisceze@cs.washington.edu. Follow Ceze on Twitter at @luisceze.

91̽��team stores digital images in DNA — and retrieves them perfectly

Jennifer Langston — Thu, 07 Apr 2016 15:18:12 +0000

All the movies, images, emails and other digital data from more than 600 basic smartphones (10,000 gigabytes) can be stored in the faint pink smear of DNA at the end of this test tube. Photo: Tara Brown Photography/ 91̽��

Technology companies routinely build to store all the baby pictures, financial transactions, funny cat videos and email messages its users hoard.

But a new technique developed by 91̽�� and Microsoft researchers could shrink the space needed to store digital data that today would fill a Walmart supercenter down to the size of a sugar cube.

For more information, visit the .

In a presented in April at the , the team of computer scientists and electrical engineers has detailed one of the first complete systems to encode, store and retrieve digital data using DNA molecules, which can store information millions of times more compactly than current archival technologies.

Authors of the paper are 91̽��computer science and engineering doctoral student , 91̽��bioengineering doctoral student , 91̽��associate professor of computer science and engineering , 91̽��associate professor of electrical engineering and of computer science and engineering , and Microsoft researchers and 91̽��CSE affiliate faculty Doug Carmean and Karin Strauss.

In one experiment outlined in the paper, the team successfully encoded digital data from four image files into the nucleotide sequences of synthetic DNA snippets. More significantly, they were also able to reverse that process — retrieving the correct sequences from a larger pool of DNA and reconstructing the images without losing a single byte of information.

The team has also encoded and retrieved data that authenticates archival video files from the UW’s project that contain interviews with judges, lawyers and other personnel from the Rwandan war crime tribunal.

“Life has produced this fantastic molecule called DNA that efficiently stores all kinds of information about your genes and how a living system works — it’s very, very compact and very durable,” said co-author , 91̽��associate professor of computer science and engineering.

“We’re essentially repurposing it to store digital data — pictures, videos, documents — in a manageable way for hundreds or thousands of years.”

Lee Organick, a 91̽��computer science and engineering research scientist, mixes DNA samples for storage. Each tube contains a digital file, which might be a picture of a cat or a Tchaikovsky symphony. Photo: Tara Brown Photography/ 91̽��

The digital universe — all the data contained in our computer files, historic archives, movies, photo collections and the exploding volume of digital information collected by businesses and devices worldwide — is .

That’s a tenfold increase compared to 2013, and will represent enough data to fill more than . While not all of that information needs to be saved, the world is producing data faster than the capacity to store it.

DNA molecules can store information many millions of times more densely than existing technologies for digital storage — flash drives, hard drives, magnetic and optical media. Those systems also degrade after a few years or decades, while DNA can reliably preserve information for centuries. DNA is best suited for archival applications, rather than instances where files need to be accessed immediately.

The team from the housed in the 91̽��Electrical Engineering Building, in close collaboration with , is developing a DNA-based storage system that it expects could address the world’s needs for archival storage.

First, the researchers developed a novel approach to convert the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences — adenine, guanine, cytosine and thymine.

“How you go from ones and zeroes to As, Gs, Cs and Ts really matters because if you use a smart approach, you can make it very dense and you don’t get a lot of errors,” said co-author , a 91̽��associate professor of electrical engineering and of computer science and engineering. “If you do it wrong, you get a lot of mistakes.”

The digital data is chopped into pieces and stored by synthesizing a massive number of tiny DNA molecules, which can be dehydrated or otherwise preserved for long-term storage.

The 91̽��and Microsoft researchers are one of two teams nationwide that have also demonstrated the ability to perform “random access” — to identify and retrieve the correct sequences from this large pool of random DNA molecules, which is a task similar to reassembling one chapter of a story from a library of torn books.

The Molecular Information Systems Lab research team: Front (left to right): Bichlien Nguyen, Lee Organick, Hsing-Yeh Parker, Siena Dumas Ang, Chris Takahashi. Back (left to right): James Bornholt, Yuan-Jyue Chen, Georg Seelig, Randolph Lopez, Luis Ceze, Karin Strauss. Not pictured: Doug Carmean, Rob Carlson, Krittika d’Silva. Credit: Tara Brown Photography/91̽�� Photo: Tara Brown Photography/91̽��

To access the stored data later, the researchers also encode the equivalent of zip codes and street addresses into the DNA sequences. Using Polymerase Chain Reaction (PCR) techniques — commonly used in molecular biology — helps them more easily identify the zip codes they are looking for. Using DNA sequencing techniques, the researchers can then “read” the data and convert them back to a video, image or document file by using the street addresses to reorder the data.

Currently, the largest barrier to viable DNA storage is the cost and efficiency with which DNA can be synthesized (or manufactured) and sequenced (or read) on a large scale. But researchers say there’s no technical barrier to achieving those gains if the right incentives are in place.

Advances in DNA storage rely on techniques pioneered by the biotechnology industry, but also incorporate new expertise. The team’s encoding approach, for instance, borrows from error correction schemes commonly used in computer memory — which hadn’t been applied to DNA.

“This is an example where we’re borrowing something from nature — DNA — to store information. But we’re using something we know from computers — how to correct memory errors — and applying that back to nature,” said Ceze.

“This multidisciplinary approach is what makes this project exciting. We are drawing from a diverse set of disciplines to push the boundaries of what can be done with DNA. And, as a result, creating a storage system with unprecedented density and durability,” said , a researcher at Microsoft and 91̽��affiliate associate professor of computer science and engineering.

The research was funded by Microsoft Research, the National Science Foundation, and the David Notkin Endowed Graduate Fellowship.

For more information, contact Ceze at luisceze@cs.washington.edu or Seelig at gseelig@u.washington.edu. To reach Strauss, please contact TNRPR@we-worldwide.com.