Georg Seelig – 91̽��News

With new ‘shuffling’ trick, researchers can measure gene activity in single cells

James Urton — Thu, 15 Mar 2018 18:13:33 +0000

For biologists, a single cell is a world of its own: It can form a harmonious part of a tissue, or go rogue and take on a diseased state, like cancer. But biologists have long struggled to identify and track the many different types of cells hiding within tissues.

Researchers at the 91̽�� and the have developed a new method to classify and track the multitude of cells in a tissue sample. In a published March 15 in the journal , the team reports that this new approach — known as SPLiT-seq — reliably tracks gene activity in a tissue down to the level of single cells.

“Cells differ from each other based on the activity of their genes — which genes are switched off or switched on,” said senior author , a 91̽��associate professor in both the Department of Electrical Engineering and the Paul G. Allen School of Computer Science & Engineering. “Using SPLiT-seq, it becomes possible to measure gene activity in individual cells, even if there are hundreds of thousands of different cells in a tissue sample.”

SPLiT-seq — which stands for Split Pool Ligation-based Transcriptome sequencing — combines a traditional approach to measuring gene expression with a new twist. For more than a decade, scientists have measured gene expression in tissues by sequencing the genetic “letters” of RNA, the DNA-like molecule that is the first step in gene expression. This standard approach — known as RNA-sequencing — profiles RNA across the whole tissue. But this approach does not tell researchers how cells within the tissue differ from one another. Single-cell RNA-sequencing addresses this by sequencing RNA from isolated cells, but existing methods are costly and do not scale well.

SPLiT-seq! Photo: Georg Seelig

SPLiT-seq makes it possible to perform single-cell RNA-sequencing without ever isolating individual cells. The researchers put the cells through four rounds of “shuffling” — splitting them into separate pools and mixing them back together. At each shuffling step, they labeled the RNA in each pool with its own unique DNA “barcode.” At the end of four rounds of shuffling and labeling, RNA from each cell essentially contained its own unique combination of barcodes — and that barcode combination is included in the bulk sequencing of all the RNA in the tissue.

“With these ‘split-pool barcoding steps,’ we solve a big problem in measuring gene expression: reliably identifying which RNA molecules came from which cell in the original tissue sample,” said , who is also a researcher in the 91̽��Molecular Engineering & Sciences Institute.

“With that problem addressed, we can begin to ask biological questions about the different types of cells we define in the tissue,” said co-author , Associate Director of Molecular Genetics at the Allen Institute for Brain Science.

The team performed SPLiT-seq on brain and spinal cord tissue samples from laboratory mice. Using SPLiT-seq, they could measure the gene activity of over 156,000 cells. Based on patterns of gene activity, they estimated that more than 100 different types of cells were present in those tissue samples – including neurons and glial cells at various stages of development and differentiation.

SPLiT-seq can deliver this rich array of biological data at a cost of “just a penny per cell,” said Seelig in a by the Allen Institute for Brain Science. This is a significantly lower cost than other single-cell RNA sequencing approaches, according to the researchers.

The researchers say that SPLiT-seq could answer important questions about how tissues develop, and identify minute changes in gene expression that precede the onset of complex diseases like Parkinson’s disease or cancer.

Co-lead authors on the paper are 91̽��electrical engineering postdoctoral researcher and , a 91̽��doctoral student in the Department of Bioengineering. Additional 91̽��co-authors are Richard Muscat, Anna Kuchina, Paul Sample and Sumit Mukherjee in the Department of Electrical Engineering; David Peeler in the Department of Bioengineering; Wei Chen in the Molecular Engineering & Sciences Institute; , a professor of bioengineering; and Drew Sellers, a research assistant professor of bioengineering and scientist with the 91̽��Institute for Stem Cell and Regenerative Medicine. Additional co-authors from Allen Institute for Brain Science are Zizhen Yao and Lucas Gray. The research was funded by the National Institutes of Health, the National Science Foundation and the Allen Institute for Brain Science.

###

For more information, contact Rosenberg at alex.b.rosenberg@gmail.com or 773-294-4109 and Seelig at gseelig@uw.edu or 206-294-8180.

Popular Science picks DNA data storage project for 2016 ‘Best of What’s New’ Award

Jennifer Langston — Wed, 19 Oct 2016 18:15:37 +0000

The award- winning Molecular Information Systems Lab research team includes: Front (left to right) — Bichlien Nguyen, Lee Organick, Hsing-Yeh Parker, Siena Dumas Ang, Chris Takahashi; Back (left to right): James Bornholt, Yuan-Jyue Chen, Georg Seelig, Randolph Lopez, Luis Ceze, Karin Strauss. Not pictured: Doug Carmean, Rob Carlson. Photo: Tara Brown Photography/91̽��

In the announced Wednesday, Popular Science recognized a technique developed by 91̽��and Microsoft researchers to store and retrieve digital data in DNA as one of the most innovative and game-changing technologies of the year.

The team from the 91̽�� announced in July that they had for the amount of digital data successfully encoded and retrieved in DNA molecules, which are a much denser and more durable long-term storage medium than current archival technologies like hard drives or magnetic tape.

They successfully encoded and decoded a , the Universal Declaration of Human Rights in more than 100 languages, the top 100 books of Project Gutenberg and the Crop Trust’s seed database — among other things— all on strands of DNA. The researchers have developed a novel approach to converting the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences — adenine, guanine, cytosine and thymine – as well as the ability to retrieve specific files from those sequences.

The team is currently focusing on automating and scaling up the DNA data storage technique, which was recognized in Popular Science’s .

“T�� honor the innovations that shape the future,” said Kevin Gray, executive editor of Popular Science. “From life-saving technology to incredible space engineering to gadgets that are just breathtakingly cool, this is the best of what’s new.”

The award is shared by lead 91̽��researchers , Torode Family Career Development Professor of computer science and engineering, and , associate professor of electrical engineering and of computer science & engineering; Microsoft principal project researchers Karin Strauss and Doug Carmean; and a team of two dozen students and researchers from both institutions.

Learn more about the DNA data storage project in this UWTV video:

91̽��team stores digital images in DNA — and retrieves them perfectly

Jennifer Langston — Thu, 07 Apr 2016 15:18:12 +0000

All the movies, images, emails and other digital data from more than 600 basic smartphones (10,000 gigabytes) can be stored in the faint pink smear of DNA at the end of this test tube. Photo: Tara Brown Photography/ 91̽��

Technology companies routinely build to store all the baby pictures, financial transactions, funny cat videos and email messages its users hoard.

But a new technique developed by 91̽�� and Microsoft researchers could shrink the space needed to store digital data that today would fill a Walmart supercenter down to the size of a sugar cube.

For more information, visit the .

In a presented in April at the , the team of computer scientists and electrical engineers has detailed one of the first complete systems to encode, store and retrieve digital data using DNA molecules, which can store information millions of times more compactly than current archival technologies.

Authors of the paper are 91̽��computer science and engineering doctoral student , 91̽��bioengineering doctoral student , 91̽��associate professor of computer science and engineering , 91̽��associate professor of electrical engineering and of computer science and engineering , and Microsoft researchers and 91̽��CSE affiliate faculty Doug Carmean and Karin Strauss.

In one experiment outlined in the paper, the team successfully encoded digital data from four image files into the nucleotide sequences of synthetic DNA snippets. More significantly, they were also able to reverse that process — retrieving the correct sequences from a larger pool of DNA and reconstructing the images without losing a single byte of information.

The team has also encoded and retrieved data that authenticates archival video files from the UW’s project that contain interviews with judges, lawyers and other personnel from the Rwandan war crime tribunal.

“Life has produced this fantastic molecule called DNA that efficiently stores all kinds of information about your genes and how a living system works — it’s very, very compact and very durable,” said co-author , 91̽��associate professor of computer science and engineering.

“We’re essentially repurposing it to store digital data — pictures, videos, documents — in a manageable way for hundreds or thousands of years.”

Lee Organick, a 91̽��computer science and engineering research scientist, mixes DNA samples for storage. Each tube contains a digital file, which might be a picture of a cat or a Tchaikovsky symphony. Photo: Tara Brown Photography/ 91̽��

The digital universe — all the data contained in our computer files, historic archives, movies, photo collections and the exploding volume of digital information collected by businesses and devices worldwide — is .

That’s a tenfold increase compared to 2013, and will represent enough data to fill more than . While not all of that information needs to be saved, the world is producing data faster than the capacity to store it.

DNA molecules can store information many millions of times more densely than existing technologies for digital storage — flash drives, hard drives, magnetic and optical media. Those systems also degrade after a few years or decades, while DNA can reliably preserve information for centuries. DNA is best suited for archival applications, rather than instances where files need to be accessed immediately.

The team from the housed in the 91̽��Electrical Engineering Building, in close collaboration with , is developing a DNA-based storage system that it expects could address the world’s needs for archival storage.

First, the researchers developed a novel approach to convert the long strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences — adenine, guanine, cytosine and thymine.

“How you go from ones and zeroes to As, Gs, Cs and Ts really matters because if you use a smart approach, you can make it very dense and you don’t get a lot of errors,” said co-author , a 91̽��associate professor of electrical engineering and of computer science and engineering. “If you do it wrong, you get a lot of mistakes.”

The digital data is chopped into pieces and stored by synthesizing a massive number of tiny DNA molecules, which can be dehydrated or otherwise preserved for long-term storage.

The 91̽��and Microsoft researchers are one of two teams nationwide that have also demonstrated the ability to perform “random access” — to identify and retrieve the correct sequences from this large pool of random DNA molecules, which is a task similar to reassembling one chapter of a story from a library of torn books.

The Molecular Information Systems Lab research team: Front (left to right): Bichlien Nguyen, Lee Organick, Hsing-Yeh Parker, Siena Dumas Ang, Chris Takahashi. Back (left to right): James Bornholt, Yuan-Jyue Chen, Georg Seelig, Randolph Lopez, Luis Ceze, Karin Strauss. Not pictured: Doug Carmean, Rob Carlson, Krittika d’Silva. Credit: Tara Brown Photography/91̽�� Photo: Tara Brown Photography/91̽��

To access the stored data later, the researchers also encode the equivalent of zip codes and street addresses into the DNA sequences. Using Polymerase Chain Reaction (PCR) techniques — commonly used in molecular biology — helps them more easily identify the zip codes they are looking for. Using DNA sequencing techniques, the researchers can then “read” the data and convert them back to a video, image or document file by using the street addresses to reorder the data.

Currently, the largest barrier to viable DNA storage is the cost and efficiency with which DNA can be synthesized (or manufactured) and sequenced (or read) on a large scale. But researchers say there’s no technical barrier to achieving those gains if the right incentives are in place.

Advances in DNA storage rely on techniques pioneered by the biotechnology industry, but also incorporate new expertise. The team’s encoding approach, for instance, borrows from error correction schemes commonly used in computer memory — which hadn’t been applied to DNA.

“This is an example where we’re borrowing something from nature — DNA — to store information. But we’re using something we know from computers — how to correct memory errors — and applying that back to nature,” said Ceze.

“This multidisciplinary approach is what makes this project exciting. We are drawing from a diverse set of disciplines to push the boundaries of what can be done with DNA. And, as a result, creating a storage system with unprecedented density and durability,” said , a researcher at Microsoft and 91̽��affiliate associate professor of computer science and engineering.

The research was funded by Microsoft Research, the National Science Foundation, and the David Notkin Endowed Graduate Fellowship.

For more information, contact Ceze at luisceze@cs.washington.edu or Seelig at gseelig@u.washington.edu. To reach Strauss, please contact TNRPR@we-worldwide.com.

New 91̽��model helps zero in on harmful genetic mutations

Jennifer Langston — Thu, 22 Oct 2015 16:05:10 +0000

A new 91̽��model can help narrow down which genetic mutations affect how genes splice and contribute to disease. In this illustration, the cell’s splicing machinery is trying to pick which “cutting” sites (pictured as white flags) it should use. Photo: Jennifer Sunami

Between any two people, there are likely to be at least 10 million differences in the genetic sequence that makes up their DNA.

Most of these differences don’t alter the way cells behave or cause health problems. But some genetic variations greatly increase the likelihood that a person will develop cancer, diabetes, colorblindness or a host of other diseases.

Despite rapid advances in our ability to map an individual’s genome — the precise coding that makes up his or her genes — we know much less about which mutations or anomalies actually cause disease.

Now, a new model and developed by 91̽�� researchers can more accurately and quantitatively predict which genetic mutations significantly change how genes splice and may warrant increased attention from disease researchers and drug developers.

The model — the first to train a machine learning algorithm on vast amounts of genetic data created with synthetic biology techniques — is outlined in a published in the Oct. 22 issue of .

“Some people have variations in a particular gene, but what you really want to know is whether those matter or not,” said lead author a 91̽��electrical engineering doctoral student. “This model can help you narrow down the universe — hugely — of the mutations that might be most likely to cause disease.”

In particular, the model predicts how these genetic sequence variations affect alternative splicing — a critical process that enables a single gene to create many different forms of proteins by including or excluding snippets of RNA.

“This is an avenue that’s unexplored to a large extent,” said Rosenberg. “It’s fairly easy to look at how mutations affect proteins directly, but people have not been able to look at how mutations affect proteins through splicing.”

For example, a scientist studying the genetic underpinnings of lung cancer or depression or a particular birth defect could type the most commonly shared DNA sequence in a particular gene into the Web tool, as well as multiple variations. The model will tell the scientist which mutations cause outsized differences in how the gene splices — which could be a sign of trouble — and which have little or no effect.

The researcher would still need to investigate whether a particular genetic sequence causes harmful changes, but the online tool can help rule out the many variations that aren’t likely to be of interest to health researchers. To validate the model’s predictive powers, the 91̽��team tested it on a handful of well-understood mutations such as those in the BRCA2 gene that have been linked to breast and ovarian cancer.

Compared to previously published models, the 91̽��approach is roughly three times more accurate at predicting the extent to which a mutation will cause genetic material to be included or excluded in the protein-making process — which can change how those proteins function and cause biological processes to go awry.

The 91̽��team used synthetic biology and DNA sequencing techniques to create a massive library of genetic data. By training a machine learning algorithm on that large synthetic dataset, the 91̽��model can make accurate predictions about the human genome. Photo: 91̽��

That’s because the 91̽��team used a new approach that combines synthetic biology and machine learning techniques to create the model.

Machine learning algorithms — which enable computers to infer rules and “learn” from vast amounts of data — become more accurate the more data they’re exposed to. But the human genome only has roughly 25,000 genes that create proteins.

Using common molecular biology techniques, the 91̽��team created a library of over 2 million synthetic “mini-genes” by including random DNA sequences. Then they determined how each random sequence element affected where genes spliced and what types of RNA were produced — which ultimately determines which proteins get made.

That larger library of synthetic data essentially teaches the model to become smarter, said lead author , a 91̽��assistant professor of electrical engineering and of computer science & engineering.

“Our algorithm works super well because it was trained on these synthetic datasets. And the reason it works so well is because that synthetic dataset is orders of magnitude larger than the training set you get from the actual human genome,” said Seelig.

“It is remarkable that a model trained entirely on synthetic data can outperform models trained directly on the human genome on the task of predicting the impact of mutations in people,” he said.

Next research steps include expanding the approach beyond alternative splicing to other processes that determine how genes are expressed.

In the meantime, by making the Web tool free and publicly available, the team hopes other scientists will use their alternative splicing model — and ultimately make progress in narrowing down which natural genetic variations are most meaningful when it comes to health and disease.

“Other research groups and companies can use our model to rank the areas of interest to them,” Seelig said. “We hope other people will take this further to more clinical applications.”

Co-authors include former 91̽��doctoral student Rupali P. Patwardhan and associate professor in the 91̽��Department of Genome Sciences.

For more information, contact Seelig at gseelig@uw.edu.

91̽��engineers invent programming language to build synthetic DNA

Michelle Ma — Mon, 30 Sep 2013 15:06:54 +0000

Similar to using Python or Java to write code for a computer, chemists soon could be able to use a structured set of instructions to “program” how DNA molecules interact in a test tube or cell.

A team led by the 91̽�� has developed a programming language for chemistry that it hopes will streamline efforts to design a network that can guide the behavior of chemical-reaction mixtures in the same way that embedded electronic controllers guide cars, robots and other devices. In medicine, such networks could serve as “smart” drug deliverers or disease detectors at the cellular level.

An artist’s rendering shows DNA structures and a chemical reaction “program” on the screen. A “chemical computer” executes the molecular program. Photo: Yan Liang, L2XY2.com

The this week (Sept. 29) in .

Chemists and educators teach and use chemical reaction networks, a century-old language of equations that describes how mixtures of chemicals behave. The 91̽��engineers take this language a step further and use it to write programs that direct the movement of tailor-made molecules.

“We start from an abstract, mathematical description of a chemical system, and then use DNA to build the molecules that realize the desired dynamics,” said corresponding author , a 91̽��assistant professor of electrical engineering and of computer science and engineering. “The vision is that eventually, you can use this technology to build general-purpose tools.”

Currently, when a biologist or chemist makes a certain type of molecular network, the engineering process is complex, cumbersome and hard to repurpose for building other systems. The 91̽��engineers wanted to create a framework that gives scientists more flexibility. Seelig likens this new approach to programming languages that tell a computer what to do.

“I think this is appealing because it allows you to solve more than one problem,” Seelig said. “If you want a computer to do something else, you just reprogram it. This project is very similar in that we can tell chemistry what to do.”

An example of a chemical program. Here, A, B and C are different chemical species. Photo: Yan Liang, L2XY2.com

Humans and other organisms already have complex networks of nano-sized molecules that help to regulate cells and keep the body in check. Scientists now are finding ways to design synthetic systems that behave like biological ones with the hope that synthetic molecules could support the body’s natural functions. To that end, a system is needed to create synthetic DNA molecules that vary according to their specific functions.

The new approach isn’t ready to be applied in the medical field, but future uses could include using this framework to make molecules that self-assemble within cells and serve as “smart” sensors. These could be embedded in a cell, then programmed to detect abnormalities and respond as needed, perhaps by delivering drugs directly to those cells.

Seelig and colleague , a 91̽��associate professor of electrical engineering, recently from the National Science Foundation as part of a national initiative to boost research in molecular programming. The new language will be used to support that larger initiative, Seelig said.

Co-authors of the paper are Yuan-Jyue Chen, a 91̽��doctoral student in electrical engineering; David Soloveichik of the University of California, San Francisco; Niranjan Srinivas at the California Institute of Technology; and Neil Dalchau, Andrew Phillips and Luca Cardelli of Microsoft Research.

The research was funded by the National Science Foundation, the Burroughs Wellcome Fund and the National Centers for Systems Biology.

###

For more information, contact Seelig at gseelig@uw.edu.

Breakthrough in detecting DNA mutations could help treat tuberculosis, cancer

Michelle Ma — Sun, 28 Jul 2013 17:02:05 +0000

The slightest variation in a sequence of DNA can have profound effects. Modern genomics has shown that just one mutation can be the difference between successfully treating a disease and having it spread rampantly throughout the body.

This conceptual image shows probe and target complexes at different stages of the reaction that checks for mutations. The red dots represent mutations in a target base pair, while the illuminated green light indicates that no mutation was found. Photo: Yan Liang, L2XY2.com

Now, researchers have developed a new method that can look at a specific segment of DNA and pinpoint a single mutation, which could help diagnose and treat diseases such as cancer and tuberculosis. These small changes can be the root of a disease or the reason some infectious diseases resist certain antibiotics. The findings were this week (July 28) in the journal .

“We’ve really improved on previous approaches because our solution doesn’t require any complicated reactions or added enzymes, it just uses DNA,” said lead author , a 91̽�� assistant professor of electrical engineering and of computer science and engineering. “This means that the method is robust to changes in temperature and other environmental variables, making it well-suited for diagnostic applications in low-resource settings.”

DNA is a type of nucleic acid, the biological molecule that gives all living things their unique genetic signatures. In a double strand of DNA, known as a double helix, a series of base pairs bond and encode our genetic information. As genomics research has progressed, it’s clear that a change of just one base pair – a sequence mutation, an insertion or a deletion – is enough to trigger major biological consequences. This could explain the onset of disease, or the reason some diseases don’t respond to usual antibiotic treatment.

Take, for example, tuberculosis – a disease that’s known to have drug-resistant strains. Its resistance to antibiotics often is due to a small number of mutations in a specific gene. If a person with tuberculosis isn’t responding to treatment, it’s likely because there is a mutation, Seelig said.

Now, researchers have the ability to check for that mutation preventatively.

Seelig, along with of Rice University and Sherry Chen, a 91̽��doctoral student in electrical engineering, designed probes that can pick out mutations in a single base pair in a target stretch of DNA. The probes allow researchers to look in much more detail for variations in long sequences – up to 200 base pairs – while current methods can detect mutations in stretches of up to only 20.

“In terms of specificity, our research suggests that we can do quadratically better, meaning that whatever the best level of specificity, our best will be that number squared,” said Zhang, an assistant professor of bioengineering at Rice University.

The testing probes are designed to bind with a sequence of DNA that is suspected of having a mutation. The researchers do this by creating a complimentary sequence of DNA to the double-helix strand in question. Then, they allow molecules containing both sequences to mix in a test tube in salt water, where they naturally will match up to one another if the base pairs are intact. Unlike previous technologies, the probe molecule checks both strands of the target double helix for mutations rather than just one, which explains the increased specificity.

The probe is engineered to emit a fluorescent glow if there’s a perfect match between it and the target. If it doesn’t illuminate, that means the strands didn’t match and there was in fact a mutation in the target strand of DNA.

The researchers have filed a patent on the technology and are working with the . They hope to integrate it into a paper-based diagnostic test for diseases that could be used in parts of the world with few medical resources.

The research was funded by the National Institutes of Health, the National Science Foundation and the Department of Defense’s Advanced Research Projects Agency.

###

For more information, contact Seelig at gseelig@uw.edu.