data science – 91̽��News

91̽��introduces new minor in data science

Jackson Holtz — Wed, 18 Nov 2020 21:38:43 +0000

Responding to the burgeoning amounts of data being generated across disciplines, and the development of new tools for working with these data, the 91̽�� now offers a minor track for non-STEM students in data science. It’s one of the first such programs in the country.

Data manipulation and analysis is at the heart of many recent scientific advances, industrial innovations and insights into the human experience. That’s why the new data science minor will provide students the opportunities to learn data science skills and analysis while also understanding the broader contextual and ethical implications of using data.

“The goal is to combine some of the technical skills that relate to the new developments of generating and analyzing large amounts of data. And then giving students the context and the critical thinking skills to do something meaningful with that,” said , an associate professor of archaeology and director of the new data science minor.

The minor was developed, in particular, for students from the arts, social sciences and humanities. While not intended to make these students competitive for jobs whose exclusive role is data science, it will allow students whose primary expertise is in a particular domain to more effectively utilize data, communicate insights based on data analysis, and interact with colleagues using common tools for data analysis.

“We imagine that a graduate of this program will be perfectly suited to ‘translating’ roles in industry, where they can understand and speak the languages of the different specialists of data science,” Marwick said. “They might be the one who is in the meeting that includes the engineers, includes the programmer, includes policymakers or people writing about it. Our graduates might be the ones who are coming up, asking the challenging questions and sort of connecting the skills of the engineer to the work of the policy maker, kind of gluing things together to help everyone be effective.”

Data science is a cross-cutting and evolving area of scholarship, Marwick said. For the purposes of the minor, the scope of data science education consists of the union of two areas: 1. Education activities that develop competency in producing, managing or analyzing data; 2. Activities promoting the synthesis, contextualization and interpretation of the data.

“Data science education must distinguish itself by closely coupling the teaching of methods, tools, applications and meta-examination of data science practice,” Marwick said.

The new minor is designed to meet the needs of a wide range of students and is open to student in any major, Marwick said. “There is no required progression through courses.”

Students can opt for flexible pathways while avoiding bottlenecks caused by popular courses. Course work is divided into three broad categories: ‘Data Studies’ that teaches foundational data literacy and explores the broader implications of the field; ‘Data Skills’ dives into basic programming, visualization, machine learning, data acquisition and management practices, software tools and qualitative analysis; and ‘Cross-cutting’ courses that explore the potential of this new field in various domains and that synthetize theories and questions in the context of a project-based learning environment. Students can declare the minor through their departmental adviser.

A minor in data science could provide students with an attractive credential for graduate programs, other degree programs and employment with non-profits, governments and companies that want employees with communication and critical thinking skills accompanied by general competence in data analysis.

For more information, take a look at the minor’s website or contact Marwick at bmarwick@uw.edu.

Partnerships for Impact: NSF Awards an additional $4M to the West Big Data Innovation Hub co-led by the 91̽��eScience Institute

91̽��News staff — Thu, 20 Jun 2019 00:34:15 +0000

The National Science Foundation is awarding a second round of funding for the — organizations launched in 2015 to build and strengthen data science partnerships across industry, academia, nonprofits and government to address scientific and societal challenges.

Each of the hubs will receive $4 million over four years for a total investment of $16 million, double the budget for the first round of Big Data Hubs awards. The 91̽��, in collaboration with the University of California, Berkeley and the University of California, San Diego, will continue to coordinate the , or West Hub.

“For more than a decade, the eScience Institute has worked to bridge government, business, and cross-disciplinary academia in order to advance data-intensive discovery in the broadest imaginable range of fields,” said , Principal Investigator of West Hub, Founding Director of the eScience Institute and a 91̽��professor of computer science and engineering. “Through our partnership with Berkeley and UCSD in leading the NSF’s West Hub, we have been extending this to the 13-state western region.

The 91̽��leadership team for this collaborative award also includes West Hub Deputy Director and Co-Principal Investigator , who is also Executive Director of the eScience Institute, along with Co-Principal Investigator , an associate professor in the 91̽��Information School.

The West Hub’s first three years of operation have included a diverse set of application-focused projects: develop data analysis and tools to ; better understand disease through ; and . The Hub also supports cross-cutting efforts to produce frameworks and resources useful to multiple areas of inquiry and practice, from data sharing and cloud computing to responsible data science.

The next four years will include an emphasis on developing and enabling translational data science, with signature initiatives including:

Fire and water: regional data collaboratives for the future of natural resource management. Building upon the momentum from regional roundtables, workshops, , , the open-to-all and other efforts, the West Hub will focus on collaborative, user-focused projects that leverage new shared data and open access tools. This summer and fall, with additional funding from the Water Foundation and Leonardo DiCaprio Foundation, the West Hub will work with journalists from mainstream and ethnic media, offering fellowships that connect impacted communities with research and education efforts around water data.
Housing instability: trusted data collaborative for responsible data management. Racial biases in eviction practices, rapidly increasing housing prices and complex interactions between services to support homeless families have led to neighborhood-level inequities in urban environments and a lack of transparency in the efficacy of interventions. Through a partnership with the Bill & Melinda Gates Foundation, Microsoft and the , the West Hub will integrate data from multiple jurisdictions to study questions about how neighborhood change, service delivery and demographics influence outcomes for homeless families. In preliminary work, the West Hub supported the , which extracted information from thousands of evictions case reports and uncovered extreme racial disparity, leading directly to a — increasing the response time allowed to tenants. As part of this work, the West Hub is expanding the scope of the , a socio-technical platform for responsible data governance initially used for mobility data, to support housing and population health data. The effort is designed to balance competing objectives among stakeholders, improving fairness in analytic methods, preserving privacy, protecting data owners’ proprietary information and promoting transparency.
Stress-testing access for road video: understanding risk and opportunity in data sharing. After hosting a six-month nation-wide series of community problem-solving sessions, technology demonstrations and discussions focused on transportation safety, the West Hub will strengthen a partnership with the NSF and the to investigate the reversibility of tools used to de-identify video data from automobile drivers. Tied to a three-year data collection effort that produced data for more than 3,000 drivers, including 1,500 crashes and 3,000 near-crashes, this project will include community dialogue about privacy and bias.

“By catalyzing partnerships that integrate academic researchers into the fabric of communities across the U.S., we can accelerate and deepen the impact of basic research on a range of societal issues, from water management to efficient transportation systems,” said Beth Plale, one of the NSF program directors managing the Big Data Hubs awards.

Participants at a West Big Data Innovation Hub meeting. Photo: 91̽��

Leveraging lessons learned from four years of the 91̽��, the West Hub will host a training course and develop a guide for organizations interested in creating programs pairing student fellows with data scientist mentors and project leads from academia, government or the private sector. The West Hub’s focus on societal-facing challenges will drive collaborations in topics such as transportation, public health, sustainable urban planning and disaster recovery.

“The continued support from the National Science Foundation for the West Big Data Innovation Hub confirms the importance of the Hub in bringing together diverse local, regional and national partners to engage in using modern data science to tackle societal challenges,” said 91̽��Provost Mark Richards.

As part of their efforts to increase workforce readiness in the region, the West Hub will partner with The Carpentries to host data science Train-the-Trainer workshops, especially aiming to engage underrepresented groups and geographic areas that are not currently served by cognate programs. The partnership builds upon prior that included local government leaders across the western region and the .

“Developing innovative, effective solutions to grand challenges requires linking scientists and engineers with local communities,” said Jim Kurose, NSF assistant director for Computer and Information Science and Engineering. “The Big Data Hubs provide the glue to achieve those links, bringing together teams of data science researchers with cities, municipalities and anchor institutions.”

An example of the unique team formation resulting from West Hub community engagement can be found in the growth of an NSF-funded project at Boise State University’s School of Public Policy, which focused on criminal justice and police data, public safety and community trust.

“By facilitating data sharing with industry partners, and connecting researchers from Idaho with police departments in Washington and Arizona, we supported work that led to new levels of collaboration and new connections for national-scale initiatives,” said West Hub Executive Director and Co-Principal Investigator Meredith Lee.

Many of the West Hub’s continuing initiatives and collaborations will highlight challenges surrounding data ethics and responsible data science, bringing communities together through opportunities such as workshops on , efforts and .

As a new service to the community, each Big Data Hub will maintain a seed fund for translational data science collaborations as part of its project budget. This seed fund will provide small grants to pilot early feasibility studies for innovative new solutions to grand challenges of importance to the region. The West Hub’s requests for collaborative seed projects will serve to gather compelling, timely and actionable community ideas throughout the year. Embarking on the next phase of growth and national coordination, the Hubs will also work with the NSF and additional partners to host an All Hands community data science meeting which will be open to the public as a signature event in 2020.

###

Three 91̽��teams receive TRIPODS+X grants for research in data science

91̽��News staff — Wed, 12 Sep 2018 21:54:07 +0000

The National Science Foundation on Sept. 11 that it is awarding grants totaling $8.5 million to 19 collaborative projects at 23 universities for the study of complex and entrenched problems in data science. Three of these projects will be based at the 91̽�� and led by researchers in the College of Engineering and the College of Arts & Sciences.

The grants build on 2017 awards in the Transdisciplinary Research in Principles of Data — or — program. These new grants make up the TRIPODS+X program, which expands these big-data projects into broader areas of science, engineering and mathematics. The lead faculty on these new projects are among the core founding faculty of the , the UW’s TRIPODS institute.

“The multidisciplinary approach for addressing the increasing volume and complexity of data enabled through the TRIPODS+X projects will have a profound impact on the field of data science and its use,” said Jim Kurose, NSF assistant director for Computer and Information Science and Engineering. “This impact will be sure to grow as data continues to drive scientific discovery and innovation.”

The TRIPODS program’s convergent and interdisciplinary approach emerged from the 2016 NSF TRIPODS workshop. Since then, the program has evolved into a community of institutes that share expertise and work together to advance the three NSF priorities central to TRIPODS: research, visioning and education. Research-track projects aim to develop new algorithms and fundamental approaches to data-driven challenges. Visioning projects focus on fostering collaboration across disciplines and help spawn well-integrated research teams that yield truly new perspectives. Education projects are pilot efforts that aim to drive workforce development in multiple disciplines and at multiple education levels. Each TRIPODS institute will have three years to use its award to expand efforts in one of these program tracks.

The first UW-led project, a research-track endeavor, is called “Safe Imitation Learning for Robotics” and is led by assistant professor of statistics and fellow . This project will focus on imitation learning in robotics, a form of learning in which a system learns through demonstration. Researchers will design trust-building learning algorithms and lay the groundwork for safe imitation-learning approaches for beneficial human-machine interaction. Additional 91̽��researchers on this project include associate professor of electrical engineering ; , an associate professor in both the Department of Statistics and the Paul G. Allen School of Computer Science & Engineering; and , who is also a professor in the Allen School.

Fazel will lead the second TRIPODS+X project at the UW: “Foundational Training in Neuroscience and Geoscience via Hack Weeks.” This project will enhance the successful “hack week” model as a tool for data science education and collaboration. blend elements of traditional lecture-style pedagogy and participant-driven projects. Two hack week formats, one for neuroscience and one for the geosciences, have already been organized and held by researchers at the eScience Institute. For this project, hack week leaders will work to incorporate training on core methods in statistics and optimization in order to promote a deeper understanding of methodologies along with hands-on experience with data-driven problems in the geosciences and in neuroscience. Additional 91̽��researchers on this project are at the eScience Institute; at the Applied Physics Laboratory; , an assistant professor of applied mathematics; and Harchaoui.

The third UW-led TRIPODS+X project, “Scaling Up Descriptive Epidemiology and Metabolic Network Models via Faster Sampling,” is led by , an assistant professor in the Allen School. This research track will focus on developing and disseminating practical analysis tools for public health and biological studies that involve large datasets and rely on accurate “sampling” — a principle of randomly drawing a subset of cases in a larger dataset, in order to identify trends quickly and speed up analysis. To develop these tools, this project will evaluate current big-data projects in health metrics and systems biology. Additional researchers on this project are , an associate professor in the UW’s and , a professor of computing at Georgia Tech.

“TRIPODS+X is exciting not only for its near-term impact addressing some of society’s most important scientific challenges, but because of its potential for developing tools for future applications,” said Anne Kinney, NSF assistant director for Mathematical and Physical Sciences.

###

For more information, contact Joshua Chamot with the NSF at 703-292-4489 or jchamot@nsf.gov.

Adapted from by the National Science Foundation.

Hack week: Study supports collaborative, participant-driven approach for researchers to learn data science from their peers

James Urton — Thu, 23 Aug 2018 15:54:43 +0000

A scene from the 2018 Neurohackademy on Aug. 10, 2018 in the Alder Commons on the 91̽�� campus. Photo: Alex Alspaugh/91̽��

Each night, high-definition cameras mounted to telescopes collect terabytes of data about objects in the sky. Each day, scientists sequence the genomes of people, animals, plants and microbes for biomedical and evolutionary research. Each year, the Large Hadron Collider of data on particle collisions.

Science has become a big-data endeavor. But scientists are not universally adept in “data science” — the computing and statistical skillsets needed to handle, sort, analyze and draw conclusions from big data. The shortage of know-how in data science can hamper research, medicine and .

Now a team from the 91̽��, New York University and the University of California, Berkeley has developed an interactive workshop in data science for researchers at multiple stages of their careers. The course format, called “hack week,” blends elements from both traditional lecture-style pedagogy with participant-driven projects. The most recent was a neuroscience-themed event held in July on the 91̽��campus. As the team reports in a published Aug. 20 in the , participants rated the hack weeks as opportunities to learn about new concepts, foster new connections, share data openly, and develop skills and work on problems that will positively affect their day-to-day research lives.

Participants work on their projects at the 2018 Neurohackademy on Aug. 10, 2018. Photo: Alex Alspaugh/91̽��

“The idea behind hack week was to bring together people who were interested in data science and give them a place to meet, talk and exchange ideas,” said lead and corresponding author , associate director of the UW’s astronomy-focused . “But instead of a traditional format with experts lecturing nonexperts, this would allow participants to mingle more and teach one another.”

Huppenkothen was involved in the inaugural hack week event, “Astro Data Hack Week,” held at the 91̽��in 2014. That event brought together big-data researchers in astrophysics and cosmology. Since then, the team has held four additional events, three “” events for neuroscience and two “” events for the geosciences.

All hack week events have the same basic design and organizing principles. They usually commence with some structured periods for instruction, and then shift toward time for participant-driven, open-ended projects, as well as peer networking and free discussion. The projects can resemble a , but with greater emphasis on collaboration and learning rather than specific outcomes. Hack week participants tackle their projects in smaller groups, with organizers circulating to observe and provide feedback or encouragement.

The projects range from experiments that the participants brought from their home institutions to ideas that come up during the course. One project from the inaugural Astro Hack Week, for example, eventually became Stingray, a software project to provide algorithms to analyze time-series data in astronomy. At last month’s Neurohackademy, a new two-week version of Neuro Hack Week, one team worked on developing common ways to analyze different types of MRI scans.

The events’ open-ended structure places greater responsibility on the organizers of each hack week.

Participants collaborating on chosen projects at the 2018 Neurohackademy on the 91̽��campus. Photo: Alex Alspaugh/91̽��

“A hack week takes a different kind of preparation, because you don’t have the security of ‘falling back’ on the structure of traditional talks and lectures,” said co-author Anthony Arendt, a research scientist with the 91̽�� who has organized Geo Hack Week. “You have to set up ways to encourage participants at all levels of ability and comfort — creating a welcoming space for everyone to pitch ideas.”

Most hack weeks organized by the team cap the number of participants at 60. Organizers also strive to select participants to maximize diversity — including scientists of different abilities, backgrounds and at different stages of their careers. Participants also agree to abide by a code of conduct that emphasizes respect and positive interactions.

In surveys conducted after eight hack weeks, participants ranked the events positively as spaces to learn, teach, network and foster relationships. More than three-quarters ranked the hack weeks as successful learning experiences, while two-thirds reported teaching skills to someone else. This feedback was constant across different backgrounds, showing that the unique format of hack weeks helps all participants feel included, said Huppenkothen.

“Now we want other scientific communities to learn about our experiences and see how they might start organizing their own events,” said Huppenkothen. “We also want feedback from other communities — both good and bad — and to widen the dialogue about data science and skill development.”

Aftermath of a brainstorming session at the 2018 Neurohackademy. Photo: Alex Alspaugh/91̽��

Their includes supplementary materials detailing the hack week experiences and advice for other groups interested in starting their own workshops.

Participants gave hack weeks high scores for promoting open-science principles — in which researchers publicly post and share their datasets, code and methods. Open science principles are critical to addressing challenges that researchers face in making their research more reproducible, said co-author , a data scientist with the 91̽�� and co-organizer of the recent Neurohackademy, along with at the University of Texas at Austin.

“One of our goals with the hack week format is to elevate the quality of science being done,” said Rokem. “The best way to do that is to try out ideas and share what you’ve learned.”

Additional co-authors are David Hogg with the NYU Center for Data Science; Karthik Ram at the Berkeley Institute for Data Science at the University of California, Berkeley; and Jake VanderPlas at the 91̽��eScience Institute. The research was funded by the National Institutes of Health; the 91̽��; New York University; the University of California, Berkeley; the Charles and Lisa Simonyi Fund for Arts and Sciences; and the Washington Research Foundation.

###

For more information, contact Huppenkothen at dhuppenk@uw.edu and Rokem at arokem@gmail.com.

#MemoriesInDNA Project wants to store your photos in DNA for the benefit of science – and future generations

Jennifer Langston — Wed, 24 Jan 2018 16:56:41 +0000

A selection of images submitted to the #MemoriesInDNA Project. Photo: 91̽��

If you could pick an image to be preserved for thousands of years, what would it be? A picture of your family, an endangered landscape, a page of poetry, or a snapshot that sends a message to the future?

Researchers from the at the 91̽�� and Microsoft are looking to collect 10,000 original images from around the world to preserve them indefinitely in synthetic DNA manufactured by . DNA holds promise as a revolutionary storage medium that lasts much longer and is many orders of magnitude denser than current technologies.

The team has already encoded important compositions in DNA molecules, including The Universal Declaration of Human Rights, the top 100 books of Project Gutenberg, songs from the and an .

The invites the public to submit original photographs that they’d like to see preserved in DNA for millennia. The images — which can be uploaded at the — will be encoded in synthetic DNA and made available to researchers worldwide. The researchers also are encouraging people to share their images on social media with the hashtag #MemoriesInDNA and include a story about why the photograph or video is important to them.

Lead researchers on the UW/Microsoft DNA data storage project include (left to right) Georg Seelig, 91̽��associate professor of electrical engineering and of computer science and engineering; Luis Ceze, the Torode Family Career Development Professor of Computer Science & Engineering; and Karin Strauss, a Microsoft researcher and 91̽��affiliate associate professor of computer science and engineering. Photo: Tara Brown Photography

“It’s your turn to show us what should be preserved in DNA forever,” said , professor in the UW’s Paul G. Allen School of Computer Science & Engineering. “We want people to go out and take a picture of something that they want the world to remember — it’s a fun opportunity to send a message to future generations and help our research in the process.”

DNA data storage has emerged as a potential solution to bridge the growing gap between the amount of digital data generated today — by everything from commercial video to space imagery to medical records — and our ability to affordably and efficiently store that data.

Unlike data centers, which require acres of land and account for in the United States, DNA molecules can store information millions of times more compactly. The basic process converts the strings of ones and zeroes in digital data into the four basic building blocks of DNA sequences — adenine, guanine, cytosine and thymine. It employs synthetic DNA molecules created in a lab, not living DNA.

The team of 91̽��computer scientists and electrical engineers, in collaboration with Microsoft researchers and working with Twist Bioscience, holds the for the amount of data stored in DNA. So far they have been able to encode photographic images and video in DNA and retrieve and convert those individual molecular “files” back into digital data.

Their next challenge involves exploring how to perform meaningful data processing directly in DNA — without having to convert the images back into their electronic form.

“Let’s suppose you have a trillion images encoded in DNA and want to find all the photographs that have a red car in them, or to find out whether a person’s face exists in those images,” said Ceze. “We want to be able to do that information processing in DNA directly — to search in a smart way and make the molecules themselves carry out that computer vision work.”

A digital microfluidics prototype in the UW’s Molecular Information Systems Lab. Photo: Dennis Wise/91̽��

The team will encode approximately 10,000 of the crowdsourced images in manufactured snippets of DNA. The researchers’ approach to searching images directly in DNA relies on the fact that certain nucleotides stick to others — A binds to T and C binds to G.

They can introduce strips of DNA into the solution that contains a coded “query” — essentially, a string of complementary DNA that causes all photographs with a red car or certain facial features or whatever meets the criteria of the query to bind to it. By attaching magnetic nanoparticles to the query DNA, they can use a magnet to pull out all the similar images that have stuck to it.

“It is thrilling to bring computer science and molecular biology together in this project,” said Microsoft senior researcher and collaborator Karin Strauss. “There has been amazing progress recently in both areas and, when combined, they can be very powerful in tackling problems created by the massive amounts of data we’ve been generating.”

“Having a set of diverse images from around the world will help us invent new ways to make molecules work with each other to carry out these computations directly,” said Microsoft partner architect and collaborator .

An Illumina NextSeq flow cell, which is used by researchers in the UW’s Molecular Information Systems Lab to sequence DNA samples that contain digital data. Photo: Dennis Wise/91̽��

The team will employ machine learning to devise methods to map and encode all the visual features contained in a photograph — such as colors, curves, lines and objects — in DNA. The main challenge is doing that in a way that allows scientists to extract similar things and perform meaningful data processing.

“We will use neural networks to explore ways to classify visual patterns in the images and video that we encode in DNA,” said , 91̽��associate professor of electrical engineering and in the Allen School. “For example, are there more red cars than blue cars in a photograph? Or are there people riding bicycles?”

“With proof-of-concept achieved for DNA as a digital data storage media, we are working to drive down the cost of synthesizing DNA to enable its potential as a widely-available commercial solution for the growing body of precious data in digital format, such as archival data, financial and health record backups, and all long-term data retention where current media is not practical,” said Emily M. Leproust, CEO of Twist Bioscience. “MemoriesInDNA is a fabulous project to showcase the technological, scientific and cultural importance of DNA worldwide and we look forward to our role in this historic event.”

#MemoriesInDNA will provide an important library of images to be encoded in a separately funded project supported by the Defense Advanced Research Projects Agency (DARPA) . 91̽��was recently awarded $6.3M to accelerate the pace at which data can be encoded in DNA, and to develop new capabilities to process this data through image search and classification. The work will build the foundation on which 91̽��can advance its next-generation work in molecular information processing.

Note: To be included in the DNA image collection, photographs cannot be copyrighted by any other party and must be free of violent or inappropriate content. The image dataset will be preserved in DNA indefinitely and shared with researchers worldwide. For more details about how to upload and share images, visit the .

###

For more information, contact misl-info@cs.washington.edu or visit the #MemoriesInDNA Project website: .

91̽��researchers estimate poverty and wealth from cell phone metadata

Peter Kelley — Mon, 30 Nov 2015 21:39:13 +0000

The northern and western provinces are divided into cells (the smallest administrative unit of the country), and the cell is shaded according to the average predicted wealth of all cell phone owners in that region. The southern province is overlaid with a diagram that uses geographic identifiers in the call data to divide the region into several hundred thousand small partitions, which each may be as small as a household or a microvillage. The darker the area, the greater the wealth. Photo: Joshua Blumenstock

In developing or war-ravaged countries where government censuses are few and far between, gathering data for public services or policymaking can be difficult, dangerous or near-impossible. Big data is, after all, mainly a First World opportunity.

But cell towers are easier to install than telephone land lines, even in such challenged areas, and mobile or cellular phones are widely used among the poor and wealthy alike.

Now, researchers with the 91̽�� and have devised a way to estimate the distribution of wealth and poverty in an area by studying metadata from calls and texts made on cell phones. Such metadata contains information about the time, location and nature of the “mobile phone events” but not their content. Their was published Nov. 27 in the journal Science.

“Quantitative, rigorous measurements are key to making important decisions about social welfare allocation and the distribution of humanitarian aid,” said lead author , assistant professor in the 91̽��Information School, who is also an adjunct professor in computer science and engineering. “But in a lot of developing countries high-quality data doesn’t exist.

“What we show in this paper, and I think fairly clearly, is that phone data can be used to estimate wealth and poverty.”

The research was performed in , a nation of 11 million-some people in East Africa. There in 2009, while still working on his dissertation, Blumenstock oversaw students from the Kigali Institute of Science and Technology as they conducted telephone interviews with 1,000 mobile phone owners chosen at random.

The questions were designed to learn where those individuals fell on the socioeconomic ladder and what the “signature” of wealth is in the metadata — that is, what cell phone habits are particular to those who are relatively wealthy.

“For those thousand people, we know roughly whether they’re rich or poor. That’s the ground truth that anchors the data to reality,” Blumenstock said.

The researchers then linked that information to metadata about mobile phone use provided by a Rwandan telephone company to determine the hallmarks of socioeconomic status in the data.

Simple patterns emerged — for instance that wealthier people tended to make more calls than poorer people. But that’s just one of thousands of bits of information that aid this process.

Other hints of wealth or poverty in metadata are:

The way people pre-pay for phone time; those buying $10 worth of time tend to be wealthier than those buying 50 cents of time.

The daily rhythm of calls made — those phoning during daytime business hours are systematically different from those who make irregular calls, perhaps because they are more likely to be “white-collar” workers.

The degree to which a person is more likely to make than receive phone calls. Since in Rwanda the caller pays for the call, poorer people tend to receive more calls than they make. This also reflects a phenomenon called “flashing,” where a poorer person calls a wealthier friend and quickly hangs up, thus sending the signal that they should call back.

“In practice it’s not simple,” Blumenstock said. “We use supervised machine learning algorithms to sort through thousands of patterns to figure out what is most correlated with wealth and poverty. But once we know which mobile phone patterns are indicative of wealth, we can extrapolate to the country’s one and a half million cell phone users. We just see for each person thereafter what pattern they follow — the wealthy pattern or the poor pattern.”

Blumenstock’s 91̽��co-author, Gabriel Cadamuro, a graduate student in computer science and engineering, said the team tried not to bring expectations of which aspects of the metadata might be found useful for predicting wealth.

“Using the appropriate machine learning technique enabled us to determine which of these values were the most useful,” Cadamuro said, “and we noticed that in doing it this way that we picked up a lot we would have missed had we tried to go purely via our intuition.”

That information is then overlaid onto area maps to create a visual representation of the geographic distribution of wealth, from the district level to that of households or microvillages.

Blumenstock emphasized that the research is conducted in a way that respects ethical standards and the privacy of the callers, as well as the competitive interests of the phone company providing the data.

Not all governments are able to conduct population censuses and household surveys, and some go decades in between. In Rwanda, household surveys occur every three to five years. Blumenstock said based on the government’s 2010 survey, the 2009 mobile phone metadata proved more effective at indicating wealth and poverty than the previous Rwandan government survey in 2007.

Blumenstock chosen for Gates Foundation’s “Grand Challenges Explorations”
The researcher has received a $100,000 grant for his research, “Billions of Transactions, Thousands of Photos: Combining Mobile Network Operator Data with Crowd-Sourced Photographs to Measure the Availability and Use of Digital Financial Services.” The work will take place in Ghana. .

Blumenstock and colleagues suggest that governments might use this sort of survey process, which costs about $10,000, rather than spend millions on a formal countrywide census.

“We are saying, if you have nothing else and can’t survey the outer regions of the country, this creates an option to spend $10,000 and get interim estimates of what things look like, and to construct a higher-resolution estimate of the geographic distribution of wealth,” he said.

This early work is mostly “proof of concept” at this stage, Blumenstock said, but the researchers can envision many practical uses to come.

Cadamuro said, “We are hopeful that this broad approach to detecting signals means that the methodology would work even on different call networks from different countries.”

“What else could you measure that would be useful?” Blumenstock asked. “You could imagine using data from Twitter, Internet use, satellite and weather stations — all this data — to measure population vulnerability, or to make better policy,” he said.

“Maybe you could even detect with phone data whether people have been skipping meals — it doesn’t seem to me that far-fetched.”

The other co-author is Robert On, a graduate student at the University of California, Berkeley.

The research was funded by the NSF; the Institute for Money, Technology, and Financial Inclusion; and the Gates Foundation.

###

For more information, contact Blumenstock at 206-685-8746 or joshblum@uw.edu.

NSF grant #1025103
Institute for Money, Technology, and Financial Inclusion grant # 2010-2366
Gates Foundation grant # OPP1106936

Online ‘Legislative Explorer’ uses big data to track decades of lawmaking

Peter Kelley — Fri, 25 Apr 2014 18:44:24 +0000

91̽�� political scientist has matched data visualization with the study of lawmaking to create a new online tool for researchers and students called the .

Think of it as big data meeting up with How a Bill Becomes a Law.

“The goal was to get beyond the ‘Schoolhouse Rock’ narrative and let users discover the lawmaking process for themselves,” said Wilkerson, 91̽��professor of political science and director of the . The free tool is available at .

The data set is huge indeed: The Legislative Explorer tracks the progress of every bill and resolution introduced in Congress since 1973 — 250,000 in all. It notes each time a bill or resolution advances from one stage of the process to the next, in or out of committee or moves to the floor for consideration, totaling about 750,000 such movements.

http://vimeo.com/91846611

Users can drill down through the graphically presented data in lots of ways, including by type of legislation, sponsors, party or chamber of origin. The tool also has filters allowing users to sort results many ways, including by gender of sponsoring legislator, committee affiliation and — perhaps most helpful of all — whether the legislation is considered major or minor.

“The basic idea was to apply the data-driven discovery methods used increasingly in the natural sciences to bring big data to the lawmaking process,” Wilkerson said.

John Wilkerson

Wilkerson and collaborators suggest students or researchers might start by observing how many bills are introduced in each Congress and how many die along the way or are sent to the president and become law. They also suggest researchers should keep certain caveats in mind, including the reminder that bills vary in importance and get substantially changed or combined with others. The 906-page Affordable Care Act, Wilkerson noted, started out as a six-page bill on service members’ home ownership.

Wilkerson collaborated with , a 91̽��doctoral student in political science, who organized the data underlying the online tool. The two also hired , a Seattle-based creative design and technology studio, to create and maintain the site.

Wilkerson said those using the online tool may find Congress a good deal busier than they expected. “There’s still a lot happening in Congress, but more of it these days is getting stuck at the stage where the House and Senate have to reconcile their differences.”

Future improvements, Wilkerson said, may include tracking the impact of legislation that becomes law. For example, “What happens when Congress passes a law? How does it impact the existing authorities of the federal government and the regulatory activities of federal agencies?”

The project was funded in part by revenue from , which provides tools for legislative simulation courses, and by the National Science Foundation.

Wilkerson said the tool seeks to enable citizens to become better informed about the complex legislative process, beyond simplistic descriptions and media coverage centering mostly on Congressional controversies.

“It doesn’t address everything the people might want to know, but we think the Legislative Explorer will advance public interest and understanding of ‘their’ Legislature,” he said.

“But don’t worry,” Wilkerson added. “It’s not the end of ‘Schoolhouse Rock.'”

###

Find the Legislative Explorer online at . For more information, contact Wilkerson at 206-543-8030 or jwilker@uw.edu. (NSF grant number is SES-1243917.)