Moore-Sloan Data Science Environment Summit: A Recap

This year's Moore/Sloan Data Science Environment was in the beautiful Cascade Mountains at the Suncadia Resort in Cle Elum, Washington.

Look how beautiful that is. Wow.

There were a number of sessions here that were fairly typical “data science-y:” Data Structures for DS, Astrophysics Software, and Big Data Systems Tutorial. What I thought was perhaps the most interesting at the summit was this pervasive discussion about ethics and social good. I was pleasantly surprised that the participants here were interested in engaging in topics so far outside the normal purview of coding problems, data analysis methods, and data gathering. Another testament to the great multidisciplinary field that is Data Science and the wonderful people who populate it.

I was really inspired by a lightning talk on Monday morning by Ariel Rokem of University of Washington’s eScience Institute on their Data Science for Social Good program, which had its inaugural summer program this past June. Based on the program with the same name at University of Chicago, the goal of the eScience Institute DSSG program is “to enable new insight by bringing together data and domain scientists to work on focused, collaborative projects that are designed to impact public policy for social benefit.”

The eScience Institute hosted four projects focusing on urban environments and urban science across topics such as transportation, social justice, and sustainable urban planning. Each project was assigned a mentor from the eScience Institute, and each team was populated by a project lead, DSSG fellows, and Alliances for Learning and Vision for underrepresented Americans (a post-freshman year internship) students. It was all about bringing together the Data Science fellows and faculty with project leads from industry along with undergraduate students.

Taken from the eScience Institute’s DSSG webpage, the four projects were:

  1. Assessing Community Well-Being through Open Data and Social Media
    1. Our DSSG Fellows and ALVA students paired with Third Place Technologies to create neighborhood community report pages in the context of a hyperlocal, crowd-sourced community network. The objective was to help neighborhood communities better understand the factors that impact community well-being, and how they as a neighborhood compare with other neighborhoods on these factors. This helps them set the agenda for what to prioritize in promoting their well-being. A key aspect of this project was to explore novel ways to leverage diverse social media and open data sources to dynamically assess community-level well-being, in order to a) enable early identification of emerging social issues warranting a collective response, and to b) automatically identify and recommend the local community hubs best positioned to coordinate a community response.
    2. Click here to read the project's full summary.
  2. Open Sidewalk Graph for Accessible Trip Planning
    1. This project is an extension of the "Hackcessible" project that was awarded top prize in this year's "HackTheCommute" event in Seattle. Hackcessible has built an application that helps people with mobility challenges to navigate the streets of Seattle based on sidewalk characteristics and the presence of curb ramps. Expanding on these ideas, the DSSG team worked to utilize city sidewalk and street data to provide stakeholders with routing information, similar to what is currently provided by Google Maps, but that considers issues of accessibility. The goal of the effort was to provide rapid and convenient routing that avoids steep hills, uncrossable intersections, stairs or construction. The work was carried out in partnership with Dr. Anat Caspi of the Taskar Center for Accessible Technology at the University of Washington, and with various stakeholders with the City of Seattle and the Washington State Department of Transportation.
    2. Click here to read the project's full summary.
  3. Predictors of Permanent Housing for Homeless Families
    1. The Bill and Melinda Gates Foundation, together with Building Changes have partnered with King, Pierce and Snohomish counties to make homelessness in these counties rare, brief and one-time. The goal of this project was to take part in this multi-stakeholder collaboration, and to analyze data about enrollments of homeless families in these counties in programs serving the homeless population, to identify factors that predicted whether families would succeed in finding permanent housing, and to investigate the ways families transition between different programs and different episodes of homelessness.
    2. Click here to read the project's full summary.
  4. Rerouting Solutions and Expensive Ride Analysis for King County Paratransit
    1. The Paratransit team collaborated with King County Metro to improve operations of the Paratransit service, which is an on-demand public transportation program that provides door-to-door rides for people with limited ability who are unable to use traditional fixed route services. Currently, King County Metro paratransit trips cost approximately ten times as much as an equivalent trip using a fixed-route service, so the team concentrated their efforts on identifying costly routes, providing cost-driven recommendations for rescheduling broken buses, and better predicting service usage hours over quarterly periods. The team analyzed history data and observed rides whose cost per boarding was over $100, providing King County Metro with a method to update predictions of usage hours customized for each day of the week and a web app which provides cost comparison for the different options of handling a broken bus event: reschedule clients on an existing route, send a new bus, or serve them with a taxi. These tools aim to help the Paratransit operations better plan resources over longer periods of time and help dispatchers make informed decisions in case of emergency.
    2. Click here to read the project's full summary.

I had a total Twilight Zone moment on Tuesday during a session entitled “Semantics of Data: Integrating Across Tools.” I attended because I thought the discussion was surrounding how the data scientists here want to communicate their tools using standard vocabularies.

I was pretty close--however my scope was off. These scientists talked for ONE HOUR AND A HALF on building standard vocabularies, ontologies, metadata schemas, json-schema. I was near-faint from surprise.

I love these projects. These students are committed to improving their communities through integrating what they know about all the multidisciplinary fields that make up data science. The real-world applications of their work are just incredible. I think this speaks to almost a moral obligation of science to not only contribute to the greater body of human knowledge, but also to improve the standard of living globally. For more on this, I’d point you to a great article by Alan Fritzler, project manager for the DSSG program at University of Chicago.


I had a total Twilight Zone moment on Tuesday during a session entitled “Semantics of Data: Integrating Across Tools.” I attended because I thought the discussion was surrounding how the data scientists here want to communicate their tools, or possibly create a directory of tools cross-institutionally to track outputs of the MSDSE.

I was pretty wrong. These scientists talked for AN HOUR AND A HALF on building standard vocabularies, ontologies, metadata schemas, using json-schema, and the semantic web (read: linked data). I was near-faint from surprise.

However, the tone of the conversation left me wondering--where else are the overlaps between science needs and library services? We’ve identified in the LIS field that things like infrastructure (institutional repositories, etc.) are resources for research that should be housed in the library, but where are the boots on the ground librarians? These collaborations are tricky, but maybe they are starting to reach that point of critical mass where we just have to get down to it. Where are my science metadata librarians at? I smell a new field...

That tweet being said, I firmly believe that this is something where librarians (those into metadata--here’s looking at you, Peggy) can collaborate with science to build these vocabularies and schemas. The plain fact of the matter is that the everyday researcher is not equipped to build these ontologies, nor do they really want to--and frankly I don’t blame them. Librarians (read: information professionals) have these skills, want to do the work, and LIS is a service industry. Take advantage of us, science!

However, there was a lot of room in the schedule for hilarity. Between David Hogg’s constant delight in our “obedience” in following directions for lunch seating and another great lightning talk Monday morning on improving the quality of the field (see tweet below), the tone of this conference was jovial, scholarly, and just plain fun. I’m excited for NYU to host next years! Here’s hoping we get a place in the Catskills...

this right here...