Advancing the Human Work of Data Science

In my dissertation, I unpack data science as a social and situated practice, particularly focusing on the oft-invisible and under-articulated forms of human work constituting data science practices. Examples of such work include: translating high-level abstract goals into computationally tractable data-driven problems, improvising on algorithmic methods and mechanical rules in the face of empirical messiness, iteratively and collaboratively making sense of algorithmic results, and establishing the trustworthiness of data, algorithms, models, and numbers in data science projects. I study such forms of work ethnographically in the context of academic and corporate data science.

Research Supervision

Critical Design
and Data Science

Supervised Maya Klabin – a Cornell Information Science senior – to study ways to effectively identify, visualize, and support forms of human work (choices, decisions, and assumptions) in data science practices. We used the methodology of critical and speculative designs to identify possible solutions.

Data Science
Decision Dashboards

Supervised Dou Mao – Cornell graduate student in Information Science – to develop mid-/high-fidelity prototypes of a “Decision Dashboard” to make visible the choices and decisions comprising data science work. The InVision prototype drew on Amazon’s Denis Batalov’s work (see link: Amazon’s research blog).

Data Analysis

Supervised Information Science undergraduate students Sherry Ge and Emily Zhang to develop alternative data analytic approaches to interpret and visualize player performance for League of Legends players. A key component of this year-long project was to reflexively monitor Sherry and Emily’s own choices and decisions in the data analytic process, providing transparent and effective insights into the human work of data analysis.

Visualizing the
Data Science Process

Supervised Information Science graduate students Dai SiqiChen PanZhenyi Xia, & Val Mack to develop critical data science design solutions, help them gain experience in technical data science work, and implement a process workflow template to document and communicate forms of human data science work.

Selected Past Projects

Imitation Game

Worked at Cardiff University in 2011 with Harry Collins and Robert Evans on the Economic Research Council (ERC) Advance Research grant funded Imitation Game (IMGAME) project. The IMGAME project is a new method for cross-cultural and cross-temporal comparison of societies using a web-version of the famous parlor game played between two different, yet interrelated, social groups.

Click here for the poster that describes the working of the IMGAME project and also my analysis of the project’s development and challenges.

Digital Ressentiment

Research Masters thesis for the CAST programme under the supervision of Jan de Roder. The thesis focused on the public expressions of ressentiment within the Dutch debate on immigration. Building on Max Scheler’s sociology of ressentiment, the thesis used theoretical heuristics to analyze the Dutch sociopolitical and cultural landscape with respect to the issue of immigration. A special emphasis of the thesis was on the nature and implications of the democratization of information, primarily through the advent of news media and internet technologies, for the social shaping of the Dutch public opinion.

Internet Science: Online Privacy, Identity, Trust, and Reputation

Worked with Sally Wyatt as part of the eHumanities group at the Royal Netherlands Academy for Arts and Science (Koninklijke Nederlandse Akademie van Wetenschappen, KNAW). Worked on the EU project titled Network for Excellence in InterNet Science (EINS). The work involved researching the social shaping of the notions of privacy and trust in relation to online social media technologies and analyzing how various online technologies manage user expectations regarding the privacy of their data.

Online Bug Discussions on GitHub

This project (supervisor: Dan Cosley) was aimed at understanding how people discuss software bugs on GitHub. The outcome of this project was a topic model of a set of related discussions that showcase ‘themes’ in people’s conversations on software bugs. The three main categories of identified topics included programming language syntax, integrative aspects of programming, and ways of finding and representing bugs.

The meta goal of this research was to experiment with the use of quantitative research as an add-on to qualitative research. The topic model helped to get an overview of specific themes in people’s conversations on bugs. This was used to inform conceptual discussions and to create a topic guide for interviewing users, programmers, and developers about software debugging practices.

Situating Software within Sociology of Testing

This project focused on an empirical case-study of software development to highlight specific aspects concerning the negotiated, temporal, and situated character of software testing processes. When and where is software testing? What is the relation between testing, use, non-use, and the user? What is the distinction between software testing and software repair/maintenance? There are some of the questions that I dealt with as part of this research.

The project showed how the empirical case-study of software development can help us think through some of the existing concepts and notions within the sociology of testing.