Advancing the Human Work of Data Science
Advancing the Human Work of Data Science
In my doctoral dissertation, I unpack data science as a social and situated practice, particularly focusing on the oft-invisible and under-articulated forms of human work constituting data science practices. Examples of such work include: translating high-level abstract goals into computationally tractable data-driven problems, improvising on algorithmic methods and mechanical rules in the face of empirical messiness, iteratively and collaboratively making sense of algorithmic results, and establishing the trustworthiness of data, algorithms, models, and numbers in data science projects. I study such forms of work ethnographically in the context of academic and corporate data science.
and Data Science
Supervised Maya Klabin – a Cornell Information Science senior – to study ways to effectively identify, visualize, and support forms of human work (choices, decisions, and assumptions) in data science practices. We used the methodology of critical and speculative designs to identify possible solutions.
Supervised Dou Mao – Cornell graduate student in Information Science – to develop mid-/high-fidelity prototypes of a “Decision Dashboard” to make visible the choices and decisions comprising data science work. The InVision prototype drew on Amazon’s Denis Batalov’s work (see link: Amazon’s research blog).
Supervised Information Science undergraduate students Sherry Ge and Emily Zhang to develop alternative data analytic approaches to interpret and visualize player performance for League of Legends players. A key component of this year-long project was to reflexively monitor Sherry and Emily’s own choices and decisions in the data analytic process, providing transparent and effective insights into the human work of data analysis.
Data Science Process
Supervised Information Science graduate students Dai Siqi, Chen Pan, Zhenyi Xia, & Val Mack to develop critical data science design solutions, help them gain experience in technical data science work, and implement a process workflow template to document and communicate forms of human data science work.
Few Past Research Projects
Worked as a junior researcher for four months at Cardiff University, UK, in 2011 with Harry Collins and Robert Evans on the Economic Research Council (ERC) Advance Research grant funded Imitation Game (IMGAME) project. The IMGAME project is a new method for cross-cultural and cross-temporal comparison of societies using a web-version of the famous parlor game played between two different, yet interrelated, social groups.
I was involved in organizing IMGAME research experiments in Cardiff and Poland as part of the core organizing team, working closely with the team on project design and implementation. The work ranged from conceptualizing innovations concerning game design to exploring how to analyze quantitative data.
This was my Research Masters thesis for the CAST programme under the supervision of Jan de Roder. The thesis, situated at the intersection of Sociology and Science and Technology Studies (STS), focused on the public expressions of ressentiment within the Dutch debate on immigration. Building on Max Scheler’s sociology of ressentiment, the thesis used theoretical heuristics to analyze the Dutch sociopolitical and cultural landscape with respect to the issue of immigration. A special emphasis of the thesis was on the nature and implications of the democratization of information, primarily through the advent of news media and internet technologies, for the social shaping of the Dutch public opinion.
For this project, I worked with Sally Wyatt as part of the eHumanities group at the Royal Netherlands Academy for Arts and Science (Koninklijke Nederlandse Akademie van Wetenschappen, KNAW). As part of this job, I worked on the EU project titled Network for Excellence in InterNet Science (EINS) and my work involved researching the social shaping of the notions of privacy and trust in relation to online social media technologies as well as analyzing how various online technologies manage user expectations regarding the privacy of their data.
This project, under the supervision of Dan Cosley, was aimed at understanding how people discuss software bugs on GitHub. The outcome of this project was a topic model of a set of related discussions that showcase ‘themes’ in people’s conversations on software bugs. The three main categories of identified topics included programming language syntax, integrative aspects of programming, and ways of finding and representing bugs.
The meta goal of this research was to experiment with the use quantitative research as an add-on to qualitative research. The topic model helped me to get an overview of specific themes in people’s conversations on bugs. This was used to inform conceptual discussions and to create a topic guide for interviewing users, programmers, and developers about software debugging practices.
In this project, I focused on an empirical case-study of software development to highlight specific aspects concerning the negotiated, temporal, and situated character of software testing processes. When and where is software testing? What is the relation between testing, use, non-use, and the user? What is the distinction between software testing and software repair/maintenance? There are some of the questions that I dealt with as part of this research.
Theoretically, this project drew on the works of Madeline Akrich, Bruno Latour, Donald MacKenzie, Trevor Pinch, and Steve Woolgar. I showed how the empirical case-study of software development can help us thing through some of the existing concepts and notions within sociology of testing.