In my doctoral dissertation project, I study how data scientists (learn to) approach, organize, and analyze the world with and through data structures, computational algorithms, and statistical techniques. I am particularly interested in oft-invisible and under-articulated forms of human work constituting data science research and practice. Instances of such work include, e.g., the work involved in data wrangling and preprocessing, or project decisions involved in steering and managing data science projects.
Using ethnographic research, in this project we explore questions such as: what forms of human and technical work are involved in data science and analytics, what are the relations between routine, creative, and expert data science work, how do people situate and evaluate data science results to make them meaningful in business, technical, and social contexts, etc. I study such forms of work ethnographically in the context of data science classrooms/workshops as well as corporate data science teams. My aim is to not only produce a better understanding of data science, but also to use, design, and develop tools and methods that can help us more effectively perform, demonstrate, and evaluate data science.
This work is currently supported by NSF grant CHS-1526155: Advancing the Human Work of Data Analytics.
In the past this work has been supported by Cornell University’s Department of Information Science and Intel Science & Technology Center for Social Computing (ISTC).
Supervised Maya Klabin – a senior at Cornell with a major in Information Science – on a project in which we explored ways in which we can effectively identify and better support the myriad forms of human work (choices, decision making, assumptions, etc.) that go into data science. We used the methodology of critical and speculative designs to identify possible solutions that can help not only produce more effective analyses, but also provide new ways to visualize data and algorithmic work.
Supervised Dou Mao – an Information Science MPS graduate at Cornell University – on a project in which we developed mid- and high-fidelity prototypes of a “Decision Dashboard” that facilitates the visibility of human choices and decisions within data science. The proof-of-concept InVision prototype developed as part of the project draws loosely on Denis Batalov’s – ML expert working at Amazon – work as detailed on Amazon’s research blog.
Supervised four Information Science MPS graduate students (Dai Siqi, Chen Pan, Zhenyi Xia, & Val Mack) to design and develop data analytic tools and engage in data science. The students developed critical designs related to the human work of data science, gained experience in conducting data science, and designed and implemented a web template that focuses on forms of human data science work. The aim of this project was to create a series of critical designs and build a series of tools that highlight and support the different forms of human work and decisions that comprise data science practices.
In this project, I focused on an empirical case-study of software development to highlight specific aspects concerning the negotiated, temporal, and situated character of software testing processes. When and where is software testing? What is the relation between testing, use, non-use, and the user? What is the distinction between software testing and software repair/maintenance? There are some of the questions that I dealt with as part of this research.
Theoretically, this project drew on the works of Madeline Akrich, Bruno Latour, Donald MacKenzie, Trevor Pinch, and Steve Woolgar. I showed how the empirical case-study of software development can help us thing through some of the existing concepts and notions within sociology of testing.
This project, under the supervision of Dan Cosley, was aimed at understanding how people discuss software bugs on GitHub. The outcome of this project was a topic model of a set of related discussions that showcase ‘themes’ in people’s conversations on software bugs. The three main categories of identified topics included programming language syntax, integrative aspects of programming, and ways of finding and representing bugs.
The meta goal of this research was to experiment with the use quantitative research as an add-on to qualitative research. The topic model helped me to get an overview of specific themes in people’s conversations on bugs. This was used to inform conceptual discussions and to create a topic guide for interviewing users, programmers, and developers about software debugging practices.
In this project, under the supervision of Phoebe Sengers, I analyzed students’ practices of annotating academic texts. My main focus was on the choices students make while accomplishing this practice: which part to annotate and for what purpose, what tool to use, what the use of a particular tool signifies, etc.
The study’s format was a combination of two qualitative research methods: focus groups and semi-structured interviewing. Eight students, in groups of two, participated in an hour-long discussion with me. All the participants, including myself, were students enrolled in the same course. For the discussion we focused on the similarities/differences in our annotations: How and why we annotate? What annotation tools do we use and what do we use them for? How do these tools shape the ways in which we engage with the text?
This project is situated within my larger interest in studying how people accomplish everyday practices. To accomplish is to achieve some form of a successful outcome through an involvement with the world around us. An everyday practice, for a particular group of people, is one that is understood as being a part of everyday life – ordinary and mundane. Studying everyday practices helps us unpack this mundaneness to better understand the nature of entanglement between us and the material world.
For this project, I worked with Sally Wyatt as part of the eHumanities group at the Royal Netherlands Academy for Arts and Science (Koninklijke Nederlandse Akademie van Wetenschappen, KNAW). As part of this job, I worked on the EU project titled Network for Excellence in InterNet Science (EINS) and my work involved researching the social shaping of the notions of privacy and trust in relation to online social media technologies as well as analyzing how various online technologies manage user expectations regarding the privacy of their data.
This project was done as part of the Research Masters thesis for the CAST programme under the supervision of Jan de Roder. The thesis, situated at the intersection of Sociology and Science and Technology Studies (STS), focused on the public expressions of ressentiment within the Dutch debate on immigration. Building on Max Scheler’s sociology of ressentiment, the thesis used theoretical heuristics to analyze the Dutch sociopolitical and cultural landscape with respect to the issue of immigration. A special emphasis of the thesis was on the nature and implications of the democratization of information, primarily through the advent of news media and internet technologies, for the social shaping of the Dutch public opinion.
In the capacity of a junior research assistant, I worked for four months at Cardiff University, UK, in 2011. During this time I worked with Harry Collins and Robert Evans on the Economic Research Council (ERC) Advance Research grant funded Imitation Game (IMGAME) project. The IMGAME project is a new method for cross-cultural and cross-temporal comparison of societies using a web-version of the famous parlor game played between two different, yet interrelated, social groups.
During this time I was involved in organizing IMGAME research experiments in Cardiff and Poland as part of the core organizing team and to work closely with the research team on the design and implementation of the project. The work ranged from conceptualizing innovations concerning the game’s design and implementation to exploring the multiplicity of ways in which the gathered quantitative data could be analyzed. Another important strand of my work involved testing the IMGAME software and analyzing how the users interacted with it.