Problem Formulation and Fairness

Conference Paper
Samir Passi, Solon Barocas
In ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*). January 29-31, 2019, Atlanta, Georgia.
Publication year: 2019

Abstract:

Formulating data science problems is an uncertain and difficult process. It requires various forms of discretionary work to translate high-level objectives or strategic goals into tractable problems, necessitating, among other things, the identification of appropriate target variables and proxies. While these choices are rarely self-evident, normative assessments of data science projects often take them for granted, despite the fact that different translations can raise profoundly different ethical concerns. Whether we consider a data science project fair often has as much to do with the formulation of the problem as any property of the resulting model. Building on six months of ethnographic fieldwork with a corporate data science team—and channeling ideas from critical data studies, science and technology studies, and early writing on knowledge discovery in databases—we describe the complex set of actors and activities involved in problem formulation. Our research demonstrates that the specification and operationalization of the problem are always negotiated and elastic, and rarely worked out with explicit normative considerations in mind. In so doing, we show that careful accounts of everyday data science work can help us better understand how and why data science problems are posed in certain ways—and why certain formulations prevail in practice, even in the face of what might seem like normatively preferable alternatives. We conclude by discussing the implications of our findings, arguing that effective normative interventions will require attending to the practical work of problem formulations.

Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects

Best Paper AwardConference Paper
Samir Passi, Steven Jackson
In Proceedings of the ACM on Human-Computer Interaction, Vol. 2, CSCW, Article 136, (November 2018). ACM. New York, NY.
Publication year: 2018

Abstract:

The trustworthiness of data science systems in applied and real-world setting emerges from the resolution of specific tensions through situated, pragmatic, and ongoing forms of work. Drawing on research in CSCW, critical data studies, and history and sociology of science, and six months of immersive ethnographic fieldwork with a corporate data science team, we describe four common tensions in applied data science work: (un)equivocal numbers, (counter)intuitive knowledge, (in)credible data, and (in)scrutable models. We show how organizational actors establish and re-negotiate trust under messy and uncertain analytic conditions through practices of skepticism, assessment, and credibility. Highlighting the collaborative and heterogeneous nature of real-world data science, we show how the management of trust in applied corporate data science settings depends not just on pre-processing and quantification, but also on negotiation and translation. We conclude by discussing the implications of our findings for data science research and practice, both within and beyond CSCW.

Data Vision: Learning to See Through Algorithmic Abstraction

Best Paper AwardConference Paper
Samir Passi, Steven Jackson
In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW '17). ACM, New York, NY, USA, 2436-2447. DOI: https://doi.org/10.1145/2998181.2998331
Publication year: 2017

Abstract:

Learning to see through data is central to contemporary forms of algorithmic knowledge production. While often represented as a mechanical application of rules, making algorithms work with data requires a great deal of situated work. This paper examines how the often-divergent demands of mechanization and discretion manifest in data analytic learning environments. Drawing on research in CSCW and the social sciences, and ethnographic fieldwork in two data learning environments, we show how an algorithm’s application is seen sometimes as a mechanical sequence of rules and at other times as an array of situated decisions. Casting data analytics as a rule-based (rather than rule-bound) practice, we show that effective data vision requires would-be analysts to straddle the competing demands of formal abstraction and empirical contingency. We conclude by discussing how the notion of data vision can help better leverage the role of human work in data analytic learning, research, and practice.

Click here to see media coverage of this paper from Cornell Research.

Mapping EINS: An exercise in mapping the Network of Excellence in Internet Science

Conference Paper
Almila Akdag Salah, Sally Wyatt, Samir Passi, Andrea Scharnhorst
arXiv preprint arXiv:1304.5753, 2013.
Publication year: 2013

Abstract:

This paper demonstrates the application of bibliometric mapping techniques in the area of funded research networks. We discuss how science maps can be used to facilitate communication inside newly formed communities, but also to account for their activities to funding agencies. We present the mapping of EINS as case — an FP7 funded Network of Excellence. Finally, we discuss how these techniques can be used to serve as knowledge maps for interdisciplinary working experts.

Deconstructing the Time-Traveling Identity

Conference Paper
Samir Passi, Ranjit Singh
Proceedings of the International conference on the Philosophy of Computer Games, Athens, Greece.
Publication year: 2011