Formulating data science problems is an uncertain and difficult process. It requires various forms of discretionary work to translate high-level objectives or strategic goals into tractable problems, necessitating, among other things, the identification of appropriate target variables and proxies. While these choices are rarely self-evident, normative assessments of data science projects often take them for granted, despite the fact that different translations can raise profoundly different ethical concerns. Whether we consider a data science project fair often has as much to do with the formulation of the problem as any property of the resulting model. Building on six months of ethnographic fieldwork with a corporate data science team—and channeling ideas from critical data studies, science and technology studies, and early writing on knowledge discovery in databases—we describe the complex set of actors and activities involved in problem formulation. Our research demonstrates that the specification and operationalization of the problem are always negotiated and elastic, and rarely worked out with explicit normative considerations in mind. In so doing, we show that careful accounts of everyday data science work can help us better understand how and why data science problems are posed in certain ways—and why certain formulations prevail in practice, even in the face of what might seem like normatively preferable alternatives. We conclude by discussing the implications of our findings, arguing that effective normative interventions will require attending to the practical work of problem formulations.
The trustworthiness of data science systems in applied and real-world setting emerges from the resolution of specific tensions through situated, pragmatic, and ongoing forms of work. Drawing on research in CSCW, critical data studies, and history and sociology of science, and six months of immersive ethnographic fieldwork with a corporate data science team, we describe four common tensions in applied data science work: (un)equivocal numbers, (counter)intuitive knowledge, (in)credible data, and (in)scrutable models. We show how organizational actors establish and re-negotiate trust under messy and uncertain analytic conditions through practices of skepticism, assessment, and credibility. Highlighting the collaborative and heterogeneous nature of real-world data science, we show how the management of trust in applied corporate data science settings depends not just on pre-processing and quantification, but also on negotiation and translation. We conclude by discussing the implications of our findings for data science research and practice, both within and beyond CSCW.
While data science learning and research focus almost-exclusively on the seemingly technical aspects of data science (code, algorithms, data structures, etc.), what remain less-visible are the “human” forms of data science work comprising assumptions, choices, and decisions that data scientists navigate as they move through the messy terrain of data, carving – what in retrospect looks like – a linear narrative of knowledge discovery. I treat the human data science work not separate from the more-visible technical work, but as work that is deeply intertwined with algorithms and technicalities. In my work, I study such forms of human data science work ethnographically in two separate contexts: (a) academic data science (one-year of ethnographic research conducted in graduate-level machine learning and data mining classrooms in a US university), and (b) corporate data science (6-months of ethnographic research conducted at a US tech firm working as a data scientist). For this workshop, I focus on the question of ‘stakes’ in data science. Specifically, I present my findings to showcase how commonly identified sociotechnical ‘stakes’ in data science (e.g., ontology, marginalization, and bias) appear to and get instantiated within everyday data science practices of data processing, modeling, analysis, visualization, and evaluation.
Critical data studies research has made visible the ‘design-use gap’—users (as people most affected by data science systems) often do not have a say in the system’s design. Much discussion thus focuses on the role and place of user participation in data science practices. In this piece, however, I focus on already-existing forms of participatory work in corporate data science practices. Corporate projects are highly participatory in nature (though not in the way we often define and expect participation). These projects necessitate diverse forms of work on the part of multiple personnel such as data scientists, project managers, business analysts, product managers, and business executives. Unpacking the collaborative work in corporate data science projects as forms of participation provides us with a different perspective on the design-use gap, helping us focus on different forms of participation in corporate data science practices.
The growth of data-driven applications has led to increased interest on part of social scientists in developing critical forms of data literacy to help evaluate the knowledge produced with and through algorithms. Within this endeavor, in this presentation, I focus on critical forms of literacy for data visualizations. Contemporary social science work on data visualizations tends to focus extensively on data journalism, which remains but one part of data analytics’ visual discourse. In my own research on learning environments, I found that visuals also play a key role in how algorithms are demonstrated to and applied by would-be data analysts. In this paper, I build on social science work on vision (e.g., Goodwin) and scientific representational practices (e.g., Lynch) to show how learning data analysis also requires learning forms of ‘visual thinking’ i.e. thinking with and through visuals. An instance of this can be seen, for example, in the use of graphs/matrices to generate order and organization, enabling students to see data in forms amenable to human perception and action. I use two sets of empirics for this argument: participant-observation of (a) two semester-long graduate level data analytic courses, and (b) a series of three data analytic workshops organized at a major U.S. East Coast university. My aim in this presentation is to show how the vocabulary of visual thinking enables us to unpack data visuals not just as representations, but also as sociomaterial artifacts constituting the very practice of data analysis – well beyond the immediate contexts of the classroom.
Learning to see through data is central to contemporary forms of algorithmic knowledge production. While often represented as a mechanical application of rules, making algorithms work with data requires a great deal of situated work. This paper examines how the often-divergent demands of mechanization and discretion manifest in data analytic learning environments. Drawing on research in CSCW and the social sciences, and ethnographic fieldwork in two data learning environments, we show how an algorithm’s application is seen sometimes as a mechanical sequence of rules and at other times as an array of situated decisions. Casting data analytics as a rule-based (rather than rule-bound) practice, we show that effective data vision requires would-be analysts to straddle the competing demands of formal abstraction and empirical contingency. We conclude by discussing how the notion of data vision can help better leverage the role of human work in data analytic learning, research, and practice.
In this workshop paper, we use an empirical example from our ongoing fieldwork, to showcase the complexity and situatedness of the process of making sense of algorithmic results; i.e. how to evaluate, validate, and contextualize algorithmic outputs. So far, in our research work, we have focused on such sense-making processes in data analytic learning environments such as classrooms and training workshops. Multiple moments in our fieldwork suggest that meaning, in data analytics, is constructed through an iterative and reflexive dialogue between data, code, assumptions, prior knowledge, and algorithmic results. A data analytic result is nothing short of a sociotechnical accomplishment – one in which it is extremely difficult, if not at times impossible, to clearly distinguish between ‘human’ and ‘technical’ forms of data analytic work. We conclude this paper with a set of questions that we would like to explore further in this workshop.
Focusing on data analytic pedagogy, in this presentation I show how students learn to make sense of algorithmic output in relation to data, code, and prior knowledge. I showcase this by drawing out the relation and contrast between human and machine understanding of algorithmically outputted numbers. This presentation conceptualizes data analytics as a situated process: one that necessitates iterative decisions to adapt prior knowledge, code, contingent data, and algorithmic output to each other. Learning to master such forms of iteration, adaption, and discretion then is an integral part of being a data analyst. I focus on the pedagogy of data analytics to demonstrate how students learn to make sense of algorithmic output in relation to underlying data and algorithmic code. While data analysis is often understood as the work of mechanized tools, I focus on the discretionary human work required to organize and interpret the world algorithmically, explicitly drawing out the relation between human and machine understanding of numbers especially in the ways in which this relationship is enacted through class exercises, examples, and demonstrations. In a learning environment, there is an explicit focus on demonstrating established methods, tools, and theories to students. Focusing on data analytic pedagogy, then, helps us to not only better understand foundational data analytic practices, but also explore how and why certain forms of standardized data sensemaking processes come to be. To make my argument, I draw on two sets of empirics: participant-observation of (a) two semester long senior/graduate-level data analytic courses, and (b) a series of three data analytic training workshops taught/organized at a major U.S. East Coast university. Conceptually, this paper draws on research in STS on social studies of algorithms,sociology of scientific knowledge, sociology of numbers, and professional vision.
When discussing the difficulties faced by interdisciplinary researchers, we often talk about issues of methodology (how to study what you want to study), of scope (how to limit the mess in our research), of access (how to interpret the work done in different disciplines), of audience (who do we want to talk to), and of profession (what kind of job can we get or should aim for). All of these are important issues. However, I argue that there exists another set of pertinent, often overlooked, challenges faced by interdisciplinary researchers. These challenges are, in my opinion, especially – but not exclusively – salient for junior interdisciplinary researchers who constantly straddle between two or more disciplines. These arise within everyday interpretations of the professional identity of these researchers. That is: not only the ways in which my colleagues interpret what I do, how I do it, and why I do it, but also what my colleagues expect of me and my work when they hear that I am from x or y department or that I am doing a PhD in x or y field. In this paper, I talk about how I have experienced this specific challenge in my own lived experiences of being an “Information Science PhD student.” In particular, I will describe this challenge within the explicit context of my own movement from the field of Science & Technology Studies (STS) to the field of Information Science (IS).
In this talk I focus on an empirical case-study of software development to highlight certain aspects concerning the negotiated, temporal, and situated character of the various processes involved within software testing. What is and isn’t software testing? When and where is software testing? What is the relation between testing, use, non-use, and the user? What is the distinction between software testing and software repair/maintenance? These are some of the questions that I will touch upon in this talk. Theoretically, this talk is situated at the intersection of Information Science (IS) and Science & Technology Studies (STS). Within sociology of testing, several scholars have worked on different aspects concerning technology testing such as the work on user configurations (Steve Woolgar), scripts (Madeleine Akrich), programs and anti-programs (Bruno Latour), similarity relationships (Donald MacKenzie), role of the user (Trevor Pinch) etc. In this talk I will show how the particular case of software testing can help us think through some of these concepts in interesting and different ways.
This paper demonstrates the application of bibliometric mapping techniques in the area of funded research networks. We discuss how science maps can be used to facilitate communication inside newly formed communities, but also to account for their activities to funding agencies. We present the mapping of EINS as case — an FP7 funded Network of Excellence. Finally, we discuss how these techniques can be used to serve as knowledge maps for interdisciplinary working experts.
This presentation will highlight privacy issues raised by increasing access to social networks made possible by various mobile applications. I will focus on the unintended consequences of the ability of third-party apps to interact not only with the online databases and services of social networks but also with a user’s personal data within the mobile device itself. Although such apps can be regulated on standardized app-stores provided by Google or Apple, the ease of working with social and mobile platforms makes it increasingly difficult to manage and govern the intentionality of the large number of mobile apps that are developed each day. Social networks and mobile devices have now become ubiquitous tools that are used by individuals to manage their everyday lives and mobile app development has become a substantial market in itself. In such a scenario, it is imperative to examine the implications of the ability of third-party applications to facilitate the large scale convergence of user information in ways that are quite novel and non-traditional. In a time when ‘privacy as contextual integrity’ and ‘privacy by design’ are issues that are featured prominently on the societal agenda, this presentation will provide insights into questions such as what contextual integrity translates to for the increasingly ubiquitous mobile medium or what must we know before we start designing privacy into mobile apps and social platforms?