2020 NSF CISE-SBE VR on Harnessing the Computational and Social Sciences to Solve Critical Societal Problems - PC Group

The Virtual Roundtable on Harnessing the Computational and Social Sciences to Solve Critical Societal Problems is an invitation-only event jointly convened by the National Science Foundation directorates of Computer and Information Science and Engineering (CISE) and Social, Behavioral, and Economic Sciences (SBE). The roundtable was held on Tuesday, May 19, 2020 from 10:45 AM to 5:00 PM EST.

Goals
The goals of the roundtable are to foster and scaffold collaborative scientific research in the computational and social-behavioral-economic spheres, to underscore the deep interdependence of technological and social systems, as well as to explore ideas to better the collaboration between academia and industry around data, science, and society.

Organization
The roundtable will be structured around three thematic sessions:

Rendering visible, understanding, and ultimately reducing long-standing disparities
Improving the trustworthiness of the information ecosystem
Empowering and diversifying the technical workforce

We anticipate that this roundtable is an early step in fostering and scaffolding collaborative research programs between the NSF CISE and SBE directorates. In addition to understanding possible research agendas for each of these substantive areas, we will also focus attention on cross-cutting challenges such as the need for new research infrastructure, industry-academic partnerships, interdisciplinary collaborations, and training and education programs.

Background
It is increasingly apparent that many of the systems on which our society depends for its health, prosperity, and security are neither purely social or purely technical ones. Rather, they are socio-technical systems. Workplace relationships, media markets, health delivery systems, and criminal justice organizations are all increasingly characterized by a complex mixture of human actors and institutions on the one hand, and digital platforms and algorithms on the other hand. Efforts to design, manage, audit, and ultimately improve these systems to the benefit of society therefore lie at the intersection of the computational sciences and the social-behavioral sciences.

This intersection has been cast into sharp relief by the COVID 19 pandemic, which--in addition to presenting humanity with an almost unprecedented global public health crisis--has exposed innumerable connections between interpersonal interaction and almost every other element of society. As we embark on this profound collective experiment in social distancing, technology is transforming commercial, social, professional, and cultural interactions and experiences--and being transformed by them--on the timescale of weeks rather than years or decades. Technology is also likely to emerge as a key component of any societal response to the pandemic--enabling, for example, real-time detection and contact tracing--with all the attendant concerns about individual privacy, social justice, and civil liberties.

In the short run, this existential crisis calls for an urgent all-hands-on-deck response from the scientific community in the form of expert advice and rapid analysis. In the longer run, however, we must also grapple with pressing questions about the organization of scientific research at the intersection of computational and social sciences. One area--often referred to as computational social science--has been gaining prominence in recent years, generating new conferences, funding opportunities, and jobs, both in academia and industry. However, to deliver on the potential of collaborative research in socio-technical systems, including generating practical solutions to real-world socio-technical problems, a number of challenges must be overcome that will require new streams of funding, new models of interdisciplinary collaboration among researchers, and a new relationship between academia, government and industry.

Core Roundtable Themes
The virtual roundtable will be organized around three core substantive themes, each of which corresponds to a contemporary societal concern of widespread interest:

Rendering visible, understanding, and reducing historical disparities. Today, citizens of different demographic, geographic, and socio-economic backgrounds experience starkly disparate opportunities across a number of dimensions, including educational achievement, economic security, health, likelihood of incarceration, and longevity. In a number of cases, technological advances appear to be combining with historical legacies to increase these disparities rather than reducing them. Can we identify and frame a research program that would tease out the social impact of new technological developments? Can we identify concrete practices that the public and private sectors can use to increase opportunity for all Americans while reducing socially-unproductive disparities? Can research design and innovation that reduces harmful disparities be incentivized?

Improving trust in the information ecosystem. To function properly democratic societies require their citizens to have reliable access to accurate and trusted information on issues of political, health, and scientific nature. Traditionally, this information has been produced by expert communities, disseminated by a relatively small number of publishing and broadcast organizations, and consumed in professionally curated packages and through interaction with trusted organizations and professions. In the past 30 years, this system of information production and consumption has been profoundly disrupted by new technologies, including the web, social media, and mobile devices. Information about essentially any topic can now be produced by any individual with an internet connection, can be disseminated to audiences of essentially any size without intermediation, and can be consumed in a wide variety of formats (video, podcast, SMS, tweet, etc.) on a wide variety of devices. Complicating matters, the distribution and consumption of content is increasingly a mixture of human and algorithmic decision making, further mediated by digital tools, raising concerns about proliferation of biased, inaccurate, and outright false information. New methods are needed for identifying, classifying, quantifying, and tracking misinformation in all its manifestations, as well as measuring its consequences for public understanding and opinion. Moreover, new partnerships are needed between academia and industry to design, build, and evaluate automated tools for mitigating the harmful effects of misinformation at scale and under real-world conditions.

Empowering the skilled technical workforce. America’s skilled technical workforce is critical to continuing American science leadership and to fueling the scientific advances that lead to quality of life improvements. However, by many estimates, a considerable number of today’s jobs will soon be eliminated or completely transformed. At the same time, the skills needed for providing many non-economic services and a wide range of highly impactful social interactions are likely to undergo similar change. How can research at the intersection of computer science and social/behavioral science create tangible quality of life improvements through helping industry, government, academia, community organizations, and individuals better understand, and more effectively serve, today’s and tomorrow’s skilled technical workforce?

Cross-Cutting Challenges. In addition to understanding possible research agendas for each of these substantive areas, we also need to focus attention on four cross-cutting challenges:

Partnering with industry around data sharing and implementation
Over the past two decades technology companies have built a dizzying array of digital systems and platforms to facilitate interpersonal communication, social networking, e-commerce, information retrieval, publishing, collaborative work, and many other applications. In addition to transforming large swathes of social and economic life, these technologies have also generated a staggering volume and diversity of data that is of potential interest to social and behavioral scientists. Indeed, the explosion of research that has taken place over this time period in data science and computational social science has been overwhelmingly fueled by this “digital exhaust.”

Unfortunately, researchers’ access to these data is ad-hoc, unreliable, inequitable, and highly non-transparent. In some cases researchers can use publicly facing programming tools (e.g. APIs) but are then subject to rules and restrictions that are typically determined without consideration of research-specific requirements. In other cases, access can only be granted to employees, requiring that researchers or their students work for the company under non-disclosure agreements. Finally, particular datasets are sometimes made available to individual researchers via social contacts or other one-off arrangements. As a result, the vast majority of researchers have no clear means of accessing the vast majority of data collected by companies. Moreover, research results are often impossible to replicate, either because the original data are unavailable or the conditions under which they were collected have changed.

Increased accessibility and transparency around data sharing could also yield considerable benefits for industry. For medium and small technology businesses that lack their own research labs, collaboration with the academic research community could translate directly to valuable insights that would otherwise remain out of scope. Even for large, established tech companies, increased transparency and external collaboration could improve the quality of their services as well as increase public trust.

Building large-scale, shared data infrastructures for research purposes
Improved access to industry data would dramatically accelerate computational social and behavioral science. On its own, however, it would not be a panacea, for at least two reasons. First, data collected by companies and government agencies are often non-representative or biased in other ways (e.g. because some people generate far more data than others, or because different individuals choose to share different levels of information about themselves). Second, because commercial systems are designed to provide useful services, not to answer scientific questions, they may not collect the data of interest in the first place, or may collect it in ways that are difficult to utilize.

An alternative strategy to partnering with industry, therefore, is for the research community to build its own data infrastructures that are designed specifically to support research. Shared research infrastructure is a familiar concept in the physical, biological, and engineering sciences, encompassing examples such as the Large Hadron Collider, the Laser Interferometer Gravitational-Wave Observatory (LIGO), the Hubble Space Telescope, the Sloan Digital Sky Survey, and the Human Genome Project. In the social sciences, the main examples of shared data infrastructure are long-running surveys such as the General Social Survey (GSS), the American National Elections Survey (ANES), and the Panel Study of Income Dynamics (PSID). However, the scale, complexity, and temporality of digital data are sufficiently different from survey data that whole new designs will be required.

Fostering collaborations between CISE and SBE researchers
In addition to more and better data, rapid progress in computational social and behavioral science will require new models of collaboration between CISE and SBE researchers. Currently, computer and social/behavioral researchers utilize distinct publishing models, where the former prioritize publishing in annual, peer-reviewed conferences and the latter in journals. Partly as a consequence, computer scientists tend to publish more frequently and with shorter turnaround times than social and behavioral scientists, where the length between initial submission and eventual publication can easily stretch into years. Computer science publications also tend to have more coauthors than their social/behavioral counterparts, where single authored papers or books are still considered helpful (or in some cases required) for tenure and promotion. Finally, norms governing what is considered a valid contribution vary widely between the two communities, with social/behavioral scientists placing more emphasis on theory-driven explanations and computer scientists valuing accurate predictions and/or working systems. Moreover, the spectrum of SBE researchers includes both quantitative and qualitative researchers and experimentalists, offering both hurdles and new possibilities for cross-disciplinary collaboration.

Collaborations between computer and social/behavioral researchers are therefore complicated by conflicting incentives and inconsistent world views. SBE scientists may worry that conference proceedings with a large number of coauthors will not be valued by their peers, while computer scientists may be unwilling to wait for two to four years to publish work in an unfamiliar journal when the same content could be published within a year in one of their own conferences. Accordingly, one outcome of this collaboration might be the creation of a new open-source journal that combines facets of CS and social science norms as part of the necessary infrastructure of a new way of doing research.

Developing training and educational programs
In recent years a number of Computational Social Science courses (e.g. at Princeton, certificates (e.g. Stanford), masters programs (e.g. U Chicago), and summer training programs (e.g. Summer Institute on Computational Social Science, BIGSSS Computational Social Science Summer School, Santa Fe Institute Graduate Workshop on Computational Social Science) have appeared. More broadly, “CS+X” programs such as those at University of Illinois and Northwestern University have created opportunities for computer science and other disciplines to interact via workshops, joint faculty hiring, and new undergraduate majors. Notwithstanding these welcome innovations, training at the PhD level remains highly segregated between the SBE and CISE communities. Although SBE students are increasingly engaging in informal cross training, courses like data science and machine learning are not routinely included in their curricular. Reciprocally, CISE students receive little exposure to SBE relevant topics such as causal inference, research design, and substantive theory. Building on existing efforts, guidelines for designing certificates, masters, and full PhD programs in CSS would be extremely valuable.