June 16 to June 29, 2019 | Princeton University

Sponsored by the Russell Sage Foundation & The Alfred P. Sloan Foundation


From the evening of Sunday, June 16 to the morning of Saturday, June 29, 2019, the Russell Sage Foundation and the Alfred P. Sloan Foundation will sponsor the Summer Institute in Computational Social Science, to be held at Princeton University. The purpose of the Summer Institute is to bring together graduate students, postdoctoral researchers, and beginning faculty interested in computational social science. The Summer Institute is for both social scientists (broadly conceived) and data scientists (broadly conceived). The co-organizers and principal faculty of the Summer Institute are Christopher Bail and Matthew Salganik. In addition to the event at Princeton, there will also be a number of partner locations run by alumni of the 2017 and 2018 Summer Institute, which will be hosted in Bamberg, Germany (University of Bamberg), Boston, MA (MIT), Cape Town, South Africa (University of Cape Town), Chicago, IL (Northwestern University), Istanbul, Turkey (Kadir Has University), Monterrey, Mexico (Universidad Autónoma de Nuevo León), Oxford, United Kingdom (Oxford University), New York, NY (Hunter College-CUNY), Research Triangle Park, NC (RTI International), and Zürich, Switzerland (ETH Zürich).

The instructional program will involve lectures, group problem sets, and participant-led research projects. There will also be outside speakers who conduct computational social science research in a variety of settings, such as academia, industry, and government. Topics covered include text as data, website scraping, digital field experiments, non-probability sampling, mass collaboration, and ethics. There will be ample opportunities for students to discuss their ideas and research with the organizers, other participants, and visiting speakers. Because we are committed to open and reproducible research, all materials created by faculty and students for the Summer Institute will be released open source.

Participation is restricted to Ph.D. students, postdoctoral researchers, and untenured faculty within 7 years of their Ph.D. Most participant costs during the workshop, including housing and most meals, will be covered, and most travel expenses will be reimbursed up to a set cap. We welcome applicants from all backgrounds and fields of study, especially applicants from groups currently under-represented in computational social science. About thirty participants will be invited, and participants are expected to fully attend and participate in the entire two-week program.

Application materials are due Wednesday, February 20, 2019.

Faculty

Matthew Salganik

Matthew Salganik is Professor of Sociology at Princeton University, and he is affiliated with several of Princeton’s interdisciplinary research centers: the Office for Population Research, the Center for Information Technology Policy, the Center for Health and Wellbeing, and the Center for Statistics and Machine Learning. His research interests include social networks and computational social science. He is the author of Bit by Bit: Social Research in the Digital Age.

Chris Bail

Chris Bail is the Douglas and Ellen Lowey Associate Professor of Sociology and Public Policy at Duke University and a member of the Interdisciplinary Program on Data Science, the Duke Network Analysis Center, and the Duke Population Research Institute. His research examines how non-profit organizations and other political actors shape social media discourse using large text-based datasets and apps for social science research. He is the author of Terrified: How Anti-Muslim Fringe Organizations Became Mainstream.

Speakers

Justin Grimmer

Justin Grimmer is a Professor in Stanford University’s Department of Political Science. His current research focuses on American political institutions, elections, and developing new machine-learning methods for the study of politics. He is the author of Representational Style in Congress: What Legislators Say and Why It Matters and The Impression of Influence: Legislator Communication, Representation, and Democratic Accountability.

Annie Liang

Annie Liang is an Assistant Professor of Economics at the University of Pennsylvania. She received her PhD from Harvard in 2016, and she spent 2016-7 as a post-doctoral researcher at Microsoft Research-New England. Her research is in economic theory (in particular, learning and information), and the application of machine learning methods for theory building and evaluation.

Beth Noveck

Beth Noveck is a Professor in the Technology, Culture, and Society department at the New York University Tandon School of Engineering, where she directs the Governance Lab. New Jersey governor Phil Murphy appointed her as the state’s first Chief Innovation Officer in 2018. Previously, Beth served as the first United States Deputy Chief Technology Officer and director of the White House Open Government Initiative under President Obama. UK Prime Minister David Cameron appointed her senior advisor for Open Government.

Chris Wiggins

Chris Wiggins is an associate professor of applied mathematics at Columbia University and the Chief Data Scientist at The New York Times. At Columbia he is a founding member of the executive committee of the Data Science Institute, and of the Department of Systems Biology, and is affiliated faculty in Statistics. He is a co-founder and co-organizer of hackNY, a nonprofit which since 2010 has organized once a semester student hackathons and the hackNY Fellows Program.

More coming soon

Teaching Assistants

Image of Alex Kindel

Alex Kindel

Alex is a PhD candidate in the Department of Sociology at Princeton University. He is interested in computational social science, historical sociology, and the formal organization of knowledge. His dissertation traces the development of statistical tools, methods, and standards in applied research since the 1950s.

Image of Cambria Naslund

Cambria Naslund

Cambria is a graduate student in sociology at Princeton University. She uses computational methods to study questions in the sociologies of science, medicine, and technology. Her current research explores public understandings of medical knowledge and diagnoses using text and image data from newspapers and crowdfunding campaigns. She completed her B.A. in Social Research and Public Policy at NYU Abu Dhabi.

Image of Simone Zhang

Simone Zhang

Simone is a graduate student in sociology at Princeton University. Her research examines how technology is reshaping how people interact with the organizations they encounter in everday life. She draws on experiments and digital trace data to study the implications of these shifts for social inclusion, socioeconomic outcomes, and trust in institutions.

Pre-arrival

As we discussed in our call for applications, we have arranged two types of training prior to the event this summer. Some students have more sophisticated coding skills but little exposure to social science; other students have significant exposure to social science but lack coding skills.

Coding

The majority of the coding work presented at the 2019 SICSS will employ R. However, you are welcome to employ a language of your choice, such as Python, Julia, or other languages that are commonly used by computational social scientists. If you would like to work in R, we recommend that you complete the following courses within DataCamp, a website with courses on many topics related to data science. Obviously, you only need to complete the classes with material that you would like to learn.

We thank DataCamp for making these materials available to admitted participants though their DataCamp for the Classroom program.

If you would like a different way to learn similar material, we recommend Introduction to R for Social Scientists taught by Charles Lanfear. This course includes video, code, and assignments.

Reading List

The Summer Institute will bring together people from many fields, and therefore we think that asking you to do some reading before you arrive will help us use our time together more effectively. First, we ask you to read Matt’s book, Bit by Bit: Social Research in the Digital Age (Read online or purchase from Amazon, Barnes & Noble, IndieBound, or Princeton University Press), which is a broad introduction to computational social science. Parts of this book will be review for most of you, but if we all read this book ahead of time, then we can use our time together for more advanced topics.

Also, for students with little or no exposure to sociology, economics, or political science, we have assembled a collection of exemplary papers in the core areas addressed by the Russell Sage Foundation. Neither your work nor the work we develop together at the institute need map neatly onto these categories, but if those with less exposure to social science read these, we will increase the chances of interdisciplinary cross-pollination, which we view as critical to the future of computational social science.

Future of Work

Behavioral Economics

Race, Ethnicity, and Immigration

Social Inequality

Schedule and materials

Sunday June 16, 2019

  • Opening Dinner (Not open to public/No livestream)

Monday June 17, 2019 - Introduction and Ethics

  • 9:00 - 9:15 Logistics (Not open to public/No livestream)

  • 9:15 - 9:30 Introductions (Not open to public/No livestream)

  • 9:30 - 10:00 Introduction to computational social science

  • 10:00 - 10:30 Why SICSS?

  • 10:30 - 10:45 Coffee Break

  • 10:45 - 11:30 Ethics: Principles-based approach

  • 11:30 - 12:15 Four areas of difficulty: informed consent, informational risk, privacy, and making decisions in the face of uncertainty

  • 12:15 - 12:30 Introduction to the group exercise

  • 12:30 - 1:30 Lunch (Not open to public/No livestream)

  • 1:30 - 3:45 Group exercise (Not open to public/No livestream)

  • 3:45 - 4:00 Break

  • 4:00 - 5:30 Possible guest speaker

  • 6:00 - 7:30 Dinner & discussion (Not open to public/No livestream)

Tuesday June 18, 2019 - Collecting Digital Trace Data

  • 9:00 - 9:15 Logistics (Not open to public/No livestream)

  • 9:15 - 9:30 What is digital trace data?

  • 9:30 - 9:45 Strengths and weakness of digital trace data

  • 9:45 - 10:15 Screen-Scraping

  • 10:15 - 10:30 Coffee Break

  • 10:30 - 11:00 Application Programming Interfaces

  • 11:00 - 12:30 Building Apps and Bots for Social Science Research

  • 12:30 - 1:30 Lunch and Guest Speaker: Chris Wiggins

  • 1:30 - 3:45 Group Exercise (Not open to public/No livestream)

  • 3:45 - 4:00 Break

  • 4:00 - 5:30 Possible guest speaker

  • 6:00 - 7:30 Dinner & Discussion (Not open to public/No livestream)

Wednesday June 19, 2019 - Automated Text Analysis

  • 9:00 - 9:15 Logistics (Not open to public/No livestream)

  • 9:15 - 9:30 History of quantitative text analysis

  • 9:30 - 9:45 Basic Text Analysis/GREP

  • 9:45 - 10:00 Dictionary-Based Text Analysis

  • 10:00 - 10:15 Coffee Break

  • 10:15 - 11:15 Topic models/Structural Topic Models

  • 11:15 - 11:20 Break

  • 11:20 - 12:30 Text Networks

  • 12:30 - 1:30 Lunch (Not open to public/No livestream)

  • 1:30 - 4:00 Group Exercise (Not open to public/No livestream)

  • 4:00 - 5:30 Possible guest speaker

  • 6:00 - 7:30 Dinner & Discussion (Not open to public/No livestream)

Thursday June 20, 2019 - Surveys in the Digital Age

  • 9:00 - 9:15 Logistics (Not open to public/No livestream)

  • 9:15 - 9:45 Survey research in the digital age

  • 9:45 - 10:15 Probability and non-probability sampling

  • 10:15 - 10:30 Coffee break

  • 10:30 - 11:00 Computer-administered interviews and wiki surveys

  • 11:00 - 11:30 Combining surveys and big data

  • 11:30 - 12:00 Group exercise introduction

  • 12:00 - 12:30 Begin group exercise

  • 12:30 - 1:30 Lunch

  • 1:30 - 3:15 Continue group exercise (Not open to public/No livestream)

  • 3:15 - 3:45 Discuss activity and open-source data

  • 3:45 - 4:00 Break

  • 4:00 - 5:30 Guest speaker: Justin Grimmer

  • 6:00 - 7:30 Dinner & Discussion (Not open to public/No livestream)

Friday June 21, 2019 - Mass Collaboration

  • 9:00 - 9:15 Logistics (Not open to public/No livestream)

  • 9:15 - 9:30 Mass collaboration

  • 9:30 - 9:45 Human computation

  • 9:45 - 10:00 Open call

  • 10:00 - 10:15 Distributed data collection

  • 10:15 - 10:30 Coffee break

  • 10:30 - 11:30 Introduction to the Fragile Families Challenge

  • 11:30 - 12:30 Working on the Fragile Families Challenge (Not open to public/No livestream)

  • 12:30 - 1:30 Lunch

  • 1:30 - 3:30 Fragile Families Challenge (Not open to public/No livestream)

  • 3:30 - 3:45 Discussion of the Fragile Families Challenge (Not open to public/No livestream)

  • 3:45 - 4:00 Break

  • 4:00 - 5:30 Possible guest speaker

  • 6:00 - 7:30 Dinner & Discussion (Not open to public/No livestream)

Saturday June 22, 2019 - Experiments

  • 9:00 - 9:15 Logistics (Not open to public/No livestream)

  • 9:15 - 9:45 What, why, and which experiments?

  • 9:45 - 10:15 Moving beyond simple experiments

  • 10:15 - 10:30 Coffee break

  • 10:30 - 11:15 Four strategies for experiments

  • 11:15 - 11:45 Zero variable cost data and musiclab

  • 11:45 - 12:15 3 Rs

  • 12:15 - 12:30 Logistics (Not open to public/No livestream)

  • 12:30 - 1:30 Lunch (Not open to public/No livestream)

  • Afternoon off

Sunday June 23, 2019 - Day off

Monday June 24, 2019 - Work on group projects

  • 9:00 - 9:15 Logistics (Not open to public/No livestream)

  • 9:15 - 10:30 Speed-dating and group formation (Not open to public/No livestream)

  • 12:30 - 1:30 Lunch and panel of book publishing: Meagan Levinson (Senior Editor, Princeton University Press), Eric Schwartz (Editoral Director, Columbia Univesity Press), and Chris Bail (Editor of the Oxford University Press Series in Computational Social Science)

  • 4:00 - 5:30 Possible guest lecture

  • 6:00 - 7:30 Dinner & Discussion (Not open to public/No livestream)

Tuesday June 25, 2019 - Work on group projects

  • 12:30 - 1:30 Lunch and flash talks ()Not open to public/No livestream)

  • 4:00 - 5:30 Guest speaker: Beth Noveck

  • 6:00 - 7:30 Dinner & Discussion (Not open to public/No livestream)

Wednesday June 26, 2019 - Work on group projects

  • 12:30 - 1:30 Lunch and flash talks (Not open to public/No livestream)

  • 4:00 - 5:30 Guest speaker: Annie Liang

  • 6:00 - 7:30 Dinner & Discussion (Not open to public/No livestream)

Thursday June 27, 2019 - Work on group projects

  • 12:30 - 1:30 Lunch and flash talks (Not open to public/No livestream)

  • 4:00 - 5:30 Possible guest speaker

  • 6:00 - 7:30 Dinner & Discussion (Not open to public/No livestream)

Friday June 28, 2019 - Present group projects

  • 1:30 - 5:15 Present group projects (Not open to public/No livestream)

  • 5:30 Closing dinner (Not open to public/No livestream)

Saturday June 29, 2019

  • Participants depart