June 17, 2018 - June 30, 2018 | Duke University

Sponsored by the Russell Sage Foundation & The Alfred P. Sloan Foundation

From the evening of Sunday, June 17 to the morning of Saturday, June 30, 2018, the Russell Sage Foundation and the Alfred P. Sloan Foundation will sponsor the Summer Institute in Computational Social Science, to be held at Duke University. The purpose of the Summer Institute is to bring together graduate students, postdoctoral researchers, and beginning faculty interested in computational social science. The Summer Institute is for both social scientists (broadly conceived) and data scientists (broadly conceived). The co-organizers and principal faculty of the Summer Institute are Christopher Bail and Matthew Salganik. There will also be seven partner locations run by alumni of the 2017 Summer Institute, which will be hosted at the following universities: Hunter College, New York University, Northwestern University, Univeristy of Cape Town, University of Colorado, University of Helsinki, and University of Washington.

The instructional program will involve lectures, group problem sets, and participant-led research projects. There will also be outside speakers who conduct computational social science research in academia, industry, and government. Topics covered include text as data, website scraping, digital field experiments, non-probability sampling, mass collaboration, and ethics. There will be ample opportunities for students to discuss their ideas and research with the organizers, other participants, and visiting speakers. Because we are committed to open and reproducible research, all materials created by faculty and students for the Summer Institute will be released open source.

Participation is restricted to Ph.D. students, postdoctoral researchers, and untenured faculty within 7 years of their Ph.D. Most participant costs during the workshop, including housing and most meals, will be covered, and most travel expenses will be reimbursed up to a set cap. About thirty participants will be invited. Participants with less experience with social science research will be expected to complete additional readings in advance of the Institute, and participants with less experience coding will be expected to complete a set of online learning modules on the R programming language. Students doing this preparatory work will be supported by a teaching assistant who will hold online office hours before the Institute.

Application materials were due Monday, February 19, 2018. We are no longer accepting applications.


Matthew Salganik

Matthew Salganik is Professor of Sociology at Princeton University, and he is affiliated with several of Princeton’s interdisciplinary research centers: the Office for Population Research, the Center for Information Technology Policy, the Center for Health and Wellbeing, and the Center for Statistics and Machine Learning. His research interests include social networks and computational social science. He is the author of the forthcoming book Bit by Bit: Social Research in the Digital Age.

Chris Bail

Chris Bail is the Douglas and Ellen Lowey Associate Professor of Sociology and Public Policy at Duke University and a member of the Interdisciplinary Program on Data Science, the Duke Network Analysis Center, and the Duke Population Research Institute. His research examines how non-profit organiations and other political actors shape social media discourse using large text-based datasets and apps for social science research. He is the author of Terrified: How Anti-Muslim Fringe Organizations Became Mainstream.


Deen Freelon

Deen Freelon is an Associate Professor in the School of Media and Journalism at the University of North Carolina at Chapel Hill, and directs the Computational Communication Research Lab.

David Lazer

David Lazer is Professor of Political Science and Computer and Information Science, Northeastern University & Harvard University.

Kristian Lum

Kristian Lum is the Lead Statistician at the Human Rights Data Analysis Group (HRDAG), where she leads the HRDAG project on criminal justice in the United States.

Sendhil Mullainathan

Sendhil Mullainathan is the Robert C. Waggoner Professor of Economics at Harvard University and the co-founder of the Abdul Latif Jameel Poverty Action Lab.

Cynthia Rudin

Cynthia Rudin is an Associate Professor of computer science, electrical and computer engineering, and statistics at Duke University, and directs the Prediction Analysis Lab.

Duncan Watts

Duncan Watts is a Principal Researcher at Microsoft Research and a founding member of the MSR-NYC lab. He is also an AD White Professor at Large at Cornell University.

Teaching Assistants

Image of Friedolin Merhout

Friedolin Merhout

Friedolin Merhout is a doctoral student in the Duke Sociology department. He enjoys exploring how computational methods provide a new lens to view longstanding social science debates, and pondering the potential inherent in the wealth of digital trace data. Before starting the doctoral program at Duke, he earned a BA from Freie Universitaet in his hometown Berlin.

Image of Marcus Mann

Marcus Mann

Marcus Mann is a doctoral student in the Duke Sociology department. He uses computational methods to examine politically partisan news ecologies on social media and maintains a general interest in the cultural differentiation of epistemic authorities and their corresponding audiences, communities, and social movements.

Janet Xu

Janet Xu is a doctoral student in the Princeton Sociology department.


Coming soon


As we discussed in our call for applications, we have arranged two types of training prior to the event this summer. Some students have more sophisticated coding skills but little exposure to social science; other students have significant exposure to social science but lack strong coding skills.


The majority of the coding work presented at the 2018 SICSS will employ R. However, you are welcome to employ a language of your choice- such as Python, Julia, or other languages that are commonly used by computational social scientists. If you would like to work in R, we recommend that you complete the following courses within DataCamp, a website that teaches people how to code. Obviously, you only need to complete the classes with material that you would like to learn.

If you cannot afford datacamp, check out Chris Bail’s Intro to R slides at http://www.chrisbail.net/p/learn-comp-soc.html

Reading List

Our institute will bring together people from many fields, and therefore we think that asking you to do some reading before you arrive will help us use our time together more effectively. First, we ask you to read Matt’s book, Bit by Bit: Social Research in the Digital Age, which is a broad introduction to computational social science. Parts of this book will be review for most of you, but if we all read this book ahead of time, then we can use our time together for more advanced topics.

Also, for students with little or no exposure to sociology, economics, or political science, we have assembled a collection of exemplary papers in the core areas addressed by the Russell Sage Foundation. Neither your work nor the work we develop together at the institute need map neatly onto these categories, but if those with less exposure to social science read these, we will increase the chances of interdisciplinary cross-pollination, which we view as critical to the future of computational social science.

Future of Work

Behavioral Economics

Race, Ethnicity, and Immigration

Social Inequality

Schedule and materials

Sunday June 17, 2018

  • Opening Dinner

Monday June 18, 2018 - Introduction and Ethics

  • 9:00-9:15 Logistics (No livestream)

  • 9:15-9:30 Introduction to computational social science

  • 9:30-9:45 Why SICSS?

  • 9:45-10:00 Introductions

  • 10:00-10:45 Ethics: Principles-based approach

  • 10:45-11:00 Coffee Break

  • 11:00-12:00 Four areas of difficulty: informed consent, informational risk, privacy, and making decisions in the face of uncertainty

  • 1:00-4:00 Group Exercise (No livestream)

  • 4:00-5:30 Guest Speaker

  • 6:00-7:30 Dinner & discussion

Tuesday June 19, 2018 - Collecting Digital Trace Data

  • 9:00-9:15 What is digital trace data?

  • 9:15-9:30 Strengths and weakness of digital trace data

  • 9:30-10:00 Screen-Scraping

  • 10:00-10:15 Break

  • 10:15-11:00 Application Programming Interfaces

  • 11:00-12:00 Apps for Social Science Research

  • 1:00-4:00 Group Exercise (No livestream)

  • 4:00-5:30 Guest Speaker

  • Dinner & Discussion

Wednesday June 20, 2018 - Automated text analysis

  • 9:00-9:15 History of quantitative text analysis

  • 9:15-9:30 Strengths and weakenesses of quantitative text analysis

  • 9:30-9:45 Basic Text Analysis/GREP

  • 9:45-10:00 Dictionary-Based Text Analysis

  • 10:00-10:15 Break

  • 10:15-11:00 Topic models and Beyond

  • 11:00-12:00 Ngram Networks

  • 12:00-1:00 Lunch

  • 1:00-4:00 Group Exercise (No livestream)

  • 4:00-5:30 Guest Speaker

  • Dinner & Discussion

Thursday June 21, 2018 - Surveys

  • 9:00-9:15 Welcome and schedule

  • 9:15-9:30 Survey research in the digital age

  • 9:30-10:00 Probability and non-probability sampling

  • 10:00-10:15 Coffee break

  • 10:45-11:00 Combining surveys and big data

  • 11:00-12:30 Begin group excercise (No livestream)

  • 12:30-1:30 Lunch

  • 1:30-3:45 Continue group excercise (No livestream)

  • 3:45-4:00 Break

  • 4:00-5:30 Guest Speaker

  • Dinner & Discussion

Friday June 22, 2018 - Mass Collaboration

  • 9:00-9:10 Welcome and schedule

  • 9:10-9:30 Mass collaboration

  • 9:30-9:40 Human computation

  • 9:40-9:50 Open call

  • 9:50-10:00 Distributed data collection

  • 10:00-10:15 Design advice

  • 10:15-10:30 Coffee break

  • 10:30-4:00 Fragile Families Challenge

  • Dinner & Discussion

Saturday June 23, 2018 - Experiments

  • 9:00 - 9:15 Welcome and schedule

  • 9:15 - 9:45 What, why, and which experiments?

  • 9:45 - 10:15 Moving beyond simple experiments

  • 10:15 - 10:30 Coffee break

  • 10:30 - 11:15 Four strategies for experiments

  • 11:15 - 11:45 Zero variable cost data and musiclab

  • 11:45 - 12:15 3 Rs

  • Lunch

  • Afternoon off

Sunday June 24, 2018 - Day off

Monday June 25, 2018 - Work on projects

Tuesday June 26, 2018 - Work on projects

Wednesday June 27, 2018 - Work on projects

Thursday June 28, 2018 - Work on projects

Friday June 29, 2018 - Present final projects

  • Closing dinner

Saturday June 30, 2018

  • Students depart

Live Stream

For those unable to attend in person, we will be live-streaming each day from approximately 9:00am to 5:30pm ET. Group exercises and some of the visiting speaker’s lectures will not be live-streamed. No registrations will be required to watch the livestream. We will post addition information about the livestream here once it is avaiable.