University of Colorado at Boulder
Summer Institute in Computational Social Science Partner Site

August 13, 2018 - August 17, 2018

Sponsored by The Russell Sage Foundation & The Alfred P. Sloan Foundation

From the morning of Monday, August 13 to the evening of Friday, August 17, 2018, University of Colorado Boulder will host a satellite of the Summer Institute in Computational Social Science. The purpose of the Summer Institute is to bring together graduate students, postdoctoral researchers, and beginning faculty interested in computational social science. The Summer Institute is for both social scientists (broadly conceived) and data scientists (broadly conceived). This satellite program is co-organized by Brian Keegan and Allie Morgan.

The instructional program will involve lectures, group problem sets, and participant-led research projects. There will also be outside speakers who conduct computational social science research in academia, industry, and government. Topics covered include text as data, website scraping, digital field experiments, non-probability sampling, mass collaboration, and ethics. There will be ample opportunities for students to discuss their ideas and research with the organizers, other participants, and visiting speakers. Because we are committed to open and reproducible research, all materials created by faculty and students for the Summer Institute will be released open source.

Participation will be open to all students, faculty and staff of universities in Colorado. We are supported by the Russell Sage Foundation, the Sloan Foundation, Center to Advance Research and Training in the Social Sciences (CARTSS), Institute of Behavioral Science (IBS), and Center for Research Data and Digital Scholarship (CRDDS).

How to Apply

If you are interested in attending SICSS Boulder 2018, please complete this application. Space for this inaugural summer institute will be limited to approximately 30 people. Meals and beverages during the institute will be provided, but we unfortunately cannot provide funding for travel or accommodation. Accepted participants should already have or be willing to develop basic proficiency with computational reasoning/programming skills in Python and/or R before attending the program. We very strongly encourage applications from people traditionally under-represented in computing.

Kindly send any inquiries to Brian Keegan. Applications will be due by June 1st.

Organizers

Brian Keegan

Brian C. Keegan is a computational social scientist whose research is at the intersection of human-computer interaction, network science and data science. His research explores the structure and dynamics of large-scale online communication and collaboration using socio-technical system log data. Brian is developing new methods, theories and tools to help people make better sense of bursts of information and design better responses to them. Before joining CU-Boulder as an Assistant Professor, Keegan was a research associate at the Harvard Business School’s HBX online learning platform and a postdoctoral researcher in computational social science at Northeastern University. He received his PhD in media, technology and society from Northwestern University’s School of Communication. He also earned SB degrees in Mechanical Engineering and Science, Technology and Society from the Massachusetts Institute of Technology.

Allison Morgan

Allison Morgan is pursuing her Ph.D. in computer science at the University of Colorado, Boulder. She is interested in using data mining, machine learning, and social network analysis to develop and test hypotheses about the origins and effects of gender imbalance within academia. She attended last year’s SICSS and is excited to build a computational social science community at CU Boulder via this satellite. She was recently awarded the National Science Foundation’s Graduate Research Fellowship. Prior to graduate school, Allison worked as a data scientist for two years at a small tech start-up in Portland, OR. She earned her B.A. in physics from Reed College.

Local Speakers

Aaron Clauset

Aaron Clauset is an Assistant Professor in the Department of Computer Science and the BioFrontiers Institute at the University of Colorado Boulder, and is External Faculty at the Santa Fe Institute. He received a PhD in Computer Science, with distinction, from the University of New Mexico, a BS in Physics, with honors, from Haverford College, and was an Omidyar Fellow at the prestigious Santa Fe Institute. In 2016, he was awarded the Erdos-Renyi Prize in Network Science. Clauset is an internationally recognized expert on network science, computational social science, and machine learning for complex systems. His work has appeared in many prestigious scientific venues, including Nature, Science, PNAS, JACM, WWW, ICWSM, STOC, SIAM Review, and Physical Review Letters. His work has also been covered in the popular press by the Wall Street Journal, The Economist, Discover Magazine, New Scientist, Wired, Miller-McCune, the Boston Globe and The Guardian.

Jake Hofman

Jake Hofman is a Senior Researcher at Microsoft Research in New York City, where he works in the field of computational social science. Prior to joining Microsoft, he was a Research Scientist in the Microeconomics and Social Systems group at Yahoo! Research. He holds a B.S. in Electrical Engineering from Boston University and a Ph.D. in Physics from Columbia University. He is an Adjunct Assistant Professor of Applied Mathematics and Computer Science at Columbia University and runs Microsoft’s Data Science Summer School to promote diversity in computer science. His work has been published in journals such as Science, Proceedings of the National Academy of Sciences, and Management Science, and has been featured in popular outlets including The New York Times, The Wall Street Journal, The Financial Times, and The Economist.

Daniel Larremore

Daniel Larremore is an Assistant Professor in the Department of Computer Science and the BioFrontiers Institute at the University of Colorado at Boulder. His research develops statistical and inferential methods for analyzing large-scale network data, and uses those methods to solve applied problems in diverse domains, including public health and academic labor markets. In particular, his work focuses on generative models for networks, the ongoing evolution of the malaria parasite and the origins of social inequalities in academic hiring and careers. Prior to joining the University of Colorado faculty, he was an Omidyar Fellow at the Santa Fe Institute 2015-2017 and a post-doctoral fellow at the Harvard T.H. Chan School of Public Health 2012-2015. He obtained his Ph.D. in Applied Mathematics from the University of Colorado at Boulder in 2012, and holds an undergraduate degree from Washington University in St. Louis.

Yotam Shmargad

Yotam Shmargad is a computational social scientist with an interest in political networks and privacy. In his research, he runs experiments, links and analyzes large datasets, and uses natural experiments to study how digital media augment the patterns of connectivity between people – the size, density, and diversity of our social networks - and the implications that these bigger networks have for our social and political lives. Shmargad’s recent projects look at how political candidates can overcome financial shortcomings with Twitter, and how the partisan composition of one’s social network influences the information they choose to share online. Before joining the University of Arizona as an Assistant Professor, Shmargad received his PhD in Marketing from Northwestern University’s Kellogg School of Management. He holds an MS in Operations Management from Columbia University and a BS in Mathematics from UCLA.

Amanda Stevenson

Amanda Jean Stevenson is a sociologist trained in demographic and computer science methods. She studies the impacts of and responses to abortion and family planning policy. She is an Assistant Professor of Sociology at the University of Colorado Boulder. In her current research, she uses demographic methods to study the impacts of reproductive health policies, and computational and qualitative methods to study social responses to these policies. At Boulder she leads a team using massive administrative data at the Census Bureau to evaluate the life course consequences of access to (as opposed to use of) highly effective contraception. And she contributes to a variety of ongoing evaluations of reproductive health policies and develops new strategies for measuring fertility with administrative data. Another line of research examines the social responses to reproductive health policies. In a current project, she uses Twitter responses, website content, media coverage, and in-depth interviews to examine the social movement response to Texas’ 2013 abortion restrictions. The case provides an opportunity to investigate how social movements negotiate intersectional critiques from within their ranks.

Chenhao Tan

Chenhao Tan is an assistant professor of computer science at University of Colorado Boulder. He obtained his PhD degree in the Department of Computer Science at Cornell University and bachelor’s degrees in computer science and in economics from Tsinghua University. Prior to joining CU Boulder, he spent a year at University of Washington as a postdoc. His research interests include natural language processing and computational social science. He has published papers primarily at ACL and WWW, and also at KDD, WSDM, ICWSM, etc. His work has been covered by many news media outlets, such as the New York Times and the Washington Post. He also won a Facebook fellowship and a Yahoo! Key Scientific Challenges award.

More speakers coming soon!

Participants

Image of Zachary Cooper

Zachary Cooper

I am a first year PhD student in Anthropology specializing in Archaeology here at the University of Colorado, Boulder. My advisor is Dr. Scott Ortman and my research interests include ancient migrations, diachronic linguistics, complex systems, and urban scaling.

Angela Cunningham

I am a PhD candidate, expecting to graduate in the winter of 2018. My dissertation focuses on critical military geographies of rural Americans during and after the Great War. Drawing on newly accessible comprehensive individual-level civilian and military data, my research employs the techniques of historical demography and spatial analysis within a theoretical framework of space-time as relative, relational and constitutive to argue that the life courses and relationships of individual soldiers bind home and front and necessitate a more nuanced appreciation of the far-reaching and persistent effects of militaristic ideologies and practices. I have presented portions of my work at the annual meetings of the Social Sciences History Association, American Association of Geographers, and the Population Association of America. My first sole-authored paper, forthcoming in Historical Methods, explores automated record linkage methodologies.

Ashlynn Daughton

Ashlynn Daughton is an information science PhD student at the University of Colorado, Boulder. She is interested in leveraging internet data to better inform public health decision makers. She is a graduate of the University of California, Berkeley (BS in molecular biology) and Boston University (MA in public health concentrating in maternal and child health and epidemiology). She works with Michael Paul at CU Boulder, and holds a position at Los Alamos National Laboratory in the Analytics, Intelligence, and Technology Division. Her current research focuses on using machine learning techniques to better understand human behavior, development of decision support tools for public health professionals, and methods to better incorporate Internet and social media data into traditional epidemiological models.

Emirhan Demirhan

Emirhan Demirhan is a PhD candidate in Sociology at the University of North Texas. His research interests include democratization, the obstacles to the functioning of democracies, and manipulation of public opinion using computational and quantitative methods. In his dissertation, Emirhan focuses on how Turkish democratization failed as a result of unrestrained defiance to the status quo, which led to the destruction of institutions.

Image of Alia Gant

Alia Gant

Alia Gant is currently a Diversity Resident Librarian at Penn State University. Prior to joining the university, Ms. Gant studied information science at the University of Texas at Austin where she focused her studies on academic librarianship. Ms. Gant also studied international studies in her graduate and undergraduate programs at the University of Iowa and American University respectively with an emphasis on Western Hemisphere studies and the European Union, focusing on Portuguese speaking countries, Brazil and Portugal. Alia is looking forward to the Summer Institute in Computational Social Science at the University of Colorado at Boulder! She hopes to learn more about CSS and also avenues to intersect her librarianship skills with computational social science methods to enrich her professional and academic research goals.

Image of Angelia Giannone

Angelia Giannone

Angelia Giannone is a doctoral student of Information at the University of Arizona’s iSchool. Her research interests span across computational creativity broadly, with specialization in games studies and design, UX design, new media, and cultural studies. Before joining the iSchool, Angelia received her MA at the University of Arizona in Rhetoric, Composition, and the Teaching of English as well as a BS in Professional Writing from Worcester Polytechnic Institute. Angelia’s current research investigates STEM, maker, and participatory cultures, storytelling in games, and uses of computational social sciences to create media projects focused on socio-cultural agency.

Krishna Gouripeddi

Currently working as HR data scientist in Lehi, Utah. Interested in learning and potentially applying CSS methods and concepts in the field of Human Resources.

Jordan Hale

Xiaowen Hu

Xiaowen Hu is currently a 2nd year PhD student in Finance at the University of Colorado, Boulder.

Juhi Huda

Juhi is a doctoral candidate in the Environmental Studies Program (policy core) at CU Boulder. She studies environmental policy and governance in the United States and India focusing on policy change in areas of food systems governance (agricultural biotechnology), disaster and hazards (wildfire), and climate change; and factors influencing policy change such as communication, advocacy, and stakeholders. Her dissertation research investigates the controversial issue of agricultural biotechnology policy in India.

Image of Eaman Jahani

Eaman Jahani

Eaman Jahani is a graduate research assistant pursuing a PhD degree in Social and Engineering Systems with a minor in Statistics at MIT IDSS. Prior to MIT, he was a software engineer at Google for 4 years. His main training is in statistics and computer science, but recently he has been appreciating econometrics and modeling in applied economics. His past research examined the extent of bubbles vs truth-seeking in cryptocurrency markets and socio-economic prediction in social networks. His current research focuses on structural factors such as networks or institutions that contribute to persistence of inequality.

Anais Landry

Stefani Langehennig

Stefani Langehennig is a PhD candidate at the University of Colorado Boulder studying American politics and political methodology. Her research focuses on institutions, policy making, and congressional organization. She earned her Bachelor of Arts in Government at the University of Texas at Austin and her Masters of Science in Political Science at the University of Nebraska Omaha.

Eun Lee

I’ve just finished my Ph.D. in South Korea about the structural inequality and its effects on the dynamics. I am a huge fan of Complex system group in Colorado, so I applied the summer school to meet and learn from the group. My passion is to understand the effect of heterogeneous characteristics of social network and people’s attribute on human behavior and perception.

Huyen Le

I am a Vietnam Education Foundation (VEF) fellow and a PhD student in Computer Science at the University of Iowa since Fall 2013. Since Fall 2015 I have worked under the supervision of professor Zubair Shafiq. My research interests are Social Media Analysis, Text Mining, and Applied Machine Learning. I received my M.S. in Computer Science, Pohang University of Science and Technology, South Korea in 2012 and my B.S. in Computer Science, Hanoi University of Science and Technology, Vietnam in 2010.

Nathan Lee-Ammons

Nathan is a PhD student in the Environmental Studies Program at the University of Colorado - Boulder. Nathan primarily studies factors that influence public acceptance of emerging food technologies in the United States, particularly food technologies that claim to make food systems more sustainable. He is also involved in a research project investigating the interplay of human and natural systems on the outcome of human migration in Bangladesh.

Ningzi Li

My research interests include organizational theory, economic sociology and non-market strategy. I particularly focus on integrated market and non-market strategies and online community management. I received doctoral degree in sociology from Cornell University.

Nicholas Light

Nick is a PhD Student in the Marketing division at the University of Colorado Leeds School of Business. Broadly his research focuses on consumer judgement and decision making. Specifically, Nick studies consumer perceptions of understanding, simplicity/complexity, and anti-science beliefs.

Jeremiah Osborne-Gowey

I am a PhD student in the Environmental Studies (ENVS) and Environmental Design (ENVD) programs at CU Boulder. I work at the nexus of science, policy and natural resource management. I am particularly interested in collaborative governance approaches to managing interactions between humans and the rest of the natural world. My dissertation research focuses on understanding the role and evolution of network approaches to collaborative governance. My graduate research examines whether and how social learning networks build and foster adaptive capacity and resilience during transitions in complex social-environmental systems. I am currently working with the Fire Adapted Communities Learning Network (FAC Net). I have diverse skillset with a background that include statistics, ecology, behavioral interactions, community structure, impacts of introduced species, science communication, and policy and planning. At CU Boulder, I teach methods and planning courses in the ENVD program and have research appointments in the Center for Science and Technology Policy Research (CSTPR) and the Institute of Behavioral Science (IBS). Before joining the Goldstein lab at CU Boulder, I worked for >15 years as an aquatic/landscape ecologist with Federal and State agencies, universities and private and non-profit consulting firms throughout the Western United States. I earned an Honors Bachelor of Science degree in Fisheries and Wildlife (2003), a Master of Science in Quantitative Fish Ecology (2005), and a Master of Public Policy (2016) from Oregon State University. I enjoy spending time in the great outdoors with my partner and kids, friends and animals. My favorite activities include camping, backpacking, fishing, hunting, forest foraging, SCUBA diving, fly tying, traveling, photography, reading, gardening, geocaching, and homebrewing/distilling.

Marie Ouellet

Anthony Pinter

Anthony T. Pinter is a Ph.D. student in the Department of Information Science at the University of Colorado Boulder. He works with Dr. Jed R. Brubaker and the Identity Lab at CU, investigating online audiences’ role in identity disclosure and trans experiences in social media. Prior to CU, he completed his B.S. and M.S. in Information Sciences and Technology at Penn State, where he worked with Dr. Lynette Kvasny Yarger on research related to discrimination in STEM fields. In his free time, he is a prolific music consumer, avid mountain biker, and a high school track coach.

Katherine Runge

Katherine L. Runge is a PhD student of political science at the University of Colorado at Boulder. Her first field of study is American politics, and her second field of study is research methodology. Her specific areas of research focus on gender and politics in the American context, along with political psychology and voter behavior.

Image of Joshua Sanders

Joshua Sanders

I am a mathematician (MA, U.Colo.) turned statistician. After working in post-secondary math education for the last decade, I have recently started working with Amanda Stevenson and Katie Genadek at the Institute of Behavioral Science at the University of Colorado.

Sarah Shugars

A doctoral candidate in Northeastern’s Network Science program, Sarah uses network analysis and natural language processing to study political dialogue and deliberation. Her research focuses on developing a network methodology for deliberation; modeling the way an individual reasons as a network of interconnected ideas and studying deliberation as process in which groups exchange ideas and collectively create new solutions. She received a BA cum laude in Physics from Clark University and an MA in Integrated Marketing Communications from Emerson College. She currently serves as senior editor for the Good Society: The Journal of Civic Studies and previously worked at Tufts University’s Tisch College of Civic Life.

Tara Streng Schroeter

I am a second year graduate student at the University of Colorado Boulder working towards my Ph.D. in Sociology. After completing my Bachelor’s degree at the University of Utah I became interested in programming and data science, and curious to explore the ways that these fields can enhance my skills as a social researcher. I am currently working on research related to health, mortality, campus sexual assault, and relevant policies.

Becca Wang

Becca Wang is a doctoral student in Sociology at Brown University, where she is also affiliated with the Population Studies and Training Center. Her research utilizes longitudinal analysis techniques to examine how population mobility is related to gender inequality, health, and urbanization. She also explores how computational methods and textual data can improve our understanding of social change over time. Prior to graduate school, she worked at Mathematica Policy Research and Mercy Corps.

Image of Sari Widman

Sari Widman

Sari Widman explores alternative models for STEM and digital literacy education for learners of all ages. Her research centers the development of equitable and humanizing practices in informal learning spaces, and how families and communities engage with educational opportunities and resources. Her current work focuses on multi and intergenerational learning in community settings, with a particular focus on libraries. She is currently a PhD student at the University of Colorado Boulder School of Education, in Learning Sciences and Human Development.

Rumei Yang

Rumei (May) Yang, an international nursing Ph.D. student from China. My early research interest mainly focused on the patient safety, including fall prevention, medication errors and risk management using evidence-based practice models. My current interest of research is focusing on the quality of care for the elderly. Skiing, hiking, and swimming are my favorite activities.

Matthew Yarbrough

Matthew Yarbrough is a doctoral student at the University of Colorado-Boulder. His primary field of research is American Politics and minor field of Political Methodology. Previously graduating with Bachelors in Political Science, History, and International Affairs from the University of Georgia, Matthew’s research focuses on gender differences in legislative behavior in the US Congress. Additional research focuses include congressional committee politics, LGBTQ politics, and political ambition within minority communities. In addition to his role as a student, Matthew serves as a Tri-Chair of the Chancellor’s Advisory on Gender and Sexuality at CU-Boulder.

Joe Zamadics

Hello, I am Joe Zamadics. I am a political science PhD candidate at the University of Colorado with a focus in American politics and political methodology. Specifically, my dissertation focuses on issue salience in lawmaking. By analyzing state newspaper content, I am able to link trends in issue salience to lawmaker actions and policy outcomes. I am originally from West Chester, PA which is 30 minutes outside the city of Philadelphia, home of the Super Bowl LII Champion Philadelphia Eagles. I graduated from Susquehanna University in 2012 with a B.S. in Economics. I obtained an M.A. in political science from the University of Colorado in 2014.

Kai Zhu

Schedule and materials

Monday August 13, 2018 - Introduction and Ethics

  • 8:00 - 9:00: Breakfast

  • 9:00 - 10:00: Logistics, administrivia, introductions

  • 10:00 - 11:00: Round robin of participant’s research and how it relates to the topic of the day

  • 11:00 - 12:00: Lecture 1 - Principles of ethics, four areas of difficulty (consent, risk, privacy, uncertainty)

  • 12:00 - 1:00: Lunch

  • 1:00 - 2:00: Lecture 2 - Contextual integrity, value-sensitive design, regulatory models, de-anonymization, social engineering, incentives

  • 2:00 - 3:00: Case Study 1 - Users behaving badly

  • 3:00 - 3:30: Coffee break

  • 3:30 - 4:30: Case Study 2 - Researchers behaving badly

  • 4:30 - 5:30: Research Talk - Casey Fiesler

Tuesday August 14, 2018 - Collecting Digital Trace Data

  • 8:00 - 9:00: Breakfast

  • 9:00 - 10:00: Round robin of participant’s research and how it relates to the topic of the day

  • 10:00 - 11:00: Lecture 1 - Web scraping

  • 11:00 - 12:00: Case Study 1 - Scraping web sites (rate limits, messy data, TOS)

  • 12:00 - 1:00: Lunch

  • 1:00 - 2:00: Lecture 2 - Application programming interfaces (JSON, keys, rate limits)

  • 2:00 - 3:00: Case Study 2 - Application of API scraping

  • 3:00 - 3:30: Coffee break

  • 3:30 - 4:30: Research Talk - Yotam Shmargad

  • 4:30 - 5:30: Research Talk - Brian Keegan

Wednesday August 15, 2018 - Network Analysis

  • 8:00 - 9:00: Breakfast

  • 9:00 - 10:00: Round robin of participant’s research and how it relates to the topic of the day

  • 10:00 - 11:00: Lecture 1 - Types of networks (directed, bipartite, centrality) + attributes on networks (features, time, membership, strength)

  • 11:00 - 12:00: Case Study 1 - networkx and using ICON (https://icon.colorado.edu/)

  • 12:00 - 1:00: Lunch

  • 1:00 - 2:00: Lecture 2 - Generating networks from “raw” data

  • 2:00 - 3:00: Case Study 2 - Take a tabular dataset, discuss informative ways to turn this into a network, visualization

  • 3:00 - 3:30: Coffee break

  • 3:30 - 4:30: Research Talk - Aaron Clauset

  • 4:30 - 5:30: Research Talk - Dan Larremore

Thursday August 16, 2018 - Automated Text Analysis

  • 8:00 - 9:00: Breakfast

  • 9:00 - 10:00: Round robin of participant’s research and how it relates to the topic of the day

  • 10:00 - 11:00: Lecture 1 - Text processing (casing, tokenizing, n-grams, stemming, regex, TF-IDF)

  • 11:00 - 12:00: Lecture 2 / Case Study 1 - Dictionaries (named entity recognition), POS tagging, lexicons, and sentiment analysis

  • 12:00 - 1:00: Lunch

  • 1:00 - 2:00: Lecture 3 - N-gram networks, topic modeling, generative text analysis processes (HMM, neural networks)

  • 2:00 - 3:00: Case Study 2 - Project Gutenberg with cleaning and analysis or topic modeling

  • 3:00 - 3:30: Coffee break

  • 3:30 - 4:30: Research Talk - Michael Paul

  • 4:30 - 5:30: Research Talk - Chenhao Tan

Friday August 17, 2018 - Experiments / Causal Inference

  • 8:00 - 9:00: Breakfast

  • 9:00 - 10:00: Round robin of participant’s research and how it relates to the topic of the day

  • 10:00 - 11:00: Lecture 1 - Experimental design (RCTs, factorial design, validity etc.), A-B tests

  • 11:00 - 12:00: Lecture 2 - Causal inference from observational data (discontinuity, instrumental variable, matching, differencing)

  • 12:00 - 1:00: Lunch

  • 1:00 - 2:00: Case Study 2 - Examine natural experiments in time series data

  • 2:00 - 3:00: Research Talk 1 - Jake Hofman

  • 3:00 - 3:30: Coffee break

  • 3:30 - 4:30: Research Talk 2 - Amanda Stevenson

  • 4:30 - 5:30: Short research talks by participants, wrapping things up