New Directions: Bridging Natural Language Processing (NLP) and Survey Research at SurvAI-Day
The SurvAI Workshop is hosted by the Social Data Science Center (SoDa) and the Artificial Intelligence Interdisciplinary Institute (AIM) at the University of Maryland with support from the American Association of Public Opinion Research (AAPOR), The Washington-Baltimore Chapter of AAPOR (DC-AAPOR), and the American Statistical Association (ASA).
October 7, 2024
Workshop: 8:00 am – 5:00 pm
Samuel Riggs IV Alumni Center, Orem Alumni Hall
7801 Alumni Drive, University of Maryland, College Park, Maryland 20742
The workshop aims to strengthen the emerging connection between NLP and survey researchers. Survey researchers and data collectors are increasingly recognizing the need to integrate Large Language Models (LLMs) into their workflows. At the same time, NLP researchers see the importance of grounding their models with better data for accurate public representation as technology is becoming increasingly human-facing. This event seeks to foster joint work between the survey research and NLP communities, identify gaps and needs, and initiate a closer relationship between these interconnected fields.
Keynote Speakers
Barbara Plank
Professor and Co-director of the Center for Information and Language Processing at LMU Munich
Professor, IT University of Copenhagen
Vice-president elect, Association for Computational Linguistics (ACL)
Her research lab (Munich AI and NLP lab, MaiNLP) focuses on robust machine learning for Natural Language Processing with an emphasis on human-facing and data-centric approaches. Her research has been funded by distinguished grants, including an Amazon Research Award (2018), the Danish Research Council (Sapere Aude Research Leader Grant, 2020-2024), and the European Research Council (ERC Consolidator Grant, 2022-2027). Barbara is a Scholar of ELLIS (the European Laboratory for Learning and Intelligent Systems) and regularly serves on international committees. Currently, she is Vice President Elect of the Association for Computational Linguistics (ACL).
Frauke Kreuter
Co-Director, Social Data Science Center (SoDa)
Professor, Joint Program in Survey Methodology, University of Maryland
Chair of Statistics and Data Science in Social Sciences and the Humanities
Ludwig-Maximilians-University of Munich
President American Association for Public Opinion Research (AAPOR)
She is an elected fellow of the American Statistical Association and the 2020 recipient of the Warren Mitofsky Innovators Award of the American Association for Public Opinion Research. In addition to her academic work Dr. Kreuter is the Founder of the International Program for Survey and Data Science, developed in response to the increasing demand from researchers and practitioners for the appropriate methods and right tools to face a changing data environment; Co-Founder of the Coleridge Initiative, whose goal is to accelerate data-driven research and policy around human beings and their interactions for program management, policy development, and scholarly purposes by enabling efficient, effective, and secure access to sensitive data about society and the economy. coleridgeinitiative.org; and Co-Founder of the German language podcast Dig Deep.
Panelists / Moderators / Short Course Instructors
Please be sure to check back as the list will be updated as confirmations come in.
Trent Buskirk
Professor, School of Data Science
Old Dominion University
Trent D. Buskirk, Ph.D. has recently joined the new School of Data Science at Old Dominion University as one of several founding faculty members. Prior to this appointment, Trent was the Novak Family Distinguished Professor of Data Science and outgoing Chair of the Applied Statistics and Operations Research Department at Bowling Green State University. Dr. Buskirk is a Fellow of the American Statistical Association and his research interests include big data quality, recruitment methods through social media, the use of big data and machine learning methods for health, social and survey science design and analysis, mobile and smartphone survey designs and in methods for calibrating and weighting non-probability samples and fairness in AI models and interpretable ML methods. When Trent is not geeking out over data science, big data or survey methodology, you can find him playing a competitive game of Pickleball!
Stephanie Eckman
Stephanie Eckman has a PhD in Methodology and Statistics from the University of Maryland. She has collected survey data around the world for government, nonprofits, and industry. Her current research interest is in applying the lessons from surveys to collect more accurate and efficient training data for AI/ML models. Her work on this topic has been published at EMNLP and ICML.
Anna-Carolina Haensch
Assistant Research Professor, Joint Program in Survey Methodology, University of Maryland
Senior Researcher, Institute for Statistics, Ludwig-Maximilians-University of Munich
Anna-Carolina (“Caro”) Haensch is an assistant research professor at the Joint Program in Survey Methodology at the University of Maryland and a Senior Researcher at the Institute for Statistics at LMU Munich, Germany. She is interested in Synthetic Data, Multiple Imputation, Statistics and Data Science training and enjoys teaching quantitative courses.
Tobias Holtdirk
Doctoral Researcher
Leibniz Institute for the Social Sciences – Cologne
Tobias Holtdirk is a doctoral researcher in Computational Social Science department at GESIS – Leibniz Institute for the Social Sciences in Cologne. Prior to joining GESIS, he studied at RWTH Aachen University, where he graduated with a Bachelor’s degree in Computer Science and a Master’s degree in Computational Social Systems. His current research projects focus on applying Large Language Models (LLMs) to political research, including fine-tuning LLMs to predict voting behaviour based on survey data and conducting a LLM-based analysis of German parliamentary debates.
David Jurgens
Associate Professor in the School of Information
Associate Professor in the department of Computer Science and Engineering
University of Michigan
He obtained his PhD in Computer Science from the University of California, Los Angeles. His research centers on language technologies for social understanding and on behavioral analysis through language.
His work has been recognized by the Cozzarelli Prize from the National Academy of Science, Cialdini Prize from the Society for Personality and Social Psychology, multiple best paper awards and nominations (e.g,. ACL, ICWSM), and an NSF CAREER award.
Claire Kelley
Senior Data Scientist, Co-Director for Data Science
Child Trends
Claire Kelley is a senior data scientist and co-director for data science at Child Trends. Her work focuses on the applications of data science and AI techniques to social science problems: particularly those in the domains of education and health. Her current work includes use of AI to develop customized interactive tools, experiments with natural language processing approaches to qualitative data analysis and research on how AI impacts educators and students in K-12 classrooms.
Sarah Kelley
Co-Director for Data Science
Child Trends
Sarah Kelley is co-director of data science at Child Trends. Sarah is a full stack data scientist with primary research interests in generative AI, natural language procession and precision social science. At Child Trends she conducts research focused on the wellbeing of children and youth through the lens of AI and data science techniques. She is currently working on a wide range of projects, including developing interactive AI powered explanations of risk scores in the juvenile justice system, conducing AI supported meta-analyses and experimenting with AI for text summarization at scale.
Max Melchior Lang
Ph.D. Student
University of Oxford
Max Lang is a PhD student in Population Health at the Big Data Institute, University of Oxford. His research focuses on modeling diseases in multimorbid settings and informing interventions in low-income countries. He also develops conversational AI for research and business applications, rethinking qualitative data collection through chat interfaces and interviewing agents that can conduct human-like interviews.
Joshua Y. Lerner
Research Methodologist
NORC at the University of Chicago
Josh Lerner is a Research Methodologist at NORC at the University of Chicago, where he works on problems related to the intersection of AI, NLP, causal inference, and political/social science research. He is currently working on exploring the ways AI and NLP tools can be used to improve aspects of the research pipeline and how this intersects with both survey research and quantitative program evaluation.
Vinodkumar Prabhakaran
Staff Research Scientist
Responsible AI and Human Centered Technologies
Google
Dr. Vinodkumar Prabhakaran is a staff research scientist at Google’s Responsible AI and Human Centered Technologies organization, and co-lead the interdisciplinary Technology, AI, Society and Culture (TASC) team. Before Google, he was a postdoc at Stanford University, and obtained his PhD from Columbia University. His prior research focused on building scalable ways using language technologies to identify and address large-scale societal issues such as racial disparities in policing, workplace incivility, and online abuse. He has published over 50 articles in top-tier venues such as the PNAS, ACL, TACL, NAACL, EMNLP, and FAccT.
MPower Professor, Institute for Advanced Computer Studies and Department of Linguistics
University of Maryland
Philip Resnik is Professor at University of Maryland in the Department of Linguistics and the Institute for Advanced Computer Studies. He earned his bachelor’s in Computer Science at Harvard and his PhD in Computer and Information Science at the University of Pennsylvania, and does research in computational linguistics. Prior to joining UMD, he was an associate scientist at BBN, a graduate summer intern at IBM T.J. Watson Research Center (subsequently awarded an IBM Graduate Fellowship) while at UPenn, and a research scientist at Sun Microsystems Laboratories. In 2020 he was designated a Fellow of the Association for Computational Linguistics.
Tyler Waite
Advisory Data Scientist
AI, Automation & Data Platform
IBM
Tyler Waite is an Advisory Data Scientist with the AI, Automation & Data Platform team at IBM. Tyler has a B.S. in Cognitive Psychology from the University of Utah, a M.A. in Human Factors from the University of Illinois and studied Information Science and UX Research (ABD) at Indiana University. Tyler’s main focus at IBM is exploring new ways to use Python for survey data analysis and creating dynamic interactive dashboards of the findings using IBM’s Cloud Pak for Data dashboarding tool.
Xinpeng Wang
Ph.D. Student
Ludwig-Maximilians-University of Munich
Xinpeng Wang is a PhD student at Munich AI and NLP (MaiNLP) lab, Ludwig-Maximilians-