This course is an introduction to Computational Social Sciences (CSS). Over the last two
decades, the amount of data available in text format, the methods and computational power
available to analyze it, have drastically increased. This course aims to introduce participants to
the use of computational methods to answer questions of the social sciences. The objective is to
demystify their complexity and to show how social scientists, both qualitative and quantitative,
can take advantage of data collected from the internet, and text analysis methods. The class
covers a variety of different methods used in CSS, including web scraping, text mining, topic
modelling, word embeddings, and supervised machine learning using Transformer models and
LLMs.
Objectives
This class has three main goals. First, we want to help students understand how to address
data limitations by automatically collecting textual data and choosing the right corpora for
their research interests. Second, we aim to expose students to various practical methods for
analyzing text quantitatively, enabling them to conduct their own research and potentially
prepare for more extensive projects like a Master's thesis. Lastly, we hope that students will
improve their programming skills and grasp quantitative reasoning, which can be applied to
handle different types of data, not just text.
The course will take place over a span of five days and integrates lectures with hands-on labsessions
to apply the methods. Each morning session, lasting for two hours, is dedicated to
presenting comprehensive content on the methods employed. This includes an exploration of
the underlying logic, advantages, disadvantages, and typical applications of the methods discussed.
In the afternoon, the first session will focus on demonstrating the practical application
of these methods. The final session of each day is designed to actively engage students in
applying the knowledge they have acquired throughout the day.
Pre-requisites
The course requires basic knowledge of RStudio. In terms of coding skills, this course picks
up where the RStudio lab sessions for the Quantitative Methods II of the School of Research
lecture ended. The final day introduces students to a text analysis application in Python, but
no knowledge of this language is required.