K2SP 3500 - Decoding biases in Artificial Intelligence
The course invites students to explore the issues of discrimination in AI. The deluge of public data and the
advent of deep learning techniques in the last two decades have generated a lot of hope for designing
more personalized and balanced public policies. However, using machine learning algorithms to make
public decisions about every citizen has not come without generating fierce criticisms from activists and
citizens concerned by the way they are being computed.
In this class, we invite students to get their hands on data and code and investigate on how and why public
data and state-of-the-art algorithms, far from being neutral and objective, may inherently produce
discrimination toward specific populations. Conversely, we will also use state-of-the-art machine learning
models to audit how they may embed systematic racial or gender biases, while still considering how they
fit a larger socio-technical context. These questions are particularly pertinent and increasingly complex in
the current context of the advent of generative AI models.
The objectives of the class are threefold: (i) discuss the ongoing debates on algorithmic fairness and its
application to data-driven policy, (ii) learn and practice large data collection, manipulation, and main
families of machine learning algorithms, (iii) create your own research design to investigate some original
dataset or existing algorithm and put it to the test in a collective empirical project.
Célia NOURI,Jean-Philippe COINTET
Cours magistral seul
English
- In Class Presence: 2 hours a week / 24 hours a semester
- Online learning activities: 20 minutes a week / 4 hours a semester
- Reading and Preparation for Class: 45 minutes a week / 18 hours a semester
- Research and Preparation for Group Work: 2 hours a week / 24 hours a semester
- Research and Writing for Individual Assessment: 20 minutes a week / 4 hours per semester
Empirical explorations of real-world cases are central in this class. For these hands-on sessions, basic Python coding
skills are required: defining a function, importing an external library (we will use pandas extensively), list and
dictionary manipulation.
Spring 2024-2025
There will be two main assessments during the semester. The most important one (2/3 of the grade) is a
collective project for which students are required to identify/generate a dataset online, design an
experimental plan to analyze its inherent biases, and finally visualize and reflect upon the systematic
discriminations embedded in the dataset. The final delivery will take the form of a website. An individual
take-home paper will also be graded. Active participation during the class can also be rewarded with an
extra point.
The pedagogical format is strongly oriented toward a workshop-style class. Typically, the class will start
with a discussion of the reading, followed by a short lecture on the concepts of the session before the
class turns into applied mode, wherein students will practice data coding by themselves.
O'neil, C. (2016). Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.
Garg N, Schiebinger L, Jurafsky D, Zou J. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences. 2018 Apr 17;115(16):E3635-44.