OBME 2140 - Data management and data analysis for social science

This course introduces practical tools and methods for digital social science research. A first series of three sessions bears upon data and reference search online and the creation of structured bibliographies with Zotero. The core of the course is devoted to an initiation of the principles of "tidy data" as implemented in the R language and of the grammar of data visualization with ggplot. A strong focus is given on the specific challenges of social science data format such as transforming wide survey data in a long format or interpreting a codebook. The final sessions present several applied uses of R programming such as web scraping, geographical analysis or text mining.

Learning Outcomes

1. Basics of data analysis and data wrangling with R and the tidyverse

2. Basics of data visualization with ggplot

3. Reference management with Zotero

4. Writing data reports and code notebooks with R

Professionnal Skills

All the learning outcomes correspond to applied skills in a wide array of professional settings:

- Exploring and merging large datasets

- Producing high-quality data visualization

- Writing interactive data reports.

- Creating and curating a shared bibliographic database.

Pierre-carl LANGLAIS
Séminaire
English
- In Class Presence: 2 hours a week / 24 hours a semester

- Online learning activities: 1 hours a week / 12 hours a semester

- Reading and Preparation for Class: 6 hours a week / 72 hours a semester

- Research and Preparation for Group Work: 2 hours a week / 24 hours a semester

- Research and Writing for Individual Assessments: 1.5 hours a week / 18 hours a semester

No pre-requisite. The course is suited for beginners and do not require any previous background in programming.
Autumn and Spring 2024-2025
The evaluation will be focused on four assignments:

- Create and manage a Zotero Library, a small-group work due by the end of september.

- Write an R script to analyze a predefined dataset, an individual assignment due by mid-October.

- Write an R script to analyze an original dataset, an individual assignment due to early November.

- Create a collective data notebook on an original dataset, a small-group work due by the end of November.

Every work count as one fourth of the final note.

The course will adopt an hybrid pedagogy with one pre-recorded course per week and a collective interactive session with Zoom covering the topics and issues covered by the course and a continuous feedback.to the personal or collective projects done by the students.

Courses requirements focuses on “real-life” situations (such as “cleaning” a dataset), that students are likely to meet while writing they master thesis or afterwards in professional life.

Hadley Wickham, Tidy Data, Journal of Statistical Software, vol. 59, n°10, 2014
Julia Silge & David Robinson, Text mining with R, A Tidy Approach, O'Reilly, 2018, https://www.tidytextmining.com/
Hadley Wickham & Garett Grolemund, R for Data science, https://r4ds.had.co.nz/
Kieran Healy, Data visualization: A practical introduction, Princeton University Press, 2018