OBME 2140 - Data management and data analysis for social science

Course Description

This course introduces practical tools and methods for digital social science research. A first series of three sessions bears upon data and reference search online and the creation of structured bibliographies with Zotero. The core of the course is devoted to an initiation of the principles of "tidy data" as implemented in the R language and of the grammar of data visualization with ggplot. A strong focus is given on the specific challenges of social science data format such as transforming wide survey data in a long format or interpreting a codebook. The final sessions present several applied uses of R programming such as web scraping, geographical analysis or text mining.

Learning Outcomes

1. Basics of data analysis and data wrangling with R and the tidyverse

2. Basics of data visualization with ggplot

3. Reference management with Zotero

4. Writing data reports and code notebooks with R

Professionnal Skills

All the learning outcomes correspond to applied skills in a wide array of professional settings:

- Exploring and merging large datasets

- Producing high-quality data visualization

- Writing interactive data reports.

- Creating and curating a shared bibliographic database.

Enseignants

Pierre-carl LANGLAIS

Type

Séminaire

Language of tuition

English

Workload

- In Class Presence: 2 hours a week / 24 hours a semester

- Online learning activities: 1 hours a week / 12 hours a semester

- Reading and Preparation for Class: 6 hours a week / 72 hours a semester

- Research and Preparation for Group Work: 2 hours a week / 24 hours a semester

- Research and Writing for Individual Assessments: 1.5 hours a week / 18 hours a semester

Pre-requisite

No pre-requisite. The course is suited for beginners and do not require any previous background in programming.

Semester

Autumn and Spring 2024-2025

Course validation

The evaluation will be focused on four assignments:

- Create and manage a Zotero Library, a small-group work due by the end of september.

- Write an R script to analyze a predefined dataset, an individual assignment due by mid-October.

- Write an R script to analyze an original dataset, an individual assignment due to early November.

- Create a collective data notebook on an original dataset, a small-group work due by the end of November.

Every work count as one fourth of the final note.

Pedagogical format

The course will adopt an hybrid pedagogy with one pre-recorded course per week and a collective interactive session with Zoom covering the topics and issues covered by the course and a continuous feedback.to the personal or collective projects done by the students.

Courses requirements focuses on “real-life” situations (such as “cleaning” a dataset), that students are likely to meet while writing they master thesis or afterwards in professional life.

Lectures conseillées / Recommended readings

Lectures principales / Main readings

Hadley Wickham, Tidy Data, Journal of Statistical Software, vol. 59, n°10, 2014

Disponible / Available

Wickham, Hadley. « Tidy Data », Journal of statistical software. 2014, vol.59 no 10. p. 1‑23.

Julia Silge & David Robinson, Text mining with R, A Tidy Approach, O'Reilly, 2018, https://www.tidytextmining.com/

Disponible / Available

Julia Silge __EPERLUET__ David Robinson, "Text mining with R, A Tidy Approach", O'Reilly, 2018

Hadley Wickham & Garett Grolemund, R for Data science, https://r4ds.had.co.nz/

Disponible / Available

Hadley Wickham __EPERLUET__ Garett Grolemund, "R for Data science", O'Reilly, 2017

Lectures secondaires / Main readings

Kieran Healy, Data visualization: A practical introduction, Princeton University Press, 2018

Disponible / Available

Kieran Healy, Data visualization: A practical introduction, Princeton University Press, 2018