OBME 2195 - Artificial intelligence for corpus analysis in Social science and humanities

The course is an initiation to advanced techniques of corpus analysis based on artificial intelligence and machine learning. The first three introductory sessions describe the core principles of contemporary artificial intelligence algorithms (embeddings, neural networks), the extraction of online textual and visual corpus and the use of Python (with Google Colab) and R (with R Studio) for corpus analysis. The second series of five courses is devoted to textual analysis from Most examples of the course will be taken from sources relevant to social science and humanities and from social media (Twitter, Instagram…)

Learning Outcomes

1. Extraction of textual and visual corpus of online sources and social media

2. Automated classification of textual corpus (unsupervised and supervised)

3. Automated classification of visual corpus (supervised)

4. Use of embeddings for semantic analysis of visual and textual corpus

5. Initiation to the latest developments in AI (BERT, CLIP)

Professional Skills

All the learning outcomes correspond to applied skills in a wide array of professional settings

• Retrieving and exploring large textual and visual corpus.

• Create supervised models to automate corpus analysis.

• Draft and develop a code notebook for data analysis

• Initiation to latest technological and research trends on Artificial Intelligence.

Pierre-carl LANGLAIS
Séminaire
English
- In Class Presence: 2 hours a week / 24 hours a semester

- Online learning activities: 1 hours a week / 12 hours a semester

- Reading and Preparation for Class: 6 hours a week / 72 hours a semester

- Research and Preparation for Group Work: 2 hours a week / 24 hours a semester

- Research and Writing for Individual Assessments: 1.5 hours a week / 18 hours a semester

An initiation to one major programming language for data science, ideally either Python or R. The course has been partly conceived as a continuation to the R course Data Analysis and Data Management for the humanities.
Spring 2022-2023
Artificial intelligence for corpus analysis in Social science and humanities
The evaluation will be focused on three assignments:

Write a Python script/R script to analyze a predefined textual corpus (individual assignment)

Write a Python script/R script to analyze an original corpus (individual assignment)

Create a collective data notebook on an original visual or textual corpus, a small-group work

Each assignment count as a third of the final grade.

The course will adopt an hybrid pedagogy with one pre-recorded course per week and a collective interactive session with Zoom covering the topics and issues covered by the course and a continuous feedback.to the personal or collective projects done by the students.

Courses requirements focuses on “real-life” situations and applied examples of AI and machine learning that students are likely to meet in professional or research settings.

1. Julia Silge & David Robinson, Text mining with R, A Tidy Approach, O'Reilly, 2018, https://www.tidytextmining.com/
2. Melanie Walsh, Introduction to Cultural analytics & Python, https://melaniewalsh.github.io/Intro-Cultural-Analytics/welcome.html
3. Ankur A. Patel, Ajay Uppili Arasanipalai, Applied Natural Language Processing in the Enterprise, O'Reilly
4. François Chollet, Deep Learning with Python, O'reilly