Aarhus University Seal

CORE talk: AI model finetuning and evaluation for the humanities and social sciences

CORE Talk by Guest Researcher Miguel Escobar Varela (University of Singapore).

Info about event

Time

Thursday 4 September 2025,  at 10:00 - 11:00

Location

Jens Chr. Skous Vej 4, Building 1483-524 or via Zoom.

Abstract: The study of sociocultural phenomena often requires classifying images and texts at scale. Commercial AI models are sometimes well suited for these tasks, but they are hard to evaluate and reproduce (companies such as OpenAI can remove older models on a whim) and are prone to unexpected errors. On the other hand, open weights models can be finetuned to achieve unprecedented levels of nuance and precision for a range of culturally and historically specific tasks. These small, task-constrained models are easier to evaluate, cite and reuse, and they are generally less energy-hungry than their commercial counterparts. Advances in techniques such as parameter efficient finetuning (PEFT) have also dramatically lowered the difficulty of finetuning models, and it is now possible for small research teams to produce remarkable accuracy.

The secret sauce for these models is having great training and evaluation data. These datasets don’t need to be large at all, but they must be diverse, consistent and representative. The great news is that the key skills for creating these datasets is attention to detail and context, which are core to the humanities and social sciences. In this talk, I will describe a range of projects that have used finetuning approaches for image and text classification, and best practices used by my research group, the Computational Cultural Heritage Group at the National University of Singapore. While most of my work is about the languages and artistic traditions of Southeast Asia, I am actively looking for collaborations that look at other cultural contexts.

Miguel Escobar Varela is Associate Professor and Deputy Director of the Centre for Computational Social Science and Humanities (CSSH) at the National University of Singapore. He is the author of Theater as Data (University of Michigan Press, 2021) and is associate editor of Computational Humanities Research. Links to papers, code and datasets can be found on his website https://miguelescobar.com

Zoom linkhttps://aarhusuniversity.zoom.us/j/69776930579