Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?
Procedia Computer Science,
Volume 264,
pages 346-355,
doi: 10.1016/j.procs.2025.07.145
- Aug 2025

Emotion recognition in conversations (ERC) focuses on identifying emotion shifts within interactions, representing a significant step toward advancing machine intelligence. However, ERC data remains scarce, and existing datasets face numerous challenges due to their highly biased sources and the inherent subjectivity of soft labels. Even though Large Language Models (LLMs) have demonstrated their quality in many affective tasks, they are typically expensive to train, and their application to ERC tasksparticularly in data generationremains limited. To address these challenges, we employ a small, resource-efficient, and general-purpose LLM to synthesize ERC datasets with diverse properties, supplementing the three most widely used ERC benchmarks. For each benchmark, we generated two datasets of similar characteristics, totaling six datasets. We evaluate the utility of these datasets to (1) supplement existing datasets for ERC classification, and (2) analyze the effects of label imbalance in ERC. Our experimental results indicate that ERC classifier models trained on the generated datasets exhibit strong robustness and consistently achieve statistically significant performance improvements on existing ERC benchmarks.

@Article{KCW25a, author = {Kaplan, Burak Can and Carneiro, Hugo and Wermter, Stefan}, title = {Can Large Language Models Generate Effective Datasets for Emotion Recognition in Conversations?}, booktitle = {}, journal = {Procedia Computer Science}, editors = {}, number = {}, volume = {264}, pages = {346-355}, year = {2025}, month = {Aug}, publisher = {}, doi = {10.1016/j.procs.2025.07.145}, }