面向多模态情感识别的软硬数据增强与交互注意力混合时空卷积网络

打开文本图片集
中图分类号:TP391 文献标志码:A 文章编号:1001-3695(2026)04-008-1028-10
doi:10.19734/j.issn.1001-3695.2025.08.0300
Multimodal emotion recognition with soft-hard data augmentation and interactive attention mixed spatiotemporal convolutional network
Zhao Kun’,Meng Fan’,Li Wenqiang2,3a,3b,Tang Guilin1, Xie Yaocong1+ (1.SchoolofIformation&ontrolEngnering,QingdoUniersityofechnogyQingdaoSandong25,China;2HenanMental HospitaldacUsi chiatryb.HtioCeflocletodntclUi Henan 453002,China)
Abstract:Existingmultimodalemotionrecognitionmethodssuferfromcomplexcomputation,limitedcrossmodalcollaborativemodelingcapabilities,andinsuficientgeneralizationinscenarioswithincompletemodalities.Toaddresstheseissues,this paperdevelopedanovelmodelnamed hybrid spatiotemporalconvolutional networkwithinteractiveattentionandsoft-harddata augmentation(MSCN-IA-SHDA).Theproposed model firstlyutilizedMSCNtoextractdeepspatiotemporal features from speechndvisionmodalitiesinparalel.Subsequently,themodel introducedagatedIAmechanism toachievegatedcrossmodalfeaturefusionFinall,itincorporatedSHDAstrategytonhancethemodel’srobustnessagainstmodalincopletees. TheexperimentalresultsontheRAVDESS,CH-SIMSand CH-SIMSv2datasets show that MSCN-IA-SHDA module maintainsa lowcomputationalcomplexitywhileensuring recognitionaccuracy,andtheaccuracyratesof sentimentclasificationare 88. 43% , 74.32% and 77.87% respectively,verifying the effectiveness of the proposed method.
Keywords:multimodal emotionrecognition;soft-harddataaugmentation;mixedspatiotemporalconvolution;atentionmechanism;lightweight
0 引言
近年来,多模态情感识别(multimodalemotionrecognition,MER)取得了显著进展[1,2],其核心思想在于融合多源信息(如文本、语音、视觉等)以提升情感识别的准确性,因此在情感计算领域受到广泛关注[3]。(剩余24392字)