연구성과물

논문 및 특허

[해외논문] [ 2차년도 ] Improving Multimodal API Prediction via Adding Dialog State and Various Multimodal Gates
  • 게재 : DSTC9, AAAI 2021 WS
  • 등록일2021.05.12
  • 조회 2,119

In Dialog System Technology Challenge 9 (DSTC9) Track4 called Situated Interactive MultiModal Conversational AI (SIMMC), three subtasks were proposed. There is data describing the situation of the customer and assistant shopping in the Virtual Reality environment, and these subtasks perform API prediction, response generation, and dialog state tracking for conversations between the customer and assistant. To solve the DSTC9 Track4 subtask 1, we propose a new model with the Multimodal Adaptation Gate (MAG) and MultiModal Interaction module (MMI) to the LSTM encoder and memory network. MAG and MMI are used to determine the ratio of the text and metadata information (product images and information) to be received, and in which part of the information to be focused on through attention mechanism. In addition, we put the dialog act and slot generated from GPT-2 as the model input in the form of embedding. As a result, for furniture and fashion domain, there is an improvement in action accuracy of 3.13% and action attribute accuracy of 4.86% on average in subtask 1, which makes API prediction.