Grounding Natural Language Inference on Images

Vu Trong, Hoa

Vyvozování v přirozeném jazyce s využitím obrazových dat

diplomová práce (OBHÁJENO)

Zobrazit/otevřít

Záznam o průběhu obhajoby (151.5Kb)

Trvalý odkaz

http://hdl.handle.net/20.500.11956/101573

Identifikátory

SIS: 191640

Oponent práce

Libovický, Jindřich

Fakulta / součást

Matematicko-fyzikální fakulta

Obor

Matematická lingvistika

Katedra / ústav / klinika

Ústav formální a aplikované lingvistiky

Datum obhajoby

11. 9. 2018

Nakladatel

Univerzita Karlova, Matematicko-fyzikální fakulta

Jazyk

Angličtina

Známka

Velmi dobře

Klíčová slova (česky)

vyvozování v přirozeném jazyce

Klíčová slova (anglicky)

Grounding Natural Language Inference on Images

Grounding Natural Language Inference on Images Hoa Trong VU July 20, 2018 Abstract Despite the surge of research interest in problems involving linguistic and vi- sual information, exploring multimodal data for Natural Language Inference remains unexplored. Natural Language Inference, regarded as the basic step towards Natural Language Understanding, is extremely challenging due to the natural complexity of human languages. However, we believe this issue can be alleviated by using multimodal data. Given an image and its description, our proposed task is to determined whether a natural language hypothesis contra- dicts, entails or is neutral with regards to the image and its description. To address this problem, we develop a multimodal framework based on the Bilat- eral Multi-perspective Matching framework. Data is collected by mapping the SNLI dataset with the image dataset Flickr30k. The result dataset, made pub- licly available, has more than 565k instances. Experiments on this dataset show that the multimodal model outperforms the state-of-the-art textual model. References 1

Citace dokumentu

Metadata

Zobrazit celý záznam