Větné reprezentace s interpretací podobnosti

Svobodová, Zuzana

Sentence representations with similarity interpretation

diploma thesis (DEFENDED)

View/Open

Záznam o průběhu obhajoby (347.2Kb)

Permanent link

http://hdl.handle.net/20.500.11956/188477

Identifiers

Study Information System: 256681

Referee

Libovický, Jindřich

Faculty / Institute

Faculty of Mathematics and Physics

Discipline

Computer Science - Artificial Intelligence

Department

Institute of Formal and Applied Linguistics

Date of defense

13. 2. 2024

Publisher

Univerzita Karlova, Matematicko-fyzikální fakulta

Language

Czech

Grade

Excellent

Keywords (Czech)

neuronové sítě|větné embeddingy

Keywords (English)

neural networks|sentence embeddings

Větné reprezentace - tzv. embeddingy, získané z modelů neuronových sítí, tvoří jádro mnoha aplikací jak v akademickém prostředí, tak v průmyslu. Ačkoliv embed- dingy dosahují vynikajících výsledků v korelaci s lidským vnímáním větné podobnosti, často chybí vysvětlení, proč modely rozhodly o větách, že jsou podobné či nepodobné. V této práci se snažíme zvýšit interpretovatelnost embeddingů začleněním různých sé- mantických anotací do průběhu tréninku modelu. Představujeme takto natrénovaný model SBERTslice, který vytváří embeddingy schopné rozlišovat různé sémantické vlast- nosti textu, včetně prvků jako je negace, sentiment, jmenné entity, emocionální tón a sémantické vztahy mezi větným slovesem a dalšími slovy ve větě. Otestovali jsme embeddingy generované modelem SBERTslice v určování sémantické podobnosti vět a klasifikaci textu, kde SBERTslice ve většině případů překonal původní model SBERT. 1

Abstract (English)

Sentence representations - embeddings - obtained from neural network models are the core part of many applications in both academia and industry. Although embeddings reach great results in correlation with human sense of sentence similarity, there is often a lack of explanation for why models choose sentences to be similar. In this thesis, we strive to increase the interpretability of model embeddings by incorporating different semantic sentence level annotations in the learning process. We introduce a model called SBERTslice that produces embeddings that can distinguish nuanced semantic variations in text, including elements like negation, sentiment, named entities, emotional tone, and verb-oriented relation between words in a text. We evaluated SBERTslice embeddings in various text classification and semantic sim- ilarity tasks and for a majority of them, SBERTslice outperformed the original SBERT. 1

Citace dokumentu

Metadata

Show full item record