Větné reprezentace s interpretací podobnosti

Svobodová, Zuzana

Sentence representations with similarity interpretation

dc.contributor.advisor	Hudeček, Vojtěch
dc.creator	Svobodová, Zuzana
dc.date.accessioned	2024-04-08T09:36:34Z
dc.date.available	2024-04-08T09:36:34Z
dc.date.issued	2024
dc.identifier.uri	http://hdl.handle.net/20.500.11956/188477
dc.description.abstract	Sentence representations - embeddings - obtained from neural network models are the core part of many applications in both academia and industry. Although embeddings reach great results in correlation with human sense of sentence similarity, there is often a lack of explanation for why models choose sentences to be similar. In this thesis, we strive to increase the interpretability of model embeddings by incorporating different semantic sentence level annotations in the learning process. We introduce a model called SBERTslice that produces embeddings that can distinguish nuanced semantic variations in text, including elements like negation, sentiment, named entities, emotional tone, and verb-oriented relation between words in a text. We evaluated SBERTslice embeddings in various text classification and semantic sim- ilarity tasks and for a majority of them, SBERTslice outperformed the original SBERT. 1	en_US
dc.description.abstract	Větné reprezentace - tzv. embeddingy, získané z modelů neuronových sítí, tvoří jádro mnoha aplikací jak v akademickém prostředí, tak v průmyslu. Ačkoliv embed- dingy dosahují vynikajících výsledků v korelaci s lidským vnímáním větné podobnosti, často chybí vysvětlení, proč modely rozhodly o větách, že jsou podobné či nepodobné. V této práci se snažíme zvýšit interpretovatelnost embeddingů začleněním různých sé- mantických anotací do průběhu tréninku modelu. Představujeme takto natrénovaný model SBERTslice, který vytváří embeddingy schopné rozlišovat různé sémantické vlast- nosti textu, včetně prvků jako je negace, sentiment, jmenné entity, emocionální tón a sémantické vztahy mezi větným slovesem a dalšími slovy ve větě. Otestovali jsme embeddingy generované modelem SBERTslice v určování sémantické podobnosti vět a klasifikaci textu, kde SBERTslice ve většině případů překonal původní model SBERT. 1	cs_CZ
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	neuronové sítě\|větné embeddingy	cs_CZ
dc.subject	neural networks\|sentence embeddings	en_US
dc.title	Větné reprezentace s interpretací podobnosti	cs_CZ
dc.type	diplomová práce	cs_CZ
dcterms.created	2024
dcterms.dateAccepted	2024-02-13
dc.description.department	Institute of Formal and Applied Linguistics	en_US
dc.description.department	Ústav formální a aplikované lingvistiky	cs_CZ
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.identifier.repId	256681
dc.title.translated	Sentence representations with similarity interpretation	en_US
dc.contributor.referee	Libovický, Jindřich
thesis.degree.name	Mgr.
thesis.degree.level	navazující magisterské	cs_CZ
thesis.degree.discipline	Computer Science - Artificial Intelligence	en_US
thesis.degree.discipline	Informatika - Umělá inteligence	cs_CZ
thesis.degree.program	Computer Science - Artificial Intelligence	en_US
thesis.degree.program	Informatika - Umělá inteligence	cs_CZ
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Informatika - Umělá inteligence	cs_CZ
uk.degree-discipline.en	Computer Science - Artificial Intelligence	en_US
uk.degree-program.cs	Informatika - Umělá inteligence	cs_CZ
uk.degree-program.en	Computer Science - Artificial Intelligence	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Větné reprezentace - tzv. embeddingy, získané z modelů neuronových sítí, tvoří jádro mnoha aplikací jak v akademickém prostředí, tak v průmyslu. Ačkoliv embed- dingy dosahují vynikajících výsledků v korelaci s lidským vnímáním větné podobnosti, často chybí vysvětlení, proč modely rozhodly o větách, že jsou podobné či nepodobné. V této práci se snažíme zvýšit interpretovatelnost embeddingů začleněním různých sé- mantických anotací do průběhu tréninku modelu. Představujeme takto natrénovaný model SBERTslice, který vytváří embeddingy schopné rozlišovat různé sémantické vlast- nosti textu, včetně prvků jako je negace, sentiment, jmenné entity, emocionální tón a sémantické vztahy mezi větným slovesem a dalšími slovy ve větě. Otestovali jsme embeddingy generované modelem SBERTslice v určování sémantické podobnosti vět a klasifikaci textu, kde SBERTslice ve většině případů překonal původní model SBERT. 1	cs_CZ
uk.abstract.en	Sentence representations - embeddings - obtained from neural network models are the core part of many applications in both academia and industry. Although embeddings reach great results in correlation with human sense of sentence similarity, there is often a lack of explanation for why models choose sentences to be similar. In this thesis, we strive to increase the interpretability of model embeddings by incorporating different semantic sentence level annotations in the learning process. We introduce a model called SBERTslice that produces embeddings that can distinguish nuanced semantic variations in text, including elements like negation, sentiment, named entities, emotional tone, and verb-oriented relation between words in a text. We evaluated SBERTslice embeddings in various text classification and semantic sim- ilarity tasks and for a majority of them, SBERTslice outperformed the original SBERT. 1	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky	cs_CZ
thesis.grade.code	1
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O