A contrastive description of English and Czech using the methodology of n-gram extraction

Šebestová, Denisa

Kontrastivní popis angličtiny a češtiny s využitím metodologie n-gramů

dissertation thesis (DEFENDED)

View/Open

Záznam o průběhu obhajoby (335.8Kb)

Permanent link

http://hdl.handle.net/20.500.11956/178511

Identifiers

Study Information System: 194144

Consultant

Milička, Jiří

Referee

Březina, Václav

Kopřivová, Marie

Faculty / Institute

Faculty of Arts

Discipline

English Language

Department

Institute of Linguistics

Date of defense

19. 9. 2022

Publisher

Univerzita Karlova, Filozofická fakulta

Language

English

Grade

Pass

Keywords (Czech)

Keywords (English)

Tato disertační práce zkoumá frazeologické vzorce ve třech různých registrech (parlamentní debaty, noviny a dětská beletrie) v angličtině a češtině. Vyhledává a rozebírá rekurentní sekvence slov pomocí metody n-gramů. Cílem je popsat frazeologické vlastnosti každého registru a porovnat je mezijazykově. Práce také zkoumá možnosti přizpůsobení metody n-gramů se zřetelem k typologickým vlastnostem zkoumaných jazyků. Čeština v tomto ohledu klade na metodu vyšší nároky kvůli velké míře morfologické a slovosledné variability. Práci tvoří tři případové studie, každá věnovaná jinému registru. První studie využívá a následně porovnává různé délky n-gramů. Předmětem zkoumání je malý korpus úzce specializovaného registru parlamentních debat. Studie dochází k závěru, že komplexního popisu registru lze dosáhnout nejlépe kombinací n-gramů různých délek. To ovšem představuje metodologický problém:, protože mezi n-gramy různých délek jsou četné překryvy, které znemožňují přesnou kvantifikaci a způsobují, že některé n-gramy jsou zastoupeny vícekrát. Studie popisuje různé funkce frazeologických vzorců a všímá si jejich role v organizaci diskursu. Druhá studie je věnována novinovým textům a zaměřuje se na n-gramy obsahující některou z předem zvolených předložek; předložky jsou ke zkoumání frazeologických vzorců...

Abstract (English)

This dissertation examines phraseological patterns in three registers (parliamentary debates, newspaper reporting, children's fiction) in English and Czech. It identifies and analyses recurrent word sequences through n-gram extraction, aiming to characterise the phraseology of each register and compare them cross- linguistically, while observing how the n-gram method can be adapted to accommodate for the typological properties of each language. Czech is particularly challenging in this respect due to its morphological and positional variability. The dissertation comprises three case studies, each focussed on a different register. The first case study explores different n-gram lengths using a small corpus of a specialised register - parliamentary debates, suggesting that for a comprehensive register characterisation, different lengths should be combined. It notes the importance of discourse-structuring patterns and the problem of overlaps between n-grams. In the newspaper study, I extract n-grams containing prepositions - a convenient starting point given their frequency and involvement in text- structuring. N-grams are complemented with collocation analysis, revealing some evaluative prosodies and semantic preferences of patterns and suggesting that the newspaper register is not purely...

Citace dokumentu

Metadata

Show full item record