Measuring readability of technical texts

Kriukova, Anna

Měření čitelnosti odborných textů

diplomová práce (OBHÁJENO)

Zobrazit/otevřít

Záznam o průběhu obhajoby (348.6Kb)

Trvalý odkaz

http://hdl.handle.net/20.500.11956/175521

Identifikátory

SIS: 245686

Oponent práce

Vidová Hladká, Barbora

Fakulta / součást

Matematicko-fyzikální fakulta

Obor

Computer Science - Language Technologies and Computational Linguistics

Katedra / ústav / klinika

Ústav formální a aplikované lingvistiky

Datum obhajoby

2. 9. 2022

Nakladatel

Univerzita Karlova, Matematicko-fyzikální fakulta

Jazyk

Angličtina

Známka

Výborně

Klíčová slova (česky)

srozumitelnost|čitelnost|datová analýza|korpusová lingvistika

Klíčová slova (anglicky)

readability|technical texts|data analytics|corpus linguistics|comprehensibility

Title: Measuring readability of technical texts Author: Anna Kriukova Faculty of Mathematics and Physics: Institute of Formal and Applied Linguistics Supervisor: Mgr. Cinkov'a Silvie, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: This research explores various approaches to measuring readability of technical texts. The data I work with is provided by Hyperskill, an online educa- tional platform dedicated mostly to Computer Science, where I did my internship. In the first part of my research, I examine classical readability formulas and try to find correlations between their values and the user statistics available for the texts. The results show that there are no high correlations, thus, the standard formulas are not suitable for the task. The second part of the research is dedi- cated to experiments with machine learning algorithms. Firstly, I use four sets of features to predict the average rating, completion time, and completion rate of a step. Then, I introduce a rule-based algorithm to split the texts into well- and poorly-written ones, which relies on students' comments. However, binary classification trained on this division shows low results and is not used in the final pipeline. The system suggested as the outcome of my work employs the user statistics' prediction for new texts and...

Citace dokumentu

Metadata

Zobrazit celý záznam