Measuring readability of technical texts
Měření čitelnosti odborných textů
diplomová práce (OBHÁJENO)
Zobrazit/ otevřít
Trvalý odkaz
http://hdl.handle.net/20.500.11956/175521Identifikátory
SIS: 245686
Kolekce
- Kvalifikační práce [11216]
Autor
Vedoucí práce
Oponent práce
Vidová Hladká, Barbora
Fakulta / součást
Matematicko-fyzikální fakulta
Obor
Computer Science - Language Technologies and Computational Linguistics
Katedra / ústav / klinika
Ústav formální a aplikované lingvistiky
Datum obhajoby
2. 9. 2022
Nakladatel
Univerzita Karlova, Matematicko-fyzikální fakultaJazyk
Angličtina
Známka
Výborně
Klíčová slova (česky)
srozumitelnost|čitelnost|datová analýza|korpusová lingvistikaKlíčová slova (anglicky)
readability|technical texts|data analytics|corpus linguistics|comprehensibilityTitle: Measuring readability of technical texts Author: Anna Kriukova Faculty of Mathematics and Physics: Institute of Formal and Applied Linguistics Supervisor: Mgr. Cinkov'a Silvie, Ph.D., Institute of Formal and Applied Lin- guistics Abstract: This research explores various approaches to measuring readability of technical texts. The data I work with is provided by Hyperskill, an online educa- tional platform dedicated mostly to Computer Science, where I did my internship. In the first part of my research, I examine classical readability formulas and try to find correlations between their values and the user statistics available for the texts. The results show that there are no high correlations, thus, the standard formulas are not suitable for the task. The second part of the research is dedi- cated to experiments with machine learning algorithms. Firstly, I use four sets of features to predict the average rating, completion time, and completion rate of a step. Then, I introduce a rule-based algorithm to split the texts into well- and poorly-written ones, which relies on students' comments. However, binary classification trained on this division shows low results and is not used in the final pipeline. The system suggested as the outcome of my work employs the user statistics' prediction for new texts and...