Korpus spontánní mluvené češtiny ORAL2013
The Corpus of Spontaneous Spoken Czech ORAL 2013
Vědecký článek
Zobrazit/ otevřít
Trvalý odkaz
- Číslo 1 [8]
Datum vydání
Univerzita Karlova, Filozofická fakultaPraha
Zdrojový dokument
Časopis pro moderní filologii (Journal for Modern Philology) (web)ISSN: 2336-6591
Rok vydání periodika: 2015
Ročník periodika: 2015
Číslo periodika: 1
Odkaz na licenční podmínky
https://creativecommons.org/licenses/by-nc-nd/2.0/Klíčová slova (česky)
jazykový korpus, složení korpusu, spontánní mluvený jazyk, čeština, transkripceKlíčová slova (anglicky)
language corpus, corpus design, spontaneous spoken language, Czech, transcriptionThe paper presents a corpus of spontaneous spoken Czech called ORAL2013, its design principles and practical solutions adopted during the data collection. The corpus is designed to represent contemporaryspontaneous spoken language used in informal, real-life situations across the whole of the Czech Republic. The corpus consists of audio recordings and their transcriptions aligned with time stamps; it features manual annotation and broad regional coverage with a large variety of speakers. ORAL2013 contains 835 recordings from the period 2008 to 2011 made with 2,544 speakers (of whom 1,297 speakers are unique); the total length of the audio tracks is almost 300 hours and the total size of the transcriptions exceeds 3.28 million tokens. ORAL2013 is made publicly available by the Czech National Corpus at http://www.korpus.cz/.