Zobrazit minimální záznam

Corpus DIA1900: its Conception and Building
dc.contributor.authorBenešová, Lucie
dc.contributor.authorKučera, Karel
dc.contributor.authorNajbrtová, Kateřina
dc.contributor.authorPivoňková, Klára
dc.contributor.authorStluka, Martin
dc.date.accessioned2023-07-19T12:57:57Z
dc.date.available2023-07-19T12:57:57Z
dc.date.issued2023
dc.identifier.urihttp://hdl.handle.net/20.500.11956/183037
dc.language.isocs_CZcs
dc.publisherUniverzita Karlova, Filozofická fakultacs
dc.subjectdiachronní korpuscs
dc.subjectčeština 19. stoletícs
dc.subjectmorfologický slovníkcs
dc.subjectlemmatizacecs
dc.subjectmorfologické značkovánícs
dc.subjecttagsetcs
dc.titleKorpus DIA1900: jeho koncepce a vytvářenícs
dc.typeVědecký článekcs
dcterms.accessRightsopenAccess
dcterms.licensehttp://creativecommons.org/licenses/by-nc-nd/2.0/
dc.title.translatedCorpus DIA1900: its Conception and Buildingcs
uk.abstract.enThe objective of the paper is to describe the principles for building the onemillionword DIA1900 Corpus consisting of Czech texts published between 1851 and 1900, designed to be both balanced and representative. There are two main goals determining the methods of corpus building and the decision to develop new tools tailored to the special needs of 19th century Czech: 1) to present the variability of Czech in the 2nd half of the 19th century (including spelling, morphology, wordformation) and 2) to link the detected variants to the appropriate lemmas. The paper presents the phases of the processing of the texts, including transcription, manual pre-annotation, as well as the construction of a large morphological dictionary and the selection of a suitable set of paradigms. Further sections are focused on annotation and morphological tagging and manual disambiguation. The objective was to create a gold standard, intended for use in the automatic annotation both of the DIA1900 corpus and the planned corpus of Czech texts of the years 1800–1850.cs
dc.publisher.publicationPlacePrahacs
uk.internal-typeuk_publication
dc.identifier.doi10.14712/23366591.2023.1.8
dc.description.startPage121cs
dc.description.endPage140cs
dcterms.isPartOf.nameČasopis pro moderní filologiics
dcterms.isPartOf.journalYear2023
dcterms.isPartOf.journalVolume2023
dcterms.isPartOf.journalIssue1
dcterms.isPartOf.issn2336-6591
dc.relation.isPartOfUrlhttps://casopispromodernifilologii.ff.cuni.cz


Soubory tohoto záznamu

Thumbnail

Tento záznam se objevuje v následujících sbírkách

Zobrazit minimální záznam


© 2017 Univerzita Karlova, Ústřední knihovna, Ovocný trh 560/5, 116 36 Praha 1; email: admin-repozitar [at] cuni.cz

Za dodržení všech ustanovení autorského zákona jsou zodpovědné jednotlivé složky Univerzity Karlovy. / Each constituent part of Charles University is responsible for adherence to all provisions of the copyright law.

Upozornění / Notice: Získané informace nemohou být použity k výdělečným účelům nebo vydávány za studijní, vědeckou nebo jinou tvůrčí činnost jiné osoby než autora. / Any retrieved information shall not be used for any commercial purposes or claimed as results of studying, scientific or any other creative activities of any person other than the author.

DSpace software copyright © 2002-2015  DuraSpace
Theme by 
@mire NV