Shluky silně podobných textů

Diviš, Jiří

Clusters of closely related documents

dc.contributor.advisor	Holub, Martin
dc.creator	Diviš, Jiří
dc.date.accessioned	2017-03-31T09:41:43Z
dc.date.available	2017-03-31T09:41:43Z
dc.date.issued	2007
dc.identifier.uri	http://hdl.handle.net/20.500.11956/8139
dc.description.abstract	This thesis focuses on automatic searching for clusters of topically similar texts in large text collection. We introduce an algorithm for nding the clusters and a method of optimizing its parameters using machine learning techniques. The algorithm is implemented and experimentaly evaluated. For evaluation we use a manually annotated collection of Czech documents, which contains a set of sample clusters chosen and tagged by a human annotator, and a huge collection of newspaper arcticles. Experiments show that the output of our algorithm ful ls our expectation and gives clusters of topically similar texts.	en_US
dc.description.abstract	Práce se věnuje automatizovanému hledání shluků tématicky podobných textových dokumentů v rozsáhlých textových kolekcích. V práci je navržen algoritmus pro nalezení těchto shluků a metoda pro optimalizaci jeho parametrů pomocí strojového učení. Byla provedena implementace a experimentální ověření funkčnosti navrženého postupu. Pro evaluaci je využita ručně anotovaná kolekce českých dokumentů obsahující množinu vzorových shluků a dále obsáhlá kolekce novinových článků. Provedené experimenty ukazují, že výstupem navrženého algoritmu jsou požadované shluky tématicky podobných textů.	cs_CZ
dc.language	Čeština	cs_CZ
dc.language.iso	cs_CZ
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.title	Shluky silně podobných textů	cs_CZ
dc.type	diplomová práce	cs_CZ
dcterms.created	2007
dcterms.dateAccepted	2007-02-05
dc.description.department	Ústav formální a aplikované lingvistiky	cs_CZ
dc.description.department	Institute of Formal and Applied Linguistics	en_US
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.identifier.repId	40246
dc.title.translated	Clusters of closely related documents	en_US
dc.contributor.referee	Húsek, Dušan
dc.identifier.aleph	000858938
thesis.degree.name	Mgr.
thesis.degree.level	magisterské	cs_CZ
thesis.degree.discipline	Softwarové systémy	cs_CZ
thesis.degree.discipline	Software systems	en_US
thesis.degree.program	Informatika	cs_CZ
thesis.degree.program	Informatics	en_US
uk.thesis.type	diplomová práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Ústav formální a aplikované lingvistiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Institute of Formal and Applied Linguistics	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Softwarové systémy	cs_CZ
uk.degree-discipline.en	Software systems	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Informatics	en_US
thesis.grade.cs	Výborně	cs_CZ
thesis.grade.en	Excellent	en_US
uk.abstract.cs	Práce se věnuje automatizovanému hledání shluků tématicky podobných textových dokumentů v rozsáhlých textových kolekcích. V práci je navržen algoritmus pro nalezení těchto shluků a metoda pro optimalizaci jeho parametrů pomocí strojového učení. Byla provedena implementace a experimentální ověření funkčnosti navrženého postupu. Pro evaluaci je využita ručně anotovaná kolekce českých dokumentů obsahující množinu vzorových shluků a dále obsáhlá kolekce novinových článků. Provedené experimenty ukazují, že výstupem navrženého algoritmu jsou požadované shluky tématicky podobných textů.	cs_CZ
uk.abstract.en	This thesis focuses on automatic searching for clusters of topically similar texts in large text collection. We introduce an algorithm for nding the clusters and a method of optimizing its parameters using machine learning techniques. The algorithm is implemented and experimentaly evaluated. For evaluation we use a manually annotated collection of Czech documents, which contains a set of sample clusters chosen and tagged by a human annotator, and a huge collection of newspaper arcticles. Experiments show that the output of our algorithm ful ls our expectation and gives clusters of topically similar texts.	en_US
uk.file-availability	V
uk.publication.place	Praha	cs_CZ
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Ústav formální a aplikované lingvistiky	cs_CZ
dc.identifier.lisID	990008589380106986