Web page analyzer for scraping

Christozov, Valentín

Analyzátor webových stránek pro extrakci dat

dc.contributor.advisor	Macková, Kateřina
dc.creator	Christozov, Valentín
dc.date.accessioned	2023-07-24T18:46:31Z
dc.date.available	2023-07-24T18:46:31Z
dc.date.issued	2023
dc.identifier.uri	http://hdl.handle.net/20.500.11956/183127
dc.description.abstract	Web scraping is a technique used for a variety of applications to extract data from the web. To create a scraper, a developer needs to perform an analysis of the scraped website using tools like web devtools or Postman. This analysis is necessary to locate the data and to determine an effective way to scrape the website. The structure of websites varies greatly and the process of analysis is tedious and time-consuming. The goal of this project is to create a tool, that a non-developer could use to get an insight into where the data is stored on the website and how it can be scraped. An analysis is performed on the input website. Results of the analysis are presented in a minimalist user interface. Output of the analysis can be used as a guide for no code web scraping tools configuration as well as a baseline for a web scraper development.	en_US
dc.description.abstract	Web scraping je technika používaná ve spoustě aplikací k získání dat z webových stránek. Pro vytvoření scraperu musí vývojář nejdříve provést analýzu webové stránky, ze které chce data stahovat. Tato analýza se dělá pomocí nástrojů jako web devtools nebo Postman a je zapotřebí na nalezení dat a na určení efektivního způsobu jak scrapovat webstránku. Struktury jed- notlivých webových stránek se velmi liší, a proto je proces analýzy zdlouhavý a časově náročný. Cílem tohoto projektu je vytvořit nástroj, který by mohl použít i běžný uživatel, aby získal přehled o tom, jak lze data z dané webové stránky efek- tivně stáhnout. Tento nástroj provede analýzu vstupní webové stránky, jejíž výsledky jsou prezentovány v minimalistickém uživatelském rozhraní. Výstup analýzy může být použitý jako návod na konfiguraci web scrapingových ná- strojů bez psaní kódu a rovněž jako podklad pro vývoj webového scraperu.	cs_CZ
dc.language	English	cs_CZ
dc.language.iso	en_US
dc.publisher	Univerzita Karlova, Matematicko-fyzikální fakulta	cs_CZ
dc.subject	web scraping\|page analyser	en_US
dc.subject	extrakce dat z webu\|analyzátor webových stránek	cs_CZ
dc.title	Web page analyzer for scraping	en_US
dc.type	bakalářská práce	cs_CZ
dcterms.created	2023
dcterms.dateAccepted	2023-06-29
dc.description.department	Katedra teoretické informatiky a matematické logiky	cs_CZ
dc.description.department	Department of Theoretical Computer Science and Mathematical Logic	en_US
dc.description.faculty	Faculty of Mathematics and Physics	en_US
dc.description.faculty	Matematicko-fyzikální fakulta	cs_CZ
dc.identifier.repId	258782
dc.title.translated	Analyzátor webových stránek pro extrakci dat	cs_CZ
dc.contributor.referee	Petříček, Tomáš
thesis.degree.name	Bc.
thesis.degree.level	bakalářské	cs_CZ
thesis.degree.discipline	Programování a softwarové systémy	cs_CZ
thesis.degree.discipline	Programming and Software Systems	en_US
thesis.degree.program	Informatika	cs_CZ
thesis.degree.program	Computer Science	en_US
uk.thesis.type	bakalářská práce	cs_CZ
uk.taxonomy.organization-cs	Matematicko-fyzikální fakulta::Katedra teoretické informatiky a matematické logiky	cs_CZ
uk.taxonomy.organization-en	Faculty of Mathematics and Physics::Department of Theoretical Computer Science and Mathematical Logic	en_US
uk.faculty-name.cs	Matematicko-fyzikální fakulta	cs_CZ
uk.faculty-name.en	Faculty of Mathematics and Physics	en_US
uk.faculty-abbr.cs	MFF	cs_CZ
uk.degree-discipline.cs	Programování a softwarové systémy	cs_CZ
uk.degree-discipline.en	Programming and Software Systems	en_US
uk.degree-program.cs	Informatika	cs_CZ
uk.degree-program.en	Computer Science	en_US
thesis.grade.cs	Dobře	cs_CZ
thesis.grade.en	Good	en_US
uk.abstract.cs	Web scraping je technika používaná ve spoustě aplikací k získání dat z webových stránek. Pro vytvoření scraperu musí vývojář nejdříve provést analýzu webové stránky, ze které chce data stahovat. Tato analýza se dělá pomocí nástrojů jako web devtools nebo Postman a je zapotřebí na nalezení dat a na určení efektivního způsobu jak scrapovat webstránku. Struktury jed- notlivých webových stránek se velmi liší, a proto je proces analýzy zdlouhavý a časově náročný. Cílem tohoto projektu je vytvořit nástroj, který by mohl použít i běžný uživatel, aby získal přehled o tom, jak lze data z dané webové stránky efek- tivně stáhnout. Tento nástroj provede analýzu vstupní webové stránky, jejíž výsledky jsou prezentovány v minimalistickém uživatelském rozhraní. Výstup analýzy může být použitý jako návod na konfiguraci web scrapingových ná- strojů bez psaní kódu a rovněž jako podklad pro vývoj webového scraperu.	cs_CZ
uk.abstract.en	Web scraping is a technique used for a variety of applications to extract data from the web. To create a scraper, a developer needs to perform an analysis of the scraped website using tools like web devtools or Postman. This analysis is necessary to locate the data and to determine an effective way to scrape the website. The structure of websites varies greatly and the process of analysis is tedious and time-consuming. The goal of this project is to create a tool, that a non-developer could use to get an insight into where the data is stored on the website and how it can be scraped. An analysis is performed on the input website. Results of the analysis are presented in a minimalist user interface. Output of the analysis can be used as a guide for no code web scraping tools configuration as well as a baseline for a web scraper development.	en_US
uk.file-availability	V
uk.grantor	Univerzita Karlova, Matematicko-fyzikální fakulta, Katedra teoretické informatiky a matematické logiky	cs_CZ
thesis.grade.code	3
uk.publication-place	Praha	cs_CZ
uk.thesis.defenceStatus	O