Dizertačné práce

Spracovanie a vyhľadávanie v neštruktúrovaných databázach

Autor práce: Ing. Roman Čerešňák
Školiteľ: doc. Ing. Michal Kvet, PhD.
Dátum obhajoby: 19.8.2021
Študijný program: aplikovaná informatika
Oponent 1: doc. Ing. Jarmila ŠKRINÁROVÁ, PhD.
Oponent 2: prof. Ing. Marcel HARAKAĽ, PhD.

Slovenský abstrakt:

ČEREŠŇÁK, Roman: Spracovanie a vyhľadávanie v neštruktúrovaných databázach [dizertačná práca] - Žilinská univerzita v Žiline. Fakulta riadenia a informatiky; Katedra informatiky. - Školiteľ: doc. Ing. Michal Kvet, PhD. - Stupeň odbornej kvalifikácie: Doktor filozofie v študijnom odbore Informatika. Žilina: FRI ŽU v Žiline, 2021. Dizertačná práca sa zaoberá problematikou správy a komplexným návrhom migračného mechanizmu, ktorý zabezpečuje manažment transformačných pravidiel potrebných nielen na presun dát, ale aj presun schémy z relačných databáz na nerelačné. V prípade nutnosti je tento mechanizmus možné aplikovať aj smerom z nerelačnej databázy na relačnú databázu. V procese analýzy existujúcich možností sme dospeli k záveru, že ponechanie voľnosti dát a dátovej štruktúry v nerelačných databázach spôsobuje nadmernú nekonzistentnosť, čo v prípade nutnosti transformačného procesu spôsobuje zdĺhavé konsolidačné čakanie, a tým predlžuje čas celkovej zmeny. Výhodou nami navrhnutých metód je automatická konsolidácia údajov na základe stanovenej granularity, a tým zníženie času potrebného na transformáciu. K ďalšej výhode nami navrhnutých metód patrí zadefinovanie pravidiel, ktoré riešia nesúlad medzi dátovými typmi a ich následnú dátovú zmenu. V práci sme taktiež definovali mechanizmus, ktorý dokáže ovplyvňovať dáta, ktoré boli v systéme nahromadené, dátami ktoré do procesu vstupujú („real-time data“) a nahromadené dáta ovplyvňujú. Ďalej sme vytvorili koncept automatického prispôsobovania procesov na základe výkonu, čím sme optimalizovali paralelizmus, a tým dokážeme spustené procesy pridávať a odoberať. Taktiež bola navrhnutá metóda verzionovania, čo nám umožňuje v prípade problému vo veľkej miere redukovať čas, a tým začať proces od bodu zlyhania. V závere práce sa venujeme zrýchleniu vyhľadávacieho procesu v nerelačných databázach MongoDB a DynamoDB, vytváraniu temporálnych indexov v pamäti a zlepšenie implementácie umelej inteligencie. Experimentálne sme odhadli pravidlo aplikovania vyhľadávania umelej inteligencie pri veľkom množstve údajov („Big Data“). Kľúčové slová: Štruktúrovaná databáza, neštruktúrovaná databáza, veľké dáta, migračný mechanizmus, vyhľadávanie v neštruktúrovaných databázach, paralelizmus, migračný mechanizmus, ovplyvňovanie dát v reálnom čase.

Anglický abstrakt:

ČEREŠŇÁK, Roman: Processing and searching in unstructured databases [dissertation thesis] - University of Žilina. Faculty of Management Science and Informatics; Department of Informatics. - Supervisor: doc. Ing. Michal Kvet, PhD. - Qualification level: Philosophiae doctor in the study field Informatics. Žilina: FRI ŽU in Žilina, 2021. Main objective of this dissertation thesis is the management and complex design of migration mechanism which handles management of transformation rules used not only for data transfers, but also in the transfer of schema from relational to non-relational databases. This mechanism can also be applied in the other direction - from a non-relational database to a relational one. In the process of analysis of the existing solutions, we came to the conclusion, that maintaining flexibility of data and data structure in non-relational databases causes enormous inconsistencies. In the necessity of transformation, these inconsistencies lead to higher computational times of transformation related to ineffective consolidation of data. The advantages of our original methods are automatic consolidation of a database on the basis of given granularity and time reduction of the transformation process. Another advantage of our original methods lays in the definition of rules, which solve the inconsistency between data types and their following data change. We also define a mechanism, which can influence data, which have been accumulated in the system, with the use of data which enter the process during its’ run (real-time data) and influence accumulated data. We created the concept of automatic scaling of the processes based on the performance, which we used for the purposes of optimization of parallelism. With this mechanism, we can add or remove running processes. Also, new versioning method, which can be used to start the process at the point of failure, and which leads to time reduction as well, was designed and implemented. In the end, we deal with the improving of the query performance in non-relational database MongoDB and DynamoDB, creating temporal in-memory indexes, and improving searching performance with the use of machine learning. We experimentally estimated the rule of applying machine learning to Big Data. Key words: Structured database, unstructured database, big data, migration mechanism, search in unstructured databases, parallelism, migration mechanism, influencing data in real time.

Autoreferát dizertačnej práce
Text práce

Späť

Partneri FRI