Towards An Expressive Model For Safe and Efficient Manipulation of Graph Databases

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

University of Tlemcen

Abstract

During the past decade, we have witnessed a wide speed use of graph databases for modeling and analyzing highly interconnected data, the relational model quickly reaches its limit when it comes to query and analyze such data. Despite that, the relational model still dominates in the Web. The first motivation of this thesis is to bridge this gap by presenting a framework for mapping and efficiently querying relational data within a graph paradigm, with contributions spanning theoretical foundations, scalability solutions, and practical implementations. First, we establish a rigorous formal foundation by introducing a complete mapping process that maps any relational database, along with its schema, data, constraints, and operations, to an equivalent graph database. This process is the first to holistically guarantee critical mapping properties: information and semantic preservation to prevent loss of meaning; query and update preservation through algorithms that automatically convert SQL (relational query language) to Cypher (graph query language); and monotonicity for efficient incremental updates. Next, we synthesize these foundations into R2G-ETL, an end-to-end software tool that automates the complete mapping process from relational systems to graph systems. As the first tool to enforce all formal preservation properties, it provides a validated and practical pipeline for real-world adoption. The second contribution of this thesis deals with the computational complexity of graph pattern matching on large-scale data. As querying of graph data is based on the subgraph isomorphism, an NP-Complete problem, querying a large graph data may be time-consuming even with the more efficient systems like Neo4j. One of the possible solutions is to distribute the large graph data into separated machines, execute the query in each machine, collect, and merge the results. To our knowledge, no work has proposed a complete formalization of this problem. We then formalize a distributed querying framework for Cypher, the most widely used graph query language. This framework is centered on three algorithms, the first one handles user query fragmentation for distributed execution; the second one focuses on assembling the partial results returned from each fragment; while The core algorithm coordinates all these operations between the coordinator and the fragments, managing them in a parallel, multi-threaded manner. The distributed framework can integrate seamlessly within any graph database system such as Neo4j, overcoming the limitations of previous approaches.

Description

Citation

Collections