Towards An Expressive Model For Safe and Efficient Manipulation of Graph Databases
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Tlemcen
Abstract
During the past decade, we have witnessed a wide speed use of graph
databases for modeling and analyzing highly interconnected data, the relational
model quickly reaches its limit when it comes to query and analyze such
data. Despite that, the relational model still dominates in the Web. The first
motivation of this thesis is to bridge this gap by presenting a framework for
mapping and efficiently querying relational data within a graph paradigm,
with contributions spanning theoretical foundations, scalability solutions, and
practical implementations. First, we establish a rigorous formal foundation by
introducing a complete mapping process that maps any relational database,
along with its schema, data, constraints, and operations, to an equivalent
graph database. This process is the first to holistically guarantee critical
mapping properties: information and semantic preservation to prevent loss of
meaning; query and update preservation through algorithms that automatically
convert SQL (relational query language) to Cypher (graph query language);
and monotonicity for efficient incremental updates. Next, we synthesize these
foundations into R2G-ETL, an end-to-end software tool that automates the
complete mapping process from relational systems to graph systems. As the
first tool to enforce all formal preservation properties, it provides a validated
and practical pipeline for real-world adoption. The second contribution of this
thesis deals with the computational complexity of graph pattern matching
on large-scale data. As querying of graph data is based on the subgraph
isomorphism, an NP-Complete problem, querying a large graph data may be
time-consuming even with the more efficient systems like Neo4j. One of the
possible solutions is to distribute the large graph data into separated machines,
execute the query in each machine, collect, and merge the results. To our
knowledge, no work has proposed a complete formalization of this problem. We
then formalize a distributed querying framework for Cypher, the most widely
used graph query language. This framework is centered on three algorithms, the
first one handles user query fragmentation for distributed execution; the second
one focuses on assembling the partial results returned from each fragment; while
The core algorithm coordinates all these operations between the coordinator
and the fragments, managing them in a parallel, multi-threaded manner. The
distributed framework can integrate seamlessly within any graph database
system such as Neo4j, overcoming the limitations of previous approaches.