SQLSIM is a strategy for executing analytical similarity queries and clustering directly within a Relational DBMS (PostgreSQL) using User-Defined Functions (UDFs). By moving the processing logic to the data, this approach reduces impedance mismatch and improves performance for specific analytical workloads.
This repository contains the source code and experiments developed for my Master's Thesis in Computer Science at the Federal University of Uberlândia (UFU).
- Thesis Title: SQLSIM: Analytical queries by similarity in relational DBMS
- Full Text: Access via UFU Repository
- Author: Lívio Mendonça
- Advisor: Prof. Dr. Humberto Luiz Razente
- Co-Advisor: Prof. Dra. Maria Camila Nardini Barioni
/main.sql: Core implementation of the similarity and clustering algorithms in PL/pgSQL./dataviz: Jupyter Notebooks used for data visualization and analyzing experiment results. (Contributed by Antonio Fernandes)/examples: Case studies, including the Breast Cancer dataset experiments.compose.yml: Docker composition for setting up the PostgreSQL environment with necessary extensions.
- Docker & Docker Compose
- PostgreSQL Client (psql) or DBeaver
-
Clone the repository:
git clone https://github.com/liviomendonca/sqlsim.git cd sqlsim -
Start the database container:
docker compose up -d
-
Load the functions:
psql -h localhost -U postgres -d sqlsim -f main.sql
- Database: PostgreSQL
- Language: PL/pgSQL (Server-side programming)
- Analysis: Python (Jupyter, Pandas, Matplotlib) for validation and visualization
Special thanks to Antonio Fernandes for his significant contributions to the data visualization modules (/dataviz) used in this project.