Skip to content

liviomendonca/sqlsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQLSIM: Analytical Queries by Similarity in Relational DBMS

SQLSIM is a strategy for executing analytical similarity queries and clustering directly within a Relational DBMS (PostgreSQL) using User-Defined Functions (UDFs). By moving the processing logic to the data, this approach reduces impedance mismatch and improves performance for specific analytical workloads.

🎓 Academic Context

This repository contains the source code and experiments developed for my Master's Thesis in Computer Science at the Federal University of Uberlândia (UFU).

Author & Advisors

  • Author: Lívio Mendonça ORCID
  • Advisor: Prof. Dr. Humberto Luiz Razente
  • Co-Advisor: Prof. Dra. Maria Camila Nardini Barioni

📂 Repository Structure

  • /main.sql: Core implementation of the similarity and clustering algorithms in PL/pgSQL.
  • /dataviz: Jupyter Notebooks used for data visualization and analyzing experiment results. (Contributed by Antonio Fernandes)
  • /examples: Case studies, including the Breast Cancer dataset experiments.
  • compose.yml: Docker composition for setting up the PostgreSQL environment with necessary extensions.

🚀 Getting Started

Prerequisites

  • Docker & Docker Compose
  • PostgreSQL Client (psql) or DBeaver

Installation

  1. Clone the repository:

    git clone https://github.com/liviomendonca/sqlsim.git
    cd sqlsim
  2. Start the database container:

    docker compose up -d
  3. Load the functions:

    psql -h localhost -U postgres -d sqlsim -f main.sql

🛠 Technologies

  • Database: PostgreSQL
  • Language: PL/pgSQL (Server-side programming)
  • Analysis: Python (Jupyter, Pandas, Matplotlib) for validation and visualization

🤝 Acknowledgements

Special thanks to Antonio Fernandes for his significant contributions to the data visualization modules (/dataviz) used in this project.

About

Master's thesis implementation of SQLSIM: executing similarity queries and clustering directly in PostgreSQL to enable in-database analytics.

Topics

Resources

Stars

Watchers

Forks

Contributors