Fast Near-Duplicate Image Search and Delete using pHash, t-SNE and KDTree.
-
Updated
Nov 22, 2022 - Python
Fast Near-Duplicate Image Search and Delete using pHash, t-SNE and KDTree.
Fast image similarity search with hash tables (Golang). Version 2 (LATEST)
Master Thesis - Trajectories Analysis
Audio fingerprinting and recognition in Python + similarity search on audio files
A really simple and fast rust library that tells whether two GPS points are close to each other by leveraging cheap lower bound of the otherwise expensive geodesic distance evaluation.
Multi module project focused on near-duplicate search for images.
Fast image similarity search with hash tables (Golang). Version 1
A trivial approach for near-duplicate detection of audios
Text fingerprinting: MinHash + LSH, SimHash, TLSH, ONNX semantic embeddings (BGE/E5/MiniLM), with byte-stable hash layouts and no_std + alloc default builds.
Testing Jaccard similarity and Cosine similarity techniques to calculate the similarity between two questions.
Partition-aware MinHash LSH deduplication library for large-scale text data curation on Apache Spark.
Convert text into a similarity hash based on sha256. Inspired by https://matpalm.com/resemblance/simhash/
Add a description, image, and links to the near-duplicate topic page so that developers can more easily learn about it.
To associate your repository with the near-duplicate topic, visit your repo's landing page and select "manage topics."