@@ -539,7 +539,7 @@ Duplicate timestamps: Filtered
539539mvn exec:java -Dexec.mainClass=" examples.N11_AIS_Expand_Full"
540540```
541541
542- ** Architecture** : INCREMENTAL BUILDING
542+ ** Architecture** : Incremental Building
543543```
544544Record → Create/Append to sequence → Sequence ALWAYS ready
545545If the sequence needs more space → MEOS auto-expands it with memory optimization
@@ -615,6 +615,231 @@ String ewkt = geo_as_ewkt(utm, 6);
615615
616616---
617617
618+ #### 17. ` N13_Aggregation_Demo ` - SQL Aggregate Functions
619+ ** Concepts** : Aggregate transfn/finalfn pattern, union operations
620+
621+ Demonstrates the PostgreSQL aggregate function pattern (transition function + final function) for combining multiple temporal/spatial objects.
622+
623+ ``` bash
624+ mvn exec:java -Dexec.mainClass=" examples.N13_Aggregation_Demo"
625+ ```
626+
627+ ** Three Aggregation Examples** :
628+
629+ ** 1. IntSpan Union** (Simple aggregation)
630+ ```
631+ Input: [1,5], [3,8], [10,15], [12,20]
632+ Process: Merge overlapping spans
633+ Output: {[1,8], [10,20]}
634+ ```
635+
636+ ** 2. FloatSpanSet Grouped** (GROUP BY aggregation)
637+ ```
638+ Input: 100 spansets, grouped by k % 10
639+ Process: 10 accumulators (one per group)
640+ Output: 10 FloatSpanSets
641+ ```
642+
643+ ** 3. TextSet Grouped** (Set aggregation)
644+ ```
645+ Input: TextSets grouped by k % 10
646+ Process: Union sets in each group
647+ Output: 10 TextSets
648+ ```
649+
650+ ** Pattern Explained** :
651+ ``` java
652+ // PHASE 1: Accumulation (transfn)
653+ Pointer state = null ;
654+ for (each row) {
655+ Pointer value = parse(row);
656+ state = transfn(state, value); // Accumulate
657+ }
658+
659+ // PHASE 2: Finalization (finalfn)
660+ Pointer result = finalfn(state); // Produce result
661+ ```
662+
663+ ** Key Functions** :
664+ - ` span_union_transfn() ` / ` spanset_union_finalfn() `
665+ - ` spanset_union_transfn() ` / ` spanset_union_finalfn() `
666+ - ` set_union_transfn() ` / ` set_union_finalfn() `
667+
668+ ** Use cases** :
669+ - Merging availability time slots
670+ - Combining room occupancy periods
671+ - Aggregating sensor data ranges
672+
673+ ---
674+
675+ #### 18. ` N14_RTree_Index ` - Spatial Indexing
676+ ** Concepts** : RTree spatial index, bounding box searches, performance optimization
677+
678+ Demonstrates RTree spatial indexing for fast spatial/temporal queries.
679+
680+ ``` bash
681+ mvn exec:java -Dexec.mainClass=" examples.N14_RTree_Index"
682+ ```
683+
684+ ** The Problem** : Finding boxes in a region
685+ ```
686+ Brute force: Check ALL 5,000,000 boxes → 400 ms
687+ RTree index: Check 200,000 boxes in the specified region/bounding box → 180 ms
688+ ```
689+
690+ ** Program Flow** :
691+ 1 . ** Build Index** - One-time cost
692+ 2 . ** Search with RTree** - Fast
693+ 3 . ** Search Brute Force** - Slow
694+ 4 . ** Validate** - Both find same 142 boxes
695+
696+ ** Note** : RTree is not a silver bullet
697+ For small datasets, Brute Force can outperform the R-Tree because:
698+ 1 . Initialization Cost
699+ - Building the index and managing native memory pointers adds a fixed overhead.
700+ - For small datasets, this setup time can outpace the actual search gains, making Brute Force faster.
701+ 2 . Search Threshold
702+ - The R-Tree only becomes profitable when the time saved by "pruning" the search space exceeds the time spent traversing the tree structure.
703+
704+ Rule of thumb: Use R-Trees for large-scale spatial datasets (like the 5M boxes in the program) or when making frequent, repeated queries on the same data.
705+
706+ ** Key Functions** :
707+ ``` java
708+ Pointer rtree = rtree_create_stbox();
709+ rtree_insert(rtree, box, id);
710+ Pointer ids = rtree_search(rtree, query, countPtr);
711+ ```
712+
713+ ** Use cases** :
714+ - Maritime traffic queries
715+ - Event detection in regions
716+
717+ ---
718+
719+ #### 19. ` N15_TPoint_MakeCoords ` - Coordinate Arrays Construction
720+ ** Concepts** : Alternative construction, coordinate arrays
721+
722+ Demonstrates building temporal point sequences from coordinate arrays instead of individual instants.
723+
724+ ``` bash
725+ mvn exec:java -Dexec.mainClass=" examples.N15_TPoint_MakeCoords"
726+ ```
727+
728+ Pass arrays directly to create your sequence of TPoints without
729+ having to manually instantiate each one of them manually and then assembling them into your final sequence
730+ ``` java
731+ double [] x = {2.349 , 2.350 , 2.351 };
732+ double [] y = {48.853 , 48.854 , 48.855 };
733+ double [] z = {10.5 , 12.3 , 11.8 };
734+
735+ // Efficient: One single call for the entire sequence
736+ Pointer seq = tpointseq_make_coords(xPtr, yPtr, zPtr, timesPtr, ... );
737+ ```
738+
739+ ** Use cases** :
740+ - GPS logger data (CSV format)
741+ - Data conversion (GPS/CSV → MEOS)
742+
743+ ---
744+
745+ #### 20. ` N16_Clustering_KMeans ` - K-means Clustering
746+ ** Concepts** : K-means algorithm, centroid-based clustering
747+
748+ Groups geographic points into K clusters based on proximity.
749+
750+ ``` bash
751+ mvn exec:java -Dexec.mainClass=" examples.N16_Clustering_KMeans"
752+ ```
753+
754+ ** Input** : ` popplaces.csv ` (30 cities)
755+ ** Output** : Same + cluster column (0-9)
756+
757+ ** Algorithm** (K=10):
758+ 1 . Choose 10 initial centers
759+ 2 . Assign each city to nearest center
760+ 3 . Recalculate centers
761+ 4 . Repeat until stable
762+
763+ ** Key Functions** :
764+ ``` java
765+ Pointer geo_cluster_kmeans(geometries, count, k)
766+ ```
767+
768+ ** Use cases** :
769+ - Delivery zones
770+ - Service areas
771+
772+ ---
773+
774+ #### 21. ` N17_Clustering_Topological ` - Topological Clustering
775+ ** Concepts** : Clustering by spatial relationships, automatic K
776+
777+ Groups geometries based on spatial relationships (touching/proximity).
778+
779+ ``` bash
780+ mvn exec:java -Dexec.mainClass=" examples.N17_Clustering_Topological"
781+ ```
782+
783+ ** Input** : ` regions.csv `
784+ ** Output** : ` regions_new.csv ` with clusters
785+
786+ ** Two Methods** :
787+ 1 . ** ClusterIntersecting** - Groups that touch/overlap
788+ 2 . ** ClusterWithin(1000m)** - Groups within distance
789+
790+ ** Key Difference** : Number of clusters emerges from data (not fixed K)
791+
792+ ** Key Functions** :
793+ ``` java
794+ geo_cluster_intersecting(geometries, count, numClustersPtr);
795+ geo_cluster_within(geometries, count, distance, numClustersPtr);
796+ ```
797+
798+ ** Use cases** :
799+ - Road networks (connected components)
800+ - Land parcels (adjacency)
801+ - Building blocks
802+
803+ ---
804+
805+ #### 22. ` N18_Clustering_DBSCAN ` - Density-Based Clustering
806+ ** Concepts** : DBSCAN algorithm, density clustering, outlier detection
807+
808+ Finds clusters based on density and identifies isolated points as noise.
809+
810+ ``` bash
811+ mvn exec:java -Dexec.mainClass=" examples.N18_Clustering_DBSCAN"
812+ ```
813+
814+ ** Input** : ` US.txt ` (geonames schools)
815+ ** Output** : ` geonames_new.csv ` with clusters
816+
817+ ** Parameters** :
818+ - eps: 2000 meters (neighbor distance)
819+ - minpoints: 5 (minimum density required for a point to be considered as a "CORE" one)
820+
821+ ** Point Types** :
822+ - ** CORE** : ≥5 neighbors → Forms cluster
823+ - ** BORDER** : Near core → In cluster
824+ - ** NOISE** : Isolated → Outlier
825+
826+ ** Key Functions** :
827+ ``` java
828+ geo_cluster_dbscan(geometries, count, eps, minpoints, clusters)
829+ ```
830+
831+ ** Advantages** :
832+ - Automatic cluster count
833+ - Arbitrary shapes
834+ - Identifies outliers
835+
836+ ** Use cases** :
837+ - Urban planning (underserved areas)
838+ - Hot spot detection
839+ - Service gap analysis
840+
841+ ---
842+
618843## Data Files
619844
620845All data files are in ` src/main/java/examples/data/ ` :
@@ -635,8 +860,29 @@ All data files are in `src/main/java/examples/data/`:
635860- ` berlinmod_trips.csv ` - 154 trips in HexWKB format
636861- ` brussels_communes.csv ` - 19 Brussels municipalities
637862- ` brussels_region.csv ` - Brussels boundary
863+ - ` regions.csv ` - BerlinMOD regions (for topological clustering)
638864- Coordinate system: EPSG:3857 (Web Mercator meters)
639865
866+ ### Clustering Datasets
867+ - ` popplaces.csv ` - 30 populated places worldwide (for K-means)
868+ - Format: ` name,pop_max,geom `
869+ - Natural Earth data: https://www.naturalearthdata.com/
870+ - Coordinate system: EPSG:4326 (WGS84)
871+
872+ ### Geonames Dataset (US Schools)
873+ - ` US.txt ` - Full geonames dump for USA (for DBSCAN)
874+ - Download from: https://download.geonames.org/export/dump/US.zip
875+ - Format: TSV with 19+ fields
876+ - Used fields: ` geonameid,name,admin1,lat,lon,fcode `
877+ - Filter: ` fcode='SCH' ` (schools only)
878+ - Size: ~ 2.5M records (~ 500MB uncompressed)
879+
880+ ### Aggregation Test Data
881+ - ` intspans.csv ` - 10 integer spans
882+ - ` floatspansets.csv ` - 100 float span sets
883+ - ` textsets.csv ` - 100 text sets
884+
885+ ---
640886
641887## Common Functions
642888
0 commit comments