Skip to content

Commit f3a907a

Browse files
committed
add: readme/guide file to accompany new users trying the example programs
1 parent 8cb3b12 commit f3a907a

1 file changed

Lines changed: 247 additions & 1 deletion

File tree

src/main/java/examples/EXAMPLES_GUIDE.md

Lines changed: 247 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -539,7 +539,7 @@ Duplicate timestamps: Filtered
539539
mvn exec:java -Dexec.mainClass="examples.N11_AIS_Expand_Full"
540540
```
541541

542-
**Architecture**: INCREMENTAL BUILDING
542+
**Architecture**: Incremental Building
543543
```
544544
Record → Create/Append to sequence → Sequence ALWAYS ready
545545
If the sequence needs more space → MEOS auto-expands it with memory optimization
@@ -615,6 +615,231 @@ String ewkt = geo_as_ewkt(utm, 6);
615615

616616
---
617617

618+
#### 17. `N13_Aggregation_Demo` - SQL Aggregate Functions
619+
**Concepts**: Aggregate transfn/finalfn pattern, union operations
620+
621+
Demonstrates the PostgreSQL aggregate function pattern (transition function + final function) for combining multiple temporal/spatial objects.
622+
623+
```bash
624+
mvn exec:java -Dexec.mainClass="examples.N13_Aggregation_Demo"
625+
```
626+
627+
**Three Aggregation Examples**:
628+
629+
**1. IntSpan Union** (Simple aggregation)
630+
```
631+
Input: [1,5], [3,8], [10,15], [12,20]
632+
Process: Merge overlapping spans
633+
Output: {[1,8], [10,20]}
634+
```
635+
636+
**2. FloatSpanSet Grouped** (GROUP BY aggregation)
637+
```
638+
Input: 100 spansets, grouped by k % 10
639+
Process: 10 accumulators (one per group)
640+
Output: 10 FloatSpanSets
641+
```
642+
643+
**3. TextSet Grouped** (Set aggregation)
644+
```
645+
Input: TextSets grouped by k % 10
646+
Process: Union sets in each group
647+
Output: 10 TextSets
648+
```
649+
650+
**Pattern Explained**:
651+
```java
652+
// PHASE 1: Accumulation (transfn)
653+
Pointer state = null;
654+
for (each row) {
655+
Pointer value = parse(row);
656+
state = transfn(state, value); // Accumulate
657+
}
658+
659+
// PHASE 2: Finalization (finalfn)
660+
Pointer result = finalfn(state); // Produce result
661+
```
662+
663+
**Key Functions**:
664+
- `span_union_transfn()` / `spanset_union_finalfn()`
665+
- `spanset_union_transfn()` / `spanset_union_finalfn()`
666+
- `set_union_transfn()` / `set_union_finalfn()`
667+
668+
**Use cases**:
669+
- Merging availability time slots
670+
- Combining room occupancy periods
671+
- Aggregating sensor data ranges
672+
673+
---
674+
675+
#### 18. `N14_RTree_Index` - Spatial Indexing
676+
**Concepts**: RTree spatial index, bounding box searches, performance optimization
677+
678+
Demonstrates RTree spatial indexing for fast spatial/temporal queries.
679+
680+
```bash
681+
mvn exec:java -Dexec.mainClass="examples.N14_RTree_Index"
682+
```
683+
684+
**The Problem**: Finding boxes in a region
685+
```
686+
Brute force: Check ALL 5,000,000 boxes → 400 ms
687+
RTree index: Check 200,000 boxes in the specified region/bounding box → 180 ms
688+
```
689+
690+
**Program Flow**:
691+
1. **Build Index** - One-time cost
692+
2. **Search with RTree** - Fast
693+
3. **Search Brute Force** - Slow
694+
4. **Validate** - Both find same 142 boxes
695+
696+
**Note**: RTree is not a silver bullet
697+
For small datasets, Brute Force can outperform the R-Tree because:
698+
1. Initialization Cost
699+
- Building the index and managing native memory pointers adds a fixed overhead.
700+
- For small datasets, this setup time can outpace the actual search gains, making Brute Force faster.
701+
2. Search Threshold
702+
- The R-Tree only becomes profitable when the time saved by "pruning" the search space exceeds the time spent traversing the tree structure.
703+
704+
Rule of thumb: Use R-Trees for large-scale spatial datasets (like the 5M boxes in the program) or when making frequent, repeated queries on the same data.
705+
706+
**Key Functions**:
707+
```java
708+
Pointer rtree = rtree_create_stbox();
709+
rtree_insert(rtree, box, id);
710+
Pointer ids = rtree_search(rtree, query, countPtr);
711+
```
712+
713+
**Use cases**:
714+
- Maritime traffic queries
715+
- Event detection in regions
716+
717+
---
718+
719+
#### 19. `N15_TPoint_MakeCoords` - Coordinate Arrays Construction
720+
**Concepts**: Alternative construction, coordinate arrays
721+
722+
Demonstrates building temporal point sequences from coordinate arrays instead of individual instants.
723+
724+
```bash
725+
mvn exec:java -Dexec.mainClass="examples.N15_TPoint_MakeCoords"
726+
```
727+
728+
Pass arrays directly to create your sequence of TPoints without
729+
having to manually instantiate each one of them manually and then assembling them into your final sequence
730+
```java
731+
double[] x = {2.349, 2.350, 2.351};
732+
double[] y = {48.853, 48.854, 48.855};
733+
double[] z = {10.5, 12.3, 11.8};
734+
735+
// Efficient: One single call for the entire sequence
736+
Pointer seq = tpointseq_make_coords(xPtr, yPtr, zPtr, timesPtr, ...);
737+
```
738+
739+
**Use cases**:
740+
- GPS logger data (CSV format)
741+
- Data conversion (GPS/CSV → MEOS)
742+
743+
---
744+
745+
#### 20. `N16_Clustering_KMeans` - K-means Clustering
746+
**Concepts**: K-means algorithm, centroid-based clustering
747+
748+
Groups geographic points into K clusters based on proximity.
749+
750+
```bash
751+
mvn exec:java -Dexec.mainClass="examples.N16_Clustering_KMeans"
752+
```
753+
754+
**Input**: `popplaces.csv` (30 cities)
755+
**Output**: Same + cluster column (0-9)
756+
757+
**Algorithm** (K=10):
758+
1. Choose 10 initial centers
759+
2. Assign each city to nearest center
760+
3. Recalculate centers
761+
4. Repeat until stable
762+
763+
**Key Functions**:
764+
```java
765+
Pointer geo_cluster_kmeans(geometries, count, k)
766+
```
767+
768+
**Use cases**:
769+
- Delivery zones
770+
- Service areas
771+
772+
---
773+
774+
#### 21. `N17_Clustering_Topological` - Topological Clustering
775+
**Concepts**: Clustering by spatial relationships, automatic K
776+
777+
Groups geometries based on spatial relationships (touching/proximity).
778+
779+
```bash
780+
mvn exec:java -Dexec.mainClass="examples.N17_Clustering_Topological"
781+
```
782+
783+
**Input**: `regions.csv`
784+
**Output**: `regions_new.csv` with clusters
785+
786+
**Two Methods**:
787+
1. **ClusterIntersecting** - Groups that touch/overlap
788+
2. **ClusterWithin(1000m)** - Groups within distance
789+
790+
**Key Difference**: Number of clusters emerges from data (not fixed K)
791+
792+
**Key Functions**:
793+
```java
794+
geo_cluster_intersecting(geometries, count, numClustersPtr);
795+
geo_cluster_within(geometries, count, distance, numClustersPtr);
796+
```
797+
798+
**Use cases**:
799+
- Road networks (connected components)
800+
- Land parcels (adjacency)
801+
- Building blocks
802+
803+
---
804+
805+
#### 22. `N18_Clustering_DBSCAN` - Density-Based Clustering
806+
**Concepts**: DBSCAN algorithm, density clustering, outlier detection
807+
808+
Finds clusters based on density and identifies isolated points as noise.
809+
810+
```bash
811+
mvn exec:java -Dexec.mainClass="examples.N18_Clustering_DBSCAN"
812+
```
813+
814+
**Input**: `US.txt` (geonames schools)
815+
**Output**: `geonames_new.csv` with clusters
816+
817+
**Parameters**:
818+
- eps: 2000 meters (neighbor distance)
819+
- minpoints: 5 (minimum density required for a point to be considered as a "CORE" one)
820+
821+
**Point Types**:
822+
- **CORE**: ≥5 neighbors → Forms cluster
823+
- **BORDER**: Near core → In cluster
824+
- **NOISE**: Isolated → Outlier
825+
826+
**Key Functions**:
827+
```java
828+
geo_cluster_dbscan(geometries, count, eps, minpoints, clusters)
829+
```
830+
831+
**Advantages**:
832+
- Automatic cluster count
833+
- Arbitrary shapes
834+
- Identifies outliers
835+
836+
**Use cases**:
837+
- Urban planning (underserved areas)
838+
- Hot spot detection
839+
- Service gap analysis
840+
841+
---
842+
618843
## Data Files
619844

620845
All data files are in `src/main/java/examples/data/`:
@@ -635,8 +860,29 @@ All data files are in `src/main/java/examples/data/`:
635860
- `berlinmod_trips.csv` - 154 trips in HexWKB format
636861
- `brussels_communes.csv` - 19 Brussels municipalities
637862
- `brussels_region.csv` - Brussels boundary
863+
- `regions.csv` - BerlinMOD regions (for topological clustering)
638864
- Coordinate system: EPSG:3857 (Web Mercator meters)
639865

866+
### Clustering Datasets
867+
- `popplaces.csv` - 30 populated places worldwide (for K-means)
868+
- Format: `name,pop_max,geom`
869+
- Natural Earth data: https://www.naturalearthdata.com/
870+
- Coordinate system: EPSG:4326 (WGS84)
871+
872+
### Geonames Dataset (US Schools)
873+
- `US.txt` - Full geonames dump for USA (for DBSCAN)
874+
- Download from: https://download.geonames.org/export/dump/US.zip
875+
- Format: TSV with 19+ fields
876+
- Used fields: `geonameid,name,admin1,lat,lon,fcode`
877+
- Filter: `fcode='SCH'` (schools only)
878+
- Size: ~2.5M records (~500MB uncompressed)
879+
880+
### Aggregation Test Data
881+
- `intspans.csv` - 10 integer spans
882+
- `floatspansets.csv` - 100 float span sets
883+
- `textsets.csv` - 100 text sets
884+
885+
---
640886

641887
## Common Functions
642888

0 commit comments

Comments
 (0)