You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
π Semantic-Cpp: A Future-Oriented Intelligent Stream Processing Framework for C++
Semantic-Cpp is a modern C++ stream processing library, completely redesigned from the ground up, featuring a "multiple headers, zero external dependencies" modular architecture. Each header file has a clear, singular responsibility and is independently testable, together forming a complete stream processing ecosystem. This library innovatively blends the essence of multiple programming paradigms:
The Elegance and Fluidity of Java Stream API: Chain calls, declarative programming, making code as graceful as poetry β¨
The Laziness and Flexibility of JavaScript Generators: Lazy evaluation, on-demand generation, memory-friendly π±
The Efficiency and Order of Database Indexing: Intelligent sorting, index-driven, a powerful tool for time-series data processing β±οΈ
The Batch Processing Philosophy of "Container-as-Element": Vectors, lists, maps... Any container can be a first-class citizen in the stream, flowing freely π¦
It abstracts data processing as operations on "elements" and their "logical positions (indices)"βmuch like "rows" and "primary keys" in a database. You can freely rearrange, offset, and reverse indices without touching the data itself; you can also pass any container (vector, map, array...) as an indivisible whole within the stream, and "unpack" it back to the element level at any time. This ability to freely switch between two granularities is absent in traditional stream frameworks. π―
Semantic-Cpp consists of seven core header files, layered progressively. Each file has a single responsibility and is independently testable. Five namespaces, each with its own remit, work together to form a complete pipeline from data source to final result:
The dependency chain is clear and logical, like a meticulously designed circuit diagram: current flows from the foundational type definitions upwards, with each layer depending only on the layers beneath it. Ultimately, all paths converge at semantic.h and semantics.h, forming the complete stream processing capability.
function.h β No dependencies, the type foundation
pool.h β Depends on function.h
charsequence.h β Independent module, Unicode processing
collector.h β Depends on function.h, pool.h
hash.h / less.h β Independent modules, standard library extensions
semantic.h β Depends on all of the above
semantics.h β Depends on semantic.h
π Namespace Overview
Semantic-Cpp meticulously designs five namespaces, each like an independent "department", with distinct responsibilities yet closely collaborating:
Namespace
Header File
Responsibility
Core Types/Functions
function
function.h
Type system foundation
Timestamp, Module, Generator<T>, Supplier<R>, Consumer<T>, Predicate<T> etc.
charset, Meta, Point, Charsequence, Builder, Buffer etc.
collector
collector.h
Terminal collection execution
Collector<E,A,R>, Identity<A>, Accumulator<A,E> etc.
collectable
semantic.h
Materialised data containers
Collectable<E>, OrderedCollectable<E>, UnorderedCollectable<E> etc.
semantic
semantic.h semantics.h
Stream construction & intermediate operations
Semantic<E>, useRange(), useFrom() etc.
π Namespace Collaboration Flow
Data flow between namespaces is like an assembly line in a factoryβraw material enters from semantic, undergoes processing layer by layer, and is finally packaged and shipped from collector. Each step has a clear boundary of responsibility:
semantic::useRange(0, 100) // β semantic namespace: create stream
.map(int x { return x * 2; }) // β semantic namespace: intermediate transform
.filter(int x { return x > 50; }) // β semantic namespace: intermediate filter
.toUnordered() // β Convert to collectable namespace
.toVector(); // β Invoke collector from collector namespace
π¦ Layer 1: function.h β Type Foundation
function.h defines the type system for the entire framework, the common foundation for all modules. π
namespacefunction {
using Timestamp = longlong; // Index type, the "timestamp" of data in the streamusing Module = unsignedlonglong; // Module/count typetemplate <typename T>
using Generator = std::function<void(
std::function<void(T, Timestamp)>, // accept β receive an element
std::function<bool(T, Timestamp)> // interrupt β should we stop?
)>;
}
Generator is the core abstraction of the entire stream system. π It does not return data; instead, it accepts two callbacksβaccept ("I'm ready, please accept this element") and interrupt ("should we stop?"). This inversion of control design means the data producer has no knowledge of the consumer; it simply "pushes" data at the appropriate moment. This is the essence of lazy evaluation: data only truly "flows" when accept is called; before that, everything is merely a description.
pool.h provides the global thread pool pool::pool, the concurrency engine for the entire framework. π It employs a declarative parallelism designβwhen you write .parallel(4), it does not immediately launch four threads to start processing. This line of code is merely a "declaration": telling the framework "I intend to use 4 threads for parallel processing". Actual parallel execution occurs when a terminal operation is invokedβthat is, when you call collection methods like toVector(), findFirst(), count(), etc.
Feature
Description
Declarative Parallelism
.parallel(4) only declares "I want to use 4 threads", does not start immediately
Emergency Shutdown
Built-in emergencyShutdown() and std::set_terminate handler
Exception Propagation
submit() returns std::future, propagating exceptions safely to the main thread
π€ Layer 3: charsequence.h β Unicode Character Sequences
charsequence.h is a complete Unicode processing module, providing functionality for creating, converting, and manipulating character sequences. π It supports various encodings like UTF-8, UTF-16 (LE/BE), UTF-32 (LE/BE), ASCII, and Latin1. It correctly detects and handles surrogate pairs, returning the standard U+FFFD replacement character for invalid code points.
Key Rule: A Semantic<E> must first be converted to a Collectable<E> via toUnordered(), toOrdered(), toWindow(), toStatistics(), or sort() before terminal methods can be called.
π§ Five Materialisation Paths
Conversion Method
Target Type
Underlying Data Structure
Performance Characteristic
toUnordered()
UnorderedCollectable
unordered_map
Average O(1) lookup
toOrdered()
OrderedCollectable
map
O(log n) lookup
sort()
OrderedCollectable
map (value-sorted)
O(log n) lookup
toWindow()
WindowCollectable
Inherits ordered collection
Supports slide/tumble
toStatistics<D>()
Statistics<E,D>
Inherits ordered collection
30+ statistical methods
π Collectable β All Terminal Methods (Alphabetical Order)
Method
Return Type
Description
allMatch(predicate)
bool
All elements match condition
anyMatch(predicate)
bool
Any element matches condition
average<D>()
D
Average
average<D>(mapper)
D
Average after mapping
collect(identity, acc, comb, fin)
R
Custom four-stage collection
collect(identity, interrupt, acc, comb, fin)
R
Custom interruptible collection
count()
Module
Total number of elements
empty()
bool
Is the stream empty?
error()
void
Output to stderr (supports delimiter/prefix/suffix/converter)
findAny()
std::optional<E>
Find any (random) element
findAt(index)
std::optional<E>
Find element at specified index (supports negative)
findFirst()
std::optional<E>
Find the first element
findLast()
std::optional<E>
Find the last element
findMaximum()
std::optional<E>
Find the maximum element
findMaximum(comparator)
std::optional<E>
Find maximum with custom comparator
findMinimum()
std::optional<E>
Find the minimum element
findMinimum(comparator)
std::optional<E>
Find minimum with custom comparator
forEach(consumer)
void
Perform side-effect for each element
group(keyExtractor)
unordered_map<K, vector<E>>
Group by key
groupBy(keyExtractor, valueExtractor)
unordered_map<K, vector<V>>
Group by key and extract value
join()
Charsequence
Join with default format
join(delimiter)
Charsequence
Join with custom delimiter
join(prefix, delimiter, suffix)
Charsequence
Join with fully custom format
noneMatch(predicate)
bool
No element matches condition
out()
Charsequence
Output to stdout (supports delimiter/prefix/suffix/converter)
partition(size)
vector<vector<E>>
Partition by fixed size
partitionBy(keyExtractor)
vector<vector<E>>
Partition by index key
partitionBy(keyExtractor, valueExtractor)
vector<vector<V>>
Partition by index key and extract value
range<D>()
D
Numeric range (max - min)
range<D>(mapper)
D
Numeric range after mapping
reduce(accumulator)
std::optional<E>
Reduction without identity
reduce(identity, accumulator)
E
Reduction with identity
reduce(identity, acc, comb)
R
Fully custom reduction
summate<D>()
D
Summation
summate<D>(mapper)
D
Summation after mapping
toArray<N>()
std::array<E, N>
Collect into fixed-size array
toDeque()
std::deque<E>
Collect into deque
toForwardList()
std::forward_list<E>
Collect into forward_list
toList()
std::list<E>
Collect into list
toMap(keyExtractor)
std::map<K, E>
Collect into map by key
toMap(keyExtractor, valueExtractor)
std::map<K, V>
Collect into map with custom key & value
toMultimap(keyExtractor)
std::multimap<K, E>
Collect into multimap by key
toMultimap(keyExtractor, valueExtractor)
std::multimap<K, V>
Collect into multimap with custom key & value
toMultiset()
std::multiset<E>
Collect into multiset
toPriorityQueue()
std::priority_queue<E>
Collect into priority_queue
toQueue()
std::queue<E>
Collect into queue
toSet()
std::set<E>
Collect into set (unique & sorted)
toStack()
std::stack<E>
Collect into stack
toUnorderedMap(keyExtractor, valueExtractor)
std::unordered_map<K, V>
Collect into unordered_map
toUnorderedMultimap(keyExtractor)
std::unordered_multimap<K, E>
Collect into unordered_multimap by key
toUnorderedMultimap(keyExtractor, valueExtractor)
std::unordered_multimap<K, V>
Collect into unordered_multimap with custom key & value
toUnorderedMultiset()
std::unordered_multiset<E>
Collect into unordered_multiset
toUnorderedSet()
std::unordered_set<E>
Collect into unordered_set
toVector()
std::vector<E>
Collect into vector
π Statistics<E,D> β Statistical Methods
Method
Return Type
Description
summate()
D
Summation
average()
D
Average
minimum()
std::optional<D>
Minimum value
maximum()
std::optional<D>
Maximum value
range()
D
Range (max - min)
variance()
D
Population variance
standardDeviation()
D
Population standard deviation
median()
std::optional<D>
Median
mode()
std::optional<E>
Mode
percentile(p)
std::optional<D>
p-th percentile
firstQuartile()
std::optional<D>
First quartile (Q1)
thirdQuartile()
std::optional<D>
Third quartile (Q3)
interquartileRange()
std::optional<D>
Interquartile range (IQR)
skewness()
D
Skewness
kurtosis()
D
Kurtosis
frequency()
map<E, complex>
Frequency domain features
distribute()
map<E, complex>
Spatial distribution features
dft()
vector<complex<double>>
Discrete Fourier Transform
idft()
vector<complex<double>>
Inverse Discrete Fourier Transform
fft()
vector<complex<double>>
Fast Fourier Transform
ifft()
vector<complex<double>>
Inverse Fast Fourier Transform
gradient(...)
vector<double>
Gradient descent
All the above methods also support an optional mapper parameter version.
π§ Semantic Intermediate Operation Methods
Category
Method
Description
Element Transform
map
One-to-one mapping transformation
flatMap
One-to-many mapping and flattening
flat
Flatten nested streams (supports Semantic and containers)
π§ Layer 6: semantics.h β Stream Construction Factories
π’ Numeric Range Generation
Method
Description
useRange(start, end)
Generate range [start, end)
useRange(start, end, step)
Range with step (supports negative)
useRangeClosed(start, end)
Generate closed range [start, end]
useRangeClosed(start, end, step)
Closed range with step
βΎοΈ Infinite Stream Generation
Method
Description
useInfinite(seed, generator)
Infinite iteration from seed value
useGenerate(supplier)
Infinite calls to supplier
useGenerate(supplier, limit)
Finite number of calls to supplier
useIterate(seed, generator)
Infinite iteration from seed value
useIterate(seed, generator, limit)
Finite number of iterations
useRandom()
Infinite stream of random integers
useRandom(min, max)
Random number stream in specified range
useRandom(min, max, count)
Random number stream with specified range and count
π¦ Container & Element Construction
Method
Description
useEmpty()
Create an empty stream
useOf(element)
Create stream from a single element
useOf(e1, e2)
Create stream from two elements
useOf(e1, e2, e3)
Create stream from three elements
useOf({...})
Create stream from initialiser list
useFrom(container)
Create stream from standard container
useFrom({...})
Create stream from initialiser list
useRepeat(element, count)
Repeat element n times
π Text & Unicode Processing
Method
Description
useBlob(text)
Split string into char stream by bytes
useBlob(text, start, end)
Split specified range by bytes
useBlob(istream)
Read from input stream line by line
useBlob(istream, delimiter)
Read from input stream by delimiter
useText(text)
Whole text stream (Charsequence)
useText(text, delimiter)
Split text by delimiter
useText(istream)
Read entire content from input stream
useSequence(charsequence)
Create code point stream from character sequence
useSequence(text, encoding)
Create code point stream from text with specified encoding
useCharsequence(charsequence)
Character sequence as a whole stream
useCharsequence(charsequence, delimiter)
Split character sequence by delimiter
π Layer 7: hash.h / less.h β The Universal Language of the Container World
Provides complete hash and comparison support for all standard library containers (including nested containers), pair, tuple, optional, variant, chrono time types, complex numbers, and more. Containers nested to any depth and in any combination can now be used as keys in unordered_set or elements in set. π
π Performance Optimisation Tips
Choose the Right Container: Use toUnordered() if order doesn't matter, toOrdered() or sort() if sorting is needed.
Leverage Parallelism: Use parallel() for large datasets.
Utilise Lazy Evaluation: takeWhile and limit can terminate early.
Semantic-Cpp β Building efficient, clear data processing pipelines with modern C++. ππ―β¨
About
Semantic-Cpp is a completely redesigned modern C++ stream processing library, built upon a "multi-header, zero external dependency" modular architecture. Each header file has a clear, single responsibility and can be tested independently, together forming a complete stream processing ecosystem.