Skip to content

[WWW'26] StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning

License

Notifications You must be signed in to change notification settings

CGCL-codes/StreamFP

Repository files navigation

Here is the cleaned-up version of your markdown without the abnormal characters:


StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning

Python 3.8+ PyTorch 1.12+ License: MIT Conference

📢 News

  • [April 2026] StreamFP has been accepted to The Web Conference 2026 (WWW '26)!

📖 Overview

StreamFP is a novel stream learning framework designed to handle non-stationary data streams with high efficiency and robustness against catastrophic forgetting. It introduces learnable fingerprints—compact parameter vectors that summarize the model state—to guide data selection processes.

Key challenges in Stream Learning (SL) addressed by StreamFP:

  1. Data Redundancy: Incoming streams often contain redundant data that wastes computation.
  2. Catastrophic Forgetting: Incremental updates can overwrite earlier knowledge.
  3. Efficiency: Traditional model-based selection is often too computationally expensive for real-time streams.

StreamFP achieves superior accuracy and efficiency compared to state-of-the-art methods (e.g., Camel, ER, GradMatch) across varying data arrival rates.

🚀 Methodology

StreamFP consists of three key components driven by a shared set of learnable fingerprints [cite: 141-144]:

StreamFP Framework
  1. Fingerprint-based Coreset Selection (FCS): Selects informative samples from incoming batches based on fingerprint similarity, prioritizing data that balances novelty and familiarity.
  2. Fingerprint-based Buffer Update (FBU): Dynamically maintains the replay buffer by preserving representative historical samples and discarding redundant ones.
  3. Fingerprint Attunement (FA): A lightweight plugin that uses pre-trained ViT attention to calibrate fingerprints online with negligible overhead.

🛠️ Installation

Prerequisites

  • Linux or macOS
  • Python 3.8+
  • PyTorch 1.12+ and CUDA 11.3+

Setup

# Clone the repository
git clone https://github.com/CGCL-codes/StreamFP.git
cd StreamFP

# Create and activate conda environment
conda env create -f environment.yml
conda activate sl

# (Optional) Install FastMoE (main path: build without NCCL)
# NOTE: FastMoE builds a CUDA extension. If you see errors like "nccl.h: No such file or directory",
# you can build without NCCL by setting USE_NCCL=0 (recommended unless you need NCCL-based distributed comm).
conda install -y cmake ninja

git clone --recursive https://github.com/laekov/fastmoe.git
cd fastmoe

# Option 1: disabling distributed features
USE_NCCL=0 python setup.py install

# Option 2: enabling distributed features
python setup.py install

# Quick check
python -c "import fmoe, fmoe_cuda; print('FastMoE installed:', fmoe_cuda.__file__)"
cd ..

📂 Datasets

Create a data/ directory in the project root.

sh core50.sh

⚡ Quick Start

Basic Usage

To run a standard experiment (e.g., on Clear10), use the scripts provided in experiments/:

# Run Clear10 experiment
sh experiments/clear10.sh

# Run Stream-51 experiment
sh experiments/stream51.sh

Custom Configuration

You can customize the training by modifying the arguments in run.py. Key arguments include:

  • --selection_method: Strategy for coreset selection (e.g., StreamFP, Camel, Random).
  • --update_method: Strategy for buffer update (e.g., StreamFP, ER, GSS).
  • --skip_batch: Enable batch skipping for high-speed streams (default: 1).
  • --traintime_limit: Simulate real-time constraints.

Example command:

python -u run.py --config configs/clear10.yaml \
  --repeat 1 --overwrite 1 \
  --selection_method StreamFP --update_method StreamFP \
  --mem_size 102 --traintime_limit 10

📊 Results

StreamFP consistently outperforms baselines in both Accuracy and Forgetting metrics. Below is a comparison on Stream-51 and Clear10 datasets:

Dataset Method Accuracy (%) Forgetting (%) Runtime (s)
Stream-51 ER 59.99 3.70 1883.75
StreamFP 64.44 2.25 2049.52
Clear10 ER 51.90 1.09 412.50
StreamFP 54.94 0.82 448.80

Detailed results can be found in the results_log/ directory after training.

📜 Citation

If you find this work useful for your research, please cite our WWW '26 paper:

@inproceedings{li2026streamfp,
  title={StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning},
  author={Li, Changwu and Shi, Tongjun and Zhang, Shuhao and Chen, Binbin and He, Bingsheng and Liao, Xiaofei and Jin, Hai},
  booktitle={Proceedings of the ACM Web Conference 2026 (WWW '26)},
  year={2026},
  publisher={ACM},
  address={Dubai, United Arab Emirates},
  doi={10.1145/XXXXXXXXXXXX}
}

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

This research is supported by Huazhong University of Science and Technology and Singapore University of Technology and Design. We thank the authors of Clear Benchmark, CORe50, and Stream-51 for their datasets.


About

[WWW'26] StreamFP: Fingerprint-guided Data Selection for Efficient Stream Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •