Skip to content

yifu-ding/FastDeploy

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3,780 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastDeploy with SQAttn Integration

This repository is a fork of PaddlePaddle/FastDeploy, extended with SQAttn support for efficient long-context Transformer inference.

FastDeploy is a high-performance deployment framework for large language models and vision-language models based on PaddlePaddle. This fork integrates SQAttn into the FastDeploy inference stack, enabling sparse-quantized attention computation for ultra-long-context scenarios.

Main Changes in This Fork

Compared with the upstream FastDeploy repository, this fork mainly includes the following modifications:

  • Added SQAttn support for long-context Transformer inference.
  • Integrated SQAttn-related attention components into the FastDeploy execution workflow.
  • Adapted the inference path to support sparse-quantized attention computation.
  • Cleaned unused or intermediate files related to the integration.
  • Applied minor fixes and adjustments for compatibility and usability.

About

FastDeploy fork with SQAttn integration for efficient sparse-quantized long-context Transformer inference.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 49.8%
  • C++ 26.5%
  • Cuda 23.0%
  • Other 0.7%