π¨βπ» Authors: Hui Lin*, Chao Zhang*, Danfeng Hong, Kexin Dong, and Congcong Wenβ
* Equal Contribution Β Β Β β Corresponding Author.
π Journal: IEEE Geoscience and Remote Sensing Magazine (GRSM), 2025
Remote sensing image classification is essential for various applications, including agricultural monitoring, urban planning, and land use classification. However, remote sensing data is often distributed across multiple institutions, and due to privacy concerns and data-sharing restrictions, leveraging large-scale datasets in a centralized training framework is challenging. Federated learning offers a promising solution by enabling collaborative model training across distributed data sources without requiring data centralization. Nevertheless, current Vision-Language Models (VLMs), which typically contain billions of parameters, pose significant communication challenges for traditional federated learning approaches based on model parameter updates, as they would incur substantial communication costs. In this paper, we propose FedRSCLIP, the first federated learning framework designed for remote sensing image classification based on a VLM, specifically CLIP. FedRSCLIP addresses the challenges of data heterogeneity and large-scale model transmission in federated environments by introducing Prompt Learning, which optimizes only a small set of tunable parameters. The framework introduces a dual-prompt mechanism, comprising Shared Prompts for global knowledge sharing and Private Prompts for client-specific adaptation. To maintain semantic coherence between shared and private prompts, we propose the Dual Prompt Alignment Constraint, which balances global consistency and local adaptability across diverse client distributions. Additionally, to enhance cross-modal representation learning, we introduce the Cross-Modal Feature Alignment Constraint to align multimodal features between text and image prompts. To validate the effectiveness of our proposed model, we construct the Fed-RSIC dataset based on three existing remote sensing image classification datasets, specifically designed to simulate various federated learning configurations. Importantly, the Fed-RSIC dataset includes two partitioning schemes: a uniform version, where data is evenly distributed across clients, and a heterogeneous version, where each client holds only a subset of class labels. This dual design enables comprehensive evaluation under both idealized and realistic federated learning scenarios. Experimental results on the Fed-RSIC dataset demonstrate the effectiveness and superiority of FedRSCLIP in addressing the key challenges of federated remote sensing image classification.
- β We propose FedRSCLIP, the first framework to integrate Vision-Language Models into federated learning for remote sensing image classification. FedRSCLIP enhances representation and classification performance while optimizing communication efficiency, addressing challenges such as high communication costs and data heterogeneity.
- β We introduce Prompt Learning for VLMs in federated learning, which optimizes a small set of tunable parameters instead of transmitting the entire model. Furthermore, we propose a dual-prompt mechanism comprising Shared Prompts for global knowledge sharing and Private Prompts for client-specific customization, enabling the model to balance global consistency and local flexibility.
- β We develop two innovative constraints to enhance prompt alignment and representation learning in federated learning with VLMs. The first is the Dual Prompt Alignment Constraint, which ensures semantic consistency between shared and private prompts by aligning their representations during training. The second is the Cross-Modal Feature Alignment Constraint, which aligns multimodal features between text and image representations, enhancing the model's ability to capture relevant features and improving its classification performance.
- β We construct the Fed-RSIC dataset by integrating three widely used remote sensing classification benchmarks. It includes both uniform and heterogeneous partitioning strategies to simulate diverse federated learning scenarios and enables comprehensive evaluation. Experiments show that FedRSCLIP achieves state-of-the-art performance in remote sensing image classification across various federated learning configurations.
Please kindly cite the papers if this code is useful and helpful for your research.
@article{fedrsclip,
title={FedRSCLIP: Federated learning for remote sensing scene classification using vision-language models},
author={Lin, Hui and Zhang, Chao and Hong, Danfeng and Dong, Kexin and Wen, Congcong},
journal={IEEE Geoscience and Remote Sensing Magazine},
year={2025},
volume={},
number={},
pages={2-18},
publisher={IEEE}
}