PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders NeurIPS 2024 spotlight
Xiangdong Zhang*, Shaofeng Zhang* and Junchi Yan
- π₯ Aug, 2024: PCP-MAE is available in arxiv.
- π Sept, 2024: PCP-MAE is accepted by NeurIPS 2024 as spotlight.
- π Oct, 2024: The corresponding checkpoints are released in Google Drive and the code will be coming soon.
- π Oct, 2024: The code has been released.
- π‘ Nov, 2024: The introduction to PCP-MAE is added.
- π June, 2025: Our work Point-PQAE is accepted by ICCV 2025, which introduces a new paradigm for point cloud self-supervised learning.
- Complete the introduction for the PCP-MAE project.
- Publish the training and inference code.
- Release the checkpoints for pre-training and finetuning.
In this paper, we show a motivating empirical result that when directly feeding the centers of masked patches to the decoder without information from the encoder, it still reconstructs well. In other words, the centers of patches are important and the reconstruction objective does not necessarily rely on representations of the encoder, thus preventing the encoder from learning semantic representations.
In short, the 2D MAE and Point-MAE differ in several aspects, as shown in the figure below. Therefore, it is inappropriate to directly transfer 2D MAE operations to the 3D domain.
Based on this key observation, we propose a simple yet effective method, i.e., learning to Predict Centers for Point Masked AutoEncoders (PCP-MAE) which guides the model to learn to predict the significant centers and use the predicted centers to replace the directly provided centers.
Our method is of high pre-training efficiency compared to other alternatives and achieves great improvement over Point-MAE, particularly surpassing it by 5.50% on OBJ-BG, 6.03% on OBJ-ONLY, and 5.17% on PB-T50-RS for 3D object classification on the ScanObjectNN dataset.
To ensure a fair time comparison, the code for Point-MAE should be modified slightly in two ways:
- Add "config.dataset.train.others.whole = True" to the training to align Point-FEMAE and our method.
- Instead of using KNN_CUDA, change it into the knn_point function (refer to the official code of ReCon, Point-FEMAE or our PCP-MAE) which directly uses torch operation to align with Point-FEMAE and our approach. This will significantly increase the training speed.
| Task | Dataset | Config | Acc. | Checkpoints Download |
|---|---|---|---|---|
| Pre-training | ShapeNet | base.yaml | N.A. | Pre-train |
| Classification | ScanObjectNN | finetune_scan_objbg.yaml | 95.52% | OBJ_BG |
| Classification | ScanObjectNN | finetune_scan_objonly.yaml | 94.32% | OBJ_ONLY |
| Classification | ScanObjectNN | finetune_scan_hardest.yaml | 90.35% | PB_T50_RS |
| Classification | ModelNet40(1k) w/o voting | finetune_modelnet.yaml | 94.1% | ModelNet40_1K |
| Classification | ModelNet40(1k) w/ voting | finetune_modelnet.yaml | 94.4% | ModelNet40_1K_voting |
| Part Segmentation | ShapeNetPart | segmentation | 84.9% Cls.mIoU | TBD |
| Scene Segmentation | S3DIS | semantic_segmentataion | 61.3% mIoU | TBD |
| Task | Dataset | Config | 5w10s (%) | 5w20s (%) | 10w10s (%) | 10w20s (%) | Download |
|---|---|---|---|---|---|---|---|
| Few-shot learning | ModelNet40 | fewshot.yaml | 97.4 Β± 2.3 | 99.1 Β± 0.8 | 93.5Β±3.7 | 95.9Β±2.7 | FewShot |
The checkpoints and logs have been released on Google Drive. To fully reproduce our reported results, we recommend fine-tuning the pre-trained ckpt-300 with different random seeds (typically 8 different seeds) and recording the best performance which is also adopted by other peer methods (e.g. Point-MAE and ReCon). Occasionally, ckpt-275 may outperform ckpt-300, so we encourage you to try to fine-tune with both ckpt-300 and ckpt-275.
PyTorch >= 1.7.0 < 1.11.0; python >= 3.7; CUDA >= 9.0; GCC >= 4.9; torchvision;
# Quick Start
conda create -n pcpmae python=3.10 -y
conda activate pcpmae
# Install pytorch
conda install pytorch==2.0.1 torchvision==0.15.2 cudatoolkit=11.8 -c pytorch -c nvidia
# pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html
# Install required packages
pip install -r requirements.txt
# Install the extensions
# Chamfer Distance & emd
cd ./extensions/chamfer_dist
python setup.py install --user
cd ./extensions/emd
python setup.py install --user
# PointNet++
pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
We use ShapeNet, ScanObjectNN, ModelNet40, ShapeNetPart and S3DIS in this work. See DATASET.md for details.
To pretrain PCP-MAE on ShapeNet training set, run the following command. If you want to try different models or masking ratios etc., first create a new config file, and pass its path to --config.
CUDA_VISIBLE_DEVICES=<GPU> python main.py --config cfgs/pretrain/base.yaml --exp_name <output_file_name>
Fine-tuning on ScanObjectNN, run:
# Select one config from finetune_scan_objbg/objonly/hardest.yaml
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/finetune_scan_hardest.yaml \
--finetune_model --exp_name <output_file_name> --ckpts <path/to/pre-trained/model> --seed $RANDOM
# Test with fine-tuned ckpt
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --test --config cfgs/finetune_scan_hardest.yaml \
--exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model>
Fine-tuning on ModelNet40, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/finetune_modelnet.yaml \
--finetune_model --exp_name <output_file_name> --ckpts <path/to/pre-trained/model> --seed $RANDOM
# Test with fine-tuned ckpt
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --test --config cfgs/finetune_modelnet.yaml \
--exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model>
Voting on ModelNet40, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --test --config cfgs/finetune_modelnet.yaml \
--exp_name <output_file_name> --ckpts <path/to/best/fine-tuned/model> --seed $RANDOM --vote
Few-shot learning, run:
CUDA_VISIBLE_DEVICES=<GPUs> python main.py --config cfgs/fewshot.yaml --finetune_model \
--ckpts <path/to/pre-trained/model> --exp_name <output_file_name> --way <5 or 10> --shot <10 or 20> --fold <0-9> --seed $RANDOM
Part segmentation on ShapeNetPart, run:
cd segmentation
python main.py --gpu <gpu_id> --ckpts <path/to/pre-trained/model> \
--log_dir <log_dir> --learning_rate 0.0002 --epoch 300 \
--root <path/to/data> \
--seed $RANDOM
Semantic segmentation on S3DIS, run:
cd semantic_segmentation
python main.py --ckpts <path/to/pre-trained/model> \
--root path/to/data --learning_rate 0.0002 --epoch 60 --gpu <gpu_id> --log_dir <log_dir>
Simple visualization, run:
python main_vis.py --config cfgs/pretrain/base.yaml --exp_name final_vis \
--ckpts <path/to/pre-trained/model> --test
In addition to the simple method mentioned above for visualizing point clouds, we use the PointFlowRenderer repository to render high-quality point cloud images.
If you have any questions related to the code or the paper, feel free to email Xiangdong (zhangxiangdong@sjtu.edu.cn) or Shaofeng (sherrylone@sjtu.edu.cn).
PCP-MAE is released under MIT License. See the LICENSE file for more details. Besides, the licensing information for pointnet2 modules is available here.
This codebase is built upon Point-MAE, ReCon, Pointnet2_PyTorch.
If you find our work useful in your research, please consider citing:
@article{zhang2024pcp,
title={PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders},
author={Zhang, Xiangdong and Zhang, Shaofeng and Yan, Junchi},
journal={arXiv preprint arXiv:2408.08753},
year={2024}
}

