|
2 | 2 |
|
3 | 3 | \[[Project Page](https://x-decoder-vl.github.io/)\] \[[Paper](https://arxiv.org/pdf/2212.11270.pdf)\] \[[HuggingFace All-in-One Demo](https://huggingface.co/spaces/xdecoder/Demo)\] \[[HuggingFace Instruct Demo](https://huggingface.co/spaces/xdecoder/Instruct-X-Decoder)\] \[[Video](https://youtu.be/nZZTkYM0kd0)\] |
4 | 4 |
|
5 | | -by [Xueyan Zou*](https://maureenzou.github.io/), [Zi-Yi Dou*](https://zdou0830.github.io/), [Jianwei Yang*](https://jwyang.github.io/), [Zhe Gan](https://zhegan27.github.io/), [Linjie Li](https://scholar.google.com/citations?user=WR875gYAAAAJ&hl=en), [Chunyuan Li](https://chunyuan.li/), [Xiyang Dai](https://sites.google.com/site/xiyangdai/), [Harkirat Behl](https://harkiratbehl.github.io/), [Jianfeng Wang](https://scholar.google.com/citations?user=vJWEw_8AAAAJ&hl=en), [Lu Yuan](https://scholar.google.com/citations?user=k9TsUVsAAAAJ&hl=en), [Nanyun Peng](https://vnpeng.net/), [Lijuan Wang](https://scholar.google.com/citations?user=cDcWXuIAAAAJ&hl=zh-CN), [Yong Jae Lee^](https://pages.cs.wisc.edu/~yongjaelee/), [Jianfeng Gao^](https://www.microsoft.com/en-us/research/people/jfgao/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fum%2Fpeople%2Fjfgao%2F) in CVPR 2023. |
| 5 | +by [Xueyan Zou*](https://maureenzou.github.io/), [Zi-Yi Dou*](https://zdou0830.github.io/), [Jianwei Yang*](https://jwyang.github.io/), [Zhe Gan](https://zhegan27.github.io/), [Linjie Li](https://scholar.google.com/citations?user=WR875gYAAAAJ&hl=en), [Chunyuan Li](https://chunyuan.li/), [Xiyang Dai](https://sites.google.com/site/xiyangdai/), [Harkirat Behl](https://harkiratbehl.github.io/), [Jianfeng Wang](https://scholar.google.com/citations?user=vJWEw_8AAAAJ&hl=en), [Lu Yuan](https://scholar.google.com/citations?user=k9TsUVsAAAAJ&hl=en), [Nanyun Peng](https://vnpeng.net/), [Lijuan Wang](https://scholar.google.com/citations?user=cDcWXuIAAAAJ&hl=zh-CN), [Yong Jae Lee^](https://pages.cs.wisc.edu/~yongjaelee/), [Jianfeng Gao^](https://www.microsoft.com/en-us/research/people/jfgao/?from=http%3A%2F%2Fresearch.microsoft.com%2Fen-us%2Fum%2Fpeople%2Fjfgao%2F) in **CVPR 2023**. |
6 | 6 |
|
7 | 7 |
|
8 | 8 | ## :hot_pepper: Getting Started |
9 | | - |
10 | | -<!-- :point_right: *[New]* **One-Line Getting Started:** |
| 9 | +We release the following contents for **both SEEM and X-Decoder**:exclamation: |
| 10 | +- [x] Demo Code |
| 11 | +- [x] Model Checkpoint |
| 12 | +- [x] Comprehensive User Guide |
| 13 | +- [x] Training Code |
| 14 | +- [x] Evaluation Code |
| 15 | + |
| 16 | +:point_right: **One-Line SEEM Demo with Linux:** |
11 | 17 | ```sh |
12 | | -sh asset/train.sh # training |
13 | | -sh aaset/eval.sh # evaluation |
14 | | -``` --> |
15 | | - |
16 | | -:point_right: *[New]* **Latest Checkpoints and Numbers:** |
17 | | -| | | COCO | | | ADE | | Ref-COCO | COCO-Karpathy | | |
18 | | -|----------|------------|------|------|------|-----|------|----------|---------------|------| |
19 | | -| Backbone | Checkpoint | PQ | mAP | mIoU | mAP | mIoU | mIoU | ir@1 | tr@1 | |
20 | | -| Focal-T | [last](https://huggingface.co/xdecoder/X-Decoder/resolve/main/xdecoder_focalt_last.pt) | 50.8 | 39.5 | 62.4 | 9.6 | 23.9 | 63.2 | 30.0 | 48.3 | |
21 | | -| Focal-T | [best_open_seg](https://huggingface.co/xdecoder/X-Decoder/resolve/main/xdecoder_focalt_best_openseg.pt) | 48.8 | 37.0 | 60.2 | 10.1 | 29.1 | 61.6 | 30.2 | 48.36 | |
22 | | -| Focal-L | [last](https://huggingface.co/xdecoder/X-Decoder/blob/main/xdecoder_focall_last.pt) | 56.2 | 46.4 | 65.5 | 11.5 | 23.6 | 67.7 | 34.9 | 54.4 | |
23 | | -| Focal-L | [best_open_seg](https://huggingface.co/xdecoder/X-Decoder/blob/main/xdecoder_focall_bestseg.pt) | 51.5 | 41.3 | 64.1 | 11.7 | 29.4 | 61.5 | 30.7 | 50.1 | |
24 | | - |
25 | | -Note the number in Table 1 in main paper is after task specific finetuning. |
26 | | - |
27 | | -:point_right: *[New]* **Installation, Training, Evaluation, Dataset, and Demo Guide** |
28 | | -* [DATASET.md](asset/DATASET.md) |
29 | | -* [INSTALL.md](asset/INSTALL.md) |
30 | | -* [TRAIN.md](asset/TRAIN.md) |
31 | | -* [EVALUATION.md](asset/EVALUATION.md) |
32 | | -* [DEMO.md](asset/DEMO.md) |
| 18 | +git clone git@github.com:UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git && sh aasets/scripts/run_demo.sh |
| 19 | +``` |
33 | 20 |
|
34 | | -## :fire: News |
| 21 | +:round_pushpin: *[New]* **Getting Started:** |
| 22 | + |
| 23 | +* [INSTALL.md](assets/readmes/INSTALL.md) <br> |
| 24 | +* [DATASET.md](assets/readmes/DATASET.md) <br> |
| 25 | +* [TRAIN.md](assets/readmes/TRAIN.md) <br> |
| 26 | +* [EVAL.md](assets/readmes/EVAL.md) <br> |
| 27 | +* [INFERENCE.md](assets/readmes/INFERENCE.md) <br> |
35 | 28 |
|
| 29 | +:round_pushpin: *[New]* **Latest Checkpoints and Numbers:** |
| 30 | +| | | | COCO | | | Ref-COCOg | | | VOC | | SBD | | |
| 31 | +|-----------------|---------------------------------------------------------------------------------------------|----------|------|------|------|-----------|------|------|-------|-------|-------|-------| |
| 32 | +| Method | Checkpoint | backbone | PQ ↑ | mAP ↑ | mIoU ↑ | cIoU ↑ | mIoU ↑ | AP50 ↑ | NoC85 ↓ | NoC90 ↓| NoC85 ↓| NoC90 ↓| |
| 33 | +| X-Decoder | [ckpt](https://huggingface.co/xdecoder/X-Decoder/resolve/main/xdecoder_focalt_last.pt) | Focal-T | 50.8 | 39.5 | 62.4 | 57.6 | 63.2 | 71.6 | - | - | - | - | |
| 34 | +| X-Decoder-oq201 | [ckpt](https://huggingface.co/xdecoder/X-Decoder/resolve/main/xdecoder_focall_last.pt) | Focal-L | 56.5 | 46.7 | 67.2 | 62.8 | 67.5 | 76.3 | - | - | - | - | |
| 35 | +| SEEM_v0 | [ckpt](https://huggingface.co/xdecoder/SEEM/resolve/main/seem_focalt_v0.pt) | Focal-T | 50.6 | 39.4 | 60.9 | 58.5 | 63.5 | 71.6 | 3.54 | 4.59 | * | * | |
| 36 | +| SEEM_v0 | - | Davit-d3 | 56.2 | 46.8 | 65.3 | 63.2 | 68.3 | 76.6 | 2.99 | 3.89 | 5.93 | 9.23 | |
| 37 | +| SEEM_v0 | [ckpt](https://huggingface.co/xdecoder/SEEM/resolve/main/seem_focall_v0.pt) | Focal-L | 56.2 | 46.4 | 65.5 | 62.8 | 67.7 | 76.2 | 3.04 | 3.85 | * | * | |
| 38 | +| SEEM_v1 | [ckpt](https://huggingface.co/xdecoder/SEEM/resolve/main/seem_focalt_v1.pt) | Focal-T | 50.8 | 39.4 | 60.7 | 58.5 | 63.7 | 72.0 | 3.19 | 4.13 | * | * | |
| 39 | +| SEEM_v1 | [ckpt](https://huggingface.co/xdecoder/SEEM/resolve/main/seem_samvitb_v1.pt) | SAM-ViT-B | 52.0 | 43.5 | 60.2 | 54.1 | 62.2 | 69.3 | 2.53 | 3.23 | * | * | |
| 40 | +| SEEM_v1 | [ckpt](https://huggingface.co/xdecoder/SEEM/resolve/main/seem_samvitl_v1.pt) | SAM-ViT-L | 49.0 | 41.6 | 58.2 | 53.8 | 62.2 | 69.5 | 2.40 | 2.96 | * | * | |
| 41 | + |
| 42 | +**SEEM_v0:** Supporting Single Interactive object training and inference <br> |
| 43 | +**SEEM_v1:** Supporting Multiple Interactive objects training and inference |
| 44 | + |
| 45 | + |
| 46 | +## :fire: News |
| 47 | +* **[2023.10.04]** We are excited to release :white_check_mark: [training/evaluation/demo code](https://github.com/microsoft/X-Decoder/edit/v2.0/README.md#hot_pepper-getting-started), :white_check_mark: [new checkpoints](https://github.com/microsoft/X-Decoder/edit/v2.0/README.md#hot_pepper-getting-started), and :white_check_mark: [comprehensive readmes](https://github.com/microsoft/X-Decoder/edit/v2.0/README.md#hot_pepper-getting-started) for ***both X-Decoder and SEEM***! |
36 | 48 | * **[2023.09.24]** We are providing new demo command/code for inference ([DEMO.md](asset/DEMO.md))! |
37 | 49 | * **[2023.07.19]** :roller_coaster: We are excited to release the x-decoder training code ([INSTALL.md](asset/INSTALL.md), [DATASET.md](asset/DATASET.md), [TRAIN.md](asset/TRAIN.md), [EVALUATION.md](asset/EVALUATION.md))! |
38 | 50 | * **[2023.07.10]** We release [Semantic-SAM](https://github.com/UX-Decoder/Semantic-SAM), a universal image segmentation model to enable segment and recognize anything at any desired granularity. Code and checkpoint are available! |
|
0 commit comments