|
| 1 | +--- |
| 2 | +title: 'MedMultiPoints' |
| 3 | +desc: 'A Multimodal Dataset for Object Detection, Localization, and Counting in Medical Imaging' |
| 4 | +thumbnail: /thumbnails/MedMultiPoints.png |
| 5 | +publication: https://arxiv.org/abs/2505.16647 |
| 6 | +github: https://github.com/Simula/PointDetectCount |
| 7 | +tags: |
| 8 | + - medical |
| 9 | + - multimodal |
| 10 | + - detection |
| 11 | + - localization |
| 12 | + - counting |
| 13 | + - microscopy |
| 14 | + - endoscopy |
| 15 | +--- |
| 16 | + |
| 17 | +The **MedMultiPoints** dataset is a curated **multimodal medical imaging benchmark** designed for **multi-task learning**—spanning **object detection**, **localization**, and **counting** tasks. |
| 18 | +It integrates data from both **endoscopic** (HyperKvasir) and **microscopic** (VISEM-Tracking) modalities to reflect real-world clinical diversity and imaging conditions. |
| 19 | + |
| 20 | +It is introduced in the paper: |
| 21 | +**"Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models"** |
| 22 | +📍 *Presented at IEEE CBMS 2025, Madrid, Spain* |
| 23 | +→ [Project Page & Code](https://github.com/Simula/PointDetectCount) |
| 24 | +→ [📄 Paper (arXiv)](https://arxiv.org/abs/2505.16647) |
| 25 | + |
| 26 | +--- |
| 27 | + |
| 28 | +## 🧩 Dataset Summary |
| 29 | + |
| 30 | +| Component | Details | |
| 31 | +|------------|----------| |
| 32 | +| **Images** | 10 600 endoscopic and microscopic medical images | |
| 33 | +| **Tasks** | Object Detection • Point Localization • Object Counting | |
| 34 | +| **Annotations** | Bounding boxes, point coordinates, count labels, and class labels | |
| 35 | +| **Modalities** | Endoscopy (GI) and Microscopy | |
| 36 | +| **Format** | JSONL instruction-style annotations for VLM and multi-task pipelines | |
| 37 | +| **Intended Use** | Multi-task, instruction-based, and multimodal medical AI research | |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## 📚 Features |
| 42 | + |
| 43 | +- 🩻 **Multi-type annotations** per image: |
| 44 | + - `bbox_2d`: Bounding boxes for detection |
| 45 | + - `point_2d`: Points for localization |
| 46 | + - `count`: Object counts |
| 47 | +- 🔗 Designed for **Vision-Language Models (VLMs)** and **instruction-tuned frameworks** |
| 48 | +- 🧠 Enables **cross-task supervision**—learning from counting, detection, and localization jointly |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## 🧾 Data Schema |
| 53 | + |
| 54 | +| Field | Type | Description | |
| 55 | +|-------|------|-------------| |
| 56 | +| `image` | Image | Raw medical image | |
| 57 | +| `image_sha256` | string | SHA-256 hash for integrity | |
| 58 | +| `img_size` | [int, int] | Original image width and height | |
| 59 | +| `points` | list | List of `[x, y]` point annotations | |
| 60 | +| `bbox` | list | List of `[x1, y1, x2, y2]` bounding boxes | |
| 61 | +| `count` | int | Number of objects in the image | |
| 62 | +| `label` | string | Object/class label (e.g., `polyps`, `sperm`) | |
| 63 | +| `collection_method` | string | Task type (`counting`, `detection`, etc.) | |
| 64 | +| `classification` | string | Annotation description (`pathological-findings`, etc.) | |
| 65 | +| `organ` | string | Target organ (`Lower GI`, `Microscopy`, etc.) | |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +## 🎯 Supported Tasks |
| 70 | + |
| 71 | +- 🔲 **Object Detection** — bounding-box prediction |
| 72 | +- 📍 **Localization** — point coordinate prediction |
| 73 | +- 🔢 **Counting** — regression on object counts |
| 74 | +- 🧠 **Multimodal Instruction-Based Learning** — unified multi-task training |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +## 💾 Download |
| 79 | + |
| 80 | +### Hugging Face Dataset |
| 81 | + |
| 82 | +**Dataset:** |
| 83 | +[https://huggingface.co/datasets/SimulaMet/MedMultiPoints](https://huggingface.co/datasets/SimulaMet/MedMultiPoints) |
| 84 | + |
| 85 | +```python |
| 86 | +from datasets import load_dataset |
| 87 | + |
| 88 | +ds = load_dataset("SimulaMet/MedMultiPoints")["train"] |
| 89 | +sample = ds[0] |
| 90 | + |
| 91 | +image = sample["image"] |
| 92 | +bbox = sample["bbox"] |
| 93 | +points = sample["points"] |
| 94 | +count = sample["count"] |
| 95 | +``` |
| 96 | + |
| 97 | +**Instruction-Fused JSONL Files** |
| 98 | +- [`multi-task-train.jsonl`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-train.jsonl) |
| 99 | +- [`multi-task-test.jsonl`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-test.jsonl) |
| 100 | + |
| 101 | +--- |
| 102 | + |
| 103 | +## 🧠 Example Entry |
| 104 | + |
| 105 | +```json |
| 106 | +{ |
| 107 | + "image_sha256": "71179abc4b011cc99bddb3344e3e114765b32bdf77e78892f046026d785a4bdb", |
| 108 | + "img_size": [622, 529], |
| 109 | + "points": [[234, 171.5]], |
| 110 | + "bbox": [[38, 5, 430, 338]], |
| 111 | + "count": 1, |
| 112 | + "label": "polyps", |
| 113 | + "collection_method": "counting", |
| 114 | + "classification": "pathological-findings", |
| 115 | + "organ": "Lower GI" |
| 116 | +} |
| 117 | +``` |
| 118 | + |
| 119 | +--- |
| 120 | + |
| 121 | +## 📜 Terms of Use |
| 122 | + |
| 123 | +Released under **CC BY-NC 4.0** — for research and educational use. |
| 124 | + |
| 125 | +--- |
| 126 | + |
| 127 | +## 🧾 Citation |
| 128 | + |
| 129 | +If you use this dataset, please cite: |
| 130 | + |
| 131 | +```bibtex |
| 132 | +@incollection{Gautam, |
| 133 | + author = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{a}l}, |
| 134 | + title = {Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models}, |
| 135 | + booktitle = {2025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS)}, |
| 136 | + publisher = {IEEE}, |
| 137 | + pages = {18--20}, |
| 138 | + doi = {10.1109/CBMS65348.2025.00090} |
| 139 | +} |
| 140 | +``` |
| 141 | + |
| 142 | +--- |
| 143 | + |
| 144 | +## 📫 Contact |
| 145 | + |
| 146 | +For questions, please reach out to: |
| 147 | +📧 [sushant@simula.no](mailto:sushant@simula.no) |
0 commit comments