Skip to content

Commit 800de62

Browse files
Update datasets (#19)
* Clarify contact information for dataset questions Updated contact information for dataset inquiries. * Update datasets
1 parent d2f4ef3 commit 800de62

6 files changed

Lines changed: 303 additions & 3 deletions

File tree

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
---
22
title: 'Kvasir-VQA-x1'
33
desc: 'A Large-Scale Multi-Task Benchmark for GI Tract Visual Question Answering'
4-
thumbnail: /thumbnails/kvasir.jpg
5-
publication:
4+
thumbnail: /thumbnails/kvasir-vqa-v1.png
5+
publication: https://doi.org/10.1007/978-3-032-08009-7_6
6+
github: https://github.com/simula/Kvasir-VQA-x1
67
tags:
78
- gastrointestinal
89
- endoscopy
@@ -105,4 +106,5 @@ location = {Daejeon, Korea (Republic of)}
105106

106107
## Contact
107108

108-
sushant@simula.no, michael@simula.no, vajira@simula.no, steven@simula.no or paalh@simula.no
109+
Please contact sushant@simula.no, michael@simula.no, vajira@simula.no, steven@simula.no or paalh@simula.no for any questions regarding the dataset.
110+

datasets/medmultipoints.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
---
2+
title: 'MedMultiPoints'
3+
desc: 'A Multimodal Dataset for Object Detection, Localization, and Counting in Medical Imaging'
4+
thumbnail: /thumbnails/MedMultiPoints.png
5+
publication: https://arxiv.org/abs/2505.16647
6+
github: https://github.com/Simula/PointDetectCount
7+
tags:
8+
- medical
9+
- multimodal
10+
- detection
11+
- localization
12+
- counting
13+
- microscopy
14+
- endoscopy
15+
---
16+
17+
The **MedMultiPoints** dataset is a curated **multimodal medical imaging benchmark** designed for **multi-task learning**—spanning **object detection**, **localization**, and **counting** tasks.
18+
It integrates data from both **endoscopic** (HyperKvasir) and **microscopic** (VISEM-Tracking) modalities to reflect real-world clinical diversity and imaging conditions.
19+
20+
It is introduced in the paper:
21+
**"Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models"**
22+
📍 *Presented at IEEE CBMS 2025, Madrid, Spain*
23+
[Project Page & Code](https://github.com/Simula/PointDetectCount)
24+
[📄 Paper (arXiv)](https://arxiv.org/abs/2505.16647)
25+
26+
---
27+
28+
## 🧩 Dataset Summary
29+
30+
| Component | Details |
31+
|------------|----------|
32+
| **Images** | 10 600 endoscopic and microscopic medical images |
33+
| **Tasks** | Object Detection • Point Localization • Object Counting |
34+
| **Annotations** | Bounding boxes, point coordinates, count labels, and class labels |
35+
| **Modalities** | Endoscopy (GI) and Microscopy |
36+
| **Format** | JSONL instruction-style annotations for VLM and multi-task pipelines |
37+
| **Intended Use** | Multi-task, instruction-based, and multimodal medical AI research |
38+
39+
---
40+
41+
## 📚 Features
42+
43+
- 🩻 **Multi-type annotations** per image:
44+
- `bbox_2d`: Bounding boxes for detection
45+
- `point_2d`: Points for localization
46+
- `count`: Object counts
47+
- 🔗 Designed for **Vision-Language Models (VLMs)** and **instruction-tuned frameworks**
48+
- 🧠 Enables **cross-task supervision**—learning from counting, detection, and localization jointly
49+
50+
---
51+
52+
## 🧾 Data Schema
53+
54+
| Field | Type | Description |
55+
|-------|------|-------------|
56+
| `image` | Image | Raw medical image |
57+
| `image_sha256` | string | SHA-256 hash for integrity |
58+
| `img_size` | [int, int] | Original image width and height |
59+
| `points` | list | List of `[x, y]` point annotations |
60+
| `bbox` | list | List of `[x1, y1, x2, y2]` bounding boxes |
61+
| `count` | int | Number of objects in the image |
62+
| `label` | string | Object/class label (e.g., `polyps`, `sperm`) |
63+
| `collection_method` | string | Task type (`counting`, `detection`, etc.) |
64+
| `classification` | string | Annotation description (`pathological-findings`, etc.) |
65+
| `organ` | string | Target organ (`Lower GI`, `Microscopy`, etc.) |
66+
67+
---
68+
69+
## 🎯 Supported Tasks
70+
71+
- 🔲 **Object Detection** — bounding-box prediction
72+
- 📍 **Localization** — point coordinate prediction
73+
- 🔢 **Counting** — regression on object counts
74+
- 🧠 **Multimodal Instruction-Based Learning** — unified multi-task training
75+
76+
---
77+
78+
## 💾 Download
79+
80+
### Hugging Face Dataset
81+
82+
**Dataset:**
83+
[https://huggingface.co/datasets/SimulaMet/MedMultiPoints](https://huggingface.co/datasets/SimulaMet/MedMultiPoints)
84+
85+
```python
86+
from datasets import load_dataset
87+
88+
ds = load_dataset("SimulaMet/MedMultiPoints")["train"]
89+
sample = ds[0]
90+
91+
image = sample["image"]
92+
bbox = sample["bbox"]
93+
points = sample["points"]
94+
count = sample["count"]
95+
```
96+
97+
**Instruction-Fused JSONL Files**
98+
- [`multi-task-train.jsonl`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-train.jsonl)
99+
- [`multi-task-test.jsonl`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-test.jsonl)
100+
101+
---
102+
103+
## 🧠 Example Entry
104+
105+
```json
106+
{
107+
"image_sha256": "71179abc4b011cc99bddb3344e3e114765b32bdf77e78892f046026d785a4bdb",
108+
"img_size": [622, 529],
109+
"points": [[234, 171.5]],
110+
"bbox": [[38, 5, 430, 338]],
111+
"count": 1,
112+
"label": "polyps",
113+
"collection_method": "counting",
114+
"classification": "pathological-findings",
115+
"organ": "Lower GI"
116+
}
117+
```
118+
119+
---
120+
121+
## 📜 Terms of Use
122+
123+
Released under **CC BY-NC 4.0** — for research and educational use.
124+
125+
---
126+
127+
## 🧾 Citation
128+
129+
If you use this dataset, please cite:
130+
131+
```bibtex
132+
@incollection{Gautam,
133+
author = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{a}l},
134+
title = {Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models},
135+
booktitle = {2025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS)},
136+
publisher = {IEEE},
137+
pages = {18--20},
138+
doi = {10.1109/CBMS65348.2025.00090}
139+
}
140+
```
141+
142+
---
143+
144+
## 📫 Contact
145+
146+
For questions, please reach out to:
147+
📧 [sushant@simula.no](mailto:sushant@simula.no)

datasets/soccerchat.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
title: 'SoccerChat'
3+
desc: 'A Multimodal Video-Text Dataset for Natural Language Soccer Game Understanding'
4+
thumbnail: /thumbnails/SoccerChat.png
5+
publication: https://arxiv.org/abs/2505.16630
6+
github: https://github.com/simula/SoccerChat
7+
tags:
8+
- soccer
9+
- video
10+
- multimodal
11+
- text
12+
- event-detection
13+
- reasoning
14+
- synthetic
15+
---
16+
17+
**SoccerChat** is a multimodal dataset for **video–language understanding** in the context of **soccer match analysis**.
18+
It enables training and evaluation of large vision–language models (VLMs) for **event detection**, **temporal reasoning**, and **natural language generation** over real-world broadcast video clips.
19+
20+
Introduced in the paper:
21+
📄 **"SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding"**
22+
📰 *arXiv preprint, May 2025*
23+
[Paper (arXiv:2505.16630)](https://arxiv.org/abs/2505.16630)
24+
[GitHub Project Page](https://github.com/simula/SoccerChat)
25+
[Trained Model (Qwen2-VL-7B)](https://huggingface.co/SimulaMet/SoccerChat-qwen2-vl-7b)
26+
[Web Demo (Colab)](https://colab.research.google.com/github/Simula/SoccerChat/blob/main/notebooks/WebUI.ipynb)
27+
28+
---
29+
30+
## ⚽ Dataset Summary
31+
32+
| Component | Details |
33+
|------------|----------|
34+
| **Total Examples** | 89,000 (train: 85,220 / validation: 4,625) |
35+
| **Modality** | Video + Text |
36+
| **Tasks** | Event Detection • Video Question Answering • Text Generation |
37+
| **Languages** | English |
38+
| **Video Format** | Short broadcast snippets (~5–15 seconds) |
39+
| **Total Size** | ~48 GB (videos) |
40+
| **Annotation Fields** | Video clip, natural language query, model response, and event tags |
41+
| **License** | Research use (CC BY-NC 4.0) |
42+
43+
Each example includes:
44+
- 🎞️ `video` — soccer match video snippet
45+
- 💬 `query` — natural language question or prompt
46+
- 🧠 `response` — generated or annotated answer
47+
-`events` — list of SoccerNet event tags (e.g., `Goal`, `Card`, `Foul`)
48+
- 📂 `path` — relative file path within `/videos/`
49+
50+
---
51+
52+
## 📁 Dataset Structure
53+
54+
| Split | Examples | Size |
55+
|--------|-----------|------|
56+
| **train** | 85,220 | 36.7 MB (metadata only) |
57+
| **validation** | 4,625 | 1.47 MB (metadata only) |
58+
59+
Videos must be downloaded separately (see below).
60+
61+
---
62+
63+
## 💾 Download Instructions
64+
65+
Clone from Hugging Face using Git LFS:
66+
67+
```bash
68+
git lfs install
69+
git clone https://huggingface.co/datasets/SimulaMet/SoccerChat
70+
```
71+
72+
> 📦 Videos are stored under `SoccerChat/videos/` (~48 GB total)
73+
74+
---
75+
76+
## 🧮 Data Fields
77+
78+
| Field | Type | Description |
79+
|--------|------|-------------|
80+
| `video` | Video | Video snippet of soccer event |
81+
| `query` | string | Natural language question |
82+
| `response` | string | Natural language answer |
83+
| `events` | list[string] | Associated SoccerNet event types |
84+
| `path` | string | Relative path to video file |
85+
86+
---
87+
88+
## 🔄 Convert to JSONL (for MS-Swift or other VLMs)
89+
90+
```python
91+
import os, json
92+
from datasets import load_dataset
93+
import pandas as pd
94+
95+
base = "/content/SoccerChat/videos"
96+
ds = load_dataset("SimulaMet/SoccerChat")
97+
98+
for split, out_file in [("train", "SoccerChat_train.jsonl"), ("validation", "SoccerChat_valid.jsonl")]:
99+
df = ds[split].to_pandas()
100+
df["query"] = "<video>" + df["query"]
101+
df["videos"] = df["path"].apply(lambda p: [os.path.join(base, os.path.basename(p))])
102+
df[["query", "response", "videos"]].to_json(out_file, orient="records", lines=True)
103+
```
104+
105+
---
106+
107+
## 🧠 Training & Evaluation (Example with MS-Swift)
108+
109+
### 🏋️ Training Example (Qwen2-VL-7B)
110+
111+
```bash
112+
NFRAMES=24 MAX_PIXELS=100352 NPROC_PER_NODE=4 swift sft --model_type qwen2-vl-7b-instruct --model_id_or_path qwen/Qwen2-VL-7B-Instruct --sft_type lora --dataset SoccerChat_train.jsonl --num_train_epochs 5 --batch_size 14 --deepspeed default-zero2 --eval_steps 100 --dataset_test_ratio 0.05
113+
```
114+
115+
### 📊 Evaluation
116+
117+
```bash
118+
NFRAMES=24 MAX_PIXELS=100352 swift infer --ckpt_dir checkpoint-dir --load_dataset_config true --merge_lora true --val_dataset SoccerChat_valid.jsonl
119+
```
120+
121+
---
122+
123+
## 📜 Terms of Use
124+
125+
Released under **CC BY-NC 4.0** — for non-commercial research and educational purposes only.
126+
127+
---
128+
129+
## 🧾 Citation
130+
131+
If you use this dataset, please cite:
132+
133+
```bibtex
134+
@article{Gautam2025May,
135+
author = {Gautam, Sushant and Midoglu, Cise and Thambawita, Vajira and Riegler, Michael A. and Halvorsen, P{a}l and Shah, Mubarak},
136+
title = {{SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding}},
137+
journal = {arXiv},
138+
year = {2025},
139+
month = may,
140+
eprint = {2505.16630},
141+
doi = {10.48550/arXiv.2505.16630}
142+
}
143+
```
144+
145+
---
146+
147+
## 📬 Contact
148+
149+
For any queries or collaborations, please contact:
150+
📧 [sushant@simula.no](mailto:sushant@simula.no)
151+
🌐 [sushant.info.np](https://sushant.info.np)
782 KB
Loading

public/thumbnails/SoccerChat.png

207 KB
Loading
281 KB
Loading

0 commit comments

Comments
 (0)