simula
diff --git a/‎datasets/kvasirvqa-x1.md‎ ‎datasets/kvasir-vqa-x1.md‎datasets/kvasirvqa-x1.md renamed to datasets/kvasir-vqa-x1.md
Lines changed: 5 additions & 3 deletions b/‎datasets/kvasirvqa-x1.md‎ ‎datasets/kvasir-vqa-x1.md‎datasets/kvasirvqa-x1.md renamed to datasets/kvasir-vqa-x1.md
Lines changed: 5 additions & 3 deletions
diff --git a/‎datasets/medmultipoints.md‎
Lines changed: 147 additions & 0 deletions b/‎datasets/medmultipoints.md‎
Lines changed: 147 additions & 0 deletions
diff --git a/‎datasets/soccerchat.md‎
Lines changed: 151 additions & 0 deletions b/‎datasets/soccerchat.md‎
Lines changed: 151 additions & 0 deletions
diff --git a/‎public/thumbnails/MedMultiPoints.png‎
782 KB b/‎public/thumbnails/MedMultiPoints.png‎
782 KB
diff --git a/‎public/thumbnails/SoccerChat.png‎
207 KB b/‎public/thumbnails/SoccerChat.png‎
207 KB
diff --git a/‎public/thumbnails/kvasir-vqa-v1.png‎
281 KB b/‎public/thumbnails/kvasir-vqa-v1.png‎
281 KB
@@ -1,8 +1,9 @@
 ---
 title: 'Kvasir-VQA-x1'
 desc: 'A Large-Scale Multi-Task Benchmark for GI Tract Visual Question Answering'
-thumbnail: /thumbnails/kvasir.jpg
-publication:
+thumbnail: /thumbnails/kvasir-vqa-v1.png
+publication: https://doi.org/10.1007/978-3-032-08009-7_6
+github: https://github.com/simula/Kvasir-VQA-x1
 tags:
   - gastrointestinal
   - endoscopy
@@ -105,4 +106,5 @@ location = {Daejeon, Korea (Republic of)}
 
 ## Contact
 
-sushant@simula.no, michael@simula.no, vajira@simula.no, steven@simula.no or paalh@simula.no
+Please contact sushant@simula.no, michael@simula.no, vajira@simula.no, steven@simula.no or paalh@simula.no for any questions regarding the dataset.
+
@@ -0,0 +1,147 @@
+---
+title: 'MedMultiPoints'
+desc: 'A Multimodal Dataset for Object Detection, Localization, and Counting in Medical Imaging'
+thumbnail: /thumbnails/MedMultiPoints.png
+publication: https://arxiv.org/abs/2505.16647
+github: https://github.com/Simula/PointDetectCount
+tags:
+  - medical
+  - multimodal
+  - detection
+  - localization
+  - counting
+  - microscopy
+  - endoscopy
+---
+
+The **MedMultiPoints** dataset is a curated **multimodal medical imaging benchmark** designed for **multi-task learning**—spanning **object detection**, **localization**, and **counting** tasks.  
+It integrates data from both **endoscopic** (HyperKvasir) and **microscopic** (VISEM-Tracking) modalities to reflect real-world clinical diversity and imaging conditions.
+
+It is introduced in the paper:  
+**"Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models"**  
+📍 *Presented at IEEE CBMS 2025, Madrid, Spain*  
+→ [Project Page & Code](https://github.com/Simula/PointDetectCount)  
+→ [📄 Paper (arXiv)](https://arxiv.org/abs/2505.16647)
+
+---
+
+## 🧩 Dataset Summary
+
+| Component | Details |
+|------------|----------|
+| **Images** | 10 600 endoscopic and microscopic medical images |
+| **Tasks** | Object Detection • Point Localization • Object Counting |
+| **Annotations** | Bounding boxes, point coordinates, count labels, and class labels |
+| **Modalities** | Endoscopy (GI) and Microscopy |
+| **Format** | JSONL instruction-style annotations for VLM and multi-task pipelines |
+| **Intended Use** | Multi-task, instruction-based, and multimodal medical AI research |
+
+---
+
+## 📚 Features
+
+- 🩻 **Multi-type annotations** per image:  
+  - `bbox_2d`: Bounding boxes for detection  
+  - `point_2d`: Points for localization  
+  - `count`: Object counts  
+- 🔗 Designed for **Vision-Language Models (VLMs)** and **instruction-tuned frameworks**  
+- 🧠 Enables **cross-task supervision**—learning from counting, detection, and localization jointly  
+
+---
+
+## 🧾 Data Schema
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `image` | Image | Raw medical image |
+| `image_sha256` | string | SHA-256 hash for integrity |
+| `img_size` | [int, int] | Original image width and height |
+| `points` | list | List of `[x, y]` point annotations |
+| `bbox` | list | List of `[x1, y1, x2, y2]` bounding boxes |
+| `count` | int | Number of objects in the image |
+| `label` | string | Object/class label (e.g., `polyps`, `sperm`) |
+| `collection_method` | string | Task type (`counting`, `detection`, etc.) |
+| `classification` | string | Annotation description (`pathological-findings`, etc.) |
+| `organ` | string | Target organ (`Lower GI`, `Microscopy`, etc.) |
+
+---
+
+## 🎯 Supported Tasks
+
+- 🔲 **Object Detection** — bounding-box prediction  
+- 📍 **Localization** — point coordinate prediction  
+- 🔢 **Counting** — regression on object counts  
+- 🧠 **Multimodal Instruction-Based Learning** — unified multi-task training  
+
+---
+
+## 💾 Download
+
+### Hugging Face Dataset
+
+**Dataset:**  
+[https://huggingface.co/datasets/SimulaMet/MedMultiPoints](https://huggingface.co/datasets/SimulaMet/MedMultiPoints)
+
+```python
+from datasets import load_dataset
+
+ds = load_dataset("SimulaMet/MedMultiPoints")["train"]
+sample = ds[0]
+
+image = sample["image"]
+bbox = sample["bbox"]
+points = sample["points"]
+count = sample["count"]
+```
+
+**Instruction-Fused JSONL Files**  
+- [`multi-task-train.jsonl`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-train.jsonl)  
+- [`multi-task-test.jsonl`](https://huggingface.co/datasets/SimulaMet/MedMultiPoints/resolve/main/instruction_dataset/multi-task-test.jsonl)
+
+---
+
+## 🧠 Example Entry
+
+```json
+{
+  "image_sha256": "71179abc4b011cc99bddb3344e3e114765b32bdf77e78892f046026d785a4bdb",
+  "img_size": [622, 529],
+  "points": [[234, 171.5]],
+  "bbox": [[38, 5, 430, 338]],
+  "count": 1,
+  "label": "polyps",
+  "collection_method": "counting",
+  "classification": "pathological-findings",
+  "organ": "Lower GI"
+}
+```
+
+---
+
+## 📜 Terms of Use
+
+Released under **CC BY-NC 4.0** — for research and educational use.
+
+---
+
+## 🧾 Citation
+
+If you use this dataset, please cite:
+
+```bibtex
+@incollection{Gautam,
+  author    = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{a}l},
+  title     = {Point, Detect, Count: Multi-Task Medical Image Understanding with Instruction-Tuned Vision-Language Models},
+  booktitle = {2025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS)},
+  publisher = {IEEE},
+  pages     = {18--20},
+  doi       = {10.1109/CBMS65348.2025.00090}
+}
+```
+
+---
+
+## 📫 Contact
+
+For questions, please reach out to:  
+📧 [sushant@simula.no](mailto:sushant@simula.no)
@@ -0,0 +1,151 @@
+---
+title: 'SoccerChat'
+desc: 'A Multimodal Video-Text Dataset for Natural Language Soccer Game Understanding'
+thumbnail: /thumbnails/SoccerChat.png
+publication: https://arxiv.org/abs/2505.16630
+github: https://github.com/simula/SoccerChat
+tags:
+  - soccer
+  - video
+  - multimodal
+  - text
+  - event-detection
+  - reasoning
+  - synthetic
+---
+
+**SoccerChat** is a multimodal dataset for **video–language understanding** in the context of **soccer match analysis**.  
+It enables training and evaluation of large vision–language models (VLMs) for **event detection**, **temporal reasoning**, and **natural language generation** over real-world broadcast video clips.
+
+Introduced in the paper:  
+📄 **"SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding"**  
+📰 *arXiv preprint, May 2025*  
+→ [Paper (arXiv:2505.16630)](https://arxiv.org/abs/2505.16630)  
+→ [GitHub Project Page](https://github.com/simula/SoccerChat)  
+→ [Trained Model (Qwen2-VL-7B)](https://huggingface.co/SimulaMet/SoccerChat-qwen2-vl-7b)  
+→ [Web Demo (Colab)](https://colab.research.google.com/github/Simula/SoccerChat/blob/main/notebooks/WebUI.ipynb)
+
+---
+
+## ⚽ Dataset Summary
+
+| Component | Details |
+|------------|----------|
+| **Total Examples** | 89,000 (train: 85,220 / validation: 4,625) |
+| **Modality** | Video + Text |
+| **Tasks** | Event Detection • Video Question Answering • Text Generation |
+| **Languages** | English |
+| **Video Format** | Short broadcast snippets (~5–15 seconds) |
+| **Total Size** | ~48 GB (videos) |
+| **Annotation Fields** | Video clip, natural language query, model response, and event tags |
+| **License** | Research use (CC BY-NC 4.0) |
+
+Each example includes:
+- 🎞️ `video` — soccer match video snippet  
+- 💬 `query` — natural language question or prompt  
+- 🧠 `response` — generated or annotated answer  
+- ⚡ `events` — list of SoccerNet event tags (e.g., `Goal`, `Card`, `Foul`)  
+- 📂 `path` — relative file path within `/videos/`
+
+---
+
+## 📁 Dataset Structure
+
+| Split | Examples | Size |
+|--------|-----------|------|
+| **train** | 85,220 | 36.7 MB (metadata only) |
+| **validation** | 4,625 | 1.47 MB (metadata only) |
+
+Videos must be downloaded separately (see below).
+
+---
+
+## 💾 Download Instructions
+
+Clone from Hugging Face using Git LFS:
+
+```bash
+git lfs install
+git clone https://huggingface.co/datasets/SimulaMet/SoccerChat
+```
+
+> 📦 Videos are stored under `SoccerChat/videos/` (~48 GB total)
+
+---
+
+## 🧮 Data Fields
+
+| Field | Type | Description |
+|--------|------|-------------|
+| `video` | Video | Video snippet of soccer event |
+| `query` | string | Natural language question |
+| `response` | string | Natural language answer |
+| `events` | list[string] | Associated SoccerNet event types |
+| `path` | string | Relative path to video file |
+
+---
+
+## 🔄 Convert to JSONL (for MS-Swift or other VLMs)
+
+```python
+import os, json
+from datasets import load_dataset
+import pandas as pd
+
+base = "/content/SoccerChat/videos"
+ds = load_dataset("SimulaMet/SoccerChat")
+
+for split, out_file in [("train", "SoccerChat_train.jsonl"), ("validation", "SoccerChat_valid.jsonl")]:
+    df = ds[split].to_pandas()
+    df["query"] = "<video>" + df["query"]
+    df["videos"] = df["path"].apply(lambda p: [os.path.join(base, os.path.basename(p))])
+    df[["query", "response", "videos"]].to_json(out_file, orient="records", lines=True)
+```
+
+---
+
+## 🧠 Training & Evaluation (Example with MS-Swift)
+
+### 🏋️ Training Example (Qwen2-VL-7B)
+
+```bash
+NFRAMES=24 MAX_PIXELS=100352 NPROC_PER_NODE=4 swift sft   --model_type qwen2-vl-7b-instruct   --model_id_or_path qwen/Qwen2-VL-7B-Instruct   --sft_type lora   --dataset SoccerChat_train.jsonl   --num_train_epochs 5   --batch_size 14   --deepspeed default-zero2   --eval_steps 100   --dataset_test_ratio 0.05
+```
+
+### 📊 Evaluation
+
+```bash
+NFRAMES=24 MAX_PIXELS=100352 swift infer   --ckpt_dir checkpoint-dir   --load_dataset_config true   --merge_lora true   --val_dataset SoccerChat_valid.jsonl
+```
+
+---
+
+## 📜 Terms of Use
+
+Released under **CC BY-NC 4.0** — for non-commercial research and educational purposes only.
+
+---
+
+## 🧾 Citation
+
+If you use this dataset, please cite:
+
+```bibtex
+@article{Gautam2025May,
+  author = {Gautam, Sushant and Midoglu, Cise and Thambawita, Vajira and Riegler, Michael A. and Halvorsen, P{a}l and Shah, Mubarak},
+  title = {{SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding}},
+  journal = {arXiv},
+  year = {2025},
+  month = may,
+  eprint = {2505.16630},
+  doi = {10.48550/arXiv.2505.16630}
+}
+```
+
+---
+
+## 📬 Contact
+
+For any queries or collaborations, please contact:  
+📧 [sushant@simula.no](mailto:sushant@simula.no)  
+🌐 [sushant.info.np](https://sushant.info.np)