Skip to content

cdsl-research/website-access-logs-public

Repository files navigation

Website Access Logs — Cloud and Distributed Systems Laboratory

This repository contains web server access logs and error logs from the websites operated by the Cloud and Distributed Systems Laboratory (CDSL). The logs cover the period from April 2023 to February 2026.

Privacy

All IP addresses in the logs have been anonymized. Each original IP address has been replaced with a unique identifier in the format IP###### (e.g., IP001234). The mapping is consistent within the dataset — the same original IP address always maps to the same anonymized identifier — so per-client request patterns remain analyzable without exposing real addresses.

Directory Structure

Logs are organized by month, then by daily backup snapshot:

YYYYMM/
  log-backup-YYYYMMDDHHMMSS/       # Access logs
    access.<service>.log.1
    ...
  log-error-backup-YYYYMMDDHHMMSS/ # Error logs (available from October 2024)
    error.<service>.log.1
    ...

Log Format

All HTTP access logs use the standard Nginx combined log format:

$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$http_x_forwarded_for"

Example:

IP143527 - - [31/Dec/2025:00:04:20 +0900] "HEAD / HTTP/1.1" 405 0 "https://doktor.tak-cslab.org/" "Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)"

The TCP stream proxy logs (access.nisl-1-stream.log.1) use a different format produced by the Nginx stream module:

$remote_addr [$time_local] $protocol $status $bytes_sent $bytes_received $session_time

Example:

IP277299 [21/Dec/2025:10:30:27 +0900] TCP 502 0 0 3.012

Services

Access Logs

Filename URL Available From Description
access.doktor.log.1 https://doktor.tak-cslab.org Apr 2023 Academic paper search web service built on a microservice architecture. Source code: cdsl-research/doktor-v2
access.wp.log.1 https://ja.tak-cslab.org/ Apr 2023 Public-facing WordPress website. Contains the lab blog and research introduction pages.
access.rudder.log.1 https://rudder.tak-cslab.org/ Apr 2023 Internal lab introduction website. Access is controlled via Google Account SSO.
access.nisl.tak-cslab.org.log.1 http://nisl.tak-cslab.org Apr 2023 HTTP reverse proxy that fronts a Kubernetes cluster exposed for joint research purposes.
access.lily.log.1 May 2023 Internal website operated by the lab.
access.clematis.tak-cslab.org.log.1 https://clematis.tak-cslab.org Nov 2023 Internal website operated by the lab.
access.nisl-1-stream.log.1 Mar 2024 TCP-level stream proxy for direct Kubernetes API access (port 6443). Uses Nginx stream module format.
access.forsythia.tak-cslab.org.log.1 https://forsythia.tak-cslab.org Apr 2024 Internal website operated by the lab.
access.harvest.tak-cslab.org.log.1 https://harvest.tak-cslab.org Nov 2024 Experimental website used for research purposes.

Error Logs

Error logs are available from October 2024 onwards. They are stored in separate log-error-backup-* directories within each monthly folder.

Filename Description
error.nisl.tak-cslab.org.log.1 Nginx error log for the NISL HTTP reverse proxy
error.nisl-1-stream.log.1 Nginx error log for the NISL TCP stream proxy
error.clematis.tak-cslab.org.log.1 Nginx error log for the Clematis site
error.forsythia.tak-cslab.org.log.1 Nginx error log for the Forsythia site
error.harvest.tak-cslab.org.log.1 Nginx error log for the Harvest site
error.lily.log.1 Nginx error log for the Lily site

Dataset Statistics

Total: 6,082 files / 5,736,834 log entries

Access Logs

Service Files Entries
doktor 623 529,155
wp 953 4,786,964
rudder 478 478
nisl (HTTP) 780 30,232
lily 546 38,287
clematis 543 230,985
nisl-1-stream (TCP) 331 5,857
forsythia 378 56,270
harvest 218 38,715
Total 4,850 5,716,943

Error Logs

Service Files Entries
nisl (HTTP) 206 1,273
lily 203 2,030
clematis 204 246
nisl-1-stream (TCP) 231 5,847
forsythia 203 406
harvest 185 10,089
Total 1,232 19,891

Citation

If you use this dataset in your research, please cite it as follows:

BibTeX:

@misc{cdsl-website-access-logs,
  author       = {{Cloud and Distributed Systems Laboratory}},
  title        = {Website Access Logs --- Cloud and Distributed Systems Laboratory},
  year         = {2026},
  howpublished = {\url{https://github.com/cdsl-research/website-access-logs-public}},
  note         = {Accessed: \today}
}

APA:

Cloud and Distributed Systems Laboratory. (2026). Website Access Logs — Cloud and Distributed Systems Laboratory [Dataset]. GitHub. https://github.com/cdsl-research/website-access-logs-public

Coverage by Service

Service Access Log Period Error Log Period
doktor Apr 2023 – Feb 2026
wp (WordPress) Apr 2023 – Feb 2026
rudder Apr 2023 – Feb 2026
nisl (HTTP) Apr 2023 – Feb 2026 Oct 2024 – Feb 2026
lily May 2023 – Feb 2026 Oct 2024 – Feb 2026
clematis Nov 2023 – Feb 2026 Oct 2024 – Feb 2026
nisl-1-stream (TCP) Mar 2024 – Feb 2026 Oct 2024 – Feb 2026
forsythia Apr 2024 – Feb 2026 Oct 2024 – Feb 2026
harvest Nov 2024 – Feb 2026 Oct 2024 – Feb 2026

About

web server access logs and error logs

Resources

Stars

Watchers

Forks

Contributors