HDC/index.html at main · integerkim/HDC · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="utf-8" />
  <meta name="viewport" content="width=device-width, initial-scale=1" />
  <meta name="description" content="HDC: A Hierarchical Diffusion Model for Choreography — Project page" />
  <meta name="keywords" content="HDC, choreography generation, diffusion, dance, music" />
  <title>HDC: Hierarchical Diffusion Model for Choreography</title>

  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600;800&display=swap" rel="stylesheet" />
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/bulma@0.9.4/css/bulma.min.css" />
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.0/css/all.min.css" />

  <style>
    :root { --accent: #111827; }
    html, body { font-family: Inter, system-ui, -apple-system, Segoe UI, Roboto, Noto Sans, Helvetica, Arial, sans-serif; }
    .hero.is-primary { background: linear-gradient(135deg, #111827 0%, #374151 100%); }
    .hero .title, .hero .subtitle { color: #fff; }
    .figure-placeholder {
      border: 2px dashed #c4c4c4;
      border-radius: 12px;
      aspect-ratio: 16/9;
      width: 100%;
      display: grid;
      place-items: center;
      background: #fafafa;
      color: #6b7280;
      margin: 1rem auto;
    }
    .figure-caption { text-align: center; color: #6b7280; font-size: 0.95rem; }
    .pub-links .button { margin: 0.25rem; }
    footer a { color: #4a5568; }
    /* New Sections Layout */
    .triple-video-row { display: flex; gap: 1rem; flex-wrap: wrap; }
    .triple-video-row .video-col { flex: 1 1 300px; text-align: center; }
    .triple-video-row video { width: 100%; border-radius: 8px; background:#000; }
    .video-caption { margin-top: .5rem; font-size: 0.9rem; color:#4b5563; font-weight:500; }
    .dataset-flex { display:flex; flex-wrap:wrap; gap:1.5rem; align-items:flex-start; }
    .dataset-media { flex: 1 1 340px; }
    .dataset-media img, .dataset-media video { width:100%; border-radius:8px; background:#000; }
    .dataset-desc { flex: 1 1 380px; }
    .section-sep { height:2px; background:linear-gradient(90deg,#e5e7eb,#9ca3af,#e5e7eb); margin:3rem 0 2rem; border:none; }
    @media (max-width: 900px){
      .triple-video-row { flex-direction:column; }
      .dataset-flex { flex-direction:column; }
    }
  </style>
</head>
<body>

<nav class="navbar is-spaced" role="navigation" aria-label="main navigation">
  <div class="navbar-brand">
    <a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false" data-target="navMenu">
      <span aria-hidden="true"></span>
      <span aria-hidden="true"></span>
      <span aria-hidden="true"></span>
    </a>
  </div>
  <div id="navMenu" class="navbar-menu">
    <div class="navbar-start" style="flex-grow: 1; justify-content: center;">
      <a class="navbar-item" href="#abstract"><i class="fa-regular fa-file-lines"></i>&nbsp;Abstract</a>
      <a class="navbar-item" href="#figure"><i class="fa-regular fa-image"></i>&nbsp;Figure</a>
      <a class="navbar-item" href="#method"><i class="fa-solid fa-diagram-project"></i>&nbsp;Method</a>
      <a class="navbar-item" href="#bibtex"><i class="fa-solid fa-code"></i>&nbsp;BibTeX</a>
    </div>
  </div>
</nav>

<section class="hero is-primary">
  <div class="hero-body">
    <div class="container is-max-desktop has-text-centered">
      <h1 class="title is-1">HDC: Hierarchical Diffusion Model for Choreography</h1>
      <p class="subtitle is-4">CVPR 2026 Submission</p>
      <div class="is-size-5" style="margin-top: 1rem;"><strong>Jungsu Kim</strong></div>
      <div class="pub-links" style="margin-top: 1.25rem;">
        <!-- <a class="button is-dark is-rounded" href="#"><i class="fas fa-file-pdf"></i>&nbsp;Paper</a>
        <a class="button is-dark is-rounded" href="#"><i class="ai ai-arxiv"></i>&nbsp;arXiv</a> -->
        <a class="button is-dark is-rounded" href="#"><i class="fab fa-github"></i>&nbsp;Code</a>
        <!-- <a class="button is-dark is-rounded" href="#"><i class="fa-regular fa-folder-open"></i>&nbsp;Data</a> -->
      </div>
    </div>
  </div>
</section>

<section class="section" id="abstract">
  <div class="container is-max-desktop">
    <h2 class="title is-3"><i class="fa-regular fa-file-lines"></i> Abstract</h2>
    <div class="content">
      <p>Choreography is not a mere sequence of improvised movements to music. Rather, it is a systematic creative process grounded in the structural interpretation of music. Choreographers design movements at the phrase level, tune them to musical beats and energy, and seamlessly connect phrases. In this paper, we propose HDC: A Hierarchical Diffusion model for Choreography, a framework inspired by the real-world workflow of choreography creation. HDC takes in-the-wild music as input and (1) divides it into phrases to generate beat-based dances, (2) enhances each dance using waveform-based features, and (3) generates the full choreography by concatenating all phrases and then enhances it with deep features extracted from Jukebox AI to resolve inter-phrase discontinuities and strengthen plausibility. Building on this hierarchical generation framework, we further introduce semantic text stylization. Semantic prompt such as “Like Michael Jackson” is interpreted by a large language model and mapped to controllable dance features that modulate the strength of each enhancement, allowing intuitive and high-level style conditioning. To evaluate whether the generated choreography adheres to real-world choreographic primitives, we propose new metrics comprising Phrase Diversity, Feature Alignment, and Text Stylization Score. Experimental results demonstrate that HDC outperforms state-of-the-art models in generating choreography and exhibits strong text stylization capability.</p>
    </div>
  </div>
</section>

<section class="section" id="figure">
  <div class="container is-max-desktop">
    <figure class="image" style="margin:0 auto;">
      <img src="image/cvpr_fig1_revise.png"
           alt="HDC main figure"
           style="width:100%; height:auto; border-radius:8px;">
    </figure>
    <p class="figure-caption" style="margin-top:0.75rem;">Figure 1. HDC framework: phrase-level generation, waveform-aware enhancement, and global plausibility.</p>
  </div>
</section>
<section class="section" id="method">
  <div class="container is-max-desktop">
    <h2 class="title is-3"><i class="fa-solid fa-diagram-project"></i> Method</h2>
    <div class="content">
      <p>Inspired by choreographic creation theory, HDC interprets music and progressively completes the choreography through a hierarchical pipeline comprising four stages:</p>
      <ol>
        <li><strong>Phrase-level Beat-aware Generation:</strong> The input music is segmented into phrases—the minimal choreographic units—and an initial beat-aware dance is generated for each phrase.</li>
        <li><strong>Waveform-aware Enhancement:</strong> Each beat-aware dance is refined by incorporating waveform information that reflects changes in musical energy.</li>
        <li><strong>Global Plausibility Enhancement:</strong> All phrase-level dances are concatenated, and inter-phrase discontinuities are resolved to improve global naturalness and detail.</li>
        <li><strong>Choreography Stylization with Semantic Text:</strong> Text prompts such as “Like Michael Jackson” are interpreted to modulate enhancement strengths, achieving high-level style conditioning.</li>
      </ol>
      <p>Enhancement refers to generating a target choreography that jointly incorporates the target intention with the reference dance. This is achieved via the <strong>DanceEdit</strong> module, which modulates diffusion timesteps without additional training, enabling the fusion of reference and condition intentions.</p>
    </div>
  </div>
</section>

<section class="section" id="experiment">
  <div class="container is-max-desktop">
    <h2 class="title is-3"><i class="fa-solid fa-flask"></i> Experiment</h2>
    <div class="content">
      <p>The following comparative visualization illustrates the hierarchical refinement stages in our choreography generation pipeline.</p>
      <div class="triple-video-row">
        <div class="video-col">
          <video src="video/fakelove_beat.mp4" autoplay muted loop playsinline></video>
          <div class="video-caption">Left: Beat-aware generation (phrase-level initialization)</div>
        </div>
        <div class="video-col">
          <video src="video/fakelove_wav.mp4" autoplay muted loop playsinline></video>
          <div class="video-caption">Center: Waveform-aware enhancement (energy-aligned refinement)</div>
        </div>
        <div class="video-col">
          <video src="video/fakelove_all.mp4" autoplay muted loop playsinline></video>
          <div class="video-caption">Right: Plausibility-aware enhancement (global continuity & detail)</div>
        </div>
      </div>
      <p style="margin-top:1.25rem; font-size:0.95rem; color:#374151;">Left is the beat-aware generation stage, the center shows waveform-aware enhancement, and the right displays the plausibility-aware enhancement model producing globally coherent choreography.</p>
    </div>
  </div>
</section>

<section class="section" id="dataset">
  <div class="container is-max-desktop">
    <h2 class="title is-3"><i class="fa-regular fa-folder-open"></i> Dataset</h2>
    <div class="content">
      <div class="dataset-flex">
        <div class="dataset-media">
          <img src="image/image_sample.jpg" alt="Sample dataset visualization" />
        </div>
        <div class="dataset-media">
          <video src="video/gt_goodboy.mp4" controls muted loop playsinline></video>
        </div>
        <div class="dataset-desc">
          <p><strong>Dataset Overview.</strong> Provide an overview of the curated music–dance pairs, annotation strategy (phrases, beats, waveform proxies), and style labels utilized for semantic modulation.</p>
        </div>
      </div>
    </div>
  </div>
</section>

<section class="section" id="bibtex">
  <div class="container is-max-desktop content">
    <h2 class="title is-3"><i class="fa-solid fa-code"></i> BibTeX</h2>
<!-- <pre><code>@inproceedings{kim2026hdc,
  author    = {Jungsu Kim},
  title     = {HDC: Hierarchical Diffusion Model for Choreography},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026},
}</code></pre> -->
  </div>
</section>

<footer class="footer">
  <div class="container">
    <div class="content has-text-centered">
      <!-- <p>© <span id="year"></span> Jungsu Kim. Built with Bulma.</p> -->
    </div>
  </div>
</footer>

<script>
  (function() {
    var burger = document.querySelector('.navbar-burger');
    var menu = document.getElementById('navMenu');
    if (burger && menu) {
      burger.addEventListener('click', function(){
        burger.classList.toggle('is-active');
        menu.classList.toggle('is-active');
      });
    }
    document.getElementById('year').textContent = new Date().getFullYear();
  })();
</script>

</body>
</html>