-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
270 lines (243 loc) · 10.1 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Contrastive Mean-Shift Learning for Generalized Category Discovery</title>
<!-- Bootstrap -->
<link href="css/bootstrap-4.4.1.css" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
<!-- <link href='http://fonts.googleapis.com/css?family=Open+Sans:400italic,700italic,800italic,400,700,800' rel='stylesheet' type='text/css'>
<link rel="stylesheet" type="text/css" href="css/project.css" media="screen" />
<link rel="stylesheet" type="text/css" media="screen" href="css/iconize.css" />
<script src="js/google-code-prettify/prettify.js"></script> -->
<script>
MathJax = {
tex: {
inlineMath: [['$', '$'], ['\\(', '\\)']]
},
svg: {
fontCache: 'global'
}
};
</script>
<script type="text/javascript" id="MathJax-script" async
src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-svg.js">
</script>
</head>
<!-- cover -->
<section>
<div class="jumbotron text-center mt-0">
<div class="container">
<div class="row">
<div class="col-12">
<h2>Contrastive Mean-Shift Learning for Generalized Category Discovery</h2>
<h4 style="color:#5a6268;">CVPR 2024 </h4>
<hr>
<h6>
<a href="http://sua-choi.github.io" target="_blank">Sua Choi</a>,
<a href="http://dahyun-kang.github.io" target="_blank">Dahyun Kang</a>,
<a href="https://cvlab.postech.ac.kr/~mcho/" target="_blank">Minsu Cho</a>
</h6>
<p>
Pohang University of Science and Technology (POSTECH)
</p>
<div class="row justify-content-center">
<div class="column">
<p class="mb-5"><a class="btn btn-large btn-light" href="http://arxiv.org/abs/2404.09451" role="button"
target="_blank">
<i class="fa fa-file"></i> Paper</a> </p>
</div>
<div class="column">
<p class="mb-5"><a class="btn btn-large btn-light" href="https://github.com/sua-choi/CMS"
role="button" target="_blank">
<i class="fa fa-github-alt"></i> Code </a> </p>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<!-- abstract -->
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<br>
<h2>Abstract</h2>
<!-- <hr style="margin-top:0px"> -->
<p class="text-left">
We address the problem of generalized category discovery (GCD) that aims to partition a partially labeled collection of images;
only a small part of the collection is labeled and the total number of target classes is unknown. To address this generalized
image clustering problem, we revisit the mean-shift algorithm, i.e., a classic, powerful technique for mode seeking, and
incorporate it into a contrastive learning framework. The proposed method, dubbed Contrastive Mean-Shift (CMS) learning,
trains an image encoder to produce representations with better clustering properties by an iterative process of mean shift and
contrastive update. Experiments demonstrate that our method, both in settings with and without the total number of clusters
being known, achieves state-of-the-art performance on six public GCD benchmarks without bells and whistles.
</p>
<br>
</div>
</div>
</div>
</section>
<br>
<style>
.image {
width: 100%;
}
@media (min-width: 1200px) {
.image {
width: 80%;
}
}
</style>
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h2>Preliminary</h2>
<!-- <hr style="margin-top:0px"> -->
</div>
<br>
<div class="col-12">
<p class="text-left">
<center>
<img src="images/preliminary.png" class="image" style="margin-top:10px; margin-bottom:20px;">
</center>
<!-- <h4>Mean-Shift</h4> -->
<!-- We revisit the mean-shift and introduce contrastive mean-shift (CMS) learning for generalized category discovery. -->
Mean-shift is a classic, powerful technique for mode seeking and clustering analysis. It assigns each data point a
corresponding mode through iterative shifts by kernel-weighted aggregation of neighboring points. The set of data
points that converge to the same mode defines the basin of attraction of that mode, and this naturally relates to
clustering: the points in the same basin of attraction are associated with the same cluster.
<br>
</p>
<br>
</div>
</div>
</section>
<br>
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h2>Methods</h2>
<!-- <hr style="margin-top:0px"> -->
</div>
<br>
<div class="col-12">
<p class="text-left">
<h4>Learning framework: Contrastive Mean-Shift learning (CMS)</h4>
<center>
<img src="images/overview.png" style="width:100%; margin-top:10px; margin-bottom:20px;">
</center>
<!-- We revisit the mean-shift and introduce contrastive mean-shift (CMS) learning for generalized category discovery. -->
Given a collection of images, each initial image embedding $\boldsymbol{v}_i$ from an image encoder takes a single step of
mean shift to be $\boldsymbol{z}_i$ by aggregating its $k$ nearest neighbors with a weight kernel $\varphi(\cdot)$. The
encoder network is then updated by contrastive learning with the mean-shifted embeddings, which draws a mean-shifted embedding
of image $x_{i}$ and that of its augmented image $x_{i}^{+}$ closer and pushes those of distinct images apart from each other.
<br>
<br>
<br>
<h4>Validation: Estimating the number of clusters</h4>
<center>
<img src="images/validation.png" class="image" style="margin-top:10px; margin-bottom:20px;">
</center>
During training, we estimate the number of clusters K at the end of every epoch for a fairer and efficient validation. We apply
agglomerative clustering on the validation set to obtain clustering results for different number of clusters. Among them, the
highest clustering accuracy on the labeled images is recorded as the validation performance, and the corresponding number of
clusters is determined as the estimated number of clusters.
<br>
<br>
<br>
<h4>Inference: Iterative Mean-Shift (IMS)</h4>
<center>
<img src="images/inference.png" class="image" style="margin-top:10px; margin-bottom:20px;">
</center>
To improve the final clustering property of the embeddings, we perform multi-step mean shift on the embeddings before
agglomerative clustering. Starting from the initial embeddings from the learned encoder, we update them to $t$-step
mean-shifted embeddings until the clustering accuracy on the labeled data converges. The final cluster assignment is obtained
by performing agglomerative clustering on the multi-step mean-shifted embeddings.
<br>
</p>
<br>
</div>
</div>
</section>
<br>
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h2>Experiments</h2>
</div>
<div class="col-12">
<p class="text-left">
<h4>Evaluation on GCD</h4>
<img src="images/results.png" style="width:100%; margin-top:10px; margin-bottom:10px;">
<br>
Comparison with the state of the arts on GCD using <i>DINO-ViT-B/16</i>, evaluated with or without the ground-truth
class number K for clustering.
<img src="images/results2.png" style="width:100%; margin-top:20px; margin-bottom:10px;">
Comparison with the state of the arts on GCD using <i>CLIP-ViT-B/16</i>, evaluated with or without the ground-truth
class number K for clustering.
<br>
<br>
<h4>Ablation study</h4>
<img src="images/ablations.png" style="width:100%; margin-top:10px; margin-bottom:10px;">
Effectiveness of each component of our method. SSK denotes semi-supervised k-means clustering and IMS denotes
iterative mean-shift.
<br>
</p>
</div>
</div>
</div>
</section>
<br>
<section>
<div class="container">
<div class="row">
<div class="col-12 text-center">
<h2>Qualitative results</h2>
</div>
<div class="col-12">
<p class="text-left">
<img src="images/qualitative.png" style="width:100%; margin-top:10px; margin-bottom:10px;">
kNN retrieved images of the initial embedding $\boldsymbol{v}$ and mean-shifted embedding $\boldsymbol{z}$ on CUB-200-2011.
Green denotes the correct class and red an incorrect class.
</div>
<div class="col-12">
<p class="text-left">
</p>
</div>
</div>
</div>
</section>
<br>
<!-- citing -->
<div class="container">
<div class="row ">
<div class="col-12">
<h3>Citation</h3>
<hr style="margin-top:0px">
<pre style="background-color: #e9eeef;padding: 1.25em 1.5em">
<code>
@inproceedings{choi2024contrastive,
title={Contrastive Mean-Shift Learning for Generalized Category Discovery},
author={Choi, Sua and Kang, Dahyun and Cho, Minsu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2024}
}
</code>
</pre>
<hr>
</div>
</div>
</div>
<footer class="text-center" style="margin-bottom:10px">
Thanks to <a href="https://lioryariv.github.io/" target="_blank">Lior Yariv</a> for the website template.
</footer>
</body>
</html>