fanyix.github.io/index.html at master · fanyix/fanyix.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
<!DOCTYPE html>
<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  <meta charset="utf-8">
  <title>Fanyi Xiao</title>
  <link href="./templates/css" rel="stylesheet" type="text/css">
  <link rel="stylesheet" href="./templates/normalize.css">
  <link rel="stylesheet" href="./templates/skeleton.css">
  <link rel="stylesheet" href="./templates/fyx.css">
  <style>
  #me { border : 0 solid black; border-radius : 10px; }
  </style>
</head>
<body>
  <div class="container">
    <div class="row">
      <div class="column" style="margin-top: 10%">

        <table>
        <tr>
        <td width="25%">
        <div class="two" id = 'ccc_image'><img src='./me3.jpg' width=150px id = 'me'></div>
        </td>
        <td width="75%">
        <h1 style="margin-bottom:0">Fanyi Xiao</h1>
        <h6>Email: fyxiao at ucdavis dot edu</a></h6>
        <!-- <h5>PhD student <br>computer vision and AI</h5> -->
        </td>
        </tr>
        </table>


        <p>
          <!-- I am an Applied Scientist working on computer vision and machine learning at <a href="https://www.amazon.science/computer-vision">Amazon AI</a>. Previously, I finished my PhD at the University of California Davis, advised by Prof. Yong Jae Lee. Before that, I completed my master's degree in Robotics at Carnegie Mellon University.   -->
          I am a Research Scientist working on computer vision at <a href="https://ai.facebook.com">Meta AI</a>. Previously, I have worked at Amazon AI on the AWS Rekognition team. Before that, I finished my PhD at the University of California Davis, advised by Prof. Yong Jae Lee.
          <br><br>
          During my PhD, I am very fortunate to have spent time at Disney Research working with Prof. Leonid Sigal, at NVIDIA Research with Dr. Xiaodong Yang and Dr. Ming-Yu Liu, and at Facebook AI Research (FAIR) with Dr. Christoph Feichtenhofer, Prof. Kristen Grauman and Prof. Jitendra Malik.
          <br><br>
          I'm mostly interested in multimodal learning with minimal human supervision as well as video understanding. <a href="https://drive.google.com/open?id=15fho-xyIZzAHtvjM_VF_AC_rN43gwmLS">Here</a> is a talk I gave recently.

          <!-- <br><br>
          <font color="red">We have internship openings to work on a broad range of topics including visual language pretraining for object detection, low-shot and efficient detection, etc. Drop me an email if you're interested!</font> -->

          <!-- At a high level, my research interests can be categorized into three aspects: <b>1)</b> Learning to understand videos. This includes developing algorithms for recognition, detection and segmentation in videos. <b>2)</b> Learning video representations with minimal human supervision (i.e., weakly- and self-supervised learning). <b>3)</b> Learning across modalities (e.g., image, video, language and audio). -->

          <!-- <br><br> -->
          <!-- <font color="red">I'm on job market seeking full-time position for computer vision and deep learning. Please do not hesitate to drop me an email me if you're interested in my research.</font> -->

          <!-- I have spent a great summer working with <a href="https://www.cs.ubc.ca/~lsigal/">Dr. Leonid Sigal</a> at Disney Research in 2016. This summer I am continuing my adventure at NVIDIA Research. -->
          <br><br>
          <!-- <a href="mailto:fanyix.cs@gmail.com">Email</a> / <a href="./cv.pdf">CV</a> / <a href="https://scholar.google.com/citations?user=cuqP0dYAAAAJ&hl=en">Scholar</a> / <a href="https://github.com/fanyix">Github</a> -->
          <a href="./cv.pdf">CV</a> / <a href="https://scholar.google.com/citations?user=cuqP0dYAAAAJ&hl=en">Scholar</a> / <a href="https://github.com/fanyix">Github</a>
        </p>

        <h3>News</h3>
        	3/22 -- We are releasing <a href="https://ai.facebook.com/blog/advancing-first-person-perception-with-2022-ego4d-challenge#egoobjects">EgoObjects dataset</a> -- the first large-scale dataset focused on object detectors for egocentric video, check it out!
        	<br>
        	3/22 -- Our paper on hierarchical pretraining for movie understanding is accepted to CVPR22!
        	<br>
        	12/21 -- I have recently joined Meta AI as a Research Scientist focusing on object and scene understanding for Augmented Reality.
        	<br>
        	09/20 -- Our paper on adaptive anti-aliasing won <strong>best paper award</strong> at BMVC 2020!
        	<br>
        	<!-- 08/20 -- I have recently graduated and joined the Amazon AI team!
        	<br> -->
        	<!-- 11/19 -- YOLACT won <strong>Most Innovative Award</strong> at COCO Object Detection Challenge, ICCV 2019.
          <br> -->
          <!-- 07/19 -- Two papers (one oral one poster) get accepted at ICCV 19.
          <br>
        	05/19 -- I will be joining Facebook AI Research for a summer internship!
        	<br>
        	12/18 -- Code for our video object detection work is now available on Github. Give it a try!
        	<br>
        	07/18 -- Our work on video object detection is accepted at ECCV 18, see you in Munich!
        	<br>
        	06/18 -- I am awarded a <strong>Best Graduate Researcher Award</strong> by CS Dept. of UC Davis.
        	<br> -->
        <p></p>

        <h3>Papers</h3>
        	<table width="100%" align="center" border="0" cellspacing="0" cellpadding="20">

      <tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./pics/hssl.png' width=260px height=200px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="https://arxiv.org/abs/2204.03101">
			  		<papertitle>Hierarchical Self-supervised Representation Learning for Movie Understanding</papertitle></a><br>
			          		<strong>Fanyi Xiao</strong>, Kaustav Kundu, Joseph Tighe, Davide Modolo<br>
			      <em>Computer Vision and Pattern Recognition (CVPR)</em>, 2022 <br>
				</td>
			</tr>

      <tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./project/modist/files/modist.gif' width=260px height=200px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="./project/modist/modist.html">
			  		<papertitle>MaCLR: Motion-aware Contrastive Learning of Representations for Videos</papertitle></a><br>
			          		<strong>Fanyi Xiao</strong>, Joseph Tighe, Davide Modolo<br>
			          		<font color="red">Surprising effectiveness of simple motion prior for video SSL</font>
			      <em>European Conference on Computer Vision (ECCV)</em>, 2022 <br>
				</td>
			</tr>

			<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/yolactedge.jpeg' width=250px height=200px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="https://arxiv.org/abs/2012.12259">
			  		<papertitle>YolactEdge: Real-time Instance Segmentation on the Edge</papertitle></a><br>
			          		Haotian Liu*, Rafael A. Rivera-Soto*, <strong>Fanyi Xiao</strong>, Yong Jae Lee<br>
			          		<em>IEEE International Conference on Robotics and Automation (ICRA)</em>, 2020 <br>
			        			[<a href="https://arxiv.org/abs/2012.12259">arXiv</a>] [<a href="https://github.com/haotian-liu/yolact_edge">Code</a>] [<a href="https://www.youtube.com/watch?v=-JgTd-lrsqs">Talk</a>] [<a href="https://www.youtube.com/watch?v=GBCK9SrcCLM">Demo</a>] [<a href="https://colab.research.google.com/drive/1nEZAYnGbF7VetqltAlUTyAGTI71MvPPF?usp=sharing">Colab Notebook</a>]
			        			<br>
			          		<font color="red">Run instance segmentation on your Jetson device</font>
				</td>
			</tr>


      <tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./pics/filters.gif' width=250px height=120px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="https://arxiv.org/abs/2008.09604">
			  		<papertitle>Delving Deeper into Anti-aliasing in ConvNets</papertitle></a><br>
			          		Xueyan Zou, <strong>Fanyi Xiao</strong>, Zhiding Yu, Yong Jae Lee<br>
			        <em>British Machine Vision Conference (BMVC)</em>, 2020 <br>
			        		[<a href="https://maureenzou.github.io/ddac/">Project</a>] [<a href="https://github.com/MaureenZOU/Adaptive-anti-Aliasing">Code</a>] [<a href="https://www.youtube.com/watch?v=R8eSs6Cljvc">Talk</a>]
			        		<br>
			          	<!-- <b>oral presentation</b> -->
			          	<a href="./pics/bmvc_award.jpg"><font color="red">Best Paper Award</font></a>

				</td>
			</tr>


      <tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/avslowfast.png' width=250px height=200px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="https://arxiv.org/abs/2001.08740">
			  		<papertitle>Audiovisual SlowFast Networks for Video Recognition</papertitle></a><br>
			          		<strong>Fanyi Xiao</strong>, Yong Jae Lee, Kristen Grauman, Jitendra Malik, Christoph Feichtenhofer<br>
            <!-- <em>preprint</em>, 2019 <br> -->
				</td>
			</tr>


      <tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/yolact++.png' width=250px height=150px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="https://arxiv.org/abs/1912.06218">
			  		<papertitle>YOLACT++: Better Real-time Instance Segmentation</papertitle></a><br>
			          		Daniel Bolya*, Chong Zhou*, <strong>Fanyi Xiao</strong>, Yong Jae Lee<br>
            <em>IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI)</em><br>
            <!-- * equal contribution <br> -->
            <a href="https://github.com/dbolya/yolact"><font color="red">YOLACT++ (v1.2) code released</font></a>

				</td>
			</tr>


      <tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/disentangle.png' width=250px height=150px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="./papers/disentangle.pdf">
			  		<papertitle>Identity from here, Pose from there: Self-supervised Disentanglement and Generation of Objects using Unlabeled Videos</papertitle></a><br>
			          		<strong>Fanyi Xiao</strong>, Haotian Liu, Yong Jae Lee<br>
            <em>International Conference on Computer Vision (ICCV)</em>, 2019 <br>
				</td>
			</tr>


      <tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/yolact.png' width=250px height=150px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="https://arxiv.org/abs/1904.02689">
			  		<papertitle>YOLACT: Real-time Instance Segmentation</papertitle></a><br>
			          		Daniel Bolya, Chong Zhou, <strong>Fanyi Xiao</strong>, Yong Jae Lee<br>
            <em>International Conference on Computer Vision (ICCV)</em>, 2019 <br>
            <b>oral presentation</b> <br>
            <a href="https://github.com/dbolya/yolact"><font color="red">code available!</font></a>
				</td>
			</tr>


			<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/step.png' width=250px height=150px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="https://arxiv.org/abs/1904.09288">
			  		<papertitle>STEP: Spatio-Temporal Progressive Learning for Video Action Detection</papertitle></a><br>
			          		Xitong Yang, Xiaodong Yang, Ming-Yu Liu, <strong>Fanyi Xiao</strong>, Larry Davis, Jan Kautz<br>
			        <em>Computer Vision and Pattern Recognition (CVPR)</em>, 2019 <br>
			          		<b>oral presentation</b>
				</td>
			</tr>

        	<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/stmn.png' width=250px height=80px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="./project/stmn/project.html">
			  		<papertitle>Video Object Detection with an Aligned Spatial-Temporal Memory</papertitle></a><br>
			          		<strong>Fanyi Xiao</strong> and Yong Jae Lee<br>
			          		<em>European Conference on Computer Vision (ECCV)</em>, 2018<br>
				</td>
			</tr>

        	<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/socialtree.png' width=250px height=140px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="https://arxiv.org/abs/1705.09275">
			  		<papertitle>Who Will Share My Image? Predicting the Content Diffusion Path in Online Social Networks</papertitle></a><br>
			          		Wenjian Hu, Krishna Kumar Singh*, <strong>Fanyi Xiao</strong>*, Jinyoung Han, Chen-Nee Chuah and Yong Jae Lee<br>
			          		<em>ACM International Conference on Web Search and Data Mining (WSDM)</em>, 2018<br>
			          		* equal contribution
				</td>
			</tr>

        	<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/weakalign.png' width=250px height=110px></div>
	          	</td>
				<td valign="top" width="75%">
			        		<p><a href="./project/weakalign/project.html">
			  		<papertitle>Weakly-supervised Visual Grounding of Phrases with Linguistic Structures</papertitle></a><br>
			          		<strong>Fanyi Xiao</strong>, Leonid Sigal and Yong Jae Lee<br>
			          		<em>Computer Vision and Pattern Recognition (CVPR)</em>, 2017
				</td>
			</tr>


        	<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/videoseg.png' width=250px height=135px></div>
	          	</td>
				<td valign="top" width="75%">

			        		<p><a href="./project/videoseg/project.html">
			  		<papertitle>Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals</papertitle></a><br>
			          		<strong>Fanyi Xiao</strong> and Yong Jae Lee<br>
			          		<em>Computer Vision and Pattern Recognition (CVPR)</em>, 2016 <br>
			          		<b>spotlight presentation</b>
				</td>
			</tr>


			<tr>
		          	<td width="25%">
		            		<div class="two" id = 'ccc_image'><img src='./papers/transfer2.jpg' width=250px height=135px></div>
		          	</td>
					<td valign="top" width="75%">

				        		<p><a href="http://krsingh.cs.ucdavis.edu/krishna_files/papers/track_transfer/track_transfer.html">
				  		<papertitle>Track and Transfer: Watching Videos to Simulate Strong Human Supervision for Weakly-Supervised Object Detection</papertitle></a><br>
				          		Krishna Singh, <strong>Fanyi Xiao</strong> and Yong Jae Lee<br>
				          		<em>Computer Vision and Pattern Recognition (CVPR)</em>, 2016 <br>
					</td>
			</tr>


			<tr>
			        <td width="25%">
			        	<div class="two" id = 'ccc_image'><img src='./papers/discovery.png' width=250px></div>
			        </td>
					<td valign="top" width="75%">

				        		<p><a href="./project/discovery/project.html">
				  		<papertitle>Discovering the Spatial Extent of Relative Attributes</papertitle></a><br>
				          		<strong>Fanyi Xiao</strong> and Yong Jae Lee<br>
				          		<em>International Conference on Computer Vision (ICCV)</em>, 2015 <br>
				          		<b>oral presentation</b>
					</td>
			</tr>


			<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/bilinear3.png' width=250px></div>
	          	</td>
				<td valign="top" width="75%">

			        		<p><a href="./papers/bilinear.pdf">
			  		<papertitle>Efficient Model Evaluation with Bilinear Separation Model</papertitle></a><br>
			          		<strong>Fanyi Xiao</strong> and Martial Hebert<br>
			          		<em>Winter Conference on Applications of Computer Vision (WACV)</em>, 2015 <br>
				</td>
			</tr>


			<!-- <tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/recommendation.png' width=250px></div>
	          	</td>
				<td valign="top" width="75%">

			        		<p><a href="./papers/recommendation.pdf">
			  		<papertitle>Runtime Model Recommendation for Exemplar-based Object Detection</papertitle></a><br>
			          		<strong>Fanyi Xiao</strong>, Martial Hebert, Yaser Sheikh, Yair Movshovitz-Attias, Mei Chen and Denver Dash<br>
			          		<em>Tech Report, Carnegie Mellon University</em>, 2014 <br>
				</td>
			</tr> -->

			<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/transitive.png' width=250px></div>
	          	</td>
				<td valign="top" width="75%">

			        		<p><a href="./papers/transitive.pdf">
			  		<papertitle>Transitive Distance Clustering with K-Means Duality</papertitle></a><br>
			  		Zhiding Yu, Chunjing Xu, Deyu Meng, Zhuo Hui, <strong>Fanyi Xiao</strong>, Wenbo Liu, Jianzhuang Liu<br>
			          		<em>Computer Vision and Pattern Recognition (CVPR)</em>, 2014 <br>
				</td>
			</tr>


			<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./papers/marvin.png' width=250px></div>
	          	</td>
				<td valign="top" width="75%">

			        		<p><a href="./papers/marvin.pdf">
			  		<papertitle>Physical Querying with Multi-modal Sensing</papertitle></a><br>
			          		Iljoo Baek, Taylor Stine, Denver Dash, <strong>Fanyi Xiao</strong>, Yaser Sheikh, Yair Movshovitz-Attias, Mei Chen, Martial Hebert, and Takeo Kanade<br>
			          		<em>Winter Conference on Applications of Computer Vision (WACV)</em>, 2014 <br>
				</td>
			</tr>


	</table>


	<h3>Industry Experience</h3>
        	<table width="100%" align="center" border="0" cellspacing="0" cellpadding="20">

        	<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./pics/fair.png' width=170px></div>
	          	</td>
				<td valign="top" width="75%">
					<br>
			  		<papertitle><strong>Facebook AI Research</strong> (Summer 2019)</papertitle><br>
	  				Developed an audiovisual network architecture for video understanding<br>
				</td>
			</tr>

        	<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./pics/nvidia.png' width=150px></div>
	          	</td>
		<td valign="top" width="75%">
			<br>
	  		<papertitle><strong>NVIDIA Research</strong> (Summer 2017)</papertitle><br>
	  		Developed a novel method for action detection<br>

		</td>
	</tr>
        	<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./pics/disney.png' width=150px></div>
	          	</td>
		<td valign="top" width="75%">
			<br>
	  		<papertitle><strong>Disney Research</strong> (Summer 2016)</papertitle><br>
	  		Developed a novel model for free-form language grounding on images<br>

		</td>
	<!-- </tr>
        	<tr>
	          	<td width="25%">
	            		<div class="two" id = 'ccc_image'><img src='./pics/intel.png' width=150px></div>
	          	</td>
		<td valign="top" width="75%">
			<br>
	  		<papertitle><strong>Intel Science and Technology Center</strong> (Sept 2012 - Aug 2013)</papertitle><br>
			Developed an object detection system for first-person video streams<br>
		</td>
	</tr> -->


      </div>
    </div>
  </div>

</body></html>