SELECTED PUBLICATIONS

Full paper list can be found at DBLP and Google Scholar.

JOURNAL PAPERS

Video Pivoting Unsupervised Multi-modal Machine Translation

We propose a novel method to leverage visual contents to synthesize additional pseudo-pivoting for unsupervised multimodal machine translation.

Mingjie Li, Poyao Huang, Xiaojun Chang*, Junjie Hu, Yi Yang and Alex Hauptmann

IEEE Trans. Pattern Anal. Mach. Intell. 45(3):3918-3932 (2023)

pdf

When Object Detection Meets Knowledge Distillation:A Survey

This paper provides a comprehensive survey of recent research on using knowledge distillation (KD) techniques to improve the efficiency and accuracy of object detection models.

Zhihui Li, Pengfei Xu, Xiaojun Chang, Luyao Yang, Yuanyuan Zhang, Lina Yao and Xiaojiang Chen

IEEE Trans. Pattern Anal. Mach. Intell. 45(8):10555-10579 (2023)

pdf

Attribute-guided Collaborative Learning for Partial Person Re-identification

We present a novel attribute-guided collaborative learning scheme for partial person ReID. It jointly integrates noisy keypoint restraint, structured multi-modal representation aggregation, and robust pedestrian representation learning into a unified framework.

Haoyu Zhang, Meng Liu, Yuhong Li, Ming Yan, Zan Gao, Xiaojun Chang, Liqiang Nie

IEEE Trans. Pattern Anal. Mach. Intell. 45(12):14144-14160 (2023)

pdf

Simple Primitives with Feasibility- and Contextuality-Dependence for Open-World Compositional Zero-shot Learning

we model the dependence of compositions via feasibility and contextuality

Zhe Liu, Yun Li, Lina Yao, Xiaojun Chang, Wei Fang, Xiaojun Wu, Abdulmotaleb El Saddik

IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI), 2023

pdf

DNA Family:Boosting Weight-Sharing NAS with Block-Wise Supervisions

We use a generalization boundedness tool to attribute weight-sharing NAS’s ineffectiveness to unreliable architecture ratings caused by large search space.

Guangrun Wang, Changlin Li, Liuchun Yuan, Jiefeng Peng, Xiaoyu Xian, Xiaodan Liang, Xiaojun Chang and Liang Lin

IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI), 2023

pdf

TN-ZSTAD:Transferable Network for Zero-Shot Temporal Activity Detection

We propose a novel framework TN-ZSTAD for zero-shot temporal activity detection, which combines a graph activity transformer (AGT) and a zeroshot detection subnet (ZSDN) together to infer a set of unseen activity instances for a input video directly.

Lingling Zhang, Xiaojun Chang*, Jun Liu, Minnan Luo, Zhihui Li, Lina Yao, and Alex Hauptmann

IEEE Trans. Pattern Anal. Mach. Intell. 45(3):3848-3861 (2023)

pdf

A Comprehensive Survey of Scene Graphs:Generation and Application

This survey conducts a comprehensive investigation of the current scene graph research

Xiaojun Chang, Pengzhen Ren, Pengfei Xu, Zhihui Li, Xiaojiang Chen and Alex Hauptmann

IEEE Trans. Pattern Anal. Mach. Intell. 45(1):1-26 (2023)

pdf

ZeroNAS:Differentiable Generative Adversarial Networks Search for Zero-Shot Learning

Considering the varieties in datasets and tasks, we make the first attempt to bring NAS techniques into the realm of ZSL, and thus propose ZeroNAS to formulate the GAN architecture design for ZSL as a NAS problem.

Caixia Yan, Xiaojun Chang*, Zhihui Li, Weili Guan, Zongyuan Ge, Lei Zhu and Qinghua Zheng

IEEE Trans. Pattern Anal. Mach. Intell. 44(12):9733-9740 (2022)

pdf | code

DS-Net++:Dynamic Weight Slicing for Efficient Inference in CNNs and Vision Transformers

propose dynamic weight slicing scheme, achieving good hardware-efficiency by predictively slicing network parameters at test time with respect to different inputs.

Changlin Li, Guangrun Wang, Xiaodan Liang, Zhihui Li, and Xiaojun Chang*

IEEE Trans. Pattern Anal. Mach. Intell. 45(4):4430-4446 (2023)

pdf

Semantics-Guided Contrastive Network for Zero-Shot Object Detection

We develop a novel semantics-guided contrastive net- work for ZSD, underpinned by a new mapping- contrastive strategy superior to the conventional mapping-transfer strategy. To the best of our knowl- edge, this is the first work that applies contrastive learning mechanism for ZSD.

Caixia Yan, Xiaojun Chang*, Minnan Luo, Huan Liu, Xiaoqin Zhang*, and Qinghua Zheng

IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI), DOI:10.1109/TPAMI.2021.3140070, 2021

pdf

One-Shot Neural Architecture Search:Maximising Diversity to Overcome Catastrophic Forgetting

To improve transferability, we further devised a variant of NSAS, called NSAS-C, which searches for deeper architectures in the convolutional cell search.

Miao Zhang, Huiqi Li, Shirui Pan, Xiaojun Chang, Chuan Zhou, Zongyuan Ge, and Steven Su

IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI), 2020

pdf | code

Semantic Pooling for Complex Event Analysis in Untrimmed Videos

design the informed nearly-isotonic SVM classifier (NI-SVM) that is able to exploit the carefully constructed ordering information

Xiaojun Chang, Yaoliang Yu, Yi Yang, Eric P. Xing

IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI) 39(8) 1617-1632 (2017)

pdf | code

A Comprehensive Survey of Neural Architecture Search:Challenges and Solutions

we provide a new perspective, beginning with an overview of the characteristics of the earliest NAS algorithms, summarizing the problems in these early NAS algorithms, and then providing solutions for subsequent related research work

Pengzhen Ren, Yun Xiao, Xiaojun Chang, Po-Yao Huang, Zhihui Li, Xiaojiang Chen and Xin Wang

Accepted by ACM Computing Surveys, 2021

pdf

Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization

propose a semi-supervised batch mode multi-class active learning algorithm for visual concept recognition

Yi Yang, Zhigang Ma, Feiping Nie, Xiaojun Chang, Alexander G. Hauptmann

International Journal of Computer Vision (IJCV) 113(2) 113-127 (2015)

pdf

Semantics Preserving Graph Propagation for Zero-Shot Object Detection

propose a semantics preserving graph propagation model based on GCN for the challenging ZSD task, which leverages both the semantic description and structural knowledge in prior category graphs to facilitate the semantic coherency of region graph.

Caixia Yan, Qinghua Zheng, Xiaojun Chang, Minnan Luo, ChungHsing Yeh and Alexander G. Hauptmann

IEEE Transactions on Image Processing 29:8163-8176 (2020)

pdf

Self-supervised Deep Correlation Tracking

formulate a multi-cycle consistency loss based selfsupervised learning manner to pre-training the deep feature extraction network, which can take advantage of extensive unlabeled video samples rather than limited manually annotated samples

Di Yuan, Xiaojun Chang, Po-Yao Huang, Qiao Liu, and Zhenyu He

IEEE Transactions on Image Processing 30:976-985 (2021)

pdf

CONFERENCE PAPERS

Cross-modal Clinical Graph Transformer For Ophthalmic Report Generation

we present an effective cross-modal clinical graph transformer for ophthalmic report generation

M. Li, W. Cai, K. Verspoor, S. Pan, X. Liang and X. Chang*

CVPR 2022

pdf

Beyond Fixation:Dynamic Window Visual Transformer

propose a novel plug-and-play module with a dynamic multi-scale window for multi-head selfattention in transformer

P. Ren, C. Li, G. Wang, Y. Xiao, Q. Du, X. Liang, and X. Chang

CVPR 2022

pdf

Self-Supervised Global-Local Structure Modeling for Point Cloud Domain Adaptation with Reliable Voted Pseudo Labels

propose a global scaling-up-down prediction method and a local 3D-2D-3D projectionreconstruction method for point cloud domain adaptation

H. Fan, X. Chang, W. Zhang, Y. Cheng, Y. Sun, and M. Kankanhalli

CVPR 2022

pdf

Automated Progressive Learning for Efficient Training of Vision Transformers

develop a strong manual baseline for progressive learning of ViTs, by introducing MoGrow, a momentum growth strategy to bridge the gap brought by model growing

C. Li, B. Zhuang, G. Wang, X. Liang, X. Chang, and Y. Yang

CVPR 2022

pdf

BaLeNAS:Differentiable Architecture Search via Bayesian Learning Rule

this paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions

M. Zhang, S. Pan, X. Chang, S. Su, J. Hu, R. Haffari, and B. Yang

CVPR 2022

pdf

Dual-AI:Dual-path Actor Interaction Learning for Group Activity Recognition

propose a distinct Dual-path Actor Interaction (DualAI) framework, which flexibly arranges spatial and temporal transformers in two complementary orders, enhancing actor relations by integrating merits from different spatiotemporal paths

M. Han, D. J. Zhang, Y. Wang, R. Yan, L. Yao, X. Chang*, and Y. Qiao

CVPR 2022

pdf

Knowledge Distillation via the Target-aware Transformer

propose a novel one-to-all spatial matching knowledge distillation approach

S. Lin, H. Xie, B. Wang, K, Yu, X. Chang, X. Liang, and G. Wang

CVPR 2022

pdf

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

propose an efficient and effective Spatio-Temporal Pyramid Transformer (STPT) for action detection, which reduces the huge computational cost and redundancy while capturing long-range dependency in spatio-temporal representation learning

Y. Weng, Z. Pan, M. Han, X. Chang and B. Zhuang

ECCV 2022

pdf

FFA-IR:Towards an Explainable and Reliable Medical Report Generation Benchmark

present a new benchmark, FFA-IR, towards an explainable and reliable MRG benchmark based on FFA Images and Reports

M. Li, W. Cai, R. Liu, Y. Weng, X. Zhao, C. Wang, X. Chen, Z. Liu, C. Pan, M. Li, Y. Zheng, Y. Liu, F. D. Salim, K. Verspoor, X. Liang, X. Chang*

NeurIPS 2021

openreview

Exploring Inter-Channel Correlation for Diversity-preserved Knowledge Distillation

introduce the inter-channel correlation, with the characteristics of being invariant to the spatial dimen- sion, to explore and measure both the feature diversity and homology to help the student for better represen- tation learning

L. Liu, Q. Huang, S. Lin, H. Xie, B. Wang, X. Chang* and X. Liang

ICCV 2021

arxiv | pdf | poster

Vision-Language Navigation with Random Environmental Mixup

propose the Random Environmen- tal Mixup (REM) method, which generates cross-connected house scenes as augmented data via mixuping environment

C. Liu, F. Zhu, X. Chang, X. Liang, Z. Ge and Y. Shen

ICCV 2021

arxiv | pdf

BossNAS:Exploring Hybrid CNN-transformers with Block-wisely Self-supervised Neural Architecture Search

present Block-wisely Self-supervised Neural Architecture Search (BossNAS), an unsupervised NAS method that addresses the problem of in- accurate architecture rating caused by large weight-sharing space and biased supervision in previous methods

C. Li, T. Tang, G. Wang, J. Peng, B. Wang, X. Liang and X. Chang

ICCV 2021

arxiv | pdf | CODE

Dynamic Slimmable Network

propose a new dynamic network routing regime, achieving good hardware-efficiency by predictively adjusting filter numbers of networks at test time with respect to different inputs.

C. Li, G. Wang, B. Wang, X. Liang, Z. Li, and X. Chang

CVPR 2021

paper | ORAL | CODE

SOON:Scenario Oriented Object Navigation with Graph-based Exploration

propose a task named Scenario Oriented Object Navigation (SOON), in which an agent is instructed to find an object in a house from an arbitrary starting position

F. Zhu, X. Liang, Y. Zhu, X. Chang, Q. Yu, and X. Liang

CVPR 2021

paper

UPDeT:Universal Multi-agent RL via Policy Decoupling with Transformers

propose a universal policy decoupling transformer model that extends MARL to a much broader scenario

S. Hu, F. Zhu, X. Chang and X. Liang

ICLR 2021

paper

iDARTS:Differentiable Architecture Search with Stochastic Implicit Gradients

This paper deepens our understanding of the hypergradient calculation in the differentiable NAS.

M. Zhang, S. Su, S. Pan, X. Chang, E. Abbasnejad and R. Haffari

ICML 2021

paper

Mining Inter-Video Proposal Relations for Video Object Detection

design a novel Inter-Video Proposal Relation method, which can effectively leverage inter-video proposal relation to learn discriminative representations for video object detection

M. Han, Y. Wang, X. Chang and Y. Qiao

ECCV 2020

paper | code

Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting

investigate how to utilize visual content for disambiguation and latent space alignment in unsupervised MMT

P. Huang, J. Hu, X. Chang and A. Hauptmann

ACL 2020

paper

Vision-language Navigation with Self-Supervised Auxiliary Reasoning Tasks

Introducing Auxiliary Reasoning Navigation (AuxRN), a framework with four self-supervised auxiliary reasoning tasks to take advantage of the additional training signals derived from the semantic information

F. Zhu, Y. Zhu, X. Chang, X. Liang

CVPR 2020

paper | DEMO | CODE | Oral

ZSTAD:Zero-Shot Temporal Activity Detection

Proposing a novel problem setting for temporal activity detection in which activities that are not seen during the training stage can be recognized and localized simultaneously

L. Zhang, X. Chang, J. Liu, S. Wang, Z. Ge, M. Luo, A. Hauptmann

CVPR 2020

paper

Unity Style Transfer for Person Re-Identification

smooth the style disparities within the same camera and across different cameras

C. Liu, X. Chang, Y. Shen

CVPR 2020

paper

Neural Architecture Search by Block-wisely Distilling Architecture Knowledge

modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained

C. Li, J. Peng, L. Yuan, G. Wang, X. Liang, L. Lin, X. Chang

CVPR 2020

paper | code

Vision Dialogue Navigation by Exploring Cross-modal Memory

learning an agent endowed with the capability of constant conversation for help with natural language and navigating according to human responses
propose a Cross-modal Memory Network (CMN) for remembering and understanding the rich information relevant to historical navigation actions

Y. Zhu, F. Zhu, Z. Zhan, B. Lin, J. Jiao, X. Chang, X. Liang

CVPR 2020

paper | code

Overcoming Multi-Model Forgetting in One-Shot NAS with Diversity Maximization

formulate the supernet training in the One-Shot NAS as a constrained optimization problem of continual learning that the learning of current architecture should not degrade the performance of previous architectures during the supernet training

M. Zhang, H. Li, S. Pan, X. Chang, S. Su

CVPR 2020

paper | code

Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement

enhance the intelligent exploration of differentiable Neural Architecture Search in the latent space

M. Zhang, H. Li, S. Pan, X. Chang, Z. Ge and S. Su

NeurIPS 2020

paper | code

Hierarchical Neural Architecture Search for Deep Stereo Matching

leverage the volumetric stereo matching pipeline and allow the network to automat70 ically select the optimal structures for both the Feature Net and the Matching Net

X. Cheng, Y. Zhong, M. Harandi, Y. Dai, X. Chang, H. Li, T. Drummond, and Z. Ge

NeurIPS 2020

paper | code

Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

leveraging visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations

P. Huang, X. Chang, A. G. Hauptmann

EMNLP 2019

paper

Annotation Efficient Cross-Modal Retrieval with Adversarial Attentive Alignment

propose a novel framework to leverage automatically extracted regional semantics from un-annotated images as additional weak supervision to learn visual-semantic embeddings

P. Huang, G. Kang, W. Liu, X. Chang, A. G. Hauptmann

ACM MM 2019

paper

RCAA:Relational Context-Aware Agents for Person Search

made the earliest attempt to address the person search problem and built the first deep reinforcement learning based person search framework

X. Chang, P. Huang, Y. Shen, X. Liang, Y. Yang and A. G. Hauptmann

ECCV 2018

paper

Reinforcement Cutting-Agent Learning for Video Object Segmentation

make a pioneer effort to formulate the video object segmentation problem as a Markov Decision Process and propose a novel reinforcement cutting-agent learning framework to tackle this problem

J. Han, L. Yang, D. Zhang, X. Chang, X. Liang

CVPR 2018

paper

Complex Event Detection by Identifying Reliable Shots from Untrimmed Videos

simultaneously learns a linear SVM classifier and infers a binary indicator for each instance in order to select reliable training instances from each positive or negative bag

H. Fan, X. Chang, D. Cheng, Y. Yang, D. Xu, A. G. Hauptmann

ICCV 2017

paper

They are Not Equally Reliable:Semantic Event Search Using Differentiated Concept Classifiers

combine the concept classifiers based on a principled estimate of their accuracy on the unlabeled test videos

X. Chang, Y. Yu, Y. Yang, E. P. Xing

CVPR 2016

paper

Complex Event Detection using Semantic Saliency and Nearly-Isotonic SVM

define a novel notion of semantic saliency that assesses the relevance of each shot with the event of interest

X. Chang, Y. Yang, E. P. Xing, Y. Yu

ICML 2015

paper | code

Searching Persuasively:Joint Event Detection and Evidence Recounting with Limited Supervision

propose a joint framework that simultaneously detects high-level events and localizes the indicative concepts of the events

X. Chang, Y. Yu, Y. Yang, A. G. Hauptmann

ACM MM 2015

paper

PREPRINTS

Vision Language Navigation with Multi-granularity Observation and Auxiliary Reasoning Tasks

we propose Multi-granularity Auxiliary Reason- ing Navigation (MG-AuxRN) to facilitate navigation learning. MG-AuxRN perceives multi-granularity input which combining dense object features and global image features.

Fengda Zhu, Yi Zhu, Yanxin Long, Xiaojun Chang, and Xiaodan Liang

Submitted to IEEE Trans. Pattern Anal. Mach. Intell. (T-PAMI), 2021

pdf | code