Chao Chen

I develop robust and trustworthy learning methods for modern biomedical data and beyond. My research draws from the following different domains.

Robust and Trustworthy Machine Learning: backdoor attacks, adversarial attacks, label noise, uncertainty.
Biomedical informatics: digital pathology, multi-omics data analytics, spatial and topological analysis of tissue micorenvionment.
Topological data analysis: learning with topological features, topology-informed image segmentation and analysis.

Below are a few selected ongoing projects.

Robust and Trustworthy Machine Learning

Modern machine learning faces new challenges. Large language models, vision-language models, and foundational models exhibit unprecedented representational power, achieving state-of-the-art performance in prediction and generation tasks. However, their highly complex architectures and massive model sizes also introduce significant vulnerabilities.

Since 2021, I have been deeply engaged in the study of backdoor attacks (also known as Trojan attacks). Our team is particularly interested in investigating backdoor-induced structural signals, such as the geometry and topology of data and neuron connectivity. Notably, we were the first to identify and systematically analyze the abnormal concentration of attention on trigger words in backdoored BERT models [Lyu et al., 2022]. Building on this insight, we later developed novel data-efficient attack methods that exploit this phenomenon [Lyu et al., 2022].

Beyond the direct abnormality in attention patterns, we explored higher-order structural changes caused by backdoor attacks. We discovered that backdoored models exhibit unique topological loops in neuron connectivity, resulting from shortcuts between shallow and deep-layer neurons. This discovery led to one of the earliest backdoor detection methods [Zheng et al., 2024]. Additionally, we introduced a novel topological loss function to analyze trigger connectivity in trigger-inversion-based detection methods [Hu et al., 2023]. Our exploration of geometry and topology in robust learning also extends to adversarial robustness [Zhang et al., 2022] and label noise robustness [Wu et al., 2020].

Our research continues to evolve alongside advancements in language and vision-language models. In a recent study, we found that backdoor attacks on generative models often unintentionally disrupt the underlying language structure, compromising the coherence of triggered outputs. To mitigate this issue, we proposed "regulating" the output by ensuring that vision-language model embeddings align closely with those of a vanilla large language model [Lyu, 2024]. While this approach restores model competence, it also limits adaptability to downstream tasks due to the imposed constraints on embeddings.

Can we do better in creating competent backdoored models? The key might lie in understanding the reverse process: how are structural abnormalities removed in a global backdoor mitigation process. In our earlier work explored backdoor mitigation through knowledge distillation [Pang et al., 2023]. We demonstrated that frameworks like knowledge distillation can "smooth out" structural abnormalities introduced by backdoors without explicitly identifying them.

Building on this, further research into the interplay between backdoor effectiveness, model generalization, and structural stability could offer deeper insights into both attack and defense mechanisms. By analyzing how backdoor mitigation methods alter model architecture and neuron connectivity, we may uncover strategies to design more adaptive and resilient backdoored models. Additionally, exploring alternative regularization techniques or fine-tuning approaches could help balance stealth, adaptability, and performance in these models.

Selected Publications:

[Lyu et al. 2022] Weimin Lyu, Songzhu Zheng, Tengfei Ma, Chao Chen: "A Study of the Attention Abnormality in Trojaned BERTs", in The 2022 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022 (paper).
[Lyu et al. 2023] Weimin Lyu, Songzhu Zheng, Lu Pang, Haibin Ling, Chao Chen: "Attention-Enhancing Backdoor Attacks Against BERT-based Models", in EMNLP Findings, 2023
[Zheng et al. 2021] Songzhu Zheng, Yikai Zhang, Hubert Wagner, Mayank Goswami, Chao Chen: "Topological Detection of Trojaned Neural Networks", in the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), 2021, (acceptance rate 26%)
[Hu et al. 2022] Xiaoling Hu, Xiao Lin, Michael Cogswell, Yi Yao, Susmit Jha, Chao Chen: "Trigger Hunting with a Topological Prior for Trojan Detection", in the International Conference on Learning Representations (ICLR), 2022
[Zhang et al. 2022] Wenjia Zhang, Yikai Zhang, Xiaoling Hu, Mayank Goswami, Chao Chen, Dimitris Metaxas: "A Manifold View of Adversarial Risk", in International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
[Wu et al. 2020] Pengxiang Wu, Songzhu Zheng, Mayank Goswami, Dimitris Metaxas, Chao Chen: "A Topological Filter for Learning with Label Noise", in the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), 2020, (acceptance rate 20.1%)
[Lyu et al. 2024] Weimin Lyu, Lu Pang, Tengfei Ma, Haibin Ling, Chao Chen:"TrojVLM: Backdoor Attack Against Vision Language Models", in European Conference on Computer Vision (ECCV), 2024 (paper)
[Lyu et al. 2025] Weimin Lyu, Jiachen Yao, Saumya Gupta, Lu Pang, Tao Sun, Lingjie Yi, Lijie Hu, Haibin Ling, Chao Chen:"Backdooring Vision-Language Models with Out-Of-Distribution Data", in The International Conference on Learning Representations (ICLR), 2025
[Pang et al. 2023] Lu Pang, Tao Sun, Haibin Ling, Chao Chen: "Backdoor Cleansing with Unlabeled Data", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023, (paper).

Topology-Informed Imaging Informatics

Thanks to decades of technology development, we are now able to visualize in high quality complex biomedical structures such as neurons, vessels, trabeculae and breast tissues. We need innovative methods to fully exploit these structures, which encode important information about underlying biological mechanisms. We propose principled approaches to seamlessly incorporate topological information, i.e., connected components, handles, loops, and branches, into different parts of a learning pipeline. Under the hood is a formulation of the topological computation as a differentiable operator, based on the theory of topological data analysis. This leads to a series of novel methods for segmentation, generation, and analysis of these topology-rich biomedical structures.

Selected Publications:

Fan Wang, Saarthak Kapse, Steven Liu, Prateek Prasanna, and Chao Chen: "TopoTxR: A Topological Biomarker for Predicting Treatment Response in Breast Cancer", in international conference on Information Processing in Medical Imaging (IPMI), 2021
Xiaoling Hu, Yusu Wang, Li Fuxin, Dimitris Samaras, Chao Chen: "Topology-Aware Segmentation Using Discrete Morse Theory", in the Nineth International Conference on Learning Representations (ICLR), 2021, (Spotlight, acceptance rate for spotlight+oral = 5.6%)
Shahira Abousamra, Minh Hoai Nguyen, Dimitris Samaras, Chao Chen: "Localization in the Crowd with Topological Constraints", in The 35th AAAI Conference in Artificial Intelligence (AAAI), 2021, (acceptance rate 21%)
Fan Wang, Huidong Liu, Dimitris Samaras, Chao Chen: "TopoGAN: A Topology-Aware Generative Adversarial Network", in European Conference on Computer Vision(ECCV), 2020, (paper, supplemental material, Oral, acceptance rate 2.1%).
Xiaoling Hu, Fuxin Li, Dimitris Samaras, Chao Chen: "Topology-Preserving Deep Image Segmentation", in the Thirty-third Conference on Neural Information Processing Systems (NeurIPS), 2019, (acceptance rate 21.2%)
Pengxiang Wu, Chao Chen, Yusu Wang, Shaoting Zhang, Changhe Yuan, Zhen Qian, Dimitris Metaxas, Leon Axel: "Optimal Topological Cycles and Their Application in Cardiac Trabeculae Restoration", in the 25th biennial international conference on Information Processing in Medical Imaging (IPMI), 2017, (Oral presentation, acceptance rate 14.32%, pdf, code)

Graph Neural Networks

Graph neural networks (GNNs) have shown strong learning power for graph-structured data. We improve GNNs by introducing topological and geometric constructs such as persistent homology and graph Ricci curvature as high-order multi-scale information. These methods have shown state-of-the-art performance in different tasks such as node classification, link prediction, relation prediction, etc. Most recently, we propose a novel cycle-centric GNN, which learns useful representation of rules in knowledge graphs, through the space of cycles.

Selected Publications:

Zuoyu Yan, Tengfei Ma, Liangcai Gao, Zhi Tang, Chao Chen: "A Topological View of Rule Learning in Knowledge Graphs", arXiv, 2021
Zuoyu Yan, Tengfei Ma, Liangcai Gao, Zhi Tang and Chao Chen: "Persistence Homology for Link Prediction: An Interactive View", in International Conference on Machine Learning (ICML), 2021 (acceptance rate 21.5%, paper)
Qi Zhao, Ze Ye, Yusu Wang, Chao Chen: "Persistence Enhanced Graph Neural Network", in International Conference on Artificial Intelligence and Statistics (AISTATS), 2020
Ze Ye, Kin Sum Liu, Tengfei Ma, Jie Gao, Chao Chen: "Curvature Graph Network", in the Eighth International Conference on Learning Representations (ICLR), 2020, (pdf, code, acceptance rate 26.5%).

Digital Pathology

We develop novel methods based on deep learning techniques and topological data analysis theory for cell detection, classification, and tumor microenvrionment analysis.

Selected Publications:

Shahira Abousamra, David Belinsky, John Van Arnam, Felicia Allard, Eric Yee,Rajarsi Gupta, Tahsin Kurc, Dimitris Samaras, Joel Saltz, Chao Chen: "Multi-Class Cell Detection Using Spatial Context Representation", in International Conference on Computer Vision (ICCV), 2021 (Oral, acceptance rate 3%)
Danielle J Fassler, Shahira Abousamra, Rajarsi Gupta, Chao Chen, Maozheng Zhao, David Paredes, Syeda Areeha Batool, Beatrice S Knudsen, Luisa Escobar-Hoyos, Kenneth R Shroyer, Dimitris Samaras, Tahsin Kurc, Joel Saltz: "Deep learning-based image analysis methods for brightfield-acquired multiplex immunohistochemistry images", in Diagnostic pathology, 2020
Andrew Aukerman, Mathieu Carrière, Chao Chen, Kevin Gardner, Raúl Rabadán, Rami Vanguri: "Persistent Homology Based Characterization ofthe Breast Cancer Immune Microenvironment: A Feasibility Study", in International Symposium on Computational Geometry (SoCG), 2020.
Shahira Abousamra, Danielle Fassler, Le Hou, Yuwei Zhang, Rajarsi Gupta, Tahsin Kurc, Luisa F. Escobar-Hoyos, Dimitris Samaras, Beatrice Knudson, Kenneth Shroyer, Joel Saltz, Chao Chen: "Weakly-Supervised Deep Stain Decomposition for Multiplex IHC Images", in IEEE International Symposium on Biomedical Imaging (ISBI), 2020