Ziyi 'Zoe' Wang
Hi! I’m Ziyi 👋, a Master’s student in Human-Computer Interaction at the University of Maryland, College Park.
My research is interested in Generative AI and Human-centered AI.
Before my research journey, I interned for two years as an AI Product Manager and UX Designer, driving by design and data to create user-centric solutions.
News&Updates
Jun 2025 | We have a new paper accepted to ECML PKDD 2025 on leveraging LLMs for few-shot graph OOD detection; see our Preprint!
May 2025 | We have a new paper on zero-shot graph OOD detection using foundation models (GLIP-OOD); see our Preprint!
May 2025 | We have a new paper introducing GOE-LLM, a framework using LLMs to generate synthetic OOD nodes for graph OOD detection without requiring real OOD data. See our Preprint!
Apr 2025 | We have a new paper on jailbreak detection for MLLMs—JailDAM proposes adaptive memory updates for generalizing to unseen jailbreaks. See our Preprint!
Feb 2025 | Joined the FORTIS Lab at the University of Southern California, glad to work with Prof. Yue Zhao.
Feb 2025 | We have a new paper introducing Frontend Diffusion, a multi-stage AI system that turns sketches into website code for junior researchers and designers. See our Preprint!
Nov 2024 | Joined the Human-Computer Interaction Lab at the University of Maryland, College Park.
Sep 2024 | Started my M.S. in Human-Computer Interaction at the University of Maryland, College Park.
Please check my Google Scholar for my complete publication record.
* indicates equal contribution
† indicates the corresponding author
Publications
Graph Synthetic Out-of-Distribution Exposure with Large Language Models
Haoyan Xu*, Zhengtao Yao*, Ziyi Wang, Zhan Cheng, Xiyang Hu, Mengyuan Li, Yue Zhao†
arXiv, 2025
-
Out-of-distribution (OOD) detection in graphs is critical for ensuring model robustness in open-world and safety-sensitive applications. Existing approaches to graph OOD detection typically involve training an in-distribution (ID) classifier using only ID data, followed by the application of post-hoc OOD scoring techniques. Although OOD exposure - introducing auxiliary OOD samples during training - has proven to be an effective strategy for enhancing detection performance, current methods in the graph domain generally assume access to a set of real OOD nodes. This assumption, however, is often impractical due to the difficulty and cost of acquiring representative OOD samples. In this paper, we introduce GOE-LLM, a novel framework that leverages Large Language Models (LLMs) for OOD exposure in graph OOD detection without requiring real OOD nodes. GOE-LLM introduces two pipelines: (1) identifying pseudo-OOD nodes from the initially unlabeled graph using zero-shot LLM annotations, and (2) generating semantically informative synthetic OOD nodes via LLM-prompted text generation. These pseudo-OOD nodes are then used to regularize the training of the ID classifier for improved OOD awareness. We evaluate our approach across multiple benchmark datasets, showing that GOE-LLM significantly outperforms state-of-the-art graph OOD detection methods that do not use OOD exposure and achieves comparable performance to those relying on real OOD data.
GLIP-OOD: Zero-Shot Graph OOD Detection with Foundation Model
Haoyan Xu*, Zhengtao Yao*, Xuzhi Zhang, Ziyi Wang, Langzhou He, Yushun Dong, Philip S. Yu, Mengyuan Li, Yue Zhao†
arXiv, 2025
-
Out-of-distribution (OOD) detection is critical for ensuring the safety and reliability of machine learning systems, particularly in dynamic and open-world environments. In the vision and text domains, zero-shot OOD detection - which requires no training on in-distribution (ID) data - has made significant progress through the use of large-scale pretrained models such as vision-language models (VLMs) and large language models (LLMs). However, zero-shot OOD detection in graph-structured data remains largely unexplored, primarily due to the challenges posed by complex relational structures and the absence of powerful, large-scale pretrained models for graphs. In this work, we take the first step toward enabling zero-shot graph OOD detection by leveraging a graph foundation model (GFM). We show that, when provided only with class label names, the GFM can perform OOD detection without any node-level supervision - outperforming existing supervised methods across multiple datasets. To address the more practical setting where OOD label names are unavailable, we introduce GLIP-OOD, a novel framework that employs LLMs to generate semantically informative pseudo-OOD labels from unlabeled data. These labels enable the GFM to capture nuanced semantic boundaries between ID and OOD classes and perform fine-grained OOD detection - without requiring any labeled nodes. Our approach is the first to enable node-level graph OOD detection in a fully zero-shot setting, and achieves state-of-the-art performance on four benchmark text-attributed graph datasets.
Few-Shot Graph Out-of-Distribution Detection with LLMs
Haoyan Xu*, Zhengtao Yao*, Yushun Dong, Ziyi Wang, Ryan A. Rossi, Mengyuan Li, Yue Zhao†
ECML PKDD 2025
-
Existing methods for graph out-of-distribution (OOD) detection typically depend on training graph neural network (GNN) classifiers using a substantial amount of labeled in-distribution (ID) data. However, acquiring high-quality labeled nodes in text-attributed graphs (TAGs) is challenging and costly due to their complex textual and structural characteristics. Large language models (LLMs), known for their powerful zero-shot capabilities in textual tasks, show promise but struggle to naturally capture the critical structural information inherent to TAGs, limiting their direct effectiveness.
To address these challenges, we propose LLM-GOOD, a general framework that effectively combines the strengths of LLMs and GNNs to enhance data efficiency in graph OOD detection. Specifically, we first leverage LLMs' strong zero-shot capabilities to filter out likely OOD nodes, significantly reducing the human annotation burden. To minimize the usage and cost of the LLM, we employ it only to annotate a small subset of unlabeled nodes. We then train a lightweight GNN filter using these noisy labels, enabling efficient predictions of ID status for all other unlabeled nodes by leveraging both textual and structural information. After obtaining node embeddings from the GNN filter, we can apply informativeness-based methods to select the most valuable nodes for precise human annotation. Finally, we train the target ID classifier using these accurately annotated ID nodes. Extensive experiments on four real-world TAG datasets demonstrate that LLM-GOOD significantly reduces human annotation costs and outperforms state-of-the-art baselines in terms of both ID classification accuracy and OOD detection performance.
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model
Yi Nian*, Shenzhe Zhu*, Yuehan Qin, Li Li, Ziyi Wang, Chaowei Xiao, Yue Zhao†
arXiv, 2025
-
Multimodal large language models (MLLMs) excel in vision-language tasks but also pose significant risks of generating harmful content, particularly through jailbreak attacks. Jailbreak attacks refer to intentional manipulations that bypass safety mechanisms in models, leading to the generation of inappropriate or unsafe content. Detecting such attacks is critical to ensuring the responsible deployment of MLLMs. Existing jailbreak detection methods face three primary challenges: (1) Many rely on model hidden states or gradients, limiting their applicability to white-box models, where the internal workings of the model are accessible; (2) They involve high computational overhead from uncertainty-based analysis, which limits real-time detection, and (3) They require fully labeled harmful datasets, which are often scarce in real-world settings. To address these issues, we introduce a test-time adaptive framework called JAILDAM. Our method leverages a memory-based approach guided by policy-driven unsafe knowledge representations, eliminating the need for explicit exposure to harmful data. By dynamically updating unsafe knowledge during test-time, our framework improves generalization to unseen jailbreak strategies while maintaining efficiency. Experiments on multiple VLM jailbreak benchmarks demonstrate that JAILDAM delivers state-of-the-art performance in harmful content detection, improving both accuracy and speed.
Frontend Diffusion: Empowering Self-Representation of Junior Researchers and Designers Through Agentic Workflows
Zijian Ding, Qinshi Zhang, Mohan Chi, Ziyi Wang
arXiv, 2025
-
With the continuous development of generative AI's logical reasoning abilities, AI's growing code-generation potential poses challenges for both technical and creative professionals. But how can these advances be directed toward empowering junior researchers and designers who often require additional help to build and express their professional and personal identities? We present Frontend Diffusion, a multi-stage agentic system, transforms user-drawn layouts and textual prompts into refined website code, thereby supporting self-representation goals. A user study with 13 junior researchers and designers shows AI as a human capability enhancer rather than a replacement, and highlights the importance of bidirectional human-AI alignment. We then discuss future work such as leveraging AI for career development and fostering bidirectional human-AI alignment on the intent level.