Research

Filters
Tags:
Data Attribution
Trustworthy
Optimization
Unlearning
Library
Learning Theory
Information Geometry
Data Augmentation
A Survey of Data Attribution: Methods, Applications, and Evaluation in the Era of Generative AI
Junwei Deng*, 
Yuzheng Hu*, 
Pingbang Hu*, 
Ting-Wei Li*, 
Shixuan Liu*, 
Jiachen T. Wang, 
Hao Huang, 
Dan Ley, 
Qirun Dai, 
Benhao Huang, 
Jin Huang, 
Cathy Jiao, 
Hoang Anh Just, 
Yijun Pan, 
Jingyan Shen, 
Yiwen Tu, 
Weiyi Wang, 
Xinhe Wang, 
Shichang Zhang, 
Ruoxi Jia, 
Himabindu Lakkaraju, 
Hao Peng, 
Weijing Tang, 
Chenyan Xiong, 
Jieyu Zhao, 
Hanghang Tong, 
Han Zhao, 
Jiaqi W. Ma
Sep 24th 2025
In Submission
#Data Attribution

A survey on data attribution with focus on generative AI.

SSRN
A Reliable Cryptographic Framework for Empirical Machine Unlearning Evaluation
Yiwen Tu*, 
Pingbang Hu*, 
Jiaqi W. Ma
Sep 18th 2025
NeurIPS 2025
#Trustworthy
#Data Attribution
#Unlearning

We design the first efficient machine unlearning evaluation metric with provable guarantees.

arXiv
GraSS: Scalable Data Attribution with Gradient Sparsification and Sparse Projection
Pingbang Hu, 
Joseph Melkonian, 
Weijing Tang, 
Han Zhao, 
Jiaqi W. Ma
Sep 18th 2025
NeurIPS 2025
#Data Attribution
#Optimization

We propose an efficient gradient compression algorithm to accelerate and scale gradient-based data attribution methods to billion-scale models.

arXiv
Talk
Poster
Slide
GitHub
Adversarial Attack on Data Attribution
Xinhe Wang, 
Pingbang Hu, 
Junwei Deng, 
Jiaqi W. Ma
Jan 22nd 2025
ICLR 2025
#Trustworthy
#Data Attribution

We consider the adversarial attack on training data attribution methods.

arXiv
Poster
dattri: A Library for Efficient Data Attribution
Junwei Deng*, 
Ting-Wei Li*, 
Shiyuan Zhang, 
Yijun Pan, 
Hao Huang, 
Xinhe Wang, 
Pingbang Hu, 
Xingjian Zhang, 
Jiaqi W. Ma
Sep 26th 2024
NeurIPS 2024 D&B (Spotlight)
#Data Attribution
#Library

We developed a efficient library for data attribution, aiming to streamline the development of data attribution algorithms.

arXiv
Poster
GitHub
Most Influential Subset Selection: Challenges, Promises, and Beyond
Yuzheng Hu, 
Pingbang Hu, 
Han Zhao, 
Jiaqi W. Ma
Sep 25th 2024
NeurIPS 2024
#Data Attribution
#Learning Theory

We provide a comprehensive study of the common practices in the Most Influential Subset Selection (MISS) problem.

arXiv
Poster
GitHub
Pseudo-Non-Linear Data Augmentation via Energy Minimization
Pingbang Hu, 
Mahito Sugiyama
Sep 7th 2024
In Submission
#Information Geometry
#Data Augmentation

We propose a new non-linear data augmentation framework powered by information geometry.

arXiv
GitHub
Travel the Same Path: A Novel TSP Solving Strategy
Pingbang Hu
Oct 12th 2022
Side Project
#Optimization

Exploring a novel approach to exactly solve an NP-hard combinatorial optimization problem by using imitation learning.

arXiv
GitHub
Last Updated on Oct 23rd 2025