A Unified Theory of Random Projection for Influence Functions

Posted: Feb 11th 2026
Abstract

A unified theory of random projection for influence functions.

arXiv | GitHub

Brief Summary

Influence functions and related data attribution scores take the form of inverse-sensitive bilinear functionals g⊤F−1g′g^{\top}F^{-1}g^{\prime}g⊤F−1g′, where F⪰0F\succeq 0F⪰0 is a curvature operator and g,g′g,g^{\prime}g,g′ are training and test gradients. In modern overparameterized models, forming or inverting F∈Rd×dF\in\mathbb{R}^{d\times d}F∈Rd×d is prohibitive, motivating scalable influence computation via random projection with a sketch P∈Rm×dP \in \mathbb{R}^{m\times d}P∈Rm×d. This practice is commonly justified via the Johnson-Lindenstrauss (JL) lemma, which ensures approximate preservation of Euclidean geometry for a fixed dataset. However, preserving pairwise distances does not address how sketching behaves under inversion. Furthermore, there is no existing theory that explains how sketching interacts with other widely-used techniques, such as ridge regularization (replacing F−1F^{-1}F−1 with (F+λI)−1(F+\lambda I)^{-1}(F+λI)−1) and structured curvature approximations.

We develop a unified theory characterizing when projection provably preserves influence functions, with a focus on the required sketch size mmm. When g,g′∈range(F)g,g^{\prime}\in\mathrm{range}(F)g,g′∈range(F), we show that:

  1. Unregularized projection: exact preservation holds if and only if PPP is injective on range(F)\mathrm{range}(F)range(F), which necessitates m≥rank(F)m\geq \mathrm{rank}(F)m≥rank(F);
  2. Regularized projection: ridge regularization fundamentally alters the sketching barrier, with approximation guarantees governed by the effective dimension of FFF at the regularization scale λ\lambdaλ. This dependence is both sufficient and worst-case necessary, and can be substantially smaller than rank(F)\mathrm{rank}(F)rank(F);
  3. Factorized influence: for Kronecker-factored curvatures F=A⊗EF=A\otimes EF=A⊗E, the guarantees continue to hold for decoupled sketches P=PA⊗PEP=P_A\otimes P_EP=PA​⊗PE​, even though such sketches exhibit structured row correlations that violate canonical i.i.d. assumptions; the analysis further reveals an explicit computational–statistical trade-off inherent to factorized sketches.

Beyond this range-restricted setting, we analyze out-of-range test gradients and quantify a sketch-induced leakage term that arises when test gradients have components in ker⁡(F)\ker(F)ker(F). This yields guarantees for influence queries on general, unseen test points.

Overall, this work develops a novel theory that characterizes when projection provably preserves influence and provides principled, instance-adaptive guidance for choosing the sketch size in practice.

Citation

@misc{hu2026unified,
      title={A Unified Theory of Random Projection for Influence Functions},
      author={Pingbang Hu and Yuzheng Hu and Jiaqi W. Ma and Han Zhao},
      year={2026},
      eprint={2602.10449},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2602.10449},
}
Last Updated on Feb 12th 2026