Dataset Distillation using Random Feature Approximation
Dataset distillation compresses large datasets into smaller synthetic coresets that retain performance with the aim of reducing storage and computational burdens of processing an original, entire dataset. The present disclosure provides an improved algorithm that uses a non-deterministic feature approximation of neural network Gaussian process (NNGP) kernels, or other trained kernels, that reduces a kernel matrix computation to O(|S|). When combined with a modified Platt scaling loss, the disclosed algorithm can provide at least a 100-fold speedup over a Kernel-Inducing Points (KIP) algorithm and can run on a single graphics processing unit. The disclosed Random Feature Approximation Distillation (RFAD) algorithm can perform competitively with other dataset condensation algorithms in accuracy over a range of large-scale datasets, both in kernel regression and finite-width network training. The disclosed techniques can be effective on tasks such as model interpretability and data privacy preservation.
Researchers
-
systems and methods for efficient dataset distillation using non-deterministic feature approximation
United States of America | Published application
License this technology
Interested in this technology? Connect with our experienced licensing team to initiate the process.
Sign up for technology updates
Sign up now to receive the latest updates on cutting-edge technologies and innovations.