Dataset Distillation using Random Feature Approximation

Dataset distillation compresses large datasets into smaller synthetic coresets that retain performance with the aim of reducing storage and computational burdens of processing an original, entire dataset. The present disclosure provides an improved algorithm that uses a non-deterministic feature approximation of neural network Gaussian process (NNGP) kernels, or other trained kernels, that reduces a kernel matrix computation to O(|S|). When combined with a modified Platt scaling loss, the disclosed algorithm can provide at least a 100-fold speedup over a Kernel-Inducing Points (KIP) algorithm and can run on a single graphics processing unit. The disclosed Random Feature Approximation Distillation (RFAD) algorithm can perform competitively with other dataset condensation algorithms in accuracy over a range of large-scale datasets, both in kernel regression and finite-width network training. The disclosed techniques can be effective on tasks such as model interpretability and data privacy preservation.

Researchers

Daniela Rus / Noel Loo / Ramin Hasani / Alexander Amini

Departments: Dept of Electrical Engineering & Computer Science, Electrical Eng & Computer Sci, Computer Science & Artificial Intelligence Lab
Technology Areas: Artificial Intelligence (AI) and Machine Learning (ML)
Impact Areas: Connected World

  • systems and methods for efficient dataset distillation using non-deterministic feature approximation
    United States of America | Published application

License this technology

Interested in this technology? Connect with our experienced licensing team to initiate the process.

Sign up for technology updates

Sign up now to receive the latest updates on cutting-edge technologies and innovations.