Faster Streaming Algorithms for Deterministic Low-Rank Matrix Approximations

Applications

The Inventors have developed low-rank matrix approximation algorithms, designed for approximating large dimensionality, streaming, potentially distributed data in the matrix format, with fixed memory footprint. This technology can be used for a variety of applications that utilize the collection and analysis of extremely large data sets; for example, it can be used for tasks in image analysis or text-processing to reduce the required time and computation cost of such problems.

Problem Addressed

Other low-rank matrix approximation algorithms that perform SVD/rank-k SVD on a nxm matrix A (n>>m) require a O(nm2) runtime and O(nm) run space to compute SVD, which can be infeasible requirements for large matrices. The algorithms the Inventors discover do not require saving all existing data in the past. Instead, given a low-rank approximation of existing data, they update approximation on new data without recursively recomputing SVD on the entire dataset, saving storage and computational cost.

Technology

This invention is an expansion of recently discovered matrix sketching technique. The first algorithm, Incremental Frequent Directions has a running time of O(nml) and is based on an algorithm called Frequent Directions II that has a running time of O(nml2). It replaces the full SVD step in Frequent Directions II with a rank-one SVD update. The second algorithm, named Truncated Incremental Frequent Directions II, is a form of incremental frequent directions, with lower running time in trade of the precision for O(nl3). The third algorithm, Truncated Incremental Frequent Directions, uses the same idea from Truncated Incremental Frequent Directions II to Frequent Directions; instead of performing the truncated SVD update for each row of matrix A, it does a batch computation with ½ rows each iteration. This algorithm works with O(nml).

All three algorithms are fast and memory efficient; furthermore, both truncated incremental frequent direction and II are more accurate than the corresponding incremental frequent direction and II in terms of accuracy when measuring relative error.  

Advantages

  • Inventors’ algorithm demonstrates faster runtime than other existing methods
  • Algorithm also demonstrates empirically lower relative error rate than other methods