This invention is a highly efficient and flexible algorithm for data compression. Data compression has a wide range of applications from compressing data stores of new and existing data types, to improving end-to-end performance for stored and streaming network applications, and allowing for increased privacy and content protection at upstream sites, among many others.
Data compression reduces costs of data transmission and storage by reproducing the same or nearly the same data using more succinct descriptions. A compression system comprises an upstream encoder and a downstream decoder, the internal algorithm for both of which must be carefully designed to maximize compression efficiency. To operate, the encoder and/or the decoder typically use knowledge about the special characteristic of the data. However, whether specialized knowledge is used in the system, the downstream system is dependent on the particular choice, which reduces flexibility in system design and network transport, prevent future improvement without a significant overhaul to the entire system and losing compatibility with already compressed data. The Inventors have developed a technology to compress data in a novel way that defers nearly all assumed knowledge about the data until the downstream decoder—without losing compression efficiency.
The compression of data is predicated on the existence of certain predictable or typical statistical characteristics in data, which must be identified by expert or inferred by algorithm. This information is called the source model, because it models the statistical source from which the data is presumed to be drawn. The technology the Inventors present, called “compression with model-free encoding,” is based on the surprising discovery that the source model, a key piece of information for any compression system, does not need to be known at the upstream encoder at all for efficient compression to take place. This recognition is counterintuitive and offers a substantially different way of performing compression than current methods. The technology therefore entails new algorithms for encoders, decoders and other system components, which may have application beyond data compression.
Compression with model-free encoding is completed by “blindly” decimating the original data in the upstream encoder to an agreed upon rate. The output of this process is a set of succinct but potentially non-unique specifications on the data: this is the compressed data stream. In the downstream decoder, these specifications are combined with the statistical characteristics of the data type, i.e. the source model, to recover a solution that agrees with the original data.
- The system presents a low barrier to not only improvement but to the initial design.
- Design principles can be applied to compress any type of data, providing valuable adaptability to handle “big data” fields (e.g. bioscience, finance) and upgrade compression of traditional data (e.g. video, audio, images, natural language text and databases)