A Software System to Synthetically Generate Test Data for Testing Large Scale Software Applications
A relational database is transformed so as to obfuscate secure and/or private aspects of data contained in the database, while preserving salient elements of the data to facilitate data analysis. A restructured database is generatively modeled, and the model is sampled to create synthetic data that maintains sufficiently similar (or the same) mathematical properties and relations as the original data stored in the database. In one example, various statistics at the intersection of related database tables are determined by modeling data using an iterative multivariate approach. Synthetic data may be sampled from any part of the modeled database, wherein the synthesized data is “realistic” in that it statistically mimics the original data in the database. The generation of such synthetic data allows publication of bulk data freely and on-demand (e.g., for data analysis purposes), without the risk of security/privacy breaches.
Researchers
-
methods and apparatus for transforming and statistically modeling relational databases to synthesize privacy-protected anonymized data
United States of America | Granted | 10,713,384
License this technology
Interested in this technology? Connect with our experienced licensing team to initiate the process.
Sign up for technology updates
Sign up now to receive the latest updates on cutting-edge technologies and innovations.