A Software System to Synthetically Generate Test Data for Testing Large Scale Software Applications

A relational database is transformed so as to obfuscate secure and/or private aspects of data contained in the database, while preserving salient elements of the data to facilitate data analysis. A restructured database is generatively modeled, and the model is sampled to create synthetic data that maintains sufficiently similar (or the same) mathematical properties and relations as the original data stored in the database. In one example, various statistics at the intersection of related database tables are determined by modeling data using an iterative multivariate approach. Synthetic data may be sampled from any part of the modeled database, wherein the synthesized data is “realistic” in that it statistically mimics the original data in the database. The generation of such synthetic data allows publication of bulk data freely and on-demand (e.g., for data analysis purposes), without the risk of security/privacy breaches.

Researchers

Kalyan Veeramachaneni / Neha Patki / Jeffery Wilkinson / Kishore Durg / Sunder Nochilur

Departments: Laboratory for Information and Decision Systems
Technology Areas: Artificial Intelligence (AI) and Machine Learning (ML) / Computer Science: Cybersecurity
Impact Areas: Uncharted Frontiers

  • methods and apparatus for transforming and statistically modeling relational databases to synthesize privacy-protected anonymized data
    United States of America | Granted | 10,713,384

License this technology

Interested in this technology? Connect with our experienced licensing team to initiate the process.

Sign up for technology updates

Sign up now to receive the latest updates on cutting-edge technologies and innovations.