Generative artificial intelligence, such as Generative Adversarial Networks (GANs), has been used to amplify data for scientific analysis, allowing for data generation in reduced computing time.
The process of data amplification, which violates the principle of getting information for free, can result in a gain factor greater than one while keeping the information content unchanged.
This study presents a mathematical bound for data amplification, dependent on the number of generated and training events, and determines conditions for ensuring this bound.
While the resolution of variables in amplified data is not improved, the increase in sample size can improve statistical significance.