The Dark Side of Synthetic Data: Reality Gaps, Model Collapse and Hallucinative Analytics

MS S El Namaki

Abstract

Data lies at the heart of our artificial intelligence revolution. Massive volumes of data hold the key to the generative AI analytical processes that induce artificial intelligence outcomes. Data are drawn from a wide variety of sources and, as a result, provide an amorphous whole.  They are, with a measure of simplification, neither homogeneous, generic nor malleable! Nor are they always available to sustain an argument or complete an algorithm. This could induce reality gaps, model collapse and hallucinations.

What are synthetic data and how do they emerge. And could they lead to what we label as dark side? This will be the focus on the following article.

The article is qualitative in approach. It starts with an identification of the trigger of the problem and the emergence of the need for synthetic data. It then proceeds to define synthetic data, classify it according to a set of criteria, analyze its framework and draw its possible impact on reality status, model collapse and hallucinative analytical outcomes.

Keywords

synthetic data, AI hallucination, AI, Generative AI

Full Text:

PDF

References

Alemohammad, S., Casco-Rodriguez, J., Luzi, L., Humayun, A. I., Babaei, H., LeJeune, D., Siahkoohi, A., & Baraniuk, R. G. (2023, July 4). Self-Consuming Generative Models Go MAD. ArXiv.org. https://doi.org/10.48550/arXiv.2307.01850

Gartner: Can synthetic data drive the future of AI? | AI Business. (2022). Aibusiness.com. https://aibusiness.com/data/gartner-can-synthetic-data-drive-the-future-of-ai-

Iriondo, R. (2024, August 28). Understanding Model Collapse: A Hidden Threat in Generative AI - Generative AI Lab. Generative AI Lab. https://generativeailab.org/l/trends/understanding-model-collapse/1080/

Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, N., Anderson, R., & Gal, Y. (2024). AI models collapse when trained on recursively generated data. Nature, 631(8022), 755–759. https://doi.org/10.1038/s41586-024-07566-y

Singh, K. (2021, May 12). Synthetic Data - key benefits, types, generation methods, and challenges! | Towards Data Science. Towards Data Science. https://towardsdatascience.com/synthetic-data-key-benefits-types-generation-methods-and-challenges-11b0ad304b55/

Steinhoff , J., & Hind, S. (2025a). OSF. Doi.org. https://doi.org/10.33767/osf.io/np3vb

The Search for the Self: Selected Writings of Heinz Kohut 1950–1978, Vol. 2 (1978). Edited by Paul Ornstein. International Universities Press, New York. ISBN 0-8236-6016-8

Trampert, P., Rubinstein, D., Boughorbel, F., Schlinkmann, C., Luschkova, M., Slusallek, P., Dahmen, T., & Sandfeld, S. (2021). Deep Neural Networks for Analysis of Microscopy Images—Synthetic Data Generation and Adaptive Sampling. Crystals, 11(3), 258. https://doi.org/10.3390/cryst11030258

Trampert, P., Rubinstein, D., Boughorbel, F., Schlinkmann, C., Luschkova, M., Slusallek, P., Dahmen, T., & Sandfeld, S. (2021b). Deep Neural Networks for Analysis of Microscopy Images—Synthetic Data Generation and Adaptive Sampling. Crystals, 11(3), 258. https://doi.org/10.3390/cryst11030258


Be a part of worldclass research: Publish with us