RollData Synthetic Data

We generate synthetic tabular datasets designed to preserve statistical structure and improve downstream ML utility. Below are reproducible LightGBM benchmarks comparing models trained on real data, synthetic data, and real+synth, evaluated on held-out test sets.

Utility Preservation Summary

Utility preservation across datasets (synthetic vs real baseline). Values above 100% indicate synthetic data improved downstream ML utility in this setup.

Utility Preservation by Dataset

Get Started

View dataset-by-dataset notebooks rendered as static HTML, with direct downloads for real/synthetic/train/test and notebooks.

Dig In →
Artifact downloads are served via CDN-backed object storage. If you have questions or want an enterprise evaluation, contact us.