Stanford University
MIT
Princeton
MIT
META
Stanford / NVIDIA
MBZUAI
UC Berkeley
NVIDIA
UCL
MBZUAI
Foundation models (FMs) have become central to modern machine learning, achieving impressive performance through large-scale data and complex architectures. Yet this progress raises concerns about escalating computational and environmental costs. Recent work highlights the critical role of data, not just in scale, but in information density as a promising path toward more efficient and sustainable learning. This workshop brings together researchers and practitioners to explore data-centric challenges unique to FMs, with a focus on compact, high-quality data curation techniques.
We welcome submissions and discussions on a wide range of topics related to data-centric research for foundational models, including but not limited to:
Exploring training strategies that shift the focus from model-centric to data-centric approaches, emphasizing how data quality and selection influence model performance.
Techniques for reducing data redundancy, accelerating training pipelines, and achieving high performance with fewer data samples through compression, selection, or pruning.
Techniques for producing synthetic data aligned with foundational model objectives, covering approaches like distillation, simulation-based generation, and structured augmentation.
Strategies that prioritize informative or uncertain samples to improve model robustness, adaptability, and performance under distribution shifts, adversarial settings, or data scarcity.
Developing metrics and evaluation protocols to measure how informative and diverse a dataset is, aiming to better understand and improve data utility.
Theoretical frameworks that support efficient data utilization, such as generalization bounds, information-theoretic limits, and optimal sub-sampling theories.
Addressing the origin, tracking, and responsible use of data, including licensing, traceability, reproducibility, and regulatory compliance.
Methods for identifying, measuring, and mitigating bias in training data, with the goal of improving generalization, improving representation across subgroups.
Suggested Submission Deadline | August 22, 2025, AoE |
Accept/Reject Notification | September 22, 2025, AoE |
Workshop Date | December 6, 2025, AoE |
You must format your submission using the
NeurIPS 2025 LaTeX file.
Use \usepackage{neurips_2025}
(no options) to ensure anonymity. Submissions violating NeurIPS style or page limits will be rejected without review.
Page limits (excluding references and appendix):
Papers should be submitted via the OpenReview Website.
All papers will be double-blind and reviewed by at least 3 reviewers. Submissions must be anonymous and not contain any identifying information that may violate the double-blind policy.
(1) Non-archival: This workshop is non-archival and will not result in proceedings; workshop submissions can be submitted to other venues.
(2) Dual Submission: We welcome papers that may have been already accepted at Neurips 2025 but which are also relevant to this workshop, or that under reviews at other venues (e.g., ICLR, ICML).
(3) Contact: For any questions or submission-related issues, please contact tdel.workshop@gmail.com
.
Time | Event |
---|---|
9:00 - 9:10 | Opening Remarks |
9:10 - 9:40 | Invited Talk 1 (Chelsea Finn) |
9:40 - 10:10 | Invited Talk 2 (Song Han) |
10:10 - 11:10 | Poster Session 1 & Coffee Socials |
11:10 - 11:40 | Invited Talk 3 (Danqi Chen) |
11:40 - 12:10 | Invited Talk 4 (Antonio Torralba) |
12:10 - 12:20 | Oral Presentation 1 |
12:20 - 12:30 | Oral Presentation 2 |
12:30 - 1:30 | Lunch Break |
1:30 - 2:00 | Invited Talk 5 (Zhuang Liu) |
2:00 - 2:30 | Invited Talk 6 (Yejin Choi) |
2:30 - 3:30 | Poster Session 2 & Coffee Socials |
3:30 - 3:40 | Oral Presentation 3 |
3:40 - 3:50 | Oral Presentation 4 |
3:50 - 4:50 | Panel Discussion |
4:50 - 5:20 | Networking Session |
5:20 - 5:30 | Closing Remarks |