Towards Data-centric Efficient Learning for
Foundational Models

📍 Co-located with NeurIPS 2025
📅 December 6, 2025
📌 San Diego Convention Center

Speakers

Chelsea Finn

Chelsea Finn

Stanford University

Song Han

Song Han

MIT

Danqi Chen

Danqi Chen

Princeton

Antonio Torralba

Antonio Torralba

MIT

Zhuang Liu

Zhuang Liu

META

Yejin Choi

Yejin Choi

Stanford / NVIDIA

Organizers

zhiqiang Shen

Zhiqiang Shen

MBZUAI

Sewon Min

Sewon Min

UC Berkeley

Hongxu Yin

Hongxu Yin

NVIDIA

Xinyi Shang

Xinyi Shang

UCL

Jiacheng Cui

Jiacheng Cui

MBZUAI

About

Foundation models (FMs) have become central to modern machine learning, achieving impressive performance through large-scale data and complex architectures. Yet this progress raises concerns about escalating computational and environmental costs. Recent work highlights the critical role of data, not just in scale, but in information density as a promising path toward more efficient and sustainable learning. This workshop brings together researchers and practitioners to explore data-centric challenges unique to FMs, with a focus on compact, high-quality data curation techniques.

Call For Papers

Workshop Scope

We welcome submissions and discussions on a wide range of topics related to data-centric research for foundational models, including but not limited to:

🧠

Topic 1: Large-Scale Data-Driven Training Paradigms

Exploring training strategies that shift the focus from model-centric to data-centric approaches, emphasizing how data quality and selection influence model performance.

📦

Topic 2: Data Efficiency and Compression

Techniques for reducing data redundancy, accelerating training pipelines, and achieving high performance with fewer data samples through compression, selection, or pruning.

🧪

Topic 3: Synthetic Data for Foundational Models

Techniques for producing synthetic data aligned with foundational model objectives, covering approaches like distillation, simulation-based generation, and structured augmentation.

⚖️

Topic 4: Active Data Selection for Robustness

Strategies that prioritize informative or uncertain samples to improve model robustness, adaptability, and performance under distribution shifts, adversarial settings, or data scarcity.

📚

Topic 5: Quantifying Data Information Density

Developing metrics and evaluation protocols to measure how informative and diverse a dataset is, aiming to better understand and improve data utility.

📐

Topic 6: Theoretical Foundations of Data Efficiency

Theoretical frameworks that support efficient data utilization, such as generalization bounds, information-theoretic limits, and optimal sub-sampling theories.

🛡️

Topic 7: Data Provenance and Governance

Addressing the origin, tracking, and responsible use of data, including licensing, traceability, reproducibility, and regulatory compliance.

🔍

Topic 8: Bias Detection and Mitigation in Training Data

Methods for identifying, measuring, and mitigating bias in training data, with the goal of improving generalization, improving representation across subgroups.

Submission Guidelines

Important Dates
Suggested Submission Deadline August 22, 2025, AoE
Accept/Reject Notification September 22, 2025, AoE
Workshop Date December 6, 2025, AoE
Submission Format

You must format your submission using the NeurIPS 2025 LaTeX file.
Use \usepackage{neurips_2025} (no options) to ensure anonymity. Submissions violating NeurIPS style or page limits will be rejected without review.
Page limits (excluding references and appendix):

Submission Link

Papers should be submitted via the OpenReview Website.

Review Process

All papers will be double-blind and reviewed by at least 3 reviewers. Submissions must be anonymous and not contain any identifying information that may violate the double-blind policy.

Additional Details

(1) Non-archival: This workshop is non-archival and will not result in proceedings; workshop submissions can be submitted to other venues.
(2) Dual Submission: We welcome papers that may have been already accepted at Neurips 2025 but which are also relevant to this workshop, or that under reviews at other venues (e.g., ICLR, ICML).
(3) Contact: For any questions or submission-related issues, please contact tdel.workshop@gmail.com.

Schedule

Time Event
9:00 - 9:10 Opening Remarks
9:10 - 9:40 Invited Talk 1 (Chelsea Finn)
9:40 - 10:10 Invited Talk 2 (Song Han)
10:10 - 11:10 Poster Session 1 & Coffee Socials
11:10 - 11:40 Invited Talk 3 (Danqi Chen)
11:40 - 12:10 Invited Talk 4 (Antonio Torralba)
12:10 - 12:20 Oral Presentation 1
12:20 - 12:30 Oral Presentation 2
12:30 - 1:30 Lunch Break
1:30 - 2:00 Invited Talk 5 (Zhuang Liu)
2:00 - 2:30 Invited Talk 6 (Yejin Choi)
2:30 - 3:30 Poster Session 2 & Coffee Socials
3:30 - 3:40 Oral Presentation 3
3:40 - 3:50 Oral Presentation 4
3:50 - 4:50 Panel Discussion
4:50 - 5:20 Networking Session
5:20 - 5:30 Closing Remarks