Towards Data-centric Efficient Learning for<br>Foundational Models

Call For Papers

Workshop Scope

We welcome submissions and discussions on a wide range of topics related to data-centric research for foundational models, including but not limited to:

🧠

Topic 1: Large-Scale Data-Driven Training Paradigms

Exploring training strategies that shift the focus from model-centric to data-centric approaches, emphasizing how data quality and selection influence model performance.

📦

Topic 2: Data Efficiency and Compression

Techniques for reducing data redundancy, accelerating training pipelines, and achieving high performance with fewer data samples through compression, selection, or pruning.

🧪

Topic 3: Synthetic Data for Foundational Models

Techniques for producing synthetic data aligned with foundational model objectives, covering approaches like distillation, simulation-based generation, and structured augmentation.

⚖️

Topic 4: Active Data Selection for Robustness

Strategies that prioritize informative or uncertain samples to improve model robustness, adaptability, and performance under distribution shifts, adversarial settings, or data scarcity.

📚

Topic 5: Quantifying Data Information Density

Developing metrics and evaluation protocols to measure how informative and diverse a dataset is, aiming to better understand and improve data utility.

📐

Topic 6: Theoretical Foundations of Data Efficiency

Theoretical frameworks that support efficient data utilization, such as generalization bounds, information-theoretic limits, and optimal sub-sampling theories.

🛡️

Topic 7: Data Provenance and Governance

Addressing the origin, tracking, and responsible use of data, including licensing, traceability, reproducibility, and regulatory compliance.

🔍

Topic 8: Bias Detection and Mitigation in Training Data

Methods for identifying, measuring, and mitigating bias in training data, with the goal of improving generalization, improving representation across subgroups.

Submission Guidelines

Important Dates

Suggested Submission Deadline	August 22, 2025, AoE
Accept/Reject Notification	September 22, 2025, AoE
Workshop Date	December 6, 2025, AoE

Submission Format

You must format your submission using the NeurIPS 2025 LaTeX file.
Use \usepackage{neurips_2025} (no options) to ensure anonymity. Submissions violating NeurIPS style or page limits will be rejected without review.
Page limits (excluding references and appendix):

Submission Link

Papers should be submitted via the OpenReview Website.

Review Process

All papers will be double-blind and reviewed by at least 3 reviewers. Submissions must be anonymous and not contain any identifying information that may violate the double-blind policy.

Additional Details

(1) Non-archival: This workshop is non-archival and will not result in proceedings; workshop submissions can be submitted to other venues.
(2) Dual Submission: We welcome papers that may have been already accepted at Neurips 2025 but which are also relevant to this workshop, or that under reviews at other venues (e.g., ICLR, ICML).
(3) Contact: For any questions or submission-related issues, please contact tdel.workshop@gmail.com.

Time	Event
9:00 - 9:10	Opening Remarks
9:10 - 9:40	Invited Talk 1 (Chelsea Finn)
9:40 - 10:10	Invited Talk 2 (Song Han)
10:10 - 11:10	Poster Session 1 & Coffee Socials
11:10 - 11:40	Invited Talk 3 (Danqi Chen)
11:40 - 12:10	Invited Talk 4 (Antonio Torralba)
12:10 - 12:20	Oral Presentation 1
12:20 - 12:30	Oral Presentation 2
12:30 - 1:30	Lunch Break
1:30 - 2:00	Invited Talk 5 (Zhuang Liu)
2:00 - 2:30	Invited Talk 6 (Yejin Choi)
2:30 - 3:30	Poster Session 2 & Coffee Socials
3:30 - 3:40	Oral Presentation 3
3:40 - 3:50	Oral Presentation 4
3:50 - 4:50	Panel Discussion
4:50 - 5:20	Networking Session
5:20 - 5:30	Closing Remarks

Towards Data-centric Efficient Learning for
Foundational Models

Speakers

Chelsea Finn

Song Han

Danqi Chen

Antonio Torralba

Zhuang Liu

Yejin Choi

Organizers

Zhiqiang Shen

Sewon Min

Hongxu Yin

Xinyi Shang

Jiacheng Cui

About

Call For Papers

Workshop Scope

Topic 1: Large-Scale Data-Driven Training Paradigms

Topic 2: Data Efficiency and Compression

Topic 3: Synthetic Data for Foundational Models

Topic 4: Active Data Selection for Robustness

Topic 5: Quantifying Data Information Density

Topic 6: Theoretical Foundations of Data Efficiency

Topic 7: Data Provenance and Governance

Topic 8: Bias Detection and Mitigation in Training Data

Submission Guidelines

Schedule

Towards Data-centric Efficient Learning forFoundational Models

Speakers

Chelsea Finn

Song Han

Danqi Chen

Antonio Torralba

Zhuang Liu

Yejin Choi

Organizers

Zhiqiang Shen

Sewon Min

Hongxu Yin

Xinyi Shang

Jiacheng Cui

About

Call For Papers

Workshop Scope

Topic 1: Large-Scale Data-Driven Training Paradigms

Topic 2: Data Efficiency and Compression

Topic 3: Synthetic Data for Foundational Models

Topic 4: Active Data Selection for Robustness

Topic 5: Quantifying Data Information Density

Topic 6: Theoretical Foundations of Data Efficiency

Topic 7: Data Provenance and Governance

Topic 8: Bias Detection and Mitigation in Training Data

Submission Guidelines

Schedule

Towards Data-centric Efficient Learning for
Foundational Models