[Special Track] Agent Environment Design and Evaluation
The special track focuses on our core theme: agent environment design and evaluation. We welcome work that advances how environments are specified, generated, measured, and shared.
- Task & World Specification: formalization, compositionality, affordance modeling, procedural generation, simulator integration.
- Evaluation Methodologies: multi‑step interaction metrics, generalization tests, open‑ended benchmarks, curriculum scaling, human‑in‑the‑loop assessments.
- Environment Exemplars (illustrative, non‑exhaustive):
- CodeArena: multi‑language software‑engineering sandbox for agent tool‑use benchmarking.
- HouseWorld: 3‑D embodied household simulator with spatial reasoning tasks.
- WebShop‑X: dynamic e‑commerce website emulator for goal‑conditioned browsing and checkout.
- SocialTown: multi‑agent social environment for coordination, negotiation, and role‑play evaluation.
- Artifacts & Reproducibility: dataset/specification releases, leaderboards, reproducibility studies.
[General Track] Other Relevant Topics
The general track welcomes research broadly related to scaling environments for agents, including but not limited to:
- LLMs in Interactive Environments: policy learning, planning, reward shaping, hybrid training (e.g. RLHF, PPO), interaction‑based fine‑tuning.
- Tool‑Use and Software Environments: agents as programmers, API orchestration, agentic debugging, self‑healing code, software manipulation, web navigation.
- Multi‑Agent & Social Environments: population scaling, emergent behaviors, communication, coordination, competition, social alignment and safety.
- Embodiment & Grounding: perception‑action loops, physical simulation, spatial reasoning, robotics integration, sim‑to‑real transfer.
- Sim2Real & Deployment: domain adaptation, real‑world API integration, robustness under scale, safety, large‑scale deployment.
Awards
All accepted papers will be presented in a poster session. Up to four outstanding papers (two per track) will be invited for oral presentations. Each track will confer its own Best Paper Award.
Submission Guidelines
We manage paper submissions through OpenReview. The review process is double‑blind, so submissions must be anonymized. We welcome work that is (1) original and unpublished, (2) recently published, or (3) work‑in‑progress. Submissions will not be indexed or have archival proceedings.
Please use the NeurIPS 2025 LaTeX style file; it includes a preprint
option for
non‑anonymous preprints posted online (see additional formatting details here). Submissions
should be PDFs of ≤ 9 pages (excluding references and appendices).
Important Dates (Anywhere on Earth)
Paper Submission Deadline | |
Notification of Acceptance | |
Camera‑ready Paper Submission | |
Workshop at NeurIPS |