NeurIPS CDMX 2025: Accepted Workshops and Tutorials
We are thrilled to announce the accepted workshops and tutorials happening at NeurIPS Mexico City. Workshops will be held on November 30th and December 1st, and Tutorials on December 2nd.
Mexico City Workshops on Sunday 30 Nov
Vision Language Models: Challenges of Real World Deployment
Vision language models (VLMs) have demonstrated remarkable capabilities in integrating visual perception with natural language understanding, powering applications such as multimodal assistants, robotics, autonomous systems, and accessibility tools. However, their real-world deployment faces significant challenges in efficiency, scalability, and reliability. This workshop will bring together researchers and practitioners from academia and industry to highlight cutting-edge research, systems-level optimizations, and evaluation methodologies that are often overlooked yet pivotal for robust real-world integration. Efficiency, robustness, and reliability will be emphasized as core design principles, essential to advancing VLMs from experimental systems to dependable deployed technologies. By convening researchers at the intersection of multimodal learning, efficient inference and training, robustness and uncertainty estimation, and large-scale systems design, the workshop aims to establish concrete pathways toward building VLMs that can operate reliably under practical constraints. We hope this workshop will serve as a venue for exchanging insights on model design, efficiency techniques, and robustness evaluation that bridge the gap between research and real-world systems.
NeurIPS 2025 Workshop on Embodied and Safe-Assured Robotic Systems
This workshop focuses on advancing safe and quality-assured embodied robotic systems. Embodied systems—including autonomous robots, self-driving vehicles, robotic arms, and humanoid robots—are increasingly deployed in safety-critical real-world scenarios. Ensuring their trustworthiness—encompassing safety, reliability, and predictable behavior—remains a pressing challenge. Despite notable progress in perception, reasoning, and control, many AI-based robotic systems still operate as “black boxes,” often exhibiting unpredictable behaviors. Failures can emerge from complex sensorimotor interactions, adversarial inputs, or novel environments, leading to safety incidents and diminished user trust.
NeurIPS 2025 Workshop Research Development AI Mexico
The Research Development of AI in Mexico: Main Applications workshop seeks to showcase, strengthen, and connect the most impactful developments in Artificial Intelligence (AI) and Data Science emerging from Mexico and the broader Latin American region. Over the past four decades, Mexico has cultivated a robust research community in AI through pioneering contributions in areas such as computational intelligence, autonomous robotics, fuzzy systems, and natural language processing, led by institutions including CIC–IPN, INAOE, UNAM, ITESM, CINVESTAV, and Universidad Veracruzana.Today, the region is undergoing a strategic transformation, shifting from foundational research to the development of applied AI technologies addressing real-world needs in healthcare, education, agriculture, smart cities, cybersecurity, and sustainability. This evolution has been further propelled by increased access to open data, advances in computing infrastructure, and growing collaborations between academia, government, and industry.Despite these advances, Latin America faces distinctive challenges in the development and deployment of AI. These include limited funding, underrepresentation in global AI initiatives, digital inequality, and the need for responsible, inclusive, and culturally relevant AI systems. Additionally, emerging concerns related to AI ethics, algorithmic bias, and regulatory frameworks must be addressed proactively to ensure equitable and trustworthy technology adoption.This workshop aims to create a forum for researchers, students, practitioners, and policymakers to engage in meaningful dialogue about the current landscape and future directions of AI in Mexico and Latin America. By promoting interdisciplinary collaboration, the workshop will highlight impactful case studies, emerging research trajectories, and opportunities for cross-border cooperation, while fostering a shared vision for AI that is ethical, sustainable, and aligned with regional priorities.
NeurIPS 2025 Workshop on Socially Responsible and Trustworthy Foundation Models
The Socially Responsible and Trustworthy Foundation Models (ResponsibleFM) Workshop at NeurIPS 2025 Mexico City is envisioned as a vital interdisciplinary forum dedicated to advancing ethical, inclusive, and socially conscious research practices in the rapidly evolving field of foundation models, including language models and multimodal models. As foundation models are tremendously reshaping human communication, decision-making, and societal infrastructures, there is a growing recognition of the profound impacts these systems can have, both positive and negative, on individuals and communities. In particular, previous research has documented a wide range of risks and harms associated with foundation models, including but not limited to bias and discrimination, misinformation propagation, privacy violations, environmental concerns, and unintended social consequences.
Mexico City Workshops on Monday 1 Dec
First Workshop on LLM Persona Modeling
Large language models (LLMs) are increasingly used to simulate human-like personas for applications in research, education, healthcare, and interactive AI systems. While such persona modeling creates opportunities for interdisciplinary innovation, it raises challenges around authenticity, consistency, bias, and ethical deployment. This workshop brings together perspectives from AI, psychology, cognitive science, and human–computer interaction to advance robust methods, standardized evaluation frameworks, and responsible practices for persona modeling in LLMs. Through invited talks, panels, posters, and discussions, the event will chart a roadmap for interdisciplinary collaboration and future research in this emerging area.
Centering Low-Resource Languages and Cultures in the Age of Large Language Models
Large Language Models (LLMs) have transformed NLP research and applications, yet they are still predominantly trained on high-resource, globally dominant languages. This imbalance leads to poor performance and limited applicability for low-resource languages, which are rich in tone, morphology, and cultural meaning. As a result, current AI systems risk reinforcing linguistic inequality, cultural erasure, and lack of accessibility in critical domains like education and healthcare.This workshop aims to reframe language technology by centering low-resource languages, cultures, and epistemologies in the age of LLMs. We seek to bring together researchers, linguists, developers, healthcare professionals, and technologists to share insights and develop strategies for building inclusive, culturally grounded, and linguistically robust language models. The workshop emphasizes collaboration across disciplines and regions to ensure both technical advancement and social relevance.Key areas of focus include developing LLM architectures tailored to low-resource linguistic features, ethical and community-centered dataset collection, and multilingual benchmarks designed specifically for underrepresented languages. We also highlight the importance of healthcare and medical machine translation to support equitable access to information and improve public health outcomes. Ultimately, this workshop aims to advance responsible AI innovation that empowers low-resource language communities and shapes a more inclusive future for global language technologies.
NORA: The First Workshop on Knowledge Graphs & Agentic Systems Interplay
Agents have experienced significant growth in recent years, largely due to the rapid technological advancements of Large Language Models (LLMs). Although these agents benefit from LLMs’ advanced generation proficiency, they still suffer from catastrophic forgetting and a limited context window size compared to the agents’ needs in terms of contextual information. Knowledge Graphs (KGs) are a powerful paradigm for structuring and managing connected pieces of information while unlocking deeper insights than traditional methods. Their value is immense for tasks that require context, integration, and reasoning. However, this power comes at the cost of significant upfront and ongoing investment in construction, curation, and specialized expertise. The first version of this workshop aims at analyzing and discussing emerging and novel practices, ongoing research and validated or deployed innovative solutions that showcase the growing synergy between LLMs agents and KGs.
This workshop aims to advance the field of video understanding by fostering discussions around holistic and generalist video foundation models. Building upon the Holistic Video Understanding (HVU) initiative and dataset introduced in 2019, we have successfully organized eight HVU workshops and tutorials at top-tier venues such as CVPR and ICCV, uniting researchers, practitioners, and students from around the world. These efforts have played a central role in moving the community beyond narrow action recognition tasks toward multi-faceted, semantic, and generalist video understanding.With the emergence of large-scale foundation models and video large language models (Video-LLMs), the landscape of video understanding is rapidly evolving. These models enable unified reasoning across spatial, temporal, and multimodal dimensions, yet introduce new challenges in scalability, efficiency, interpretability, and responsible deployment.The HVU Workshop 2025 will provide a platform to explore these frontiers, discussing topics such as multimodal representation learning, long-context reasoning, evaluation of general-purpose video systems, efficient adaptation and scaling laws, and the ethical and societal implications of video AI. Our goal is to bring together a diverse and inclusive community to define the next chapter of holistic, generalist, and responsible video understanding.
Mexico City Tutorials on Tue 2 Dec
Efficient Transformers: State of the art in pruning, sparse attention, and transformer funneling
Transformer architectures consume the lionshare of computational budgets associated with today’s most powerful language and vision models, making research into greater computational efficiency a hot and essential direction. Our proposed tutorial surveys the bleeding edge of three complementary research threads that together comprise a significant part of the current industrial toolkit for achieving computational efficiency in Transformers: (1) pruning, the structured or unstructured removal of weights, layers and heads; (2) sparse attention & routing, including block, sliding-window, locality-sensitive hashing; and (3) funneling, which pools intermediate representations to shorten sequences through depth. We will then feature an expert industrial and academic panel of speakers from Caltech, MIT, Anthropic, Google Deepmind, and Microsoft, hearing about the latest trends seen in top industrial labs. Attendees will leave with actionable recipes for building sub-10 B-parameter models that match or exceed dense baselines on language, vision and multi-modal benchmarks. The tutorial targets researchers and practitioners who build or deploy Transformer models and assumes familiarity with basic deep-learning concepts but not with any specific efficiency method. All slides and publication materials will be released under a permissive license.
Geospatial Foundation Models: Overview, Application and Benchmarking
Geospatial foundation models (GeoFMs) are a class of large-scale deep learning models, typically based on the transformer architecture, that are pre-trained on vast, diverse datasets of Earth Observation data to learn a general, transferable understanding of the Earth’s surface. These models help address long-standing challenges in Earth Observation by dramatically reducing the need for manually labeled data, handling vast and diverse data streams (e.g., optical, SAR, multispectral, LiDAR), and enabling robust performance across time, space, and sensor types. In this tutorial, we will give an overview of the recent advancements in GeoFMs, highlighting the main challenges in developing these models and differences from foundation models developed for other domains. We will also show practical examples of fine-tuning and inferencing GeoFMs for different downstream tasks using the TerraTorch open-source framework, which facilitates the use to publicly available GeoFMs such as SatMAE, Prithvi-EO, DOFA, Galileo and TerraMind. Finally, we will introduce best practices for systematic and reproducible benchmarking of GeoFMs using the TerraTorch Iterate plug-in and its integration with GEO-Bench.
From Tuning to Guarantees: Statistically Valid Hyperparameter Selection
The performance and reliability of modern machine learning systems depend critically on hyperparameter selection. Whether tuning a large language model, configuring a vision pipeline, or deploying AI in safety-critical environments, the choice of hyperparameters is decisive. Current tuning strategies such as grid or random search and Bayesian optimization are powerful for empirical optimization but they do not provide statistical guarantees on the reliability of the selected configuration after deployment. This gap becomes critical when models must satisfy strict performance, safety, or fairness requirements. This tutorial introduces a rigorous and practical framework that treats hyperparameter selection as a statistical testing problem. By constructing valid p- or e-values for candidate configurations and applying multiple hypothesis testing (MHT) procedures, practitioners can control deployment risk with finite-sample guarantees. We begin with the Learn-Then-Test (LTT) methodology for average-risk control and build up to multiple key extensions, such as controlling the quantile risk using quantile LTT (QLTT), multi-objective optimization through Pareto Testing (PT), incorporating prior information through the concept of reliability graphs, and data-efficient selection through adaptive LTT (aLTT). Throughout the tutorial, we emphasize conceptual clarity, plain-language explanations of assumptions, and hands-on demonstrations with minimal, reproducible notebooks. Attendees will gain a drop-in toolkit for augmenting existing tuning workflows with statistically valid selection. They will learn how to formalize relevant risk functions, generate valid evidence, choose appropriate error-rate controls (FWER/FDR), and navigate the trade-offs between statistical conservatism and power under limited data. No prior expertise in multiple hypothesis testing is required.
How to Build Agents to Generate Kernels for Faster LLMs (and Other Models!)
The compute demanded by modern AI has been exploding since 2016; the FLOPs used to train frontier models have grown at a rate of 2.4x per year, and the inference side is growing even faster—already an estimated 80% of total AI electricity use. Large language models and other deep networks rely on highly tuned GPU kernels to achieve state-of-the-art performance; these efficient kernels directly translate to cost and energy savings. In this 2.5-hour in-person tutorial, we demonstrate how LLM-powered agents can generate and optimize GPU kernels for CUDA, HIP/ROCm, and Triton. We begin with a unified primer on GPU‐programming fundamentals and common tooling (memory hierarchy, occupancy, profilers), then introduce an agentic loop: prompt engineering, compiler/profiler feedback as tools, iterative kernel refinement, correctness validation, and automated benchmarking. We will provide additional benchmarking examples on HIP and Triton, on top of Stanford’s KernelBench that covers CUDA, KernelBot as reliable source of human curated dataset for heterogenous GPU code, and show how to turn runtime and profiler metrics into reward signals that drive kernel optimizations. On top of this loop, we build an inference-scaling framework in which the LLM proposes candidate kernels, compiles them, measures latency/throughput/energy, and feeds those signals back as rewards. By combining test-time scaling techniques the agent iteratively discovers increasingly accurate and efficient kernels. Attendees will compare generated code against expert kernels, inspect wins and losses. By the end, participants will walk away with a reproducible pipeline for LLM-driven GPU‐kernel optimization.
Positional Encoding: Past, Present, and Future
Positional Encoding is a foundational yet often opaque component of Transformer architectures, underpinning how self-attention mechanisms capture sequence order in language, vision, and multimodal models. Despite its centrality to the success of modern LLMs, and other attention-reliant architectures, the mathematical in- tuition behind positional encoding remains challenging and inaccessible to many researchers and practitioners. This workshop aims to demystify positional encoding by bridging formal theory with intuitive understanding and practical experimentation. Through a series of guided lectures, visual demonstrations, and hands-on coding sessions, participants will explore the operational principles behind effective positional representations, the evolution of key methods (from sinusoidal and learned embeddings to rotary and relative encodings), and open challenges that motivate current research directions. We will also provide open-source code implementations, mathematical visualizations, and collaborative ideation sessions for fostering new positional encoding concepts. By easing the barrier to entry for this mathematically intensive, yet crucial topic, the workshop seeks to foster deeper understanding, interdisciplinary exchange, and novel contributions to the future of Positional Encoding, and Transformer design
Science of Trustworthy Generative Foundation Models
We are living through a moment that once belonged to science fiction: generative foundation models can write, reason, design, diagnose, and increasingly, decide. They are no longer just predicting the next word — they are shaping knowledge, influencing choices, and becoming collaborators in science, medicine, education, and daily life. But here’s the tension: as their capabilities accelerate, our ability to trust them has not kept pace. Trustworthiness can’t remain a “patch after the failure” or a moral hope layered on top of engineering. It must evolve into a science—a discipline as rigorous as the one that created these models in the first place. In this tutorial, we explore what that science looks like: how we understand model behaviors, measure and stress-test trust, and design systems that earn it. We’ll build the foundations together, then step into the frontier—where models begin to exhibit human-like cognitive behaviors that inspire wonder, but also demand responsibility and new forms of alignment. This session is an invitation: to move beyond building models that impress us, toward building models we can trust with what matters.