Software provider NVIDIA recently revealed new AI and simulation tools and workflows for robot developers, including humanoid developers, at the Conference for Robot Learning (CoRL) in Munich, Germany.
The lineup includes the general availability of the NVIDIA Isaac Lab robot learning framework. These include six new humanoid robot learning workflows for Project Generalist Robot 00 Technology (GR00T) and new world-model development tools for video data curation and processing, including the NVIDIA Cosmos tokenizer and NVIDIA NeMo Curator for video processing.
Open-source Isaac Lab can scale robot training
NVIDIA Isaac Lab is an open-source robot learning framework built on NVIDIA Omniverse, a platform for developing OpenUSD applications for industrial digitalization and physical AI simulation.
Developers can use Isaac Lab to train robot policies at scale. This open-source unified robot learning framework can apply to any embodiment - including humanoids, quadrupeds, and cobots - to handle increasingly complex movements and interactions.
Commercial robot makers, robotics application developers, and robotics research entities around the world are adopting Isaac Lab, including 1X, Agility Robotics, The AI Institute, Berkeley Humanoid, Boston Dynamics, Field AI, Fourier, Galbot, Mentee Robotics, Skild AI, Swiss-Mile, Unitree Robotics, and XPENG Robotics.
NVIDIA Isaac Lab 1.2 is available now and is open source on GitHub. For researchers and developers learning to use Isaac Lab, new getting started developer guides and tutorials are now available, including an Isaac Gym to Isaac Lab migration guide.
Project GR00T can help humanoid developers address challenges
Building advanced humanoids is extremely difficult, demanding multilayer technological and interdisciplinary approaches to make the robots perceive, move, and learn skills effectively for robot-environment and human-robot interactions.
Project GR00T is an initiative to develop accelerated libraries, foundation models, and data pipelines to help accelerate the global humanoid robot developer ecosystem.
Six new Project GR00T workflows now provide humanoid developers with blueprints to help realize the most challenging humanoid robot capabilities. They include:
- GR00T-Gen for building generative AI-powered, OpenUSD-based 3D environments
- GR00T-Mimic for robot motion control and trajectory generation
- GR00T-Dexterity for robot dexterous manipulation
- GR00T-Control for whole-body control
- GR00T-Mobility for robot locomotion and navigation
- GR00T-Perception for multimodal sensing
“Humanoid robots are the next wave of embodied AI,” said Jim Fan, NVIDIA senior research manager of embodied AI. “NVIDIA research and engineering teams are collaborating across the company and our developer ecosystem to build Project GR00T to help advance the progress and development of global humanoid robot developers.”
The new NVIDIA Project GR00T workflows are coming soon to help robot companies build humanoid robot capabilities with greater ease. Developers can apply to join the NVIDIA Humanoid Robot Developer Program.
Cosmos tokenizers simplify encoding for world model building
Today, robot developers are building world models: AI representations of the world that can help predict how objects and environments respond to a robot’s actions. Building these world models is incredibly compute- and data-intensive, with models requiring thousands of hours of real-world, curated image, or video data.
NVIDIA Cosmos tokenizers can provide efficient, high-quality encoding and decoding to simplify the development of these world models. The company said Cosmos offers minimal distortion and can address temporal instability, enabling high-quality video and image reconstructions.
The open-source Cosmos tokenizer provides robotics developers with visual tokenization by breaking down images and videos into high-quality tokens with high compression rates. NVIDIA said Cosmos runs its visual reconstructions up to 12 times faster than current tokenizers.
1X, a humanoid robot company, has updated its 1X World Model Challenge dataset to use the Cosmos tokenizer.
“NVIDIA Cosmos tokenizer achieves really high temporal and spatial compression of our data while still retaining visual fidelity,” said Eric Jang, 1X Technologies VP of AI. “This allows us to train world models with long horizon video generation in an even more compute-efficient manner.”
Other humanoid and general purpose robot developers including XPENG Robotics and Hillbot are developing with the NVIDIA Cosmos tokenizer to manage high-resolution images and videos.
NVIDIA Cosmos tokenizer is available now on GitHub and Hugging Face.
NeMo Curator features automatic pipeline orchestration for video
NeMo Curator now includes a video processing pipeline. This enables robot developers to improve their world-model accuracy processing large-scale text, image, and video data. NVIDIA said NeMo can curate video processing up to seven times faster than unoptimized pipelines.
Curating video data poses challenges due to the massive size of the data, requiring scalable pipelines and efficient orchestration for load balancing across GPUs. Additionally, models for filtering, captioning, and embedding need optimization to maximize throughput.
NeMo Curator can overcome these challenges by streamlining data curation with automatic pipeline orchestration to reduce processing time. It supports linear scaling across multi-node multi-GPU systems, with the capability to handle over 100 petabytes of data. NVIDIA said this feature can help simplify AI development, reducing costs and accelerating time to market.
NeMo Curator for video processing will be available at the end of November 2024.
Robot learning community involvement at CoRL
Hugging Face and NVIDIA announced at CoRL a collaboration to accelerate open-source robotics research with LeRobot, NVIDIA Isaac Lab, and NVIDIA Jetson for the developer community.
NVIDIA robotics research released over 23 papers timed with CoRL covering new insights into integrating VLMs for improved environmental understanding and task execution, temporal robot navigation, developing long-horizon planning strategies for complex multi-step tasks, and using human demonstrations for skill acquisition.
Papers for humanoid robot control and synthetic data generation include SkillGen - a system based on synthetic data generation for training AI for robots with minimal human demonstrations - and HOVER, a robot foundation model for controlling humanoid robot locomotion and manipulation.
NVIDIA researchers will also be participating in nine workshops at the conference.