Stanford University researchers have improved on their static ALOHA robot by building a fully mobile version that can be trained to perform household tasks.
For a robot to be useful in a range of generalized tasks it needs to be able to move around and have a full range of precision motion for its arms. We’ve seen some impressive demonstrations of this by robots like Tesla’s Optimus but they are often expensive or unavailable.
Last year Tony Zhao led a team that developed ALOHA, A Low-cost Open-source HArdware system to control a bimanual, or two-armed, robot. The first demos of ALOHA’s capabilities were impressive but the robot was static, only operating on items in front of it on a desktop.
With Mobile ALOHA, the team led by Zhao and Zipeng Fu created a robot that can navigate through a complex environment like a home to open up a range of new applications.
The robot was able to cook food, wipe a wine spill off a counter, neatly arrange chairs, or call an elevator.
Some of these may seem trivial but to get a robot to do something like call an elevator isn’t easy. It needs to navigate to the elevator from potentially different starting points, accurately locate a 2cmx2cm button, press the button with just the right amount of force, and then enter the elevator.
Imitation learning
The key to the robot learning new skills is a process of imitation learning from human demonstrations. Often this is done using videos or datasets like Google’s RT-X. With Mobile ALOHA the researchers used those datasets but also took a different approach. The robot is fitted with an interface that allows an operator to be tethered to it so the operator can control the robot while completing a task.
After demonstrating a task 50 times the tether interface can be removed and Mobile ALOHA will successfully complete the task up to 90% of the time.
Imitation learning is very helpful in teaching robots new skills but it has its own set of challenges, especially in domains requiring high precision. Mobile ALOHA uses a new algorithm named Action Chunk with Transformers (ACT) which Zhao’s team developed last year.
The ACT algorithm enhances efficiency by predicting actions in chunks which reduces task complexity.
With Mobile ALOHA the researchers said they were “the first to find that co-training with static manipulation datasets improves the performance and data efficiency of mobile manipulation policies.”
This means that the abundance of existing datasets created with static robots could be very useful in training mobile ones too.
What did I tell you a few days ago? 2024 is the year of robotics. Mobile-ALOHA is an open-source robot hardware that can do dexterous, bimanual tasks like cooking a meal (with human teleoperation). Very soon, hardware will no longer bottleneck us on the quest for human-level,… pic.twitter.com/vMi3XkqKeh
— Jim Fan (@DrJimFan) January 4, 2024
Accessible and affordable
As impressive as the demos are, the off-the-shelf hardware and low cost of the solution make Mobile ALOHA especially interesting.
The robot is controlled by a regular laptop with a Nvidia 3070 Ti GPU (8GB VRAM) and an Intel i7-12800H processor. The laptop receives video streams from three Logitech C922x RGB webcams, each operating at 480×640 resolution.
The robot is powered by a 1.26kWh battery that also serves as a 14kg balancing weight to prevent the robot from tipping over.
The total bill for Mobile ALOHA came to $32,000. That’s not bad considering that this is a prototype. If it went into production, Mobile ALOHA could be a lot cheaper than that. And the fact that it’s open source means there could soon be multiple hardware developments created for the platform, driving the costs down further.
Elon Musk predicted that Tesla’s Optimus robot will eventually retail at around $20k. There’s still no “add to cart” button on Tesla’s website though regardless of how much you’d be willing to pay for one.
With Mobile ALOHA we now have a great software and hardware solution that hints at us having robot housekeepers a lot sooner than we thought.