Smart Nation and Digital Economy

CLAWdia: Voice-Controlled Claw Machine

Robotic foundation models are emerging as a significant trend in robotics research, enabling the development of intelligent, embodied systems with minimal expertise. To showcase their potential, researchers from A*STAR Institute of High Performance Computing (A*STAR IHPC) developed CLAWdia – a voice controlled claw machine that demonstrates how specialised foundation models can be seamlessly integrated into real-world robotic applications.  

CLAWdia combines speech recognition, GPT-based instruction processing, object recognition, and low level control to deliver an intuitive, hands-free gaming experience. It converts verbal commands into text, which is then processed by GPT. GPT breaks down the instruction into subtasks, such as object detection and motion planning, allowing players to control the claw mechanism using natural language commands. 

Features
  • Voice-controlled hands-free claw machine
  • GPT-based instruction decomposition
  • Open-vocabulary object detection
  • Multi-Round Dialogue for Object Confirmation

 



The Science Behind


CLAWdia (Fig 1) operates on multiple key technologies, primarily GPT-based subtask decomposition and open-vocabulary object recognition. Using Pythonic APIs embedded in prompts, GPT breaks down instructions into distinct subtasks and generates executable Python code for each task accordingly.  

CLAWdia demo overview
Fig 1. CLAWdia demo

A notable subtask is object detection, which leverages state-of-the-art open-vocabulary models such as GroundingDINO and OWL-ViT. These models are integrated with the Segment Anything Model (SAM) to determine the pixel location of the target object (Fig 2). Equipped with an RGB-D wrist camera, the robotic arm then maps the pixel location to a 3D position, enabling precise object manipulation. 

display of CLAWdia's detected results
Fig 2. The detection results for the instruction: ‘uppermost purple ball’

CLAWdia user feedback
Fig 3. The overall diagram of CLAWdia, with blue lines representing different pipelines based on user feedback.


Industry Applications

CLAWdia’s voice-controlled interface demonstrates potential for robotic solutions that require flexibility, collaboration, and adaptability, including: 
  • Voice-controlled assistants for the home or service robots
  • Voice-controlled safety-aware robotic assistants in assembly lines
  • Visual sorting systems
  • Visual fault detection systems