The Robot Learning Group (ROLE), at the Institute for Intelligent Process Automation and Robotics of the Karlsruhe Institute of Technology (KIT), focuses on various aspects of machine learning. They are investigating how robots can learn to solve tasks such as grasping objects in a typical bin picking scenario by trying them out independently. An Ensenso N10 3D camera from IDS, mounted directly at the “head” of the robot provides the required image data.

Gripping randomly lying objects is a key requirement for bin picking, but current solutions are often inflexible and specific to the workpiece to be gripped. ROLE are investigating robots that independently learn to pick up previously unknown objects from a container, beginning with random gripping attempts, as a human would do. A neural net then connects the 3D images with the successful or unsuccessful gripping attempts, determined by a force sensor in the gripper. AI (artificial intelligence) uses the stored data to identify meaningful gripping points for the objects and thus “trains” itself. Large amounts of data and many gripping attempts are essential for this. However, the researchers at KIT were able to significantly reduce the number of attempts, thus reducing the learning time.

The right grip reduces training time

Unlike analytical (or model-based) gripping methods, the ROLE robot does not need to have the recognition features described in advance. The choice of grip that the robot tries out is critical for faster learning. With the help of a neural network, gripping results can be predicted using existing knowledge. For a well-functioning system about 20,000 gripping experiments, corresponding to about 80 hours of training time on the robot, are needed. The amount of data available is the limiting factor for the system’s capabilities.

Learning without a given model

‘Transfer learning” is being used at KIT to reduce the number of necessary gripping attempts by identifying which grips have to be tried in order to gain as much information as quickly as possible and thus shorten the training time. The knowledge from an already trained neural network is used for the recognition of previously unknown objects. The ROLE Group is using the Ensenso SDK to capture depth images and process them with the OpenCV vision library and the TensorFlow machine learning platform. The larger the number and range of training objects of the system, the better it can generalize to unknown objects. This approach could permanently eliminate the need for specific training of objects for applications. In principle, there are no restrictions with regard to their form and nature and neither the 3D shape of an object nor mathematical modelling of the gripping process are needed. Knowledge of material and surface properties is also implicitly learned. This could also be extended to other automation applications from intralogistics to service robotics. In addition, the robot could learn to move objects independently in such a way that they can be grasped better in the next step.

3D image data

The visual data for the robot is provided by an Ensenso 3D stereo vision camera. This is equipped with two monochrome CMOS sensors (global shutter, 752 x 480 pixels) and a pattern projector operating at 850 nm. It views from above a container filled randomly with objects of one or more types. The system projects a high-contrast texture onto the contents of the box and generates a 3D point cloud of the visible surfaces for calculating the greyscale depth image using the Ensenso SDK. The depth image is then scaled to a resolution of only 12,000 pixels and used as input for the AI algorithms. The neural network takes care of the image analysis and the logical steps for the next grip into the box. The camera is mounted directly on the “head” of the robot in order to be able to flexibly realize different experiments. Pre-calibrated and supplied with an MVTec HALCON interface and object-oriented API (C++, C#/ .NET), the 3D camera has focal lengths from 3.6 to 16 mm and is suitable for working distances from only 30 cm up to 2,000 mm and can even be used for 3D detection of moving objects. The NxLib of the Ensenso SDK is used to capture depth images for processing with OpenCV and TensorFlow.

Outlook

The methods developed at KIT work reliably with simple objects such as screws, but some research is still required to reach product maturity, especially for gripping more complex, unknown objects. Further research will focus on how basic methods of learning can be improved and accelerated and which new applications would benefit from learning robot systems. Examples include the handling of textiles (gripping and folding towels and clothing), the dismantling of industrial parts such as electric motors for recycling, the painting of unknown objects based on camera data, or the handling of liquids or granular media.