It is convenient for users to teach novel objects to a domestic service robot with a simple procedure. In this paper, we propose a method for learning the images and names of these objects shown by the users. The object images are segmented out from cluttered scenes by using motion attention. Phoneme recognition and voice conversion are used for the speech recognition and synthesis of the object names that are out of vocabulary. In the experiments conducted with 120 everyday objects, we have obtained an accuracy of 91% for object recognition and an accuracy of 82% for word recognition. Furthermore, we have implemented the proposed method on a physical robot, DiGORO, and evaluated its performance by using RoboCup@Home's "Supermarket" task. The results have shown that DiGORO has outperformed the highest score obtained in the RoboCup@Home 2009 competition.
We propose a method for learning novel objects from audio visual input. The proposed method is based on two techniques: out-of-vocabulary (OOV) word segmentation and foreground object detection in complex environments. A voice conversion technique is also involved in the proposed method so that the robot can pronounce the acquired OOV word intelligibly. We also implemented a robotic system that carries out interactive mobile manipulation tasks, which we call "extended mobile manipulation", using the proposed method. In order to evaluate the robot as a whole, we conducted a task "Supermarket" adopted from the RoboCup@Home league as a standard task for real-world applications. The results reveal that our integrated system works well in real-world applications.
In a human-robot spoken dialogue, a robot may misunderstand an ambiguous command from a user, such as ``Place the cup down (on the table),'' potentially resulting in an accident. Although making confirmation questions before all motion execution will decrease the risk of this failure, the user will find it more convenient if confirmation questions are not made under trivial situations. This paper proposes a method for estimating ambiguity in commands by introducing an active learning scheme with Bayesian logistic regression to human-robot spoken dialogue. We conducted physical experiments in which a user and a manipulator-based robot communicated using spoken language to manipulate objects.
This paper addresses a user model for user simulation in spoken decision-making dialogue systems. When selecting from a set of altermatives, users have various decision criteria for making decision. Users often do not have a definite goal or criteria for selection, and thus they may find not only what kind of information the system can provide but their own preference or factors that they should emphasize. In this paper, we present a user model and dialogue state expression that consider user's knowledge and preferences in spoken decision-making dialogue. In order to estimate the parameters of the user model, we implement a trial sightseeing guidance system and collected dialogue data. Then, we model the dialogue as partially observable Markov decision process (POMDP), and optimize its dialogue strategy so that users can make a better choice.
This paper presents a novel method for learning object manipulation such as rotating an object or placing one object on another. In this method, motions are learned using reference-point-dependent probabilistic models, which can be used for the generation and recognition of motions. The method estimates (1) the reference point, (2) the intrinsic coordinate system type, which is the type of coordinate system intrinsic to a motion, and (3) the probabilistic model parameters of the motion that is considered in the intrinsic coordinate system. Motion trajectories are modeled by a hidden Markov model (HMM), and an HMM-based method using static and dynamic features is used for trajectory generation. The method was evaluated in physical experiments in terms of motion generation and recognition. In the experiments, users demonstrated the manipulation of puppets and toys so that the motions could be learned. A recognition accuracy of 90% was obtained for a test set of motions performed by three subjects. Furthermore, the results showed that appropriate motions were generated even if the object placement was changed.
In this paper, we propose a novel method for a robot to detect robot-directed speech: to distinguish speech that users speak to a robot from speech that users speak to other people or to themselves. The originality of this work is the introduction of a multimodal semantic confidence (MSC) measure, which is used for domain classification of input speech based on the decision on whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, object, and motion confidence with weightings that are optimized by logistic regression. Then we integrate this measure with gaze tracking and conduct experiments under conditions of natural human-robot interactions. Experimental results show that the proposed method achieves a high performance of 94% and 96% in average recall and precision rates, respectively, for robot-directed speech detection.
本論文では，物体操作対話タスクにおいて動作および発話を生成する手法を提 案する．ユーザの発話は，音声・画像・動作などを統計学習の枠組みに統合し た確信度関数を用いて理解される．本手法は，ユーザが曖昧性が少ない発話を 行った場合は，状況に応じて最も適切な動作軌道を隠れマルコフモデルを用い て生成する．また，曖昧性が大きい発話に対しては，自然な確認発話を生成し てユーザに確認を求めることで，不適切な動作を実行前に中止させることが可 能になった．
This paper proposes a method for automatic design of the sensory morphology of a mobile robot. The proposed method employs two types of adaptations, ontogenetic and phylogenetic, to optimize the sensory morphology of the robot. In ontogenetic adaptation, reinforcement learning searches for the optimal policy, which is highly dependent on the sensory morphology. In phylogenetic adaptation, a genetic algorithm is used to select morphologies with which the robot can learn tasks faster. Our proposed method was applied to the design of the sensory morphology of a line-following robot. We performed simulation experiments to compare the design solution with a hand-coded robot. The results of the experiments revealed that our robot outperformed the hand-coded robot in terms of the following accuracy andlearning speed, although our robot had fewer sensors than the hand-coded one. We also built a physical robot using the design solution. The experimental results revealed that this physical robot used its morphology effectively and outperformed the hand-coded robot.
This paper presents a novelmethod of a robot learning through imitation to acquire a user's key motions automatically. The learning architecture mainly consists of three learning modules: a switching autoregressive model (SARM), a keyword extractor without a dictionary, and a keyword selection filter that references to the tutor's reactions.
This paper proposes a method that automatically designs the sensory morphology of a mobile robot. The proposed method employs two types of adaptations - ontogenetic and phylogenetic - to optimize the sensory morphology of the robot. In ontogenetic adaptation, reinforcement learning searches for the optimal policy which is highly dependent on the sensory morphology.
The paper describes the evolutionary development of embodied agents that evolve the parameters of their controllers and sensors. The experimental results show that the physical characteristics of the agents and the task environment affect the temporal resolution of the sensors.
This paper presents a method for learning novel objects from audio-visual input. Objects are learned using out-of-vocabulary word segmentation and object extraction. The latter half of this paper is devoted to evaluations. We propose the use of a task adopted from the RoboCup@Home league as a standard evaluation for real world applications. We have implemented proposed method on a real humanoid robot and evaluated it through a task called ''Supermarket''. The results reveal that our integrated system works well in the real application. In fact, our robot outperformed the maximum score obtained in RoboCup@Home 2009 competitions.
In this paper, we propose a novel method to detect robotdirected (RD) speech that adopts the Multimodal Semantic Confidence (MSC) measure. The MSC measure is used to decide whether the speech can be interpreted as a feasible action under the current physical situation in an object manipulation task. This measure is calculated by integrating speech, image, and motion confidence measures with weightings that are optimized by logistic regression. Experimental results show that, compared with a baseline method that uses speech confidence only, MSC achieved an absolute increase of 5% for clean speech and 12% for noisy speech in terms of average maximum F-measure.
This paper proposes a method that generates motions and utterances in an object manipulation dialogue task. The proposed method integrates belief modules for speech, vision, and motions into a probabilistic framework so that a user's utterances can be understood based on multimodal information. Responses to the utterances are optimized based on an integrated confidence measure function for the integrated belief modules. Bayesian logistic regression is used for the learning of the confidence measure function. The experimental results revealed that the proposed method reduced the failure rate from 12% down to 2.6% while the rejection rate was less than 24%.
This paper presents a method to recognize and generate sequential motions for object manipulation such as placing one object on another or rotating it. Motions are learned using reference-point-dependent probabilistic models, which are then transformed to the same coordinate system and combined for motion recognition/generation. We conducted physical experiments in which a user demonstrated the manipulation of puppets and toys, and obtained a recognition accuracy of 63% for the sequential motions. Furthermore, the results of motion generation experiments performed with a robot arm are presented.
This paper proposes a machine learning method for mapping object-manipulation verbs with sensory inputs and motor outputs that are grounded in the real world. The method learns motion concepts demonstrated by a user and generates a sequence of motions, using reference-point-dependent probability models. Here, the motion concepts are learned by using hidden Markov models (HMMs). In the motion generation phase, our method transforms and combines HMMs to generate trajectories.
In this paper, we propose a system that automatically designs the sensory morphology of an adaptive robot. This system designs the sensory morphology in simulation with two kinds of adaptation, ontogenetic adaptation and phylogenetic adaptation, to optimize the learning ability of the robot.
This paper proposes a method that incrementally develops the "body schema" of a robot. The method has three features: 1) estimation of light-sensor positions based on the Time Difference of Arrival (TDOA) of signals and multidimensional scaling (MDS); 2) incremental update of the estimation; and 3) no additional equipment.
This paper proposes a system that automatically designs the sensory morphology of a line-following robot. The designed robot outperforms hand-coded designs in learning speed and accuracy.
In this paper we investigate the evolutionary development of embodied agents that are allowed to evolve not only control mechanisms but also the sensitivity and temporal resolution of their sensors. The experimental results indicate that the sensors and controller co-evolve in an agents through interacting with the environments
We have studied a string rewriting system to improve the basic design of an artificial life system named String-based Tierra. The instruction set used in String-based Tierra is converted into a set of rewriting rules using regular expressions.