HUMAN-ROBOT INTERACTION : LANGUAGE ACQUISITION WITH NEURAL NETWORK

The paper gives an overview about the process between two language processing methods towards Human-robot interaction. In this paper, Echo State Networks and Stochastic-learning grammar are explored in order to get an idea about generating human’s natural language and the possibilities of integrating these methods to make the communication process between robot to robot or robot to human to be more natural in dialogic syntactic language game. The methods integration could give several benefits such as improving the communicative efficiency and producing the more natural communication sentence.


I. INTRODUCTION
Natural Language processing recently becomes an inter-esting research topic for Artificial Intelligence Engineer and Researcher.Natural Language processing is a significant topic for the development of advanced and intelligent robot systems considering human needs to build an intelligent agent product to make a task or a business easier and more efficient to be done.Moreover, by understanding and generating humans 0 natural language, it might be feasible in the future to address computers like humans.
The development idea to make a robot being able to communicate with a human is getting more intense since we realize that this decade several robots are developed to be close to a human being.We can take an example from several popular robots such as ASIMO, iCub, and Nao.They can communicate with the human to do some simple tasks, like pointing an object or moving an object from one place to another place.
From this consideration, it is interesting to explore how the robot can communicate and teach a sentence to the human and even to the other robots to be more natural.By analysing and reviewing the up-to-date papers that have the same purpose but with different techniques.Two latest papers are selected to be compared and evaluated.First is exploring the acquisition and production of grammatical constructions through human-robot interaction with echo state networks that was composed by Hinaut et.al.And was published at May 2014 [1], and second is Alignment in Vision-based syntactic language games for teams of robots using stochastic regular grammars and reinforcement learning: The fully autonomous case and the human supervised case [2] which was composed by Maravall et.al.
Maravall's paper shows that the robot can communicate in dialogic syntactic language game by learning grammar with simple structure, and Hinaut's paper shows that Echo State Network could construct a sentence to be more natural with semantic role labeling.From both papers, the concerns are the methods in the implementation.Echo State Networks and Stochastic Grammars are the methods selected to be evaluated as the language learning systems for human-robot interaction.There is a possibility to integrate both approaches in order to make the dialogic syntactic language game to be more natural in a communication process.
This paper is structured as follows.First it introduces the definition of Natural Language Processing, as well as Echo State Network and Stochastic Grammars.Then, it presents the combination between Echo State Network and Stochastic Grammar.In the following, it discusses the Result and Discussion, and finally the conclusion will be discussed.

Natural Language Processing
Natural Language Processing is a computer science study to explore the interactions between computer and human languages with more biologically approach.It aims to analyze, understand and generate languages that are spoken by human.Natural Language Processing has given a lot of contribution in computer researches such as Information retrieval, speech recognition, and in machine learning.There are two fundamental approaches in NLP tasks which are syntactic and semantic.
In the syntactic approaches, It points the information of the word structure such as Partof-Speech Tagging (POS), Chunking and Syntactic Parser (PSG).POS tagging designates word in the structure with the syntactic role based on the definition (e.g.noun, verb, adj, prep,..).Chunking is the process to identify and classify segments of a sentence with the same syntactic components, such phrases (verb phrases, adverbial phrases, ..).PSG defines a structure to a sentence and represents its grammatical analysis with a parse tree approach.
In the semantic approach, the intention of language can be recognized by implementing methods such as Named Entity Recognition (NER), Semantic Role Labeling (SRL), and Word-sense Disambiguation (WSD).NER defines the semantic role of a noun from the noun phrases into certain categories such as institution, gender, or location.SRL defines the relationship between the words in a sentence such as subject, object, or predicate.WSD specifies the meaning of a word in particular context such as the meaning of word "home" can be interpreted as structural building or mother countries.1. Synaptic collection where each synaptic has weight 2. There is an adder to sum up the input signal.This operation follows the liner combiner rule.3.An Activation function to limit the output amplitude from each neuron.
In the Figure 1, there are several input signal (x1; x2; :::; xn) which is represented by a neuron.A neuron could have multiple inputs and only one output which could be an input to another neuron.This signal is multiplied with a synaptic weight (w1 j; w2 j; :::; wn j) which then they are calculated to be summed up by all of the input which has been multiplied by synaptic weight.The result from the summing function is called output from linear combiner uk which can be illustrated with the following equation: (1) j=1 and yk = f (uk qk) where x1; x2; :::; xp is input signals; (w1 j; w2 j; :::; wn j) is a synaptic weight from the neuron k; uk is a linear combiner output; qk is a threshold; f (:) is an activation function; and yk is an output signal from the neuron.This threshold gives an influence of affine transformation towards output uk from the linear combiner.
Activation function is denoted as f (:) which defines the output value from a certain neuron in the certain activation level based on the output from the linear combiner uk.There are several kind of activation function which is usually used in Neural Network.

Neural Network Architectures
The Architecture of Neural Network is the pattern which is constructed to relate with the learning algorithm in order to train the neuron.There are several common architectures in Neural Network model such as Single-Layer Feed Forward Network, Multi-layer Feed Forward network, and Recurrent Neural Network.
Single-Layer Feedforward Network is a simple form of neural network which has only one input layer and output layer.This model is illustrated in the Figure 2 The second class of the architecture is Multilayer Feed forward Network, which is constructed by more than one hidden layer which has hidden neurons / hidden units.Figure 3 illustrates multilayer feedforward neural network for the case of one hidden layer.In this case, the feedforward neural network is called as a network 4-5-1, where it means that the network has 4 nodes of input, 5 hidden neurons, and 1 output neuron.Recurrent Neural Network is a neural network which has one feedback loop.For example, a certain recurrent network could consist of one layer where each neuron gives back the output as an input to the other neurons.This model is illustrated in the Figure 4. Echo state network is an architecture to supervise the learning principle to generate randomly recurrent neural networks [3].Based on Jaeger, ESN has two main ideas which are first, to drive a recurrent neural network with the input signal and second, to combine a desired output signal by a linear combination which can be trained [4].
ESN has input units (u), hidden units (x) and output units (y) with multiplication of weights.The interesting part of the ESN is in its hidden units which resemble a reservoir and has complex connection graphs, so that in ESN, input can be processed randomly and iteratively.

Stochastic Learning Grammars
A stochastic grammar is a framework with a potential perception of grammaticality such as Stochastic context-free grammar, Statistical parsing, Data-oriented parsing, Hidden Markov model, and Estimation theory [7].To solve the problems that appear from a long sentence.Natural language processing uses stochastic, probabilistic and statistical methods to reduce ambiguous matter when a long sentence is processed with realistic grammar because it can trigger million analyse possibilities [8].In this report only stochastic context-free grammar is informed.
In Context Free Grammar (CFG), we will find CFG Definition in G = (V; å; R; S) where: Stochastic Context-Free Grammar is an extension of CFG which applies probabilistic modeling where each production is assigned a probability.SCFG is defined as follows: G = (M; T; R; S; P), where:  M is the set of non-terminal symbols,  T is the set of terminal symbols,  R is the set of production rules,  S is the start symbol,  P is the set of probabilities on production rules.
P is the main concern is stochastic contextfree grammar, because the P is the corresponding set of stochastic rule.

Dialogic Syntactic Language Game
The dialogs between robots which already have their own learning stochastic grammar are performed sequentially and scenes of training data sets (I1; I2; ::; I p) are presented to the robots that compose their corresponding sentences.A communication success occurs when the sentences composed by the robots coincide [2].

III. A PROPOSED MODEL FOR MORE NATURAL DIALOGIC SYNTACTIC LANGUAGE GAMES
To extract the combination between Stochastic learning grammar and Echo State Network implementation into a proposed model of dialog syntactic learning grammar, by analyzing both papers, from the grammatical construction and communication process.In the Maravall's paper, Grammatical construction is established in sentences like "object such is on the spatial relation of object such" [2].When we come to the real sentence like "the book is on the right of the ball", we can formulize the sentence into a string aRb where we have two objects names, a and b, and one spatial relationship, R. According to the production rules, there are two alternative strings in P2 and P3 which could be Rab or abR.In order to make the robot learning the language by means of an interactive process, stochastic grammars with learning capability for visual scene description is used in the paper.
The Algorithm above is a Linear rewardinaction algorithm which is implemented by Maravall in order to update the probabilistic which associates to the production rule.A l is the learning rate and b is the reinforcement signal received from the environment.A value "1" represents a success and a value "0" represents a failure.For further information, The Learning Pseudocode in the Algorithm 1 is implemented to converge the syntactic consensus by means of reinforcement learning and each of robot has its own learning stochastic grammar [2].Grammatical Construction from Xavier Hinaut's paper shows that when a person speaks to the robot, the robot will extract the sentence spoken into two classifications, Semantic words and Closed class words.In the action performing task, there is a sentence "on the left put the toy".The word that have the substantial meaning (left, put, toy) is categorized into Semantic words and the preposition is categorized as Closed Class Words.Semantic words (SW) which are stacked in the memory has a fixed connection to the desired meaning output coding where the Closed Class words (CCW) needs to go through reservoir to be processed in order to produce learnable connections, both inactive connections and active connections.The meaning constructed with semantic words with fixed connection is then categorized into a predicate form like predicate (agent, object).The selected semantic words are stored in working memory which acts as a first-in first-out stacks [1].In the hidden units, process learning happens to process the input and produces the desired output.In the Scene Description tasks, the process is vice versa where the model needs to produce a sentence from a meaning given [1].
From these considerations, the probabilistic production rules in the Maravall's paper could be combined with Scene Description of a neural production in the Hinaut's paper.As we could see in the Figure 6, the common lexicon for the spatial relationship (right, left, front, behind) and the object lexicon (book, ball, pencil, glasses) could be extracted into semantic words and memory stacks within Echo state network so that the output sentence could produce in the more natural form.For instance, when we use stochastic learning grammar, the robot produces sentence "right, book, ball" but when the production rules processed into scene description, it would be like "Book on the right of the ball".Figure 6.The probabilistic production rules with scene description of neural production From Xavier Hinaut's paper, we can see the modules communication.Once the supervisor communicates with human, it offers which mode that will be run.Train mode is run to teach the robot about new pairings and the test mode is run to use the pairing corpus that has been learned by the robot.Action Performing and Scene Description are the main tasks that will be processed in the execution mode.In Train mode, the speech of human is processed into text using speech-to-text tool, then from the extraction, the meaning of the sentence is given by the robotic platform.In Test mode, the supervisor will offer the canonical or noncanonical sentences.The robot will generate the previous training data in order to execute the action in Action Performing or produces a sentence in the Scene Description task [1].
The communication in Xavier Hinaut's paper goes iteratively, where the Supervisor acts as a starting process of communication.Train data mode and Test data mode are the choices for the human via Supervisor.Then the data used will be processed in Neural Model before it goes to the final action which is Description (SD) or Command (AP).After the final action already done, the process goes back to the supervisor again.In Maravall's paper, the communication process under the language games is sequenced until the team meets the optimal communication systems where all the robots use the same optimal stochastic grammar.The process of communication is established between two Agents, robot with robot, or robot with human.In this process, the scene data set is presented first sequentially and iteratively.By using the scene data sets, both robots use it to do a communication using their own private grammar.The success or failure in communication between two robots determines the probabilities associated to the production rule being incremented or decremented.A reward or a penalty signal will be sent to the learning algorithm to update the production rules probabilities [2].
From these considerations, to enhance the communicative efficiency between the robots and to produce the more natural sentence, a proposed model for more natural dialogic syntactic language games as we see in the Figure 7 is developed.From scene data set which contains a common lexicon for the spatial relationship and object lexicon, the agents either robot or human will start to construct a sentence (S1, S2) using scene description for neural production.X is where the sentences uttered and communicative efficiency defined [2].
where ND stands for number of dialogs and NSD stands for number of successful dialogs.

IV. DISCUSSION
Regarding to the Natural Language Processing, Stochastic grammar with learning ability and Echo State Network have the significant role in constructing the grammar of the sentence in order to establish a communication / interaction between human and robot.These methods are the grammatical base of Artificial Intelligence.
This could be proven when we take a look from the result of the paper written by Xavier hinaut with Echo State Network:the iCub robot learns to perform complex actions and describe complex scenes in the real time situations; The read-out activity in action performer task can be considered as an estimated prediction since the activity starts to change and update frequently when a new word is presented in input; The result of learnability error is 4.2% which means that the most of the corpus is learnable; Simple sentences make better generalization performances than the Elaborate sentences; When the number of neuron increases, the ill-formed negative influence sentence tend to decrease.
And the result from Dario Maravall that with Stochastic Learning grammar: mostly, robots prefer to choose the rule with containing a greater probability; The robot choose a random rule once in a while; Success case is where the robots share the same sentence to explain a spatial situation; The best result of Communicative Efficiency comes from the team which has 5 robots; The worst result of Communicative Efficiency comes from the team which has 60 robots; The experiment which involves a human agent in the team shows that the best result comes in the size of   So, there is a possibility to combine both experiments and make the methods as a hybrid methods regarding to the human-robot interaction.We can get the learnability from Echo State Networks Experiment and get the experiment being implemented in multiple robots and human from Stochastic Learning Grammar Experiment.There are several advantages and challenges in developing a model for more natural dialogic syntactic language games.

Advantages
The advantages of a proposed model of the more natural dialogic syntactic language games are: 1. Improving communicative efficiency since the coincide possibilities could be enhanced by echo state network.2. The sentence produced by the robot will be more natural and understandable by human.3. The Occurrence of Consensus of syntactic structure will be no more abR, aRb, or Rab, but complex sentence.4. The experiment which involves a human agent in the team could be the best result since the robot produces natural sentence.

Challenges
The challenges of integrating both methods, as follows: 1.The concept needs to be more evaluated in order to prove that both methods could be running well.
2. Further Experiments need to be done to prove the communicative efficiency between robot to robot and human to robot could be improved significantly using Echo State Network.

V. CONCLUSION
Neural Network approaches are key concepts of Language Acquisition between Human-Robot Interaction due to their low computational complexity, high performance as well as fast learning speed.Moreover, they achieve learning about syntactic and semantic information of a language based on unlabeled data instead of using handcrafted input features.
In the other hand, Stochastic Learning Grammar has also application in diverse area of Language Acquisition between Human-Robot Interaction especially in natural language processing usually to study about RNA molecules and design of programming languages.Grammar ambiguity could be settled by designing efficient stochastic learning grammar.
In this paper, it has concluded that Stochastic Learning Grammar and Echo State Network has a possibility to be integrated in dialogic syntactic language games.It finds there are fundamental commonalities which could support both integrations.But it considers there exist challenges in integrating both concepts, and it needs a further experiments to prove that both integrations could be running well.
As a part of Natural Language processing methods, Echo State Networks and Stochastic grammar are able to do grammatical construction in order to make a communication between human and robot.Even from both papers, Echo State Networks and Stochastic Learning Grammars methods have different implementation purposes.From the proposed model developed, there are clear advantages when dialogic syntactic games are combined with Echo State Network.The sentence produces will be more natural and improve the communicative efficiency.
All in all, this idea needs a further evaluation and experiments.Hopefully that the report could contribute to the further Stochastic Learning Grammar and Echo State Networks implementation.
Neural network is an artificial model inspired by brain neural pathway to do certain functions depending on the in-puts.Training for Neural networks is an optimization process where the parameters adapting from a specific condition [10].There are several training methods for Neural Network such as Error Backpropagation and Hessian-free Optimization.Backpropagation has been a popular method in the current decade to train Feed-Forward Neural Network, and Hessianfree Optimization has been recognized as a training method for Recurrent Neural Network.Since training recurrent Neural Networks needs much effort to accomplish, there were alternative methods of understanding, training, and using RNNs which has been proposed with Echo State Networks (ESNs) [4].

Figure 3 .
Figure 3. Fully connected feedforward network with one hidden layer and one output layer [23].

Algorithm 1
RL based syntatical coordination for k=1,2, .... Max rounds do Execute all the possible communication acts compute the Communicative Efficiency of the robot team if CE(k) = 100% in three consecutive rounds then Break end if end for Alvin R: Human-Robot Interaction.... 75-84 p-ISSN 1979-9160 | e-ISSN 2549-7901

Figure 7 .
Figure 7.A proposed hybrid model for more natural dialogic syntactic language games robots and 55-60 robots comparing the result without human agent; The Occurence of Consensus of syntatic structure of abR is 408, aRb is 418, and Rab is 374.