Artificial Intelligence
(paper4 peer review)
Tag: Visual edit
(issue #02 preparation)
Line 121: Line 121:
   
 
'''paper itself'''
 
'''paper itself'''
There are many forms and applications of AI. Many are in mainstream use today. These are mainly considered weak AI in that they are only applications of AI techniques, not computers that are taking over the world. Strong AI is the quest for artificial intelligence that matches or exceeds human intelligence. The basic idea is when we have achieved strong AI, computers will take over the world in one form or another. Perhaps we will make great pets. Artificial General Intelligence (AGI) has the same meaning.
+
There are many forms and applications of AI. Many are in mainstream use today. These are mainly considered weak AI in that they are only applications of AI techniques, not computers that are taking over the world. Strong AI is the quest for artificial intelligence that matches or exceeds human intelligence. The basic idea is when we have achieved strong AI, computers will take over the world in one form or another. Perhaps we will make great pets. Artificial General Intelligence (AGI) has the same meaning.
   
 
'''Is Strong AI possible?'''
 
'''Is Strong AI possible?'''
Line 164: Line 164:
 
What Will AI's Do With Humanity? When strong AI evolves beyond human intelligence, what will they do with humanity? Do we have a Terminator future awaiting, or something more like Buck Rogers?
 
What Will AI's Do With Humanity? When strong AI evolves beyond human intelligence, what will they do with humanity? Do we have a Terminator future awaiting, or something more like Buck Rogers?
   
<table border="1" cellspacing="0" cellpadding="0">
+
<table border="1" cellspacing="0" cellpadding="0"><tr style="background:#ffdead">
<tr style="background:#ffdead">
 
 
<td>Our Future</td>
 
<td>Our Future</td>
 
<td>Users</td>
 
<td>Users</td>
Line 181: Line 180:
 
<tr>
 
<tr>
 
<td>Put us in a zoo</td>
 
<td>Put us in a zoo</td>
<td></td>
+
<td> </td>
 
</tr>
 
</tr>
 
<tr>
 
<tr>
Line 189: Line 188:
 
<tr>
 
<tr>
 
<td>Exterminate us</td>
 
<td>Exterminate us</td>
<td></td>
+
<td> </td>
 
</tr>
 
</tr>
 
<tr>
 
<tr>
Line 248: Line 247:
 
# demonstration of a trajectory by a human operator
 
# demonstration of a trajectory by a human operator
 
# multiple demonstrations are stored in a database
 
# multiple demonstrations are stored in a database
# determine a reward function
+
# determine a reward function
 
# judge about a human demonstration with a score
 
# judge about a human demonstration with a score
   
Line 316: Line 315:
 
* AI and robotics journal, Issue 02
 
* AI and robotics journal, Issue 02
 
* paper3 and paper4 are prepared for insertion in the issue #2 of the journal. A feedback from a different user would help to take the decision.
 
* paper3 and paper4 are prepared for insertion in the issue #2 of the journal. A feedback from a different user would help to take the decision.
  +
* paper4 was peer-reviewed, the comments were added into the upstream section
* in preparation, waiting for more submission
 
  +
* from my perspective, the paper3+paper4 can be published in the issue#02 of the journal, but i will wait for further comments
 
[[Category:2020]]
 
[[Category:2020]]

Revision as of 11:25, 18 April 2020

Introduction

This is an academic peer reviewed journal on a single wiki page. It accepts manuscripts from the domain of robotics and Artificial Intelligence. The journal doesn't need git branches, nor directories but everything is stored in a single file. The file is tracked with a wiki based version history and can be edited by everyone.

There are sections available to manage the content workflow:

  1. unstable upstream, stores raw manuscript of authors
  2. stable downstream, stores release issues of the journal

Each section contains of subsections. The unstable section is divided into submissions for incoming text. And the stable section contains different issues numbered with #01, #02, #03 and so on.

Authors are uploading the raw manuscript into the upstream section. They can update the content if they found spelling mistakes or like to add something. The journal is created in the downstream section. A copy of the incoming manuscript is created which is checked for correctness of content. The upstream and the downstream section are out of sync. The readers are invited to comment the content in the downstream folder.

Unstable upstream

Paper 1

  • title: Plan monitoring with options
  • author: Manuel Rodriguez
  • date: 2020-01-02

The first step is to invent a plan notation. Technically it can be realized as addition to a physics engine. New high-level commands (better known as options) are implemented. For example the action “forward10steps”, moves the robot 10 steps forward. The action is grounded, which means, after executing the actionname, the new robot's position is set as values into the physics engine.

The next step is to write a plan into a table. For example “forward10steps, forward2steps, left, stop”. Each of the planning actions can be executed in the physics engine. If the complete plan is executed a visible notation is drawn to the screen.

Now a human operator can move the robot by hand. During the executing, the plan is monitored. Does the human operator fulfill the plan? Does he executes the actions in correct sequence? All these questions can be answered by the plan recognition module.

So in general, plan recognition is playing an adverbial game. In the normal game, for example a racing game, a new kind of game is constructed which is called “follow the plan”. The subgoal of the game are formalized in the plan notation and the human operator has to fulfill the task. The interesting aspect is, that the plan and the plan notation is created according to the needs of the operator. That means, if the operator likes to learn how to park a car, then the plan notation contains of options which have to do with parking maneuvers.

Last but not least has a well working plan monitoring module a second application. If all the parts are working great which includes the plan notations, the definitions of the options in the game engine and the plan monitoring, it is time to construct a solver which can follow the plan autonomously. Which means without human intervention. This is possible because the plan language contains of a set of actions which can be planned in a graph. This allows for the solver to reduce the state space drastically. He is only test out possible actions until the next step in the given plan. He isn't playing the complete game until he won a reward, but the AI system optimizes his behavior according to the plan. The computational effort in doing so is low.

  1. Macro-actions 1x1

Instead of defining options and macro-actions from a mathematical standpoint the idea is to describe the workflow from the perspective of realizing such systems. Suppose, a teleoperated robotarm is available. The human controls the arm with the joystick and the task is to recognize with a software, what operation the human is doing. Before we can do so, we need a plan notation. That is equal to a grammar which describes movement primitives like pushobject, moveto and ungrasp. These skills are equal to options within the reinforcement learning paradigm.

The next step is to ground the actions in reality, which means to detect when the human has executed such action. This is realized either in sourcecode or with a neural network. And now we can start the activity parser and he will discover the previous implemented options within the mocap recording. The human operator pushes with the robotarm an object, and on the screen it is shown: “option pushobject was activated”.

All the macro-actions have in common that they are realized in longer period of time. For example, “pushobject” skill takes around 2 seconds which is equal to 60 frames. And during the period, the joystick is moved in a certain direction. THis is called temporal abstraction, because the skill is referencing to a period of 2 seconds which is long.


Paper 2

  • title: Creating the STRIPS domain knowledge file
  • author: Manuel Rodriguez
  • date: 2020-01-02

In classical AI planning the major bottleneck is the absence of a STRIPS file which contains the domain knowledge. The planner itself works great, but if the game isn't formalized with abstract actions which includes preconditions and effects, it makes no sense to start the solver. A first step to overcome the issue is to create the STRIPS domain file not manual but with machine learning. The idea is, to acquire the domain knowledge with a stochastic algorithm from user demonstration. Unfortunately, this will result into a black box system, which means, it's unclear if the domain was mapped to the machine readable representation.

The better idea is to create the STRIPS file by hand. For doing so, the human operator needs the ability to bugfix the existing domain description. This is possible with interaction in a telerobotics environment. This is called “plan recognition”. Plan recognition is at foremost a setup which allows the programmer to create and modify an existing STRIPS file. To understand why interaction is important we have to describe a non-interactive STRIPS creation workflow.

In classical AI planning, the planner is at the center of attention. The planner is able to search in the state space of a STRIPS file for a goal node. The planner gets started, he is searching for a node and after a while the system has produced the the steps towards the goal. If the planner fails in doing so, the programmer can only fix the planner. That means, he can think about a faster way in traversing the STRIPS gametree. The problem is, that the planner doesn't has a bug. Writing the sourcecode which is search in a node for a goal tree isn't that hard. All the planner for STRIPS domain files are working great and without error. And creating new solver from scratch can be done in under 20 lines of code in Python.

The more likely reason why the planner fails to solve a game is because the input STRIPS file is wrong, or it is missing at all. The required pipeline for fixing the STRIPS file is the opposite. Which means, that the STRIPS file is used for annotating a gamelog. Let us go into the details. The human operator plays a game, for example the pong game. He is doing some actions on the screen and the software in the background has to identify the action names. There are two cases available, either the plan recognition engine is able to detect, that the ball touches the paddel, and is able to detect in which direction the ball is flying, or it is not powerful enough and fails to recognize the gameplay. In the second case, the system can be fixed. That means, the programmer can edit the sourcecode of the STRIPS file, add the needed features and run the game again. Either he has fixed the problem or not. It's in interactive process which results into a working STRIPS domain description. And yes, it can be tracked with a version control system like git.

In want to give a concrete example for the sokoban domain, how an interactive improvement of the plan library works. In the first iteration, the plan library consists of a single action. The system is able to detect if the player is above the box or not. The programmer is able to play the game and test the plan recognition system. He will notice, that each time he is above the box, the correct task name is shown at the screen. In the second step, the programmer decides to improve the plan library. He adds a function to detect if the box was moved to the goal position. After implementing the new feature, the programmer starts the game again, and plays it manual. He will notice, that the system is able to detect much more events.

The procedure can be repeated many times, until the STRIPS like domain description fits to the needs of the programmer. It's a software engineering problem which can be handled similar to normal programming. That means, that the programmer creates first a new issue in the issue tracker, then he improves the STRIPS file, and then he commits the changes into the repository with the git version control system. Each step of contains of manual testing, if the added feature works or not. Testing means, that the programmer is playing the game manual and the software annotates the gamelog.

  1. sourcecode

// version 1 def task(self,name):

 if name=='"movetobox":
   self.player = (self.box[0],self.box[1]-40)

// version 2 def task(self,name):

 if name=='"movetobox":
   self.player = (self.box[0],self.box[1]-40)
 elif name=="pushboxtowaypoint":
   self.box = (340,260)


Paper 3

  • title: Computer animation as testbed for Artificial Intelligence
  • author: Manuel Rodriguez
  • date: 2020-03-22

Introduction Artificial Intelligence can be researched much better, if a concrete problem is given. In the field of computer animation such challenges are available widespread. A typical example is to a animate a human hand, but biped locomotion and complex swarm animations are also well known examples. Since the 1980s these problems were addressed by the computer animation community which was build around the SIGGRAPH conference. Major contributions like behavior based animation, motion graphs and motion capture learning with neural networks were invented within this community first.

What these concepts have in common is, that they are using Artificial Intelligence as a tool. Similar to the definition of Weak AI, the hope is, that with some techniques like planning it is possible to generate an animated character. The first aim is to animate something on the screen and AI is only the tool for realizing it. In contrast to other Weak AI problems within the robotics domain, computer animation is much easier to achieve. No hardware in the loop and no sensor measurement are needed to animate a virtual avatar. The control problem is reduced to it's core.

The computer animation subject can be divided into two subgroups: graphics drawing and vector animation. Graphics and image blitting is well known because it has to do with graphics programming itself. The OpenGL standard is widely used to display 2d and 3d images with the help of the graphics card on the screen. The more interesting part in computer animation has to do with drawing the vector lines and calculate the position of a sprite. Before the sprite can be drawn by the graphics card, the algorithm has to determine if its on position p1 or on p2. One option is to write an algorithm, other options are motion capture recording, and all sorts of weak artificial Intelligence techniques. The most important feature of animation driven AI is, that the programmer gets a feedback. After the program is started the character moves on the screen and if the AI algorithm is working not perfect, the animation looks unnatural. Computer animation can be seen as a perfect testbed for realizing all sorts of AI-algorithm. The generated animation will provide a direct feedback if a program makes sense or not.

History [carlson2017computer]

Keyframes Computer animation works a bit different from normal robotics control, because computer animation has a longer history which is grounded in keyframe animation. Before the first computers were invented, animated movies were created by hand. The idea is, that a sequence of motions consists of stop-motion frames which are played back in fast speed.

Later, the concept of keyframe animation was transferred into the digital age. Most software is working with the same principle. The artist defines static keyframes each 3 seconds and the animation software creates the transitions. What makes this workflow relevant for AI-research is, that a keyframe animation is basically a planning problem. The first question is, which position the keyframes will need and the second problem is to animate the in-between clips.[huang1996planning]

From finite state machines to action learning The basic form of controlling an AI character is a finite state machine (FSM). A FSM is a simple form of a computer program. The next improvement over a FSM is a hierarchical FSM which looks similar to Behavior Trees and Scripting AI principle. The idea is, that an AI character contains of subfunctions which are specified by single steps. This allows to implement more complex behavior.

Behavior Trees and hierarchical finite state machines have the disadvantage that they are difficult to realize and too static for complex situation. The more elaborated way to specific the behavior is goal-oriented action planning (GOAP), which is a model-in-the-loop planning technique. First, the model is created which is some kind of abstract game engine and then the solver searches in the model for a desired state. The disadvantage of GOAP is easy to identify, because the programmer has to create the model which might be difficult to realize. The more advanced technique over GOAP is action learning.[giovannangeli2008autonomous] Here the GOAP-Model isn't created manual but is induced with machine learning techniques. If the model is available, it can be utilized by the solver to find the correct sequence of actions which is equal to the AI characters plan.

The advantage of GOAP over a normal Finite state machine is, that GOAP is model-based. Model-based means, that the sourcecode can't be executed directly but an external solver has to send a request to the model first and then the model will provide the correct action sequence back. The separation into a model and a solver makes the maintenance more easier, which is especially needed for complex domains.

GOAP was first introduced in the gaming-industry for creating believable characters. A famous title was F.E.A.R. (2005):

“The F.E.A.R. agent architecture employs a modified version of the cognitive architecture as described by the Synthetic Characters Group or C4 from MIT ”[long2007enhanced]

Finite State machines are working with the same technique like the first industrial robots which were programmed by scripts. The idea is to provide the sequence of actions as a computer program which is executed from top to the bottom. If the robot's movement should be adapted the step-sequence has to be changed. A finite state machine is the concrete realization in computer software. The transition to the more powerful GOAP architecture is a milestone in AI history. GOAP is equal to AI planning. It was first invented during the Shakey robot project (1972) on top of the STRIPS programming language which was later adapted by the gaming industry. The model in the F.E.A.R. game is basically a large PDDL file which provides a game engine. This model is used by the planner to determine the actions of the Non-player characters.

Often, GOAP is seen as to complex because the syntax of the STRIPS like modeling language is hard to grasp. The programmer has to implement preconditions, effects and hierarchical methods and it is difficult to adapt this technique to a certain domain. It is important to know, that GOAP can be realized independent from Strips. Strips is only a programming language for programming a game engine. What GOAP needs is an underlying abstract game engine which can be programmed in any language. It can be a neural network, a strips model, a game description language engine or it can be written in Python. The only thing which is important is, that the game engine is reproducing the reality.

References

  • [carlson2017computer] Carlson, Wayne E. Computer Graphics and Computer Animation: A Retrospective Overview. 2017.
  • [huang1996planning] Huang, Pedro S., and Michiel van de Panne. "A planning algorithm for dynamic motions." Computer Animation and Simulation’96. Springer, Vienna, 1996. 169-182.
  • [giovannangeli2008autonomous] Giovannangeli, Christophe, and Philippe Gaussier. "Autonomous vision-based navigation: Goal-oriented action planning by transient states prediction, cognitive map building, and sensory-motor learning." 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 2008.
  • [long2007enhanced] Long, Edmund. "Enhanced NPC behaviour using goal oriented action planning." Master's Thesis, School of Computing and Advanced Technologies, University of Abertay Dundee, Dundee, UK (2007).

Paper 4

Peer review A pre-publication peer review was made by an anonymous peer reviewer. In general, he was satisfied with the paper but he has added some keypoints:

  • It's important to see AGI and machine learning together
  • allusion describes relations between facts
  • language can be seen as actions

paper itself There are many forms and applications of AI. Many are in mainstream use today. These are mainly considered weak AI in that they are only applications of AI techniques, not computers that are taking over the world. Strong AI is the quest for artificial intelligence that matches or exceeds human intelligence. The basic idea is when we have achieved strong AI, computers will take over the world in one form or another. Perhaps we will make great pets. Artificial General Intelligence (AGI) has the same meaning.

Is Strong AI possible? It should be possible. Even naive simulation of the whole human brain would cost just $40 per hour:

  • Human brain has ~10^7 (Wikipedia:Cortical_column cortical columns)
  • With 1THz computing power, one 1GHz CPU would simulate 10,000 cortical columns.
  • The 1THz is for rent on Amazon EC2. (1000 * 1GHz = 50 Extra Large instances = 50 * $0.8 = $40).

The silicon transistors are much faster than brain neurons (1GHz vs. ~200Hz). For more, see chapter 7.1 Are super intelligent machines possible? from Shane Legg's thesis. [Legg2008]

How to Test Strong AI? There is currently no concrete test for strong AI. Defining what is means to be intelligent is a difficult task. There have been some tests proposed, such as the Turing test, but they are poorly defined. So, let's make our own test on this wiki, share your ideas Strong AI Test.

How to do it? Following approaches are well described in the chapter 7.2 How could intelligent machines be developed? from the Shane Legg's thesis.[Legg2008]

Theoretical approaches The Reinforcement Learning provides nice theoretical framework to describe the problem of intelligence. It is possible to describe the optimal intelligence by a mathematical equation. It just needs to be able to do universal sequence prediction. Informally, the universal AI predicts outcomes of possible actions and selects the best one.

A scaled down version could use some existing algorithms for sequence prediction. For example: Machine Learning, Prediction_by_Partial_Matching PPM, PCFG probabilistic context-free grammar, ... There is a need for good generalization. The AI should recognize when two states lead to the same future. The forgetting of unimportant details is an advantage.

Brain simulation The brain is not magic. The thinking, recognition, planning and other high-level actions are done by Cerebral_cortex neocortex -- the grey matter. The white matter are just cables connecting the neurons. The neocortex structure is uniform, different regions are specialized only because of the wiring to the sensors. There could be single algorithm applied all over the brain. For more, read the On Intelligence] popsci book, or watch Jeff Hawkins' videos at youtube.

Evolution When provided with good measure of intelligence, the genetic programming algorithms could try to evolve it.

Strong AI is equal to a captcha A human user is utilized as an oracle which provides the answer to a question told from a computer. In the example, the captcha asks what can be seen on the picture, and the human clicks the button for “5” because this number is visible in the captcha box. Additionally, a captcha and a turing test are the same ideas. They are utilized for measuring human intelligence. It allows to detect if humans have basic skills in natural language, image recognition and audio listening. Such tests are equal to a Strong AI. It describes an experiment in which a software has to decide if the human fulfills certain criteria.[Yampolskiy2013]

Result of polls When Will Strong AI Emerge? When do you think strong AI will evolve beyond human intelligence?

  • Next 5 years: User:Nicholasjh1
  • Next 10 years: User:Pygmalion pb
  • Next 30 years: User:Fidlej
  • Next 50 years: User:Bobwrit

What Will AI's Do With Humanity? When strong AI evolves beyond human intelligence, what will they do with humanity? Do we have a Terminator future awaiting, or something more like Buck Rogers?

Our Future Users
Ignore us Do we ignore stars? User:Fidlej
Make us pets

User:Pygmalion pb

Put us in a zoo  
We will become cyborgs --User:Nicholasjh1
Exterminate us  
Some combonation of the above User:Bobwrit

Play the Endgame: Singularity to see one possible ending.

Links

  • Machine Learning — Past and Future
  • AGI 2008 Conference videos

Strong AI Researchers:

  • Jürgen Schmidhuber
  • Satinder Singh
  • Shane Legg

References

  • [Legg2008] Legg, Shane. Machine super intelligence. Diss. Università della Svizzera italiana, 2008.
  • [Yampolskiy2013] Yampolskiy, Roman V. "Turing test as a defining feature of AI-completeness." Artificial intelligence, evolutionary computing and metaheuristics. Springer, Berlin, Heidelberg, 2013. 3-17.


Paper 5

  • title: Motion capture for teleoperated robots
  • author: Manuel Rodriguez
  • date: 2020-03-26

After programming a robot control software it's certain that the system won't work. After starting the robot program, the machine will behave like a random generator and fails to fulfill simple tasks. Let us make an example: somebody has created a two wheeled robot with Lego Mindstorms, and the task is that the robot avoid the obstacles and will reach a point on the map. The funny thing is, that it's not clear how the program should look like. A self-created program in Java will fail to fulfill the goal. Either the robot will collide with the obstacle, he can't locate himself on the map, or something else is wrong with the navigation system. Somebody may argue, that these are only detail problems which can be fixed with a better programming language and with advanced algorithm but they can't. It has to with robot programming in general. That means, the task of programming a wheeled robot is too complicated for today's computer scientists.

To overcome the issue and invent robots who are working better the first thing to do is to describe the problem more precisely. In the first example, the assumption was that the problem is equal to robot programming. There is the physical robot given which can execute a java program and the programmer has to write the code. This problem description works well for creating normal software which runs on a desktop PC but it fails for controlling robots. The more elaborated problem description isn't available for a robot and has to be developed first. A possible attempt into this direction is “Learning by demonstration” which will be explained next.

Before the problem can be described, it make sense to analyze which technology is working great. Suppose the human operator is allowed to use a joystick to move the robot towards the goal. This technique is working well. It's not very hard to program a software which transmits the joystick signal to the robot and the human operator has no problem to press the correct buttons at the right time to navigate the robot. And now, it's time to modify the teleoperation control a bit to make it more autonomously.

The next step after controlling a robot with a joystick is to create an “action recognition” software. That is a piece of code which is monitoring the teleoperation. Similar to a normal teleoperation control, the human operator is pressing buttons on the joystick to move the robot to the goal. The new thing is, that all the events or recognized. If the robot collides with an obstacle a message is drawn to the screen, and if the robot reaches the goal another message is shown.

A task recognition system can be used in two directions. It is monitoring existing actions but it's a also a model for producing actions. A task model is some kind of game in which actions can take place. In the given example the task model contains of two possible event: obstacle collision and goal-reached. These events are forming a formal rule system. That means from a software perspective the robot's world consists only of these sort of actions but nothing more.

In the next step the question can be answered how to control the robot autonomously in this newly created game. This task is equal to writing a narrow AI which can play a simple game like pong or tetris. The precondition for a narrow AI is, that the game rules are known and they are formalized in a computer program. Then a heuristic based solver can be created on top which brings the system into a goal state.

The bottleneck for robot control is, that no formal game is available. A game is a formalized problem space which defines actions. This problem space has to be created first. And the way in doing so is to program an action recognition system which is monitoring a teleoperated robot.

Now it's possible to explain why normal robot programming doesn't work very well. Suppose a newbie has build a physical robot and likes to program the system in Java. He will struggle because it's not clear what the problem is. That means the task is not reduced to a simple to solve programming problem but the task is not defined. It's not possible to write Java code for a problem which is unspecified. The consequence is that the programmer will struggle. Let us go a step backward and describe some well defined programming problems.

A problem which can be solved by writing code, has a very limited scope. For example one well defined problem is “print out the prime numbers from 2 to 100”. Creating Java code which can solve such a problem is not very hard. Another well defined problem is to search in an array for a certain number. This task can be mastered with a computer programming language also easily.

Robotics control has one important bottleneck. The problem specification is too broad. Even if the rule book of a robotics competition explains in detail what the robot has to do, for example to move to a goal position, the specification is not detailed enough. Instead of figuring out how to solve unspecified problems, the better idea is to ask a human operator to solve the issue. And take the human demonstration as input for creating a “task monitoring system”. That means, the first step is not create the robot control software, but in the first step the problem space is explored.

Motion database A motion database is a key element of an action recognition system. It contains of predefined movements which can be detected in reality. A motion database is used to monitor the movements of a teleoperated robot. The human operator moves the robot with a joystick and the motion database is analyzing the movement.

Postpone the robot controller The main idea behind Learning from demonstration is to avoid the creation of a robot control software at all. It's some kind of robot software without a robot. Instead the question which is answered is, which kind of preparation steps are needed before a robot control system can be realized. The hope is, that after answering the pre-steps it's will become easier to create the robot controller.

Pre-steps before the robot control system can be build are:

  1. demonstration of a trajectory by a human operator
  2. multiple demonstrations are stored in a database
  3. determine a reward function
  4. judge about a human demonstration with a score

These steps have in common that the human is doing the task manual and he is using a joystick to a control a robot. In none of the cases an autonomous robot is needed, so this reduces the effort to program such a robot to zero. Instead the programming goal is quite different and it has to do with learning from demonstration and all the steps needed towards this goal


Stable downstream

Issue #01

AI and robotics journal, Issue 01, 2020-03-21

Preface

This is the first issue of the AI and robotics journal. It's an experimental issue which should test of if the concept make sense. The submission pipleine of the journal contains of an upstream and a downstream folder. The journal itself is stored in the downstream folder because it's adressed to the normal reader.

For the first issue, two papers are taken as submission. They were created both by myself and were never published before. They are describing a learning from demonstration pipeline to build powerful robotics. The disadvantage is, that the papers are very short.

Manuel Rodriguez: Plan monitoring with options

Macro-actions 1x1

Instead of defining options and macro-actions from a mathematical standpoint the idea is to describe the workflow from the perspective of realizing such systems. Suppose, a teleoperated robotarm is available. The human controls the arm with the joystick and the task is to recognize with a software, what operation the human is doing. Before we can do so, we need a plan notation. That is equal to a grammar which describes movement primitives like pushobject, moveto and ungrasp. These skills are equal to options within the reinforcement learning paradigm.

The next step is to ground the actions in reality, which means to detect when the human has executed such action. This is realized either in sourcecode or with a neural network. And now we can start the activity parser and he will discover the previous implemented options within the mocap recording. The human operator pushes with the robotarm an object, and on the screen it is shown: “option pushobject was activated”.

All the macro-actions have in common that they are realized in longer period of time. For example, “pushobject” skill takes around 2 seconds which is equal to 60 frames. And during the period, the joystick is moved in a certain direction. This is called temporal abstraction, because the skill is referencing to a period of 2 seconds which is long.

Plan notation

The first step is to invent a plan notation. Technically, it can be realized as addition to a physics engine. New high-level commands (better known as options) are implemented. For example the action “forward10steps”, moves the robot 10 steps forward. The action is grounded, which means, after executing the actionname, the new robot's position is set as a value into the physics engine.

The next step is to write a plan into a table. For example “forward10steps, forward2steps, left, stop”. Each of the planning actions can be executed in the physics engine. If the complete plan is executed, a visible notation is drawn to the screen.

Now a human operator can move the robot by hand. During the executing, the plan is monitored. Does the human operator fulfill the plan? Does he executes the actions in the correct sequence? All these questions can be answered by the plan recognition module.

So in general, plan recognition is playing an derivative game. In the normal game, for example a racing game, a new kind of game is constructed which is called “follow the plan”. The subgoal of the game are formalized in the plan notation and the human operator has to fulfill the task. The interesting aspect is, that the plan and the plan notation is created according to the needs of the operator. That means, if the operator likes to learn how to park a car, then the plan notation contains of options which have to do with parking maneuvers.

If all the parts are working great which includes the plan notations, the definitions of the options in the game engine and the plan monitoring, it's time to construct a solver which can follow the plan autonomously. Which means without human intervention. This is possible because the plan language contains of a set of actions which can be planned in a graph. This allows for the solver to reduce the state space drastically. He is only test out possible actions until the next step in the given plan. He isn't playing the complete game until he won a reward, but the AI system optimizes his behavior according to the plan. The computational effort in doing so is low.

Manuel Rodriguez: Creating the STRIPS domain knowledge file

In classical AI planning the major bottleneck is the absence of a STRIPS file which contains of the domain knowledge. The planner itself works great, but if the game isn't formalized with abstract actions which includes preconditions and effects, it makes no sense to start the solver. A first step to overcome the issue is to create the STRIPS domain file not manual but with machine learning. The idea is, to acquire the domain knowledge with a stochastic algorithm from user demonstration. Unfortunately, this will result into a black box system, which means, it's unclear how the domain was mapped to the machine readable representation.

The better idea is to create the STRIPS file by hand. For doing so, the human operator needs the ability to bugfix the existing domain description. This is possible with interaction in a telerobotics environment, called “plan recognition”. Plan recognition is at foremost a setup which allows the programmer to create and modify an existing STRIPS file. To understand why interaction is important we have to describe a non-interactive STRIPS creation workflow.

In classical AI planning, the planner is at the center of attention. The planner is able to search in the state space for a goal node of a STRIPS file . The planner gets started, he is searching for a node and after some time the system has produced the the steps towards the goal. If the planner fails in doing so, the programmer can only fix the planner. That means, he can think about a faster way in traversing the STRIPS gametree. The problem is, that the planner doesn't has a bug. Writing the sourcecode which is search in a node for a goal tree isn't that hard. All the planner for a STRIPS domain files are working great and without error. And creating new solver from scratch can be done in under 20 lines of code in Python.

The more likely reason why the planner fails to solve a game is because the input STRIPS file is wrong, or it is missing at all. The required pipeline for fixing the STRIPS file is the opposite. Which means, that the STRIPS file is used for annotating a gamelog. Let us go into the details. The human operator plays a game, for example the pong game. He is doing some actions on the screen and the software in the background has to identify the action names. There are two cases available, either the plan recognition engine is able to detect, that the ball touches the player, and is able to detect in which direction the ball is flying, or it is not powerful enough and fails to recognize the gameplay. In the second case, the system can be fixed. That means, the programmer can edit the sourcecode of the STRIPS file, add the needed features and run the game again. Either he has fixed the problem or not. It's in interactive process which results into a working STRIPS domain description. And yes, it can be tracked with a version control system like git.

I want to give a concrete example for the sokoban domain, how an interactive improvement of the plan library works. In the first iteration, the plan library consists of a single action. The system is able to detect if the player is above the box or not. The programmer is able to play the game and test the plan recognition system. He will notice, that each time he is above the box, the correct task name is shown at the screen. In the second step, the programmer decides to improve the plan library. He adds a function to detect if the box was moved to the goal position. After implementing the new feature, the programmer starts the game again, and plays it manual. He will notice, that the system is able to detect much more events.

The procedure can be repeated many times, until the STRIPS like domain description fits to the needs of the programmer. It's a software engineering problem which can be handled similar to normal programming. That means, that the programmer creates first a new issue in the issue tracker, then he improves the STRIPS file, and then he commits the changes into the repository with the git version control system. Each step contains of manual testing, if the added feature works or not. Testing means, that the programmer is playing the game manual and the software annotates the gamelog.

sourcecode

// version 1 def task(self,name):

 if name=='"movetobox":
   self.player = (self.box[0],self.box[1]-40)

// version 2 def task(self,name):

 if name=='"movetobox":
   self.player = (self.box[0],self.box[1]-40)
 elif name=="pushboxtowaypoint":
   self.box = (340,260)


Issue #02

  • AI and robotics journal, Issue 02
  • paper3 and paper4 are prepared for insertion in the issue #2 of the journal. A feedback from a different user would help to take the decision.
  • paper4 was peer-reviewed, the comments were added into the upstream section
  • from my perspective, the paper3+paper4 can be published in the issue#02 of the journal, but i will wait for further comments