My publications:
Thomas Cederborg, Kaushik Subramanian, Himanshu Sahni, Ishaan Grover, Charles L. Isbell, and Andrea Thomaz. Evaluating and Extending the Policy Shaping Algorithm. 2018: Artificial Intelligence Journal. In preparation.
This paper covers the experiments in the IJCAI 2015 paper and the 2016 AAMAS paper described below and also includes two extensions to the algorithm. The first extension is to generalise Policy Shaping so that it can, in addition to critique, also learn from both demonstrations and Explicit Action Advice (in EAA, a teacher is provided with a state, and can either choose an action to recommend, or an action to warn against). The second extension is to introduce a parameter free version of the algorithm, that learns the meaning of teacher behavior autonomously. An additional user study was performed and several experiments show that the extended algorithm works.
Thomas Cederborg. Artificial Learners Adopting Normative Conventions from Human Teachers. 2017: Paladyn Journal of Behavioral Robotics.
This survey covers a wide range of research relevant to the creation of artificial learners trying to figure out what a human teacher would like them to do. This includes implemented artificial learners, social signal processing, theory, research relevant to understanding human teachers, and research about certain types of biological learners that can serve as inspiration for artificial learners. I make a distinction between learners that are trying to do what a human would like them to do on the one hand, and learners that (implicitly or explicitly) are trying to extract something from a human on the other hand (for example rewards, or information about how the world works). This distinction is important for both existing implemented artificial systems, and for biological learners that might be used as inspiration for implemented systems.
Thomas Cederborg, Kaushik Subramanian, Himanshu Sahni, Ishaan Grover, Charles L. Isbell, and Andrea Thomaz. Evaluating and Extending the Policy Shaping Algorithm. 2018: Artificial Intelligence Journal. In preparation.
This paper covers the experiments in the IJCAI 2015 paper and the 2016 AAMAS paper described below and also includes two extensions to the algorithm. The first extension is to generalise Policy Shaping so that it can, in addition to critique, also learn from both demonstrations and Explicit Action Advice (in EAA, a teacher is provided with a state, and can either choose an action to recommend, or an action to warn against). The second extension is to introduce a parameter free version of the algorithm, that learns the meaning of teacher behavior autonomously. An additional user study was performed and several experiments show that the extended algorithm works.
Thomas Cederborg. Artificial Learners Adopting Normative Conventions from Human Teachers. 2017: Paladyn Journal of Behavioral Robotics.
This survey covers a wide range of research relevant to the creation of artificial learners trying to figure out what a human teacher would like them to do. This includes implemented artificial learners, social signal processing, theory, research relevant to understanding human teachers, and research about certain types of biological learners that can serve as inspiration for artificial learners. I make a distinction between learners that are trying to do what a human would like them to do on the one hand, and learners that (implicitly or explicitly) are trying to extract something from a human on the other hand (for example rewards, or information about how the world works). This distinction is important for both existing implemented artificial systems, and for biological learners that might be used as inspiration for implemented systems.

cederborgssurveyarticle2017.pdf | |
File Size: | 350 kb |
File Type: |
Himanshu Sahni, Brent Harrison, Kaushik Subramanian, Thomas Cederborg, Charles L. Isbell, and Andrea Thomaz. 2016: Policy Shaping in Domains with Multiple Optimal Policies. International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS).
This paper explore the effect of different teacher strategies on a Policy Shaping agent. The basic issue is that for a given state, there might be multiple actions that are all optimal. The issue is then how the teacher should evaluate actions (for example give positive evaluations to all optimal actions, or only give positive evaluations to actions conforming to one specific optimal policy). We needed a set of domains where it was possible to map out all optimal policies, and where it was possible to know the exact number of optimal actions in any given state, so we created a series of simple gridworlds, specifically designed with this in mind. It can also be difficult to know what strategy a noisy human teacher is actually using when giving feedback, so we created a series of different types of simulated teachers, with well defined teacher strategies (for example giving positive evaluations to all optimal actions, or only giving positive evaluations to actions conforming to one specific optimal policy).
This paper explore the effect of different teacher strategies on a Policy Shaping agent. The basic issue is that for a given state, there might be multiple actions that are all optimal. The issue is then how the teacher should evaluate actions (for example give positive evaluations to all optimal actions, or only give positive evaluations to actions conforming to one specific optimal policy). We needed a set of domains where it was possible to map out all optimal policies, and where it was possible to know the exact number of optimal actions in any given state, so we created a series of simple gridworlds, specifically designed with this in mind. It can also be difficult to know what strategy a noisy human teacher is actually using when giving feedback, so we created a series of different types of simulated teachers, with well defined teacher strategies (for example giving positive evaluations to all optimal actions, or only giving positive evaluations to actions conforming to one specific optimal policy).

multipleoptimalpolicies.pdf | |
File Size: | 1019 kb |
File Type: |
Thomas Cederborg, Ishaan Grover, Charles L. Isbell, and Andrea Thomaz. 2015: Policy Shaping With Human Teachers. International Joint Conference on Artificial Intelligence (IJCAI).
Besides being the first evaluation of the Policy Shaping algorithm on human teachers, we also explored the effects of different verbal instruction conditions, and tried to learn from silence (a teacher observed a learner playing pacman, the game kept running during evaluation, and the teacher was always free to not press anything for a given state action pair). Silence was treated as an evaluation, and we checked what happened when we interpreted silence as positive or negative critique. The human teachers provided more useful data than the particular type of noise free simulated teacher that we used, despite the flawed critique generated by the human teachers. The hypothesis is that there are often multiple correct actions in pacman, and humans were better at dealing with this than the specific type of simulated teacher that we used (the simulated teacher had a single optimal policy, and simply gave positive/negative feedback based on whether or not the agent conformed to that policy). This issue was explored further in the AAMAS 2016 paper described above.
Besides being the first evaluation of the Policy Shaping algorithm on human teachers, we also explored the effects of different verbal instruction conditions, and tried to learn from silence (a teacher observed a learner playing pacman, the game kept running during evaluation, and the teacher was always free to not press anything for a given state action pair). Silence was treated as an evaluation, and we checked what happened when we interpreted silence as positive or negative critique. The human teachers provided more useful data than the particular type of noise free simulated teacher that we used, despite the flawed critique generated by the human teachers. The hypothesis is that there are often multiple correct actions in pacman, and humans were better at dealing with this than the specific type of simulated teacher that we used (the simulated teacher had a single optimal policy, and simply gave positive/negative feedback based on whether or not the agent conformed to that policy). This issue was explored further in the AAMAS 2016 paper described above.

policyshapingwithhumanteachers.pdf | |
File Size: | 574 kb |
File Type: |
Thomas Cederborg and Pierre-Yves Oudeyer. 2014: A Social Learning Formalism for Learners Trying to Figure Out What a Teacher Wants Them to Do. Paladyn Journal of Behavioral Robotics.
This article presents a theoretical foundation for approaching the problem of how a learner can infer what a teacher wants it to do through strongly ambiguous interaction or observation. The article groups the interpretation of a broad range of information sources under the same theoretical framework. Demonstrations, eye gaze, facial expressions, evaluative buttons, speech comments, EEG readings, etc, are all treated as specific instances of the same general class of information sources. These sources all provide (partially and ambiguously) information about what the teacher wants the learner to do, and all need to be interpreted concurrently. Learning setups are introduced, and algorithm outlines are presented, to illustrate some of the practical problems that must be overcome. There is a shift in how interpretation is viewed, going from the situation with a static interpretation of a teacher's behavior, to a situation with a parameterized hypothesis space, which is updated based on observations.
This article presents a theoretical foundation for approaching the problem of how a learner can infer what a teacher wants it to do through strongly ambiguous interaction or observation. The article groups the interpretation of a broad range of information sources under the same theoretical framework. Demonstrations, eye gaze, facial expressions, evaluative buttons, speech comments, EEG readings, etc, are all treated as specific instances of the same general class of information sources. These sources all provide (partially and ambiguously) information about what the teacher wants the learner to do, and all need to be interpreted concurrently. Learning setups are introduced, and algorithm outlines are presented, to illustrate some of the practical problems that must be overcome. There is a shift in how interpretation is viewed, going from the situation with a static interpretation of a teacher's behavior, to a situation with a parameterized hypothesis space, which is updated based on observations.

cederborgoudeyer2014.pdf | |
File Size: | 574 kb |
File Type: |
Thomas Cederborg and Pierre-Yves Oudeyer. 2013: Learning words by imitating. In L. J. Gogate and G. Hollich (Eds.) Theoretical and Computational Models of Word Learning: Trends in Psychology and Artificial Intelligence.
Thomas Cederborg and Pierre-Yves Oudeyer. 2013: From Language to Motor Gavagai: Unified Imitation Learning of Linguistic and Sensorimotor Skills, IEEE Transactions on Autonomous Mental Development (TAMD).
This journal article and book chapter both explore a setup where a learner/imitator watches two humans interact; one teacher/demonstrator, and one interactant. In one of the experiments, the interactant would sometimes perform a communicative act, either a speech utterance or a hand sign, and at other times there was no communication. The teacher performed a demonstration that was sometimes a response to the interactant, and sometimes a response to some other part of the context. In order to solve this problem, the learner treats the interactant as any other part of the context. The learner learns how to respond to speech and hand signs, as well as how to respond to the position of an object. Learning linguistic skills is cast as a special case of learning other sensory motor skills, and a single system is able to learn both linguistic and non linguistic skills through observation (without needing to be told which of the skills are linguistic). Another experiment focused on imitating operations on internal cognitive structures (see below).
Thomas Cederborg and Pierre-Yves Oudeyer. 2013: From Language to Motor Gavagai: Unified Imitation Learning of Linguistic and Sensorimotor Skills, IEEE Transactions on Autonomous Mental Development (TAMD).
This journal article and book chapter both explore a setup where a learner/imitator watches two humans interact; one teacher/demonstrator, and one interactant. In one of the experiments, the interactant would sometimes perform a communicative act, either a speech utterance or a hand sign, and at other times there was no communication. The teacher performed a demonstration that was sometimes a response to the interactant, and sometimes a response to some other part of the context. In order to solve this problem, the learner treats the interactant as any other part of the context. The learner learns how to respond to speech and hand signs, as well as how to respond to the position of an object. Learning linguistic skills is cast as a special case of learning other sensory motor skills, and a single system is able to learn both linguistic and non linguistic skills through observation (without needing to be told which of the skills are linguistic). Another experiment focused on imitating operations on internal cognitive structures (see below).

cederborgoudeyertamd2013.pdf | |
File Size: | 6374 kb |
File Type: |
Thomas Cederborg and Pierre-Yves Oudeyer. 2011: Imitating Operations on Internal Cognitive Structures for Language Aquisition, International Conference on Humanoid Robots.
This paper describes another experiment also covered in the TAMD 2013 article mentioned above, and focus on imitation of internal cognitive operations (in this case a "focus on object" operation, that can not be observed directly, but must instead be inferred). An artificial learner observes one teacher/demonstrator, as well as one interactant. The teacher performed a "focus on object" operation that could can not be observed directly. The learner had to infer what the teacher had done. The interactant performs two hand signs which are treated as part of the context. The teacher responds to one of them by focusing on the object that was indicated by one of the signs (there are three objects, with one hand sign for each), and responds to the other hand sign by performing a movement in the reference frame of the object that the teacher is now focusing on (there are three movements, and three movement request signs). The learner/imitator must infer how many different types of hand signs it has observed (since the inputs are continuous), as well as what hand sign triggered the focus on object operation and what hand sign triggered the movement type.
This paper describes another experiment also covered in the TAMD 2013 article mentioned above, and focus on imitation of internal cognitive operations (in this case a "focus on object" operation, that can not be observed directly, but must instead be inferred). An artificial learner observes one teacher/demonstrator, as well as one interactant. The teacher performed a "focus on object" operation that could can not be observed directly. The learner had to infer what the teacher had done. The interactant performs two hand signs which are treated as part of the context. The teacher responds to one of them by focusing on the object that was indicated by one of the signs (there are three objects, with one hand sign for each), and responds to the other hand sign by performing a movement in the reference frame of the object that the teacher is now focusing on (there are three movements, and three movement request signs). The learner/imitator must infer how many different types of hand signs it has observed (since the inputs are continuous), as well as what hand sign triggered the focus on object operation and what hand sign triggered the movement type.

humanoids2011.pdf | |
File Size: | 948 kb |
File Type: |
Manuel Lopes, Thomas Cederborg and Pierre-Yves Oudeyer. 2011: Simultaneous Acquisition of Task and Feedback Models. International Conference on Development and Learning (ICDL).
Here, a learner/imitator has initially only a partial model of the feedback protocol, and the task is completely unknown. Perfect knowledge of the feedback protocol would make learning the task easier, and a perfect understanding of the task would make learning the feedback protocol easier. The solution presented was to concurrently update the flawed models of both the feedback protocol and the task. This is an initial step towards learning interpretation hypotheses, and is well suited for exemplifying the type of research that the formalism, eventually published in the 2014 PJBR article mentioned above, is meant to describe.
Here, a learner/imitator has initially only a partial model of the feedback protocol, and the task is completely unknown. Perfect knowledge of the feedback protocol would make learning the task easier, and a perfect understanding of the task would make learning the feedback protocol easier. The solution presented was to concurrently update the flawed models of both the feedback protocol and the task. This is an initial step towards learning interpretation hypotheses, and is well suited for exemplifying the type of research that the formalism, eventually published in the 2014 PJBR article mentioned above, is meant to describe.

lopescederborgoudeyericdl2011.pdf | |
File Size: | 385 kb |
File Type: |
Thomas Cederborg, Li Ming, Adrien Baranes and Pierre-Yves Oudeyer. 2010: Incremental Local Online Gaussian Mixture Regression for Imitation Learning of Multiple Tasks, International Conference on Intelligent Robots and Systems (IROS).
A teacher/demonstrator performs a number of demonstrations, but they are not labeled. This means that the learner/imitator does not know the number of tasks, and does not know what demonstrations are instances of the same task. The paper introduces ILO-GMR, a regression technique that does not need to be told how many different tasks that is being demonstrated.
A teacher/demonstrator performs a number of demonstrations, but they are not labeled. This means that the learner/imitator does not know the number of tasks, and does not know what demonstrations are instances of the same task. The paper introduces ILO-GMR, a regression technique that does not need to be told how many different tasks that is being demonstrated.

cederborgetaliros10.pdf | |
File Size: | 1400 kb |
File Type: |
Thomas Cederborg. 2009: Combining different interaction strategies reduces uncertainty when bootstrapping a lexicon. European Conference on Advances in artificial Life (ECAL).
A population of initially non linguistic agents negotiated meanings for words of the type "shape" and "colour", concurrently with words of the type "blue", "round", "red", and "square" by playing language games. If one agent uses the description "gavagai" to refer to an object, then there is referential uncertainty, as it is not clear what aspect of the object has been described. Referential uncertainty can be reduced if two agents observe an object and one asks "what colour is that?" and the other answers "gavagai". Knowing words such as "red" and "blue" makes it easier to learn words such as "colour". And knowing words such as "colour" makes it easier to learn words such as "red" and "blue". Agents concurrently negotiated meanings for both types of words.
A population of initially non linguistic agents negotiated meanings for words of the type "shape" and "colour", concurrently with words of the type "blue", "round", "red", and "square" by playing language games. If one agent uses the description "gavagai" to refer to an object, then there is referential uncertainty, as it is not clear what aspect of the object has been described. Referential uncertainty can be reduced if two agents observe an object and one asks "what colour is that?" and the other answers "gavagai". Knowing words such as "red" and "blue" makes it easier to learn words such as "colour". And knowing words such as "colour" makes it easier to learn words such as "red" and "blue". Agents concurrently negotiated meanings for both types of words.

cederborgecal2009.pdf | |
File Size: | 161 kb |
File Type: |
My CV:

cederborgscv.pdf | |
File Size: | 93 kb |
File Type: |