Main Research Question

Main research question:

In some cases, it is not possible to specify what an artificial learner should accomplish at design time. This would for example be the case when a system is shipped to many different users, each with a unique set of preferences, or when designers can not foresee all situations that the learner will encounter, or all tasks that it will be expected to perform. Specifying what should be done, for example by defining a reward function, can also be difficult, even for a known task in a known environment. Without a pre specified goal, learners will need to infer what should be done. My main research interest is the theory, implementation, and experimental evaluation of artificial learners that deal with this issue by figuring out what a human teacher would like the learner to do through interaction.

A given algorithm that modifies a policy based on interactions with a teacher can be viewed as built on top of an implicit or explicit interpretation of teacher behavior. An algorithm that for example maximises the number of times a teacher presses a positive evaluation button, minus the number of times the teacher presses a negative evaluation button, is making a set of assumptions regarding how to interpret what the teacher means by pressing these buttons. We can compare the accuracy of different interpretations since we know the purpose of the system: to do what the human teacher would like the learner to do. If the learner is trying to do something else, this accuracy comparison might not be valid. But given this known goal, the interpretations that an algorithm is built on top of could be bad, in a well defined way. Consider an interaction where a learner receives one negative evaluation for creating a problem, and then multiple positive evaluations for a series of actions that resolve the problem and returns the agent to the original state. If the evaluations are interpreted as something that should be maximised, then the implication would be that it would be a good idea to go through this cycle as often as possible. The human behavior is (explicitly or implicitly) interpreted as urging the agent to take the action that create the problem. Alternative interpretations are possible; for example that the human is evaluating the action choice of the learner. The interpretation of the exact same human behavior would then instead be that the learner should not take the action that create the problem (as this action choice received a negative evaluation). It also implies that the learner should solve the problem, but only if the problem has already been created. Alternative interpretations can lead to new algorithms, such as the Policy Shaping algorithm which is built on top of the action evaluation interpretation (described in the past research section. See also the 2015 IJCAI paper and the 2016 AAMAS paper in the publications section). Similar interpretation questions can be asked for other interaction modes, for example: "what assumptions would we be making if we were maximising smiles minus frowns?". I think these foundational interpretation issues are necessary to deal with if one wants to build an artificial learner that is able to figure out what a human teacher would like it to do. Working on these issues in the context of social signals would be a possible topic for my next post doc (see the future direction section for more details).

While a perfect interpretation is not a realistic goal, this does not mean that all interpretations are equal. If we put a robot in an unstructured environment, such as a forrest or a city, then a fully accurate and complete model of the world will not be representable in any implementable parameter space. Some models are still better than others. The same principle holds for interpretations of a human teacher. Better models of an unstructured environment can make an agent better (but not perfect) at figuring out how to do things. In the same way, better interpretations of a human teacher can make an agent better (but not perfect) at figuring out what should be done. Finding better interpretations is relevant to designers of artificial learners, because new interpretations imply new algorithms. Imagine a group of drones, operating autonomously for months at a time in a forrest or a city, filming birds for a nature documentary. How to encode a success criteria for this task would be difficult. Even implementing simple constraints such as "footage is unusable if the birds are modifying their behavior as a response to the drone" would be tricky, let alone trying to describe what constitutes interesting footage. Demonstrations might be tricky to interpret, especially if the demonstrator is not perfect. Evaluations might also be tricky to interpret, for example if the teacher has trouble determining whether or not the birds has actually modified their behavior as a response to the drone. Complete and fully accurate models of things such as: what constitutes interesting footage, how to interpret the teacher, how birds move, etc will not be representable in any implementable parameter space. But one could still improve performance by finding better models.

I'm also interested in theoretical problems related to learning from human teachers that are flawed, limited, or mistaken in important ways. Consider a teacher that consistently gives the same type of bad demonstrations due to consistently failing to notice some side effect. Statistical averaging approaches would not be very useful since the mistake is consistent. The learner could reproduce the actions and observe teacher evaluations, but this will not help much if the teacher still do not notice the side effect. It seems like we need theoretical foundations along the lines of "what would the teacher have said/done if the side effect of these actions had been noticed?". A similar issue arise if the teacher is acting in a way that is good in the context of some specific limitation. Consider a teacher that writes down all passwords on post-its. If we know that simply remembering all passwords is not an option, then we can understand this behavior better. Unless we view the actions of the teacher in the context of this memory limitation, we might conclude that writing down passwords is a mistake, or that the goal is to make the passwords public knowledge (helpfully informing others is a fairly common human behavior, and writing down information is a fairly common way of doing this). Humans instinctively interpret the actions of others within the context of limitations. As with other things that humans do automatically/unconsciously, there is a danger of overlooking this aspect during implementation of artificial systems. Another limitation would be that a teacher is unable to make a three point shot while playing basketball. In this case it might be a good idea to move closer to the hoop, enabling the teacher to try a two point shot. If the learner is able to make the three point shot, then the teacher might prefer that a learner takes the three point shot, instead of taking the action that moves the learner closer to the hoop. In this case, the teacher would prefer that the learner not follow the demonstration, despite the demonstration being a success (the action of moving closer to the hoop was good, but only in the context of the limitations of the teacher). Other types of limitations might make it reasonable to buy a month card at the gym instead of the year card, because going to the gym regularly for a year might not be a something that can be reliably done due to motivation issues.

Determining if a given action is a mistake, or if it's good within the context of some set of limitations, seems like a non trivial problem, especially before the learner has figured out what the teacher is trying to do. It is in general not straightforward to define what success means for a learner when teachers are flawed, limited, mistaken, etc. One can approach these issues with the aim of defining success in new and more complex situations. One can also approach them from the point of view of experimental setup design, deliberately avoiding ambiguities that results in theoretical questions that one is not able to answer yet (for example avoiding setups that lead to demonstrations that are only good in the context of certain limitations). See the paper "A Social Learning Formalism for Learners Trying to Figure Out What a Teacher Wants Them to Do", in the publications section for an attempt to address some of these issues.

In principle, it would also be possible for a learner to proactively avoid ambiguities that it is not able to deal with. Consider a teacher that inspects an apartment and gives an evaluation of a cleaning robot based on how clean the apartment looks, and a cleaning robot that is trying to figure out if sweeping dust under the rug is acceptable. A positive evaluation from a teacher that has only observed the end result would be difficult to deal with. A simple coping mechanism would be to sweep the dust under the rug when the teacher happens to be watching. To get informative evaluations, possible mistakes should be highlighted instead of hidden. Even if it is not clear how to deal with all the tricky theoretical issues, one can still use clever coping strategies that would be hard to find if we simply defined away all the complexity (for example by defining the problem as trying to get positive evaluations, as opposed to trying to do what the teacher would like the learner to do).