Visible to the public Designing Context-Sensitive Norm Inverse Reinforcement Learning Framework for Norm-Compliant Autonomous Agents

TitleDesigning Context-Sensitive Norm Inverse Reinforcement Learning Framework for Norm-Compliant Autonomous Agents
Publication TypeConference Paper
Year of Publication2020
AuthorsGuo, Y., Wang, B., Hughes, D., Lewis, M., Sycara, K.
Conference Name2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN)
Date PublishedSept. 2020
ISBN Number978-1-7281-6075-7
Keywordsautonomous agents interact, CNIRL, context spaces, context-sensitive norm inverse reinforcement learning framework, context-sensitive norm IRL, context-sensitive RL, data privacy, ethical social norms, expert demonstrations, expert systems, human behaviors, human factors, IRL framework, learning (artificial intelligence), Markov processes, multi-agent systems, norm-compliant autonomous agents, privacy, privacy norm, pubcrawl, Scalability, social norm-compliant behavior

Human behaviors are often prohibited, or permitted by social norms. Therefore, if autonomous agents interact with humans, they also need to reason about various legal rules, social and ethical social norms, so they would be trusted and accepted by humans. Inverse Reinforcement Learning (IRL) can be used for the autonomous agents to learn social norm-compliant behavior via expert demonstrations. However, norms are context-sensitive, i.e. different norms get activated in different contexts. For example, the privacy norm is activated for a domestic robot entering a bathroom where a person may be present, whereas it is not activated for the robot entering the kitchen. Representing various contexts in the state space of the robot, as well as getting expert demonstrations under all possible tasks and contexts is extremely challenging. Inspired by recent work on Modularized Normative MDP (MNMDP) and early work on context-sensitive RL, we propose a new IRL framework, Context-Sensitive Norm IRL (CNIRL). CNIRL treats states and contexts separately, and assumes that the expert determines the priority of every possible norm in the environment, where each norm is associated with a distinct reward function. The agent chooses the action to maximize its cumulative rewards. We present the CNIRL model and show that its computational complexity is scalable in the number of norms. We also show via two experimental scenarios that CNIRL can handle problems with changing context spaces.

Citation Keyguo_designing_2020