CPS: Small: Informed Contextual Bandits to Support Decision-Making for Intelligent CPS
Lead PI:
Daniel Krutz
Co-PI:
Abstract

This NSF project aims to develop a novel computational framework for informed contextual multi-armed bandits (iCMABs) that will be capable of robustly operating in complex, time-varying environments. The project will bring transformative change to the way that intelligent decision-making agents are designed for CPS, specifically those that utilize variants of multi-armed bandits. The intellectual merits of the project include: I) designing novel informed contextual bandits that maintain a generative model of their external world/environment, II) designing neural architecture search processed based on neuro-evolution to automatically design these generative models in an online manner, III) providing mechanisms to identify and address corrupt contextual and reward information, and IV) facilitating a process that enables the agent to generate predictions over longer-term horizons by querying its internal generative model. The broader impacts of the project include: I) advancing intelligent CPS through the iCMAB framework, II) providing decision-making modules and processes that readily integrate with many intelligent CPS/operations, and III) making important contributions to the field of machine learning and nature-inspired computing, specifically the automated design of intelligent agents based on artificial neural networks (ANNs). 

Current challenges faced by intelligent CPS, such as those used for tasks such as sensor validation and activity decisions, include being required to robustly operate in the face of noise, coming from things such as corrupted, fragmented, and uncertain reward values and context/state information, as well as having to adapt and make predictions in real-time and continually. Our proposed iCMAB framework will enable CPS to tackle these problems by: I) jointly evolving, in an online fashion, a reward forecasting model and a generative world model of contexts, II) providing a measure of confidence in predictions of both reward and context signals, and III) utilizing evolving recurrent neural networks (eRNNs) and brain-inspired neural systems/mechanisms to predict both contextual and reward information in the streaming data setting while mitigating catastrophic forgetting. This updated information will serve as input to CPS that are driven by contextual-bandits, enabling them to take more informed actions.

Performance Period: 09/15/2022 - 08/31/2025
Institution: Rochester Institute of Tech
Award Number: 2225354