Efficient Control Synthesis and Learning in Distributed Cyber-Physical Systems

pdf

Abstract:

Scientific challenges: How can multiple cooperative cyber-physical systems communicate and coordinate to accomplish complex high-level tasks within unknown, dynamic and adversarial environments?

Overview: This collaborative project between the University of Delaware (UD) and Boston University (BU) aims at developing a framework unifying temporal logic planning and adaptive reactive synthesis for cyber-physical systems that operate in dynamic and unknown environments. Concrete hybrid dynamical systems admit discrete abstractions that can be identified and analyzed using tools from automata theory, formal language theory, grammatical inference, and model checking. The approach considers multiple systems learning the dynamics of an unknown environment collectively, and then refine their plans based on models for the environmental dynamics that are inferred during execution.

Results: The research activity culminated in the development of a framework for learning and automated control synthesis that can be used in multi-agent systems to adaptively strategize about how to optimally satisfy temporal logic specifications while interacting with an initially unknown and possibly adversarial dynamic environment. One key aspect of this novel integration of machine learning and symbolic control synthesis is that control is decoupled from learning. Learning takes the form of incremental system identification; control synthesis uses the learned model to predict environment behavior and decide on how to satisfy system specifications. As the world model is refined through the incremental learning process, the control methodology becomes increasingly more effective We addressed two different forms of uncertainty over environment behavior. In one case, the environment dynamics are captured by an MDP, and our learning algorithms are tasked with identifying the transition probabilities of this MDP based on continuing observations of environment behavior. In the second case, the environment dynamics are deterministic, still unknown, and in addition, adversarial. In this case, our learning algorithms identify the actual structure of the environment model, in other words, the transition graph. In both cases, the prior knowledge for the learner is the class of formal languages that capture the environment behavior, which for computational efficiency, are assumed to be specific subsets of regular languages. Our control methodologies branch out to two main directions as well. One direction serves cases where the environment models learned are probabilistic, and involves new algorithms for computing control policies in MDPs. The other direction serves cases where the environment models that are constructed incrementally are deterministic, and draws from game theoretic techniques for two-player zero-sum games, model-checking, and discrete receding horizon control. Our control strategies are informed by the models which are built incrementally by our learners. We have validated the architecture both in simulation in a multi-agent environment, and in experimental studies carried out on robotic hardware running the same code as tested numerically. It is demonstrated that with the consistency, conservativeness, and convergence guarantees of the learning algorithm provided by grammatical inference theory, the controllable agents perform the best possible toward meeting their task specifications, given available information. It can be shown that in the limit, the system can recover the performance it would have should it originally had complete information about the dynamics of its adversary.

Broader impacts: Exploiting a common theoretical foundation provided by formal language theory, we establish new and reinforce existing intellectual links between disparate scientific communities, and educate a new generation of scholars versed in topics of arts, sciences, and engineering. Outreach initiatives supported by outcomes of this research activity include collaboration with the BU Academy, a Boy Scout Robotics Merit Badge Program, the Delaware Annual Robotics Day, Newark’s Green Fest, as well as the involvement in the educational activities of Wilmington’s non-profit organizations aimed at supporting minority middle-school students of low socioeconomic status.

Tags:

Boston University

game theory

grammatical inference

learning

LTL control synthesis

University of Delaware