Visible to the public Biblio

Filters: Keyword is reinforcement learning  [Clear All Filters]
Ma, Yaodong, Liu, Kai, Luo, Xiling.  2022.  Game Theory Based Multi-agent Cooperative Anti-jamming for Mobile Ad Hoc Networks. 2022 IEEE 8th International Conference on Computer and Communications (ICCC). :901–905.
Currently, mobile ad hoc networks (MANETs) are widely used due to its self-configuring feature. However, it is vulnerable to the malicious jammers in practice. Traditional anti-jamming approaches, such as channel hopping based on deterministic sequences, may not be the reliable solution against intelligent jammers due to its fixed patterns. To address this problem, we propose a distributed game theory-based multi-agent anti-jamming (DMAA) algorithm in this paper. It enables each user to exploit all information from its neighboring users before the network attacks, and derive dynamic local policy knowledge to overcome intelligent jamming attacks efficiently as well as guide the users to cooperatively hop to the same channel with high probability. Simulation results demonstrate that the proposed algorithm can learn an optimal policy to guide the users to avoid malicious jamming more efficiently and rapidly than the random and independent Q-learning baseline algorithms,
Sinha, Arunesh.  2022.  AI and Security: A Game Perspective. 2022 14th International Conference on COMmunication Systems & NETworkS (COMSNETS). :393–396.
In this short paper, we survey some work at the intersection of Artificial Intelligence (AI) and security that are based on game theoretic considerations, and particularly focus on the author's (our) contribution in these areas. One half of this paper focuses on applications of game theoretic and learning reasoning for addressing security applications such as in public safety and wildlife conservation. In the second half, we present recent work that attacks the learning components of these works, leading to sub-optimal defense allocation. We finally end by pointing to issues and potential research problems that can arise due to data quality in the real world.
ISSN: 2155-2509
Hammar, Kim, Stadler, Rolf.  2022.  An Online Framework for Adapting Security Policies in Dynamic IT Environments. 2022 18th International Conference on Network and Service Management (CNSM). :359—363.

We present an online framework for learning and updating security policies in dynamic IT environments. It includes three components: a digital twin of the target system, which continuously collects data and evaluates learned policies; a system identification process, which periodically estimates system models based on the collected data; and a policy learning process that is based on reinforcement learning. To evaluate our framework, we apply it to an intrusion prevention use case that involves a dynamic IT infrastructure. Our results demonstrate that the framework automatically adapts security policies to changes in the IT infrastructure and that it outperforms a state-of-the-art method.

Gong, Taiyuan, Zhu, Li.  2022.  Edge Intelligence-based Obstacle Intrusion Detection in Railway Transportation. GLOBECOM 2022 - 2022 IEEE Global Communications Conference. :2981—2986.
Train operation is highly influenced by the rail track state and the surrounding environment. An abnormal obstacle on the rail track will pose a severe threat to the safe operation of urban rail transit. The existing general obstacle detection approaches do not consider the specific urban rail environment and requirements. In this paper, we propose an edge intelligence (EI)-based obstacle intrusion detection system to detect accurate obstacle intrusion in real-time. A two-stage lightweight deep learning model is designed to detect obstacle intrusion and obtain the distance from the train to the obstacle. Edge computing (EC) and 5G are used to conduct the detection model and improve the real-time detection performance. A multi-agent reinforcement learning-based offloading and service migration model is formulated to optimize the edge computing resource. Experimental results show that the two-stage intrusion detection model with the reinforcement learning (RL)-based edge resource optimization model can achieve higher detection accuracy and real-time performance compared to traditional methods.
Yao, Zhiyuan, Shi, Tianyu, Li, Site, Xie, Yiting, Qin, Yuanyuan, Xie, Xiongjie, Lu, Huan, Zhang, Yan.  2022.  Towards Modern Card Games with Large-Scale Action Spaces Through Action Representation. 2022 IEEE Conference on Games (CoG). :576–579.
Axie infinity is a complicated card game with a huge-scale action space. This makes it difficult to solve this challenge using generic Reinforcement Learning (RL) algorithms. We propose a hybrid RL framework to learn action representations and game strategies. To avoid evaluating every action in the large feasible action set, our method evaluates actions in a fixed-size set which is determined using action representations. We compare the performance of our method with two baseline methods in terms of their sample efficiency and the winning rates of the trained models. We empirically show that our method achieves an overall best winning rate and the best sample efficiency among the three methods.
ISSN: 2325-4289
Sun, Haoran, Zhu, Xiaolong, Zhou, Conghua.  2022.  Deep Reinforcement Learning for Video Summarization with Semantic Reward. 2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C). :754–755.

Video summarization aims to improve the efficiency of large-scale video browsing through producting concise summaries. It has been popular among many scenarios such as video surveillance, video review and data annotation. Traditional video summarization techniques focus on filtration in image features dimension or image semantics dimension. However, such techniques can make a large amount of possible useful information lost, especially for many videos with rich text semantics like interviews, teaching videos, in that only the information relevant to the image dimension will be retained. In order to solve the above problem, this paper considers video summarization as a continuous multi-dimensional decision-making process. Specifically, the summarization model predicts a probability for each frame and its corresponding text, and then we designs reward methods for each of them. Finally, comprehensive summaries in two dimensions, i.e. images and semantics, is generated. This approach is not only unsupervised and does not rely on labels and user interaction, but also decouples the semantic and image summarization models to provide more usable interfaces for subsequent engineering use.

ISSN: 2693-9371

Rizwan, Kainat, Ahmad, Mudassar, Habib, Muhammad Asif.  2022.  Cyber Automated Network Resilience Defensive Approach against Malware Images. 2022 International Conference on Frontiers of Information Technology (FIT). :237—242.
Cyber threats have been a major issue in the cyber security domain. Every hacker follows a series of cyber-attack stages known as cyber kill chain stages. Each stage has its norms and limitations to be deployed. For a decade, researchers have focused on detecting these attacks. Merely watcher tools are not optimal solutions anymore. Everything is becoming autonomous in the computer science field. This leads to the idea of an Autonomous Cyber Resilience Defense algorithm design in this work. Resilience has two aspects: Response and Recovery. Response requires some actions to be performed to mitigate attacks. Recovery is patching the flawed code or back door vulnerability. Both aspects were performed by human assistance in the cybersecurity defense field. This work aims to develop an algorithm based on Reinforcement Learning (RL) with a Convoluted Neural Network (CNN), far nearer to the human learning process for malware images. RL learns through a reward mechanism against every performed attack. Every action has some kind of output that can be classified into positive or negative rewards. To enhance its thinking process Markov Decision Process (MDP) will be mitigated with this RL approach. RL impact and induction measures for malware images were measured and performed to get optimal results. Based on the Malimg Image malware, dataset successful automation actions are received. The proposed work has shown 98% accuracy in the classification, detection, and autonomous resilience actions deployment.
Wang, Shuangbao Paul, Arafin, Md Tanvir, Osuagwu, Onyema, Wandji, Ketchiozo.  2022.  Cyber Threat Analysis and Trustworthy Artificial Intelligence. 2022 6th International Conference on Cryptography, Security and Privacy (CSP). :86—90.
Cyber threats can cause severe damage to computing infrastructure and systems as well as data breaches that make sensitive data vulnerable to attackers and adversaries. It is therefore imperative to discover those threats and stop them before bad actors penetrating into the information systems.Threats hunting algorithms based on machine learning have shown great advantage over classical methods. Reinforcement learning models are getting more accurate for identifying not only signature-based but also behavior-based threats. Quantum mechanics brings a new dimension in improving classification speed with exponential advantage. The accuracy of the AI/ML algorithms could be affected by many factors, from algorithm, data, to prejudicial, or even intentional. As a result, AI/ML applications need to be non-biased and trustworthy.In this research, we developed a machine learning-based cyber threat detection and assessment tool. It uses two-stage (both unsupervised and supervised learning) analyzing method on 822,226 log data recorded from a web server on AWS cloud. The results show the algorithm has the ability to identify the threats with high confidence.
Ogawa, Kanta, Sawada, Kenji, Sakata, Kosei.  2022.  Vulnerability Modeling and Protection Strategies via Supervisory Control Theory. 2022 IEEE 11th Global Conference on Consumer Electronics (GCCE). :559–560.
The paper aims to discover vulnerabilities by application of supervisory control theory and to design a defensive supervisor against vulnerability attacks. Supervisory control restricts the system behavior to satisfy the control specifications. The existence condition of the supervisor, sometimes results in undesirable plant behavior, which can be regarded as a vulnerability of the control specifications. We aim to design a more robust supervisor against this vulnerability.
ISSN: 2378-8143
Zhang, Lei, Zhou, Jian, Ma, Yizhong, Shen, Lijuan.  2022.  Sequential Topology Attack of Supply Chain Networks Based on Reinforcement Learning. 2022 International Conference on Cyber-Physical Social Intelligence (ICCSI). :744–749.
The robustness of supply chain networks (SCNs) against sequential topology attacks is significant for maintaining firm relationships and activities. Although SCNs have experienced many emergencies demonstrating that mixed failures exacerbate the impact of cascading failures, existing studies of sequential attacks rarely consider the influence of mixed failure modes on cascading failures. In this paper, a reinforcement learning (RL)-based sequential attack strategy is applied to SCNs with cascading failures that consider mixed failure modes. To solve the large state space search problem in SCNs, a deep Q-network (DQN) optimization framework combining deep neural networks (DNNs) and RL is proposed to extract features of state space. Then, it is compared with the traditional random-based, degree-based, and load-based sequential attack strategies. Simulation results on Barabasi-Albert (BA), Erdos-Renyi (ER), and Watts-Strogatz (WS) networks show that the proposed RL-based sequential attack strategy outperforms three existing sequential attack strategies. It can trigger cascading failures with greater influence. This work provides insights for effectively reducing failure propagation and improving the robustness of SCNs.
Yang, Xiaoran, Guo, Zhen, Mai, Zetian.  2022.  Botnet Detection Based on Machine Learning. 2022 International Conference on Blockchain Technology and Information Security (ICBCTIS). :213–217.
A botnet is a new type of attack method developed and integrated on the basis of traditional malicious code such as network worms and backdoor tools, and it is extremely threatening. This course combines deep learning and neural network methods in machine learning methods to detect and classify the existence of botnets. This sample does not rely on any prior features, the final multi-class classification accuracy rate is higher than 98.7%, the effect is significant.
Jiang, Linlang, Zhou, Jingbo, Xu, Tong, Li, Yanyan, Chen, Hao, Dou, Dejing.  2022.  Time-aware Neural Trip Planning Reinforced by Human Mobility. 2022 International Joint Conference on Neural Networks (IJCNN). :1–8.
Trip planning, which targets at planning a trip consisting of several ordered Points of Interest (POIs) under user-provided constraints, has long been treated as an important application for location-based services. The goal of trip planning is to maximize the chance that the users will follow the planned trip while it is difficult to directly quantify and optimize the chance. Conventional methods either leverage statistical analysis to rank POIs to form a trip or generate trips following pre-defined objectives based on constraint programming to bypass such a problem. However, these methods may fail to reflect the complex latent patterns hidden in the human mobility data. On the other hand, though there are a few deep learning-based trip recommendation methods, these methods still cannot handle the time budget constraint so far. To this end, we propose a TIme-aware Neural Trip Planning (TINT) framework to tackle the above challenges. First of all, we devise a novel attention-based encoder-decoder trip generator that can learn the correlations among POIs and generate trips under given constraints. Then, we propose a specially-designed reinforcement learning (RL) paradigm to directly optimize the objective to obtain an optimal trip generator. For this purpose, we introduce a discriminator, which distinguishes the generated trips from real-life trips taken by users, to provide reward signals to optimize the generator. Subsequently, to ensure the feedback from the discriminator is always instructive, we integrate an adversarial learning strategy into the RL paradigm to update the trip generator and the discriminator alternately. Moreover, we devise a novel pre-training schema to speed up the convergence for an efficient training process. Extensive experiments on four real-world datasets validate the effectiveness and efficiency of our framework, which shows that TINT could remarkably outperform the state-of-the-art baselines within short response time.
ISSN: 2161-4407
Mainampati, Manasa, Chandrasekaran, Balasubramaniyan.  2021.  Implementation of Human in The Loop on the TurtleBot using Reinforced Learning methods and Robot Operating System (ROS). 2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). :0448–0452.
In this paper, an implementation of a human in the loop (HITL) technique for robot navigation in an indoor environment is described. The HITL technique is integrated into the reinforcement learning algorithms for mobile robot navigation. Reinforcement algorithms, specifically Q-learning and SARSA, are used combined with HITL since these algorithms are good in exploration and navigation. Turtlebot3 has been used as the robot for validating the algorithms by implementing the system using Robot Operating System and Gazebo. The robot-assisted with human feedback was found to be better in navigation task execution when compared to standard algorithms without using human in the loop. This is a work in progress and the next step of this research is exploring other reinforced learning methods and implementing them on a physical robot.
ISSN: 2644-3163
Deng, Weiyang, Sargent, Barbara, Bradley, Nina S., Klein, Lauren, Rosales, Marcelo, Pulido, José Carlos, Matarić, Maja J, Smith, Beth A..  2021.  Using Socially Assistive Robot Feedback to Reinforce Infant Leg Movement Acceleration. 2021 30th IEEE International Conference on Robot & Human Interactive Communication (RO-MAN). :749–756.
Learning movement control is a fundamental process integral to infant development. However, it is still unclear how infants learn to control leg movement. This work explores the potential of using socially assistive robots to provide real-time adaptive reinforcement learning for infants. Ten 6 to 8-month old typically-developing infants participated in a study where a robot provided reinforcement when the infant’s right leg acceleration fell within the range of 9 to 20 m/s2. If infants increased the proportion of leg accelerations in this band, they were categorized as "performers". Six of the ten participating infants were categorized as performers; the performer subgroup increased the magnitude of acceleration, proportion of target acceleration for right leg, and ratio of right/left leg acceleration peaks within the target acceleration band and their right legs increased movement intensity from the baseline to the contingency session. The results showed infants specifically adjusted their right leg acceleration in response to a robot- provided reward. Further study is needed to understand how to improve human-robot interaction policies for personalized interventions for young infants.
ISSN: 1944-9437
Jo, Hyeonjun, Kim, Kyungbaek.  2022.  Security Service-aware Reinforcement Learning for Efficient Network Service Provisioning. 2022 23rd Asia-Pacific Network Operations and Management Symposium (APNOMS). :1–4.
In case of deploying additional network security equipment in a new location, network service providers face difficulties such as precise management of large number of network security equipment and expensive network operation costs. Accordingly, there is a need for a method for security-aware network service provisioning using the existing network security equipment. In order to solve this problem, there is an existing reinforcement learning-based routing decision method fixed for each node. This method performs repeatedly until a routing decision satisfying end-to-end security constraints is achieved. This generates a disadvantage of longer network service provisioning time. In this paper, we propose security constraints reinforcement learning based routing (SCRR) algorithm that generates routing decisions, which satisfies end-to-end security constraints by giving conditional reward values according to the agent state-action pairs when performing reinforcement learning.
ISSN: 2576-8565
Heseding, Hauke, Zitterbart, Martina.  2022.  ReCEIF: Reinforcement Learning-Controlled Effective Ingress Filtering. 2022 IEEE 47th Conference on Local Computer Networks (LCN). :106–113.
Volumetric Distributed Denial of Service attacks forcefully disrupt the availability of online services by congesting network links with arbitrary high-volume traffic. This brute force approach has collateral impact on the upstream network infrastructure, making early attack traffic removal a key objective. To reduce infrastructure load and maintain service availability, we introduce ReCEIF, a topology-independent mitigation strategy for early, rule-based ingress filtering leveraging deep reinforcement learning. ReCEIF utilizes hierarchical heavy hitters to monitor traffic distribution and detect subnets that are sending high-volume traffic. Deep reinforcement learning subsequently serves to refine hierarchical heavy hitters into effective filter rules that can be propagated upstream to discard traffic originating from attacking systems. Evaluating all filter rules requires only a single clock cycle when utilizing fast ternary content-addressable memory, which is commonly available in software defined networks. To outline the effectiveness of our approach, we conduct a comparative evaluation to reinforcement learning-based router throttling.
Oakley, Lisa, Oprea, Alina, Tripakis, Stavros.  2022.  Adversarial Robustness Verification and Attack Synthesis in Stochastic Systems. 2022 IEEE 35th Computer Security Foundations Symposium (CSF). :380–395.

Probabilistic model checking is a useful technique for specifying and verifying properties of stochastic systems including randomized protocols and reinforcement learning models. However, these methods rely on the assumed structure and probabilities of certain system transitions. These assumptions may be incorrect, and may even be violated by an adversary who gains control of some system components. In this paper, we develop a formal framework for adversarial robustness in systems modeled as discrete time Markov chains (DTMCs). We base our framework on existing methods for verifying probabilistic temporal logic properties and extend it to include deterministic, memoryless policies acting in Markov decision processes (MDPs). Our framework includes a flexible approach for specifying structure-preserving and non structure-preserving adversarial models. We outline a class of threat models under which adversaries can perturb system transitions, constrained by an ε ball around the original transition probabilities. We define three main DTMC adversarial robustness problems: adversarial robustness verification, maximal δ synthesis, and worst case attack synthesis. We present two optimization-based solutions to these three problems, leveraging traditional and parametric probabilistic model checking techniques. We then evaluate our solutions on two stochastic protocols and a collection of Grid World case studies, which model an agent acting in an environment described as an MDP. We find that the parametric solution results in fast computation for small parameter spaces. In the case of less restrictive (stronger) adversaries, the number of parameters increases, and directly computing property satisfaction probabilities is more scalable. We demonstrate the usefulness of our definitions and solutions by comparing system outcomes over various properties, threat models, and case studies.

Djeachandrane, Abhishek, Hoceini, Said, Delmas, Serge, Duquerrois, Jean-Michel, Mellouk, Abdelhamid.  2022.  QoE-based Situational Awareness-Centric Decision Support for Network Video Surveillance. ICC 2022 - IEEE International Conference on Communications. :335–340.

Control room video surveillance is an important source of information for ensuring public safety. To facilitate the process, a Decision-Support System (DSS) designed for the security task force is vital and necessary to take decisions rapidly using a sea of information. In case of mission critical operation, Situational Awareness (SA) which consists of knowing what is going on around you at any given time plays a crucial role across a variety of industries and should be placed at the center of our DSS. In our approach, SA system will take advantage of the human factor thanks to the reinforcement signal whereas previous work on this field focus on improving knowledge level of DSS at first and then, uses the human factor only for decision-making. In this paper, we propose a situational awareness-centric decision-support system framework for mission-critical operations driven by Quality of Experience (QoE). Our idea is inspired by the reinforcement learning feedback process which updates the environment understanding of our DSS. The feedback is injected by a QoE built on user perception. Our approach will allow our DSS to evolve according to the context with an up-to-date SA.

Hammar, Kim, Stadler, Rolf.  2022.  A System for Interactive Examination of Learned Security Policies. NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. :1–3.
We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure’s state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.
Silva, Ryan, Hickert, Cameron, Sarfaraz, Nicolas, Brush, Jeff, Silbermann, Josh, Sookoor, Tamim.  2022.  AlphaSOC: Reinforcement Learning-based Cybersecurity Automation for Cyber-Physical Systems. 2022 ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS). :290—291.
Achieving agile and resilient autonomous capabilities for cyber defense requires moving past indicators and situational awareness into automated response and recovery capabilities. The objective of the AlphaSOC project is to use state of the art sequential decision-making methods to automatically investigate and mitigate attacks on cyber physical systems (CPS). To demonstrate this, we developed a simulation environment that models the distributed navigation control system and physics of a large ship with two rudders and thrusters for propulsion. Defending this control network requires processing large volumes of cyber and physical signals to coordi-nate defensive actions over many devices with minimal disruption to nominal operation. We are developing a Reinforcement Learning (RL)-based approach to solve the resulting sequential decision-making problem that has large observation and action spaces.
Sewak, Mohit, Sahay, Sanjay K., Rathore, Hemant.  2022.  X-Swarm: Adversarial DRL for Metamorphic Malware Swarm Generation. 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops). :169–174.
Advanced metamorphic malware and ransomware use techniques like obfuscation to alter their internal structure with every attack. Therefore, any signature extracted from such attack, and used to bolster endpoint defense, cannot avert subsequent attacks. Therefore, if even a single such malware intrudes even a single device of an IoT network, it will continue to infect the entire network. Scenarios where an entire network is targeted by a coordinated swarm of such malware is not beyond imagination. Therefore, the IoT era also requires Industry-4.0 grade AI-based solutions against such advanced attacks. But AI-based solutions need a large repository of data extracted from similar attacks to learn robust representations. Whereas, developing a metamorphic malware is a very complex task and requires extreme human ingenuity. Hence, there does not exist abundant metamorphic malware to train AI-based defensive solutions. Also, there is currently no system that could generate enough functionality preserving metamorphic variants of multiple malware to train AI-based defensive systems. Therefore, to this end, we design and develop a novel system, named X-Swarm. X-Swarm uses deep policy-based adversarial reinforcement learning to generate swarm of metamorphic instances of any malware by obfuscating them at the opcode level and ensuring that they could evade even capable, adversarial-attack immune endpoint defense systems.
Hashmi, Saad Sajid, Dam, Hoa Khanh, Smet, Peter, Chhetri, Mohan Baruwal.  2022.  Towards Antifragility in Contested Environments: Using Adversarial Search to Learn, Predict, and Counter Open-Ended Threats. 2022 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). :141—146.
Resilience and antifragility under duress present significant challenges for autonomic and self-adaptive systems operating in contested environments. In such settings, the system has to continually plan ahead, accounting for either an adversary or an environment that may negate its actions or degrade its capabilities. This will involve projecting future states, as well as assessing recovery options, counter-measures, and progress towards system goals. For antifragile systems to be effective, we envision three self-* properties to be of key importance: self-exploration, self-learning and self-training. Systems should be able to efficiently self-explore – using adversarial search – the potential impact of the adversary’s attacks and compute the most resilient responses. The exploration can be assisted by prior knowledge of the adversary’s capabilities and attack strategies, which can be self-learned – using opponent modelling – from previous attacks and interactions. The system can self-train – using reinforcement learning – such that it evolves and improves itself as a result of being attacked. This paper discusses those visions and outlines their realisation in AWaRE, a cyber-resilient and self-adaptive multi-agent system.
Shubham, Kumar, Venkatesh, Gopalakrishnan, Sachdev, Reijul, Akshi, Jayagopi, Dinesh Babu, Srinivasaraghavan, G..  2021.  Learning a Deep Reinforcement Learning Policy Over the Latent Space of a Pre-trained GAN for Semantic Age Manipulation. 2021 International Joint Conference on Neural Networks (IJCNN). :1–8.
Learning a disentangled representation of the latent space has become one of the most fundamental problems studied in computer vision. Recently, many Generative Adversarial Networks (GANs) have shown promising results in generating high fidelity images. However, studies to understand the semantic layout of the latent space of pre-trained models are still limited. Several works train conditional GANs to generate faces with required semantic attributes. Unfortunately, in these attempts, the generated output is often not as photo-realistic as the unconditional state-of-the-art models. Besides, they also require large computational resources and specific datasets to generate high fidelity images. In our work, we have formulated a Markov Decision Process (MDP) over the latent space of a pre-trained GAN model to learn a conditional policy for semantic manipulation along specific attributes under defined identity bounds. Further, we have defined a semantic age manipulation scheme using a locally linear approximation over the latent space. Results show that our learned policy samples high fidelity images with required age alterations, while preserving the identity of the person.
Li, Jian, Rong, Fei, Tang, Yu.  2020.  A Novel Q-Learning Algorithm Based on the Stochastic Environment Path Planning Problem. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1977—1982.
In this paper, we proposed a path planning algorithm based on Q-learning model to simulate an environment model, which is suitable for the complex environment. A virtual simulation platform has been built to complete the experiments. The experimental results show that the algorithm proposed in this paper can be effectively applied to the solution of vehicle routing problems in the complex environment.
Sun, Yang, Li, Na, Tao, Xiaofeng.  2021.  Privacy Preserved Secure Offloading in the Multi-access Edge Computing Network. 2021 IEEE Wireless Communications and Networking Conference Workshops (WCNCW). :1–6.
Mobile edge computing (MEC) emerges recently to help process the computation-intensive and delay-sensitive applications of resource limited mobile devices in support of MEC servers. Due to the wireless offloading, MEC faces many security challenges, like eavesdropping and privacy leakage. The anti-eavesdropping offloading or privacy preserving offloading have been studied in existing researches. However, both eavesdropping and privacy leakage may happen in the meantime in practice. In this paper, we propose a privacy preserved secure offloading scheme aiming to minimize the energy consumption, where the location privacy, usage pattern privacy and secure transmission against the eavesdropper are jointly considered. We formulate this problem as a constrained Markov decision process (CMDP) with the constraints of secure offloading rate and pre-specified privacy level, and solve it with reinforcement learning (RL). It can be concluded from the simulation that this scheme can save the energy consumption as well as improve the privacy level and security of the mobile device compared with the benchmark scheme.