Visible to the public Biblio

Filters: Author is Chen, H.  [Clear All Filters]
2021-01-15
Ebrahimi, M., Samtani, S., Chai, Y., Chen, H..  2020.  Detecting Cyber Threats in Non-English Hacker Forums: An Adversarial Cross-Lingual Knowledge Transfer Approach. 2020 IEEE Security and Privacy Workshops (SPW). :20—26.

The regularity of devastating cyber-attacks has made cybersecurity a grand societal challenge. Many cybersecurity professionals are closely examining the international Dark Web to proactively pinpoint potential cyber threats. Despite its potential, the Dark Web contains hundreds of thousands of non-English posts. While machine translation is the prevailing approach to process non-English text, applying MT on hacker forum text results in mistranslations. In this study, we draw upon Long-Short Term Memory (LSTM), Cross-Lingual Knowledge Transfer (CLKT), and Generative Adversarial Networks (GANs) principles to design a novel Adversarial CLKT (A-CLKT) approach. A-CLKT operates on untranslated text to retain the original semantics of the language and leverages the collective knowledge about cyber threats across languages to create a language invariant representation without any manual feature engineering or external resources. Three experiments demonstrate how A-CLKT outperforms state-of-the-art machine learning, deep learning, and CLKT algorithms in identifying cyber-threats in French and Russian forums.

Zhang, N., Ebrahimi, M., Li, W., Chen, H..  2020.  A Generative Adversarial Learning Framework for Breaking Text-Based CAPTCHA in the Dark Web. 2020 IEEE International Conference on Intelligence and Security Informatics (ISI). :1—6.

Cyber threat intelligence (CTI) necessitates automated monitoring of dark web platforms (e.g., Dark Net Markets and carding shops) on a large scale. While there are existing methods for collecting data from the surface web, large-scale dark web data collection is commonly hindered by anti-crawling measures. Text-based CAPTCHA serves as the most prohibitive type of these measures. Text-based CAPTCHA requires the user to recognize a combination of hard-to-read characters. Dark web CAPTCHA patterns are intentionally designed to have additional background noise and variable character length to prevent automated CAPTCHA breaking. Existing CAPTCHA breaking methods cannot remedy these challenges and are therefore not applicable to the dark web. In this study, we propose a novel framework for breaking text-based CAPTCHA in the dark web. The proposed framework utilizes Generative Adversarial Network (GAN) to counteract dark web-specific background noise and leverages an enhanced character segmentation algorithm. Our proposed method was evaluated on both benchmark and dark web CAPTCHA testbeds. The proposed method significantly outperformed the state-of-the-art baseline methods on all datasets, achieving over 92.08% success rate on dark web testbeds. Our research enables the CTI community to develop advanced capabilities of large-scale dark web monitoring.

Liu, Y., Lin, F. Y., Ahmad-Post, Z., Ebrahimi, M., Zhang, N., Hu, J. L., Xin, J., Li, W., Chen, H..  2020.  Identifying, Collecting, and Monitoring Personally Identifiable Information: From the Dark Web to the Surface Web. 2020 IEEE International Conference on Intelligence and Security Informatics (ISI). :1—6.

Personally identifiable information (PII) has become a major target of cyber-attacks, causing severe losses to data breach victims. To protect data breach victims, researchers focus on collecting exposed PII to assess privacy risk and identify at-risk individuals. However, existing studies mostly rely on exposed PII collected from either the dark web or the surface web. Due to the wide exposure of PII on both the dark web and surface web, collecting from only the dark web or the surface web could result in an underestimation of privacy risk. Despite its research and practical value, jointly collecting PII from both sources is a non-trivial task. In this paper, we summarize our effort to systematically identify, collect, and monitor a total of 1,212,004,819 exposed PII records across both the dark web and surface web. Our effort resulted in 5.8 million stolen SSNs, 845,000 stolen credit/debit cards, and 1.2 billion stolen account credentials. From the surface web, we identified and collected over 1.3 million PII records of the victims whose PII is exposed on the dark web. To the best of our knowledge, this is the largest academic collection of exposed PII, which, if properly anonymized, enables various privacy research inquiries, including assessing privacy risk and identifying at-risk populations.

2021-01-11
Zhang, H., Zhang, D., Chen, H., Xu, J..  2020.  Improving Efficiency of Pseudonym Revocation in VANET Using Cuckoo Filter. 2020 IEEE 20th International Conference on Communication Technology (ICCT). :763–769.
In VANETs, pseudonyms are often used to replace the identity of vehicles in communication. When vehicles drive out of the network or misbehave, their pseudonym certificates need to be revoked by the certificate authority (CA). The certificate revocation lists (CRLs) are usually used to store the revoked certificates before their expiration. However, using CRLs would incur additional storage, communication and computation overhead. Some existing schemes have proposed to use Bloom Filter to compress the original CRLs, but they are unable to delete the expired certificates and introduce the false positive problem. In this paper, we propose an improved pseudonym certificates revocation scheme, using Cuckoo Filter for compression to reduce the impact of these problems. In order to optimize deletion efficiency, we propose the concept of Certificate Expiration List (CEL) which can be implemented with priority queue. The experimental results show that our scheme can effectively reduce the storage and communication overhead of pseudonym certificates revocation, while retaining moderately low false positive rates. In addition, our scheme can also greatly improve the lookup performance on CRLs, and reduce the revocation operation costs by allowing deletion.
2020-12-17
Lee, J., Chen, H., Young, J., Kim, H..  2020.  RISC-V FPGA Platform Toward ROS-Based Robotics Application. 2020 30th International Conference on Field-Programmable Logic and Applications (FPL). :370—370.

RISC-V is free and open standard instruction set architecture following reduced instruction set computer principle. Because of its openness and scalability, RISC-V has been adapted not only for embedded CPUs such as mobile and IoT market, but also for heavy-workload CPUs such as the data center or super computing field. On top of it, Robotics is also a good application of RISC-V because security and reliability become crucial issues of robotics system. These problems could be solved by enthusiastic open source community members as they have shown on open source operating system. However, running RISC-V on local FPGA becomes harder than before because now RISC-V foundation are focusing on cloud-based FPGA environment. We have experienced that recently released OS and toolchains for RISC-V are not working well on the previous CPU image for local FPGA. In this paper we design the local FPGA platform for RISC-V processor and run the robotics application on mainstream Robot Operating System on top of the RISC-V processor. This platform allow us to explore the architecture space of RISC-V CPU for robotics application, and get the insight of the RISC-V CPU architecture for optimal performance and the secure system.

2020-11-30
Xu, Y., Chen, H., Zhao, Y., Zhang, W., Shen, Q., Zhang, X., Ma, Z..  2019.  Neural Adaptive Transport Framework for Internet-scale Interactive Media Streaming Services. 2019 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). :1–6.
Network dynamics, such as bandwidth fluctuation and unexpected latency, hurt users' quality of experience (QoE) greatly for media services over the Internet. In this work, we propose a neural adaptive transport (NAT) framework to tackle the network dynamics for Internet-scale interactive media services. The entire NAT system has three major components: a learning based cloud overlay routing (COR) scheme for the best delivery path to bypass the network bottlenecks while offering the minimal end-to-end latency simultaneously; a residual neural network based collaborative video processing (CVP) system to trade the computational capability at client-end for QoE improvement via learned resolution scaling; and a deep reinforcement learning (DRL) based adaptive real-time streaming (ARS) strategy to select the appropriate video bitrate for maximal QoE. We have demonstrated that COR could improve the user satisfaction from 5% to 43%, CVP could reduce the bandwidth consumption more than 30% at the same quality, and DRL-based ARS can maintain the smooth streaming with \textbackslashtextless; 50% QoE improvement, respectively.
2020-11-20
Zhu, S., Chen, H., Xi, W., Chen, M., Fan, L., Feng, D..  2019.  A Worst-Case Entropy Estimation of Oscillator-Based Entropy Sources: When the Adversaries Have Access to the History Outputs. 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :152—159.
Entropy sources are designed to provide unpredictable random numbers for cryptographic systems. As an assessment of the sources, Shannon entropy is usually adopted to quantitatively measure the unpredictability of the outputs. In several related works about the entropy evaluation of ring oscillator-based (RO-based) entropy sources, authors evaluated the unpredictability with the average conditional Shannon entropy (ACE) of the source, moreover provided a lower bound of the ACE (LBoACE). However, in this paper, we have demonstrated that when the adversaries have access to the history outputs of the entropy source, for example, by some intrusive attacks, the LBoACE may overestimate the actual unpredictability of the next output for the adversaries. In this situation, we suggest to adopt the specific conditional Shannon entropy (SCE) which exactly measures the unpredictability of the future output with the knowledge of previous output sequences and so is more consistent with the reality than the ACE. In particular, to be conservative, we propose to take the lower bound of the SCE (LBoSCE) as an estimation of the worst-case entropy of the sources. We put forward a detailed method to estimate this worst-case entropy of RO-based entropy sources, which we have also verified by experiment on an FPGA device. We recommend to adopt this method to provide a conservative assessment of the unpredictability when the entropy source works in a vulnerable environment and the adversaries might obtain the previous outputs.
2019-10-02
McMahon, E., Patton, M., Samtani, S., Chen, H..  2018.  Benchmarking Vulnerability Assessment Tools for Enhanced Cyber-Physical System (CPS) Resiliency. 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). :100–105.

Cyber-Physical Systems (CPSs) are engineered systems seamlessly integrating computational algorithms and physical components. CPS advances offer numerous benefits to domains such as health, transportation, smart homes and manufacturing. Despite these advances, the overall cybersecurity posture of CPS devices remains unclear. In this paper, we provide knowledge on how to improve CPS resiliency by evaluating and comparing the accuracy, and scalability of two popular vulnerability assessment tools, Nessus and OpenVAS. Accuracy and suitability are evaluated with a diverse sample of pre-defined vulnerabilities in Industrial Control Systems (ICS), smart cars, smart home devices, and a smart water system. Scalability is evaluated using a large-scale vulnerability assessment of 1,000 Internet accessible CPS devices found on Shodan, the search engine for the Internet of Things (IoT). Assessment results indicate several CPS devices from major vendors suffer from critical vulnerabilities such as unsupported operating systems, OpenSSH vulnerabilities allowing unauthorized information disclosure, and PHP vulnerabilities susceptible to denial of service attacks.

2019-01-21
Xu, A., Dai, T., Chen, H., Ming, Z., Li, W..  2018.  Vulnerability Detection for Source Code Using Contextual LSTM. 2018 5th International Conference on Systems and Informatics (ICSAI). :1225–1230.

With the development of Internet technology, software vulnerabilities have become a major threat to current computer security. In this work, we propose the vulnerability detection for source code using Contextual LSTM. Compared with CNN and LSTM, we evaluated the CLSTM on 23185 programs, which are collected from SARD. We extracted the features through the program slicing. Based on the features, we used the natural language processing to analysis programs with source code. The experimental results demonstrate that CLSTM has the best performance for vulnerability detection, reaching the accuracy of 96.711% and the F1 score of 0.96984.

2018-09-28
Tsou, Y., Chen, H., Chen, J., Huang, Y., Wang, P..  2017.  Differential privacy-based data de-identification protection and risk evaluation system. 2017 International Conference on Information and Communication Technology Convergence (ICTC). :416–421.

As more and more technologies to store and analyze massive amount of data become available, it is extremely important to make privacy-sensitive data de-identified so that further analysis can be conducted by different parties. For example, data needs to go through data de-identification process before being transferred to institutes for further value added analysis. As such, privacy protection issues associated with the release of data and data mining have become a popular field of study in the domain of big data. As a strict and verifiable definition of privacy, differential privacy has attracted noteworthy attention and widespread research in recent years. Nevertheless, differential privacy is not practical for most applications due to its performance of synthetic dataset generation for data query. Moreover, the definition of data protection by randomized noise in native differential privacy is abstract to users. Therefore, we design a pragmatic DP-based data de-identification protection and risk of data disclosure estimation system, in which a DP-based noise addition mechanism is applied to generate synthetic datasets. Furthermore, the risk of data disclosure to these synthetic datasets can be evaluated before releasing to buyers/consumers.

2018-06-11
Zeng, J., Dong, L., Wu, Y., Chen, H., Li, C., Wang, S..  2017.  Privacy-Preserving and Multi-Dimensional Range Query in Two-Tiered Wireless Sensor Networks. GLOBECOM 2017 - 2017 IEEE Global Communications Conference. :1–7.

With the advancement of sensor electronic devices, wireless sensor networks have attracted more and more attention. Range query has become a significant part of sensor networks due to its availability and convenience. However, It is challenging to process range query while still protecting sensitive data from disclosure. Existing work mainly focuses on privacy- preserving range query, but neglects the damage of collusion attacks, probability attacks and differential attacks. In this paper, we propose a privacy- preserving, energy-efficient and multi-dimensional range query protocol called PERQ, which not only achieves data privacy, but also considers collusion attacks, probability attacks and differential attacks. Generalized distance-based and modular arithmetic range query mechanism are used. In addition, a novel cyclic modular verification scheme is proposed to verify the data integrity. Extensive theoretical analysis and experimental results confirm the high performance of PERQ in terms of energy efficiency, security and accountability requirements.

2018-05-30
Chang, S. H., William, T., Wu, W. Z., Cheng, B. C., Chen, H., Hsu, P. H..  2017.  Design of an Authentication and Key Management System for a Smart Meter Gateway in AMI. 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE). :1–2.

By applying power usage statistics from smart meters, users are able to save energy in their homes or control smart appliances via home automation systems. However, owing to security and privacy concerns, it is recommended that smart meters (SM) should not have direct communication with smart appliances. In this paper, we propose a design for a smart meter gateway (SMGW) associated with a two-phase authentication mechanism and key management scheme to link a smart grid with smart appliances. With placement of the SMGW, we can reduce the design complexity of SMs as well as enhance the strength of security.

2018-04-02
Lin, W., Wang, K., Zhang, Z., Chen, H..  2017.  Revisiting Security Risks of Asymmetric Scalar Product Preserving Encryption and Its Variants. 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). :1116–1125.

Cloud computing has emerged as a compelling vision for managing data and delivering query answering capability over the internet. This new way of computing also poses a real risk of disclosing confidential information to the cloud. Searchable encryption addresses this issue by allowing the cloud to compute the answer to a query based on the cipher texts of data and queries. Thanks to its inner product preservation property, the asymmetric scalar-product-preserving encryption (ASPE) has been adopted and enhanced in a growing number of works toperform a variety of queries and tasks in the cloud computingsetting. However, the security property of ASPE and its enhancedschemes has not been studied carefully. In this paper, we show acomplete disclosure of ASPE and several previously unknownsecurity risks of its enhanced schemes. Meanwhile, efficientalgorithms are proposed to learn the plaintext of data and queriesencrypted by these schemes with little or no knowledge beyondthe ciphertexts. We demonstrate these risks on real data sets.

2018-03-05
Zimba, A., Wang, Z., Chen, H..  2017.  Reasoning Crypto Ransomware Infection Vectors with Bayesian Networks. 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :149–151.

Ransomware techniques have evolved over time with the most resilient attacks making data recovery practically impossible. This has driven countermeasures to shift towards recovery against prevention but in this paper, we model ransomware attacks from an infection vector point of view. We follow the basic infection chain of crypto ransomware and use Bayesian network statistics to infer some of the most common ransomware infection vectors. We also employ the use of attack and sensor nodes to capture uncertainty in the Bayesian network.

2018-02-15
Ni, J., Cheng, W., Zhang, K., Song, D., Yan, T., Chen, H., Zhang, X..  2017.  Ranking Causal Anomalies by Modeling Local Propagations on Networked Systems. 2017 IEEE International Conference on Data Mining (ICDM). :1003–1008.

Complex systems are prevalent in many fields such as finance, security and industry. A fundamental problem in system management is to perform diagnosis in case of system failure such that the causal anomalies, i.e., root causes, can be identified for system debugging and repair. Recently, invariant network has proven a powerful tool in characterizing complex system behaviors. In an invariant network, a node represents a system component, and an edge indicates a stable interaction between two components. Recent approaches have shown that by modeling fault propagation in the invariant network, causal anomalies can be effectively discovered. Despite their success, the existing methods have a major limitation: they typically assume there is only a single and global fault propagation in the entire network. However, in real-world large-scale complex systems, it's more common for multiple fault propagations to grow simultaneously and locally within different node clusters and jointly define the system failure status. Inspired by this key observation, we propose a two-phase framework to identify and rank causal anomalies. In the first phase, a probabilistic clustering is performed to uncover impaired node clusters in the invariant network. Then, in the second phase, a low-rank network diffusion model is designed to backtrack causal anomalies in different impaired clusters. Extensive experimental results on real-life datasets demonstrate the effectiveness of our method.

2018-02-02
Qi, C., Wu, J., Chen, H., Yu, H., Hu, H., Cheng, G..  2017.  Game-Theoretic Analysis for Security of Various Software-Defined Networking (SDN) Architectures. 2017 IEEE 85th Vehicular Technology Conference (VTC Spring). :1–5.

Security evaluation of diverse SDN frameworks is of significant importance to design resilient systems and deal with attacks. Focused on SDN scenarios, a game-theoretic model is proposed to analyze their security performance in existing SDN architectures. The model can describe specific traits in different structures, represent several types of information of players (attacker and defender) and quantitatively calculate systems' reliability. Simulation results illustrate dynamic SDN structures have distinct security improvement over static ones. Besides, effective dynamic scheduling mechanisms adopted in dynamic systems can enhance their security further.

2017-04-20
Rohrmann, R., Patton, M. W., Chen, H..  2016.  Anonymous port scanning: Performing network reconnaissance through Tor. 2016 IEEE Conference on Intelligence and Security Informatics (ISI). :217–217.

The anonymizing network Tor is examined as one method of anonymizing port scanning tools and avoiding identification and retaliation. Performing anonymized port scans through Tor is possible using Nmap, but parallelization of the scanning processes is required to accelerate the scan rate.

2017-03-07
Benjamin, V., Li, W., Holt, T., Chen, H..  2015.  Exploring threats and vulnerabilities in hacker web: Forums, IRC and carding shops. 2015 IEEE International Conference on Intelligence and Security Informatics (ISI). :85–90.

Cybersecurity is a problem of growing relevance that impacts all facets of society. As a result, many researchers have become interested in studying cybercriminals and online hacker communities in order to develop more effective cyber defenses. In particular, analysis of hacker community contents may reveal existing and emerging threats that pose great risk to individuals, businesses, and government. Thus, we are interested in developing an automated methodology for identifying tangible and verifiable evidence of potential threats within hacker forums, IRC channels, and carding shops. To identify threats, we couple machine learning methodology with information retrieval techniques. Our approach allows us to distill potential threats from the entirety of collected hacker contents. We present several examples of identified threats found through our analysis techniques. Results suggest that hacker communities can be analyzed to aid in cyber threat detection, thus providing promising direction for future work.