Biblio

Found 288 results

Filters: Keyword is data mining  [Clear All Filters]
2021-02-10
Tizio, G. Di, Ngo, C. Nam.  2020.  Are You a Favorite Target For Cryptojacking? A Case-Control Study On The Cryptojacking Ecosystem 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS PW). :515—520.
Illicitly hijacking visitors' computational resources for mining cryptocurrency via compromised websites is a consolidated activity.Previous works mainly focused on large-scale analysis of the cryptojacking ecosystem, technical means to detect browser-based mining as well as economic incentives of cryptojacking. So far, no one has studied if certain technical characteristics of a website can increase (decrease) the likelihood of being compromised for cryptojacking campaigns.In this paper, we propose to address this unanswered question by conducting a case-control study with cryptojacking websites obtained crawling the web using Minesweeper. Our preliminary analysis shows some association for certain website characteristics, however, the results obtained are not statistically significant. Thus, more data must be collected and further analysis must be conducted to obtain a better insight into the impact of these relations.
Tanana, D..  2020.  Behavior-Based Detection of Cryptojacking Malware. 2020 Ural Symposium on Biomedical Engineering, Radioelectronics and Information Technology (USBEREIT). :0543—0545.
With rise of cryptocurrency popularity and value, more and more cybercriminals seek to profit using that new technology. Most common ways to obtain illegitimate profit using cryptocurrencies are ransomware and cryptojacking also known as malicious mining. And while ransomware is well-known and well-studied threat which is obvious by design, cryptojacking is often neglected because it's less harmful and much harder to detect. This article considers question of cryptojacking detection. Brief history and definition of cryptojacking are described as well as reasons for designing custom detection technique. We also propose complex detection technique based on CPU load by an application, which can be applied to both browser-based and executable-type cryptojacking samples. Prototype detection program based on our technique was designed using decision tree algorithm. The program was tested in a controlled virtual machine environment and achieved 82% success rate against selected number of cryptojacking samples. Finally, we'll discuss generalization of proposed technique for future work.
2021-03-01
Raj, C., Khular, L., Raj, G..  2020.  Clustering Based Incident Handling For Anomaly Detection in Cloud Infrastructures. 2020 10th International Conference on Cloud Computing, Data Science Engineering (Confluence). :611–616.
Incident Handling for Cloud Infrastructures focuses on how the clustering based and non-clustering based algorithms can be implemented. Our research focuses in identifying anomalies and suspicious activities that might happen inside a Cloud Infrastructure over available datasets. A brief study has been conducted, where a network statistics dataset the NSL-KDD, has been chosen as the model to be worked upon, such that it can mirror the Cloud Infrastructure and its components. An important aspect of cloud security is to implement anomaly detection mechanisms, in order to monitor the incidents that inhibit the development and the efficiency of the cloud. Several methods have been discovered which help in achieving our present goal, some of these are highlighted as the following; by applying algorithm such as the Local Outlier Factor to cancel the noise created by irrelevant data points, by applying the DBSCAN algorithm which can detect less denser areas in order to identify their cause of clustering, the K-Means algorithm to generate positive and negative clusters to identify the anomalous clusters and by applying the Isolation Forest algorithm in order to implement decision based approach to detect anomalies. The best algorithm would help in finding and fixing the anomalies efficiently and would help us in developing an Incident Handling model for the Cloud.
2021-04-27
Wang, Y., Guo, S., Wu, J., Wang, H. H..  2020.  Construction of Audit Internal Control System Based on Online Big Data Mining and Decentralized Model. 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :623–626.
Construction of the audit internal control system based on the online big data mining and decentralized model is done in this paper. How to integrate the novel technologies to internal control is the attracting task. IT audit is built on the information system and is independent of the information system itself. Application of the IT audit in enterprises can provide a guarantee for the security of the information system that can give an objective evaluation of the investment. This paper integrates the online big data mining and decentralized model to construct an efficient system. Association discovery is also called a data link. It uses similarity functions, such as the Euclidean distance, edit distance, cosine distance, Jeckard function, etc., to establish association relationships between data entities. These parameters are considered for comprehensive analysis.
2021-09-07
Zhang, Xing, Cui, Xiaotong, Cheng, Kefei, Zhang, Liang.  2020.  A Convolutional Encoder Network for Intrusion Detection in Controller Area Networks. 2020 16th International Conference on Computational Intelligence and Security (CIS). :366–369.
Integrated with various electronic control units (ECUs), vehicles are becoming more intelligent with the assistance of essential connections. However, the interaction with the outside world raises great concerns on cyber-attacks. As a main standard for in-vehicle network, Controller Area Network (CAN) does not have any built-in security mechanisms to guarantee a secure communication. This increases risks of denial of service, remote control attacks by an attacker, posing serious threats to underlying vehicles, property and human lives. As a result, it is urgent to develop an effective in-vehicle network intrusion detection system (IDS) for better security. In this paper, we propose a Feature-based Sliding Window (FSW) to extract the feature of CAN Data Field and CAN IDs. Then we construct a convolutional encoder network (CEN) to detect network intrusion of CAN networks. The proposed FSW-CEN method is evaluated on real-world datasets. The experimental results show that compared to traditional data processing methods and convolutional neural networks, our method is able to detect attacks with a higher accuracy in terms of detection accuracy and false negative rate.
2021-02-10
Gomes, G., Dias, L., Correia, M..  2020.  CryingJackpot: Network Flows and Performance Counters against Cryptojacking. 2020 IEEE 19th International Symposium on Network Computing and Applications (NCA). :1—10.
Cryptojacking, the appropriation of users' computational resources without their knowledge or consent to obtain cryp-tocurrencies, is a widespread attack, relatively easy to implement and hard to detect. Either browser-based or binary, cryptojacking lacks robust and reliable detection solutions. This paper presents a hybrid approach to detect cryptojacking where no previous knowledge about the attacks or training data is needed. Our Cryp-tojacking Intrusion Detection Approach, Cryingjackpot, extracts and combines flow and performance counter-based features, aggregating hosts with similar behavior by using unsupervised machine learning algorithms. We evaluate Cryingjackpot experimentally with both an artificial and a hybrid dataset, achieving F1-scores up to 97%.
Varlioglu, S., Gonen, B., Ozer, M., Bastug, M..  2020.  Is Cryptojacking Dead After Coinhive Shutdown? 2020 3rd International Conference on Information and Computer Technologies (ICICT). :385—389.
Cryptojacking is the exploitation of victims' computer resources to mine for cryptocurrency using malicious scripts. It had become popular after 2017 when attackers started to exploit legal mining scripts, especially Coinhive scripts. Coinhive was actually a legal mining service that provided scripts and servers for in-browser mining activities. Nevertheless, over 10 million web users had been victims every month before the Coinhive shutdown that happened in Mar 2019. This paper explores the new era of the cryptojacking world after Coinhive discontinued its service. We aimed to see whether and how attackers continue cryptojacking, generate new malicious scripts, and developed new methods. We used a capable cryptojacking detector named CMTracker that proposed by Hong et al. in 2018. We automatically and manually examined 2770 websites that had been detected by CMTracker before the Coinhive shutdown. The results revealed that 99% of sites no longer continue cryptojacking. 1% of websites still run 8 unique mining scripts. By tracking these mining scripts, we detected 632 unique cryptojacking websites. Moreover, open-source investigations (OSINT) demonstrated that attackers still use the same methods. Therefore, we listed the typical patterns of cryptojacking. We concluded that cryptojacking is not dead after the Coinhive shutdown. It is still alive, but not as attractive as it used to be.
2021-01-22
Mani, G., Pasumarti, V., Bhargava, B., Vora, F. T., MacDonald, J., King, J., Kobes, J..  2020.  DeCrypto Pro: Deep Learning Based Cryptomining Malware Detection Using Performance Counters. 2020 IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). :109—118.
Autonomy in cybersystems depends on their ability to be self-aware by understanding the intent of services and applications that are running on those systems. In case of mission-critical cybersystems that are deployed in dynamic and unpredictable environments, the newly integrated unknown applications or services can either be benign and essential for the mission or they can be cyberattacks. In some cases, these cyberattacks are evasive Advanced Persistent Threats (APTs) where the attackers remain undetected for reconnaissance in order to ascertain system features for an attack e.g. Trojan Laziok. In other cases, the attackers can use the system only for computing e.g. cryptomining malware. APTs such as cryptomining malware neither disrupt normal system functionalities nor trigger any warning signs because they simply perform bitwise and cryptographic operations as any other benign compression or encoding application. Thus, it is difficult for defense mechanisms such as antivirus applications to detect these attacks. In this paper, we propose an Operating Context profiling system based on deep neural networks-Long Short-Term Memory (LSTM) networks-using Windows Performance Counters data for detecting these evasive cryptomining applications. In addition, we propose Deep Cryptomining Profiler (DeCrypto Pro), a detection system with a novel model selection framework containing a utility function that can select a classification model for behavior profiling from both the light-weight machine learning models (Random Forest and k-Nearest Neighbors) and a deep learning model (LSTM), depending on available computing resources. Given data from performance counters, we show that individual models perform with high accuracy and can be trained with limited training data. We also show that the DeCrypto Profiler framework reduces the use of computational resources and accurately detects cryptomining applications by selecting an appropriate model, given the constraints such as data sample size and system configuration.
2020-12-28
Cuzzocrea, A., Maio, V. De, Fadda, E..  2020.  Experimenting and Assessing a Distributed Privacy-Preserving OLAP over Big Data Framework: Principles, Practice, and Experiences. 2020 IEEE 44th Annual Computers, Software, and Applications Conference (COMPSAC). :1344—1350.
OLAP is an authoritative analytical tool in the emerging big data analytics context, with particular regards to the target distributed environments (e.g., Clouds). Here, privacy-preserving OLAP-based big data analytics is a critical topic, with several amenities in the context of innovative big data application scenarios like smart cities, social networks, bio-informatics, and so forth. The goal is that of providing privacy preservation during OLAP analysis tasks, with particular emphasis on the privacy of OLAP aggregates. Following this line of research, in this paper we provide a deep contribution on experimenting and assessing a state-of-the-art distributed privacy-preserving OLAP framework, named as SPPOLAP, whose main benefit is that of introducing a completely-novel privacy notion for OLAP data cubes.
2021-02-22
Bashyam, K. G. Renga, Vadhiyar, S..  2020.  Fast Scalable Approximate Nearest Neighbor Search for High-dimensional Data. 2020 IEEE International Conference on Cluster Computing (CLUSTER). :294–302.
K-Nearest Neighbor (k-NN) search is one of the most commonly used approaches for similarity search. It finds extensive applications in machine learning and data mining. This era of big data warrants efficiently scaling k-NN search algorithms for billion-scale datasets with high dimensionality. In this paper, we propose a solution towards this end where we use vantage point trees for partitioning the dataset across multiple processes and exploit an existing graph-based sequential approximate k-NN search algorithm called HNSW (Hierarchical Navigable Small World) for searching locally within a process. Our hybrid MPI-OpenMP solution employs techniques including exploiting MPI one-sided communication for reducing communication times and partition replication for better load balancing across processes. We demonstrate computation of k-NN for 10,000 queries in the order of seconds using our approach on 8000 cores on a dataset with billion points in an 128-dimensional space. We also show 10X speedup over a completely k-d tree-based solution for the same dataset, thus demonstrating better suitability of our solution for high dimensional datasets. Our solution shows almost linear strong scaling.
2021-03-22
Li, Y., Zhou, W., Wang, H..  2020.  F-DPC: Fuzzy Neighborhood-Based Density Peak Algorithm. IEEE Access. 8:165963–165972.
Clustering is a concept in data mining, which divides a data set into different classes or clusters according to a specific standard, making the similarity of data objects in the same cluster as large as possible. Clustering by fast search and find of density peaks (DPC) is a novel clustering algorithm based on density. It is simple and novel, only requiring fewer parameters to achieve better clustering effect, without the requirement for iterative solution. And it has expandability and can detect the clustering of any shape. However, DPC algorithm still has some defects, such as it employs the clear neighborhood relations to calculate local density, so it cannot identify the neighborhood membership of different values of points from the distance of points and It is impossible to accurately cluster the data of the multi-density peak. The fuzzy neighborhood density peak clustering algorithm is proposed for this shortcoming (F-DPC): novel local density is defined by the fuzzy neighborhood relationship. The fuzzy set theory can be used to make the fuzzy neighborhood function of local density more sensitive, so that the clustering for data set of various shapes and densities is more robust. Experiments show that the algorithm has high accuracy and robustness.
2021-02-16
Wu, J. M.-T., Srivastava, G., Pirouz, M., Lin, J. C.-W..  2020.  A GA-based Data Sanitization for Hiding Sensitive Information with Multi-Thresholds Constraint. 2020 International Conference on Pervasive Artificial Intelligence (ICPAI). :29—34.
In this work, we propose a new concept of multiple support thresholds to sanitize the database for specific sensitive itemsets. The proposed method assigns a stricter threshold to the sensitive itemset for data sanitization. Furthermore, a genetic-algorithm (GA)-based model is involved in the designed algorithm to minimize side effects. In our experimental results, the GA-based PPDM approach is compared with traditional compact GA-based model and results clearly showed that our proposed method can obtain better performance with less computational cost.
2021-03-29
Mar, Z., Oo, K. K..  2020.  An Improvement of Apriori Mining Algorithm using Linked List Based Hash Table. 2020 International Conference on Advanced Information Technologies (ICAIT). :165–169.
Today, the huge amount of data was using in organizations around the world. This huge amount of data needs to process so that we can acquire useful information. Consequently, a number of industry enterprises discovered great information from shopper purchases found in any respect times. In data mining, the most important algorithms for find frequent item sets from large database is Apriori algorithm and discover the knowledge using the association rule. Apriori algorithm was wasted times for scanning the whole database and searching the frequent item sets and inefficient of memory requirement when large numbers of transactions are in consideration. The improved Apriori algorithm is adding and calculating third threshold may increase the overhead. So, in the aims of proposed research, Improved Apriori algorithm with LinkedList and hash tabled is used to mine frequent item sets from the transaction large amount of database. This method includes database is scanning with Improved Apriori algorithm and frequent 1-item sets counts with using the hash table. Then, in the linked list saved the next frequent item sets and scanning the database. The hash table used to produce the frequent 2-item sets Therefore, the database scans the only two times and necessary less processing time and memory space.
2021-02-23
Liu, J., Xiao, K., Luo, L., Li, Y., Chen, L..  2020.  An intrusion detection system integrating network-level intrusion detection and host-level intrusion detection. 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). :122—129.
With the rapid development of Internet, the issue of cyber security has increasingly gained more attention. An intrusion Detection System (IDS) is an effective technique to defend cyber-attacks and reduce security losses. However, the challenge of IDS lies in the diversity of cyber-attackers and the frequently-changing data requiring a flexible and efficient solution. To address this problem, machine learning approaches are being applied in the IDS field. In this paper, we propose an efficient scalable neural-network-based hybrid IDS framework with the combination of Host-level IDS (HIDS) and Network-level IDS (NIDS). We applied the autoencoders (AE) to NIDS and designed HIDS using word embedding and convolutional neural network. To evaluate the IDS, many experiments are performed on the public datasets NSL-KDD and ADFA. It can detect many attacks and reduce the security risk with high efficiency and excellent scalability.
2021-03-01
Kerim, A., Genc, B..  2020.  Mobile Games Success and Failure: Mining the Hidden Factors. 2020 7th International Conference on Soft Computing Machine Intelligence (ISCMI). :167–171.
Predicting the success of a mobile game is a prime issue in game industry. Thousands of games are being released each day. However, a few of them succeed while the majority fail. Towards the goal of investigating the potential correlation between the success of a mobile game and its specific attributes, this work was conducted. More than 17 thousands games were considered for that reason. We show that specific game attributes, such as number of IAPs (In-App Purchases), belonging to the puzzle genre, supporting different languages and being produced by a mature developer highly and positively affect the success of the game in the future. Moreover, we show that releasing the game in July and not including any IAPs seems to be highly associated with the game’s failure. Our second main contribution, is the proposal of a novel success score metric that reflects multiple objectives, in contrast to evaluating only revenue, average rating or rating count. We also employ different machine learning models, namely, SVM (Support Vector Machine), RF (Random Forest) and Deep Learning (DL) to predict this success score metric of a mobile game given its attributes. The trained models were able to predict this score, as well as the rating average and rating count of a mobile game with more than 70% accuracy. This prediction can help developers before releasing their game to the market to avoid any potential disappointments.
2020-12-28
Chaves, A., Moura, Í, Bernardino, J., Pedrosa, I..  2020.  The privacy paradigm : An overview of privacy in Business Analytics and Big Data. 2020 15th Iberian Conference on Information Systems and Technologies (CISTI). :1—6.
In this New Age where information has an indispensable value for companies and data mining technologies are growing in the area of Information Technology, privacy remains a sensitive issue in the approach to the exploitation of the large volume of data generated and processed by companies. The way data is collected, handled and destined is not yet clearly defined and has been the subject of constant debate by several areas of activity. This literature review gives an overview of privacy in the era of Business Analytics and Big Data in different timelines, the opportunities and challenges faced, aiming to broaden discussions on a subject that deserves extreme attention and aims to show that, despite measures for data protection have been created, there is still a need to discuss the subject among the different parties involved in the process to achieve a positive ideal for both users and companies.
2021-03-29
Ye, F..  2020.  Research and Application of Improved APRIORI Algorithm Based on Hash Technology. 2020 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC). :64–67.
Apriori Algorithm is the most Classic Association Rule Mining Algorithm, which has unique advantages, but it also has some disadvantages such as high overhead. This paper first describes Apriori Algorithm, points out its shortcomings, introduces related concepts, and then proposes a method based on Hash technology and compressed combination item set technology to improve APRIORI algorithm. This paper introduces the basic idea and the concrete process of the improvement in detail, analyzes the efficiency of the improved algorithm by the experiment, and advances the application of the improved algorithm in the library personalized service.
2021-02-22
Si, Y., Zhou, W., Gai, J..  2020.  Research and Implementation of Data Extraction Method Based on NLP. 2020 IEEE 14th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :11–15.
In order to accurately extract the data from unstructured Chinese text, this paper proposes a rule-based method based on natural language processing and regular expression. This method makes use of the language expression rules of the data in the text and other related knowledge to form the feature word lists and rule template to match the text. Experimental results show that the accuracy of the designed algorithm is 94.09%.
2021-05-13
Shu, Fei, Chen, Shuting, Li, Feng, Zhang, JianYe, Chen, Jia.  2020.  Research and implementation of network attack and defense countermeasure technology based on artificial intelligence technology. 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference (ITOEC). :475—478.
Using artificial intelligence technology to help network security has become a major trend. At present, major countries in the world have successively invested R & D force in the attack and defense of automatic network based on artificial intelligence. The U.S. Navy, the U.S. air force, and the DOD strategic capabilities office have invested heavily in the development of artificial intelligence network defense systems. DARPA launched the network security challenge (CGC) to promote the development of automatic attack system based on artificial intelligence. In the 2016 Defcon final, mayhem (the champion of CGC in 2014), an automatic attack team, participated in the competition with 14 human teams and once defeated two human teams, indicating that the automatic attack method generated by artificial intelligence system can scan system defects and find loopholes faster and more effectively than human beings. Japan's defense ministry also announced recently that in order to strengthen the ability to respond to network attacks, it will introduce artificial intelligence technology into the information communication network defense system of Japan's self defense force. It can be predicted that the deepening application of artificial intelligence in the field of network attack and defense may bring about revolutionary changes and increase the imbalance of the strategic strength of cyberspace in various countries. Therefore, it is necessary to systematically investigate the current situation of network attack and defense based on artificial intelligence at home and abroad, comprehensively analyze the development trend of relevant technologies at home and abroad, deeply analyze the development outline and specification of artificial intelligence attack and defense around the world, and refine the application status and future prospects of artificial intelligence attack and defense, so as to promote the development of artificial intelligence attack and Defense Technology in China and protect the core interests of cyberspace, of great significance
2020-12-21
Figueiredo, N. M., Rodríguez, M. C..  2020.  Trustworthiness in Sensor Networks A Reputation-Based Method for Weather Stations. 2020 International Conference on Omni-layer Intelligent Systems (COINS). :1–6.
Trustworthiness is a soft-security feature that evaluates the correct behavior of nodes in a network. More specifically, this feature tries to answer the following question: how much should we trust in a certain node? To determine the trustworthiness of a node, our approach focuses on two reputation indicators: the self-data trust, which evaluates the data generated by the node itself taking into account its historical data; and the peer-data trust, which utilizes the nearest nodes' data. In this paper, we show how these two indicators can be calculated using the Gaussian Overlap and Pearson correlation. This paper includes a validation of our trustworthiness approach using real data from unofficial and official weather stations in Portugal. This is a representative scenario of the current situation in many other areas, with different entities providing different kinds of data using autonomous sensors in a continuous way over the networks.
2021-06-01
Wang, Qi, Zhao, Weiliang, Yang, Jian, Wu, Jia, Zhou, Chuan, Xing, Qianli.  2020.  AtNE-Trust: Attributed Trust Network Embedding for Trust Prediction in Online Social Networks. 2020 IEEE International Conference on Data Mining (ICDM). :601–610.
Trust relationship prediction among people provides valuable supports for decision making, information dissemination, and product promotion in online social networks. Network embedding has achieved promising performance for link prediction by learning node representations that encode intrinsic network structures. However, most of the existing network embedding solutions cannot effectively capture the properties of a trust network that has directed edges and nodes with in/out links. Furthermore, there usually exist rich user attributes in trust networks, such as ratings, reviews, and the rated/reviewed items, which may exert significant impacts on the formation of trust relationships. It is still lacking a network embedding-based method that can adequately integrate these properties for trust prediction. In this work, we develop an AtNE-Trust model to address these issues. We firstly capture user embedding from both the trust network structures and user attributes. Then we design a deep multi-view representation learning module to further mine and fuse the obtained user embedding. Finally, a trust evaluation module is developed to predict the trust relationships between users. Representation learning and trust evaluation are optimized together to capture high-quality user embedding and make accurate predictions simultaneously. A set of experiments against the real-world datasets demonstrates the effectiveness of the proposed approach.
2021-02-10
Aktepe, S., Varol, C., Shashidhar, N..  2020.  MiNo: The Chrome Web Browser Add-on Application to Block the Hidden Cryptocurrency Mining Activities. 2020 8th International Symposium on Digital Forensics and Security (ISDFS). :1—5.

Cryptocurrencies are the digital currencies designed to replace the regular cash money while taking place in our daily lives especially for the last couple of years. Mining cryptocurrencies are one of the popular ways to have them and make a profit due to unstable values in the market. This attracts attackers to utilize malware on internet users' computer resources, also known as cryptojacking, to mine cryptocurrencies. Cryptojacking started to be a major issue in the internet world. In this case, we developed MiNo, a web browser add-on application to detect these malicious mining activities running without the user's permission or knowledge. This add-on provides security and efficiency for the computer resources of the internet users. MiNo designed and developed with double-layer protection which makes it ahead of its competitors in the market.

2021-04-08
Westland, T., Niu, N., Jha, R., Kapp, D., Kebede, T..  2020.  Relating the Empirical Foundations of Attack Generation and Vulnerability Discovery. 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI). :37–44.
Automatically generating exploits for attacks receives much attention in security testing and auditing. However, little is known about the continuous effect of automatic attack generation and detection. In this paper, we develop an analytic model to understand the cost-benefit tradeoffs in light of the process of vulnerability discovery. We develop a three-phased model, suggesting that the cumulative malware detection has a productive period before the rate of gain flattens. As the detection mechanisms co-evolve, the gain will likely increase. We evaluate our analytic model by using an anti-virus tool to detect the thousands of Trojans automatically created. The anti-virus scanning results over five months show the validity of the model and point out future research directions.
2021-05-18
Iorga, Denis, Corlătescu, Dragos, Grigorescu, Octavian, Săndescu, Cristian, Dascălu, Mihai, Rughiniş, Razvan.  2020.  Early Detection of Vulnerabilities from News Websites using Machine Learning Models. 2020 19th RoEduNet Conference: Networking in Education and Research (RoEduNet). :1–6.
The drawbacks of traditional methods of cybernetic vulnerability detection relate to the required time to identify new threats, to register them in the Common Vulnerabilities and Exposures (CVE) records, and to score them with the Common Vulnerabilities Scoring System (CVSS). These problems can be mitigated by early vulnerability detection systems relying on social media and open-source data. This paper presents a model that aims to identify emerging cybernetic vulnerabilities in cybersecurity news articles, as part of a system for automatic detection of early cybernetic threats using Open Source Intelligence (OSINT). Three machine learning models were trained on a novel dataset of 1000 labeled news articles to create a strong baseline for classifying cybersecurity articles as relevant (i.e., introducing new security threats), or irrelevant: Support Vector Machines, a Multinomial Naïve Bayes classifier, and a finetuned BERT model. The BERT model obtained the best performance with a mean accuracy of 88.45% on the test dataset. Our experiments support the conclusion that Natural Language Processing (NLP) models are an appropriate choice for early vulnerability detection systems in order to extract relevant information from cybersecurity news articles.
2021-04-27
Fu, Y., Tong, S., Guo, X., Cheng, L., Zhang, Y., Feng, D..  2020.  Improving the Effectiveness of Grey-box Fuzzing By Extracting Program Information. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :434–441.
Fuzzing has been widely adopted as an effective techniques to detect vulnerabilities in softwares. However, existing fuzzers suffer from the problems of generating excessive test inputs that either cannot pass input validation or are ineffective in exploring unvisited regions in the program under test (PUT). To tackle these problems, we propose a greybox fuzzer called MuFuzzer based on AFL, which incorporates two heuristics that optimize seed selection and automatically extract input formatting information from the PUT to increase the chance of generating valid test inputs, respectively. In particular, the first heuristic collects the branch coverage and execution information during a fuzz session, and utilizes such information to guide fuzzing tools in selecting seeds that are fast to execute, small in size, and more importantly, more likely to explore new behaviors of the PUT for subsequent fuzzing activities. The second heuristic automatically identifies string comparison operations that the PUT uses for input validation, and establishes a dictionary with string constants from these operations to help fuzzers generate test inputs that have higher chances to pass input validation. We have evaluated the performance of MuFuzzer, in terms of code coverage and bug detection, using a set of realistic programs and the LAVA-M test bench. Experiment results demonstrate that MuFuzzer is able to achieve higher code coverage and better or comparative bug detection performance than state-of-the-art fuzzers.