Visible to the public Biblio

Filters: Keyword is Random Forest  [Clear All Filters]
2020-07-09
Nisha, D, Sivaraman, E, Honnavalli, Prasad B.  2019.  Predicting and Preventing Malware in Machine Learning Model. 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT). :1—7.

Machine learning is a major area in artificial intelligence, which enables computer to learn itself explicitly without programming. As machine learning is widely used in making decision automatically, attackers have strong intention to manipulate the prediction generated my machine learning model. In this paper we study about the different types of attacks and its countermeasures on machine learning model. By research we found that there are many security threats in various algorithms such as K-nearest-neighbors (KNN) classifier, random forest, AdaBoost, support vector machine (SVM), decision tree, we revisit existing security threads and check what are the possible countermeasures during the training and prediction phase of machine learning model. In machine learning model there are 2 types of attacks that is causative attack which occurs during the training phase and exploratory attack which occurs during the prediction phase, we will also discuss about the countermeasures on machine learning model, the countermeasures are data sanitization, algorithm robustness enhancement, and privacy preserving techniques.

2020-06-26
Jaiswal, Prajwal Kumar, Das, Sayari, Panigrahi, Bijaya Ketan.  2019.  PMU Based Data Driven Approach For Online Dynamic Security Assessment in Power Systems. 2019 20th International Conference on Intelligent System Application to Power Systems (ISAP). :1—7.
This paper presents a methodology for utilizing Phasor Measurement units (PMUs) for procuring real time synchronized measurements for assessing the security of the power system dynamically. The concept of wide-area dynamic security assessment considers transient instability in the proposed methodology. Intelligent framework based approach for online dynamic security assessment has been suggested wherein the database consisting of critical features associated with the system is generated for a wide range of contingencies, which is utilized to build the data mining model. This data mining model along with the synchronized phasor measurements is expected to assist the system operator in assessing the security of the system pertaining to a particular contingency, thereby also creating possibility of incorporating control and preventive measures in order to avoid any unforeseen instability in the system. The proposed technique has been implemented on IEEE 39 bus system for accurately indicating the security of the system and is found to be quite robust in the case of noise in the measurement data obtained from the PMUs.
2020-06-22
Lv, Chaoxian, Li, Qianmu, Long, Huaqiu, Ren, Yumei, Ling, Fei.  2019.  A Differential Privacy Random Forest Method of Privacy Protection in Cloud. 2019 IEEE International Conference on Computational Science and Engineering (CSE) and IEEE International Conference on Embedded and Ubiquitous Computing (EUC). :470–475.
This paper proposes a new random forest classification algorithm based on differential privacy protection. In order to reduce the impact of differential privacy protection on the accuracy of random forest classification, a hybrid decision tree algorithm is proposed in this paper. The hybrid decision tree algorithm is applied to the construction of random forest, which balances the privacy and classification accuracy of the random forest algorithm based on differential privacy. Experiment results show that the random forest algorithm based on differential privacy can provide high privacy protection while ensuring high classification performance, achieving a balance between privacy and classification accuracy, and has practical application value.
2020-04-10
Newaz, AKM Iqtidar, Sikder, Amit Kumar, Rahman, Mohammad Ashiqur, Uluagac, A. Selcuk.  2019.  HealthGuard: A Machine Learning-Based Security Framework for Smart Healthcare Systems. 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS). :389—396.
The integration of Internet-of-Things and pervasive computing in medical devices have made the modern healthcare system “smart.” Today, the function of the healthcare system is not limited to treat the patients only. With the help of implantable medical devices and wearables, Smart Healthcare System (SHS) can continuously monitor different vital signs of a patient and automatically detect and prevent critical medical conditions. However, these increasing functionalities of SHS raise several security concerns and attackers can exploit the SHS in numerous ways: they can impede normal function of the SHS, inject false data to change vital signs, and tamper a medical device to change the outcome of a medical emergency. In this paper, we propose HealthGuard, a novel machine learning-based security framework to detect malicious activities in a SHS. HealthGuard observes the vital signs of different connected devices of a SHS and correlates the vitals to understand the changes in body functions of the patient to distinguish benign and malicious activities. HealthGuard utilizes four different machine learning-based detection techniques (Artificial Neural Network, Decision Tree, Random Forest, k-Nearest Neighbor) to detect malicious activities in a SHS. We trained HealthGuard with data collected for eight different smart medical devices for twelve benign events including seven normal user activities and five disease-affected events. Furthermore, we evaluated the performance of HealthGuard against three different malicious threats. Our extensive evaluation shows that HealthGuard is an effective security framework for SHS with an accuracy of 91 % and an F1 score of 90 %.
2020-04-03
Saridou, Betty, Shiaeles, Stavros, Papadopoulos, Basil.  2019.  DDoS Attack Mitigation through Root-DNS Server: A Case Study. 2019 IEEE World Congress on Services (SERVICES). 2642-939X:60—65.

Load balancing and IP anycast are traffic routing algorithms used to speed up delivery of the Domain Name System. In case of a DDoS attack or an overload condition, the value of these protocols is critical, as they can provide intrinsic DDoS mitigation with the failover alternatives. In this paper, we present a methodology for predicting the next DNS response in the light of a potential redirection to less busy servers, in order to mitigate the size of the attack. Our experiments were conducted using data from the Nov. 2015 attack of the Root DNS servers and Logistic Regression, k-Nearest Neighbors, Support Vector Machines and Random Forest as our primary classifiers. The models were able to successfully predict up to 83% of responses for Root Letters that operated on a small number of sites and consequently suffered the most during the attacks. On the other hand, regarding DNS requests coming from more distributed Root servers, the models demonstrated lower accuracy. Our analysis showed a correlation between the True Positive Rate metric and the number of sites, as well as a clear need for intelligent management of traffic in load balancing practices.

2020-03-23
Noorbehbahani, Fakhroddin, Rasouli, Farzaneh, Saberi, Mohammad.  2019.  Analysis of Machine Learning Techniques for Ransomware Detection. 2019 16th International ISC (Iranian Society of Cryptology) Conference on Information Security and Cryptology (ISCISC). :128–133.

In parallel with the increasing growth of the Internet and computer networks, the number of malwares has been increasing every day. Today, one of the newest attacks and the biggest threats in cybersecurity is ransomware. The effectiveness of applying machine learning techniques for malware detection has been explored in much scientific research, however, there is few studies focused on machine learning-based ransomware detection. In this paper, the effectiveness of ransomware detection using machine learning methods applied to CICAndMal2017 dataset is examined in two experiments. First, the classifiers are trained on a single dataset containing different types of ransomware. Second, different classifiers are trained on datasets of 10 ransomware families distinctly. Our findings imply that in both experiments random forest outperforms other tested classifiers and the performance of the classifiers are not changed significantly when they are trained on each family distinctly. Therefore, the random forest classification method is very effective in ransomware detection.

Bahrani, Ala, Bidgly, Amir Jalaly.  2019.  Ransomware detection using process mining and classification algorithms. 2019 16th International ISC (Iranian Society of Cryptology) Conference on Information Security and Cryptology (ISCISC). :73–77.

The fast growing of ransomware attacks has become a serious threat for companies, governments and internet users, in recent years. The increasing of computing power, memory and etc. and the advance in cryptography has caused the complicating the ransomware attacks. Therefore, effective methods are required to deal with ransomwares. Although, there are many methods proposed for ransomware detection, but these methods are inefficient in detection ransomwares, and more researches are still required in this field. In this paper, we have proposed a novel method for identify ransomware from benign software using process mining methods. The proposed method uses process mining to discover the process model from the events logs, and then extracts features from this process model and using these features and classification algorithms to classify ransomwares. This paper shows that the use of classification algorithms along with the process mining can be suitable to identify ransomware. The accuracy and performance of our proposed method is evaluated using a study of 21 ransomware families and some benign samples. The results show j48 and random forest algorithms have the best accuracy in our method and can achieve to 95% accuracy in detecting ransomwares.

2020-02-26
Rahman, Obaid, Quraishi, Mohammad Ali Gauhar, Lung, Chung-Horng.  2019.  DDoS Attacks Detection and Mitigation in SDN Using Machine Learning. 2019 IEEE World Congress on Services (SERVICES). 2642-939X:184–189.

Software Defined Networking (SDN) is very popular due to the benefits it provides such as scalability, flexibility, monitoring, and ease of innovation. However, it needs to be properly protected from security threats. One major attack that plagues the SDN network is the distributed denial-of-service (DDoS) attack. There are several approaches to prevent the DDoS attack in an SDN network. We have evaluated a few machine learning techniques, i.e., J48, Random Forest (RF), Support Vector Machine (SVM), and K-Nearest Neighbors (K-NN), to detect and block the DDoS attack in an SDN network. The evaluation process involved training and selecting the best model for the proposed network and applying it in a mitigation and prevention script to detect and mitigate attacks. The results showed that J48 performs better than the other evaluated algorithms, especially in terms of training and testing time.

Inaba, Koutaro, Yoneda, Tomohiro, Kanamoto, Toshiki, Kurokawa, Atsushi, Imai, Masashi.  2019.  Hardware Trojan Insertion and Detection in Asynchronous Circuits. 2019 25th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC). :134–143.

Hardware Trojan threats caused by malicious designers and untrusted manufacturers have become one of serious issues in modern VLSI systems. In this paper, we show some experimental results to insert hardware Trojans into asynchronous circuits. As a result, the overhead of hardware Trojan insertion in asynchronous circuits may be small for malicious designers who have enough knowledge about the asynchronous circuits. In addition, we also show several Trojan detection methods using deep learning schemes which have been proposed to detect synchronous hardware Trojan in the netlist level. We apply them to asynchronous hardware Trojan circuits and show their results. They have a great potential to detect a hardware Trojan in asynchronous circuits.

2019-10-15
Panagiotakis, C., Papadakis, H., Fragopoulou, P..  2018.  Detection of Hurriedly Created Abnormal Profiles in Recommender Systems. 2018 International Conference on Intelligent Systems (IS). :499–506.

Recommender systems try to predict the preferences of users for specific items. These systems suffer from profile injection attacks, where the attackers have some prior knowledge of the system ratings and their goal is to promote or demote a particular item introducing abnormal (anomalous) ratings. The detection of both cases is a challenging problem. In this paper, we propose a framework to spot anomalous rating profiles (outliers), where the outliers hurriedly create a profile that injects into the system either random ratings or specific ratings, without any prior knowledge of the existing ratings. The proposed detection method is based on the unpredictable behavior of the outliers in a validation set, on the user-item rating matrix and on the similarity between users. The proposed system is totally unsupervised, and in the last step it uses the k-means clustering method automatically spotting the spurious profiles. For the cases where labeling sample data is available, a random forest classifier is trained to show how supervised methods outperforms unsupervised ones. Experimental results on the MovieLens 100k and the MovieLens 1M datasets demonstrate the high performance of the proposed schemata.

2019-07-01
Clemente, C. J., Jaafar, F., Malik, Y..  2018.  Is Predicting Software Security Bugs Using Deep Learning Better Than the Traditional Machine Learning Algorithms? 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS). :95–102.

Software insecurity is being identified as one of the leading causes of security breaches. In this paper, we revisited one of the strategies in solving software insecurity, which is the use of software quality metrics. We utilized a multilayer deep feedforward network in examining whether there is a combination of metrics that can predict the appearance of security-related bugs. We also applied the traditional machine learning algorithms such as decision tree, random forest, naïve bayes, and support vector machines and compared the results with that of the Deep Learning technique. The results have successfully demonstrated that it was possible to develop an effective predictive model to forecast software insecurity based on the software metrics and using Deep Learning. All the models generated have shown an accuracy of more than sixty percent with Deep Learning leading the list. This finding proved that utilizing Deep Learning methods and a combination of software metrics can be tapped to create a better forecasting model thereby aiding software developers in predicting security bugs.

Perez, R. Lopez, Adamsky, F., Soua, R., Engel, T..  2018.  Machine Learning for Reliable Network Attack Detection in SCADA Systems. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :633–638.

Critical Infrastructures (CIs) use Supervisory Control And Data Acquisition (SCADA) systems for remote control and monitoring. Sophisticated security measures are needed to address malicious intrusions, which are steadily increasing in number and variety due to the massive spread of connectivity and standardisation of open SCADA protocols. Traditional Intrusion Detection Systems (IDSs) cannot detect attacks that are not already present in their databases. Therefore, in this paper, we assess Machine Learning (ML) for intrusion detection in SCADA systems using a real data set collected from a gas pipeline system and provided by the Mississippi State University (MSU). The contribution of this paper is two-fold: 1) The evaluation of four techniques for missing data estimation and two techniques for data normalization, 2) The performances of Support Vector Machine (SVM), and Random Forest (RF) are assessed in terms of accuracy, precision, recall and F1score for intrusion detection. Two cases are differentiated: binary and categorical classifications. Our experiments reveal that RF detect intrusions effectively, with an F1score of respectively \textbackslashtextgreater 99%.

2019-06-10
Eziama, E., Jaimes, L. M. S., James, A., Nwizege, K. S., Balador, A., Tepe, K..  2018.  Machine Learning-Based Recommendation Trust Model for Machine-to-Machine Communication. 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). :1-6.

The Machine Type Communication Devices (MTCDs) are usually based on Internet Protocol (IP), which can cause billions of connected objects to be part of the Internet. The enormous amount of data coming from these devices are quite heterogeneous in nature, which can lead to security issues, such as injection attacks, ballot stuffing, and bad mouthing. Consequently, this work considers machine learning trust evaluation as an effective and accurate option for solving the issues associate with security threats. In this paper, a comparative analysis is carried out with five different machine learning approaches: Naive Bayes (NB), Decision Tree (DT), Linear and Radial Support Vector Machine (SVM), KNearest Neighbor (KNN), and Random Forest (RF). As a critical element of the research, the recommendations consider different Machine-to-Machine (M2M) communication nodes with regard to their ability to identify malicious and honest information. To validate the performances of these models, two trust computation measures were used: Receiver Operating Characteristics (ROCs), Precision and Recall. The malicious data was formulated in Matlab. A scenario was created where 50% of the information were modified to be malicious. The malicious nodes were varied in the ranges of 10%, 20%, 30%, 40%, and the results were carefully analyzed.

2019-02-13
Feng, Y., Akiyama, H., Lu, L., Sakurai, K..  2018.  Feature Selection for Machine Learning-Based Early Detection of Distributed Cyber Attacks. 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech). :173–180.

It is well known that distributed cyber attacks simultaneously launched from many hosts have caused the most serious problems in recent years including problems of privacy leakage and denial of services. Thus, how to detect those attacks at early stage has become an important and urgent topic in the cyber security community. For this purpose, recognizing C&C (Command & Control) communication between compromised bots and the C&C server becomes a crucially important issue, because C&C communication is in the preparation phase of distributed attacks. Although attack detection based on signature has been practically applied since long ago, it is well-known that it cannot efficiently deal with new kinds of attacks. In recent years, ML(Machine learning)-based detection methods have been studied widely. In those methods, feature selection is obviously very important to the detection performance. We once utilized up to 55 features to pick out C&C traffic in order to accomplish early detection of DDoS attacks. In this work, we try to answer the question that "Are all of those features really necessary?" We mainly investigate how the detection performance moves as the features are removed from those having lowest importance and we try to make it clear that what features should be payed attention for early detection of distributed attacks. We use honeypot data collected during the period from 2008 to 2013. SVM(Support Vector Machine) and PCA(Principal Component Analysis) are utilized for feature selection and SVM and RF(Random Forest) are for building the classifier. We find that the detection performance is generally getting better if more features are utilized. However, after the number of features has reached around 40, the detection performance will not change much even more features are used. It is also verified that, in some specific cases, more features do not always means a better detection performance. We also discuss 10 important features which have the biggest influence on classification.

2018-10-26
Al-Janabi, Mohammed, Quincey, Ed de, Andras, Peter.  2017.  Using Supervised Machine Learning Algorithms to Detect Suspicious URLs in Online Social Networks. Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. :1104–1111.

The increasing volume of malicious content in social networks requires automated methods to detect and eliminate such content. This paper describes a supervised machine learning classification model that has been built to detect the distribution of malicious content in online social networks (ONSs). Multisource features have been used to detect social network posts that contain malicious Uniform Resource Locators (URLs). These URLs could direct users to websites that contain malicious content, drive-by download attacks, phishing, spam, and scams. For the data collection stage, the Twitter streaming application programming interface (API) was used and VirusTotal was used for labelling the dataset. A random forest classification model was used with a combination of features derived from a range of sources. The random forest model without any tuning and feature selection produced a recall value of 0.89. After further investigation and applying parameter tuning and feature selection methods, however, we were able to improve the classifier performance to 0.92 in recall.

2018-09-28
Li, Z., Li, S..  2017.  Random forest algorithm under differential privacy. 2017 IEEE 17th International Conference on Communication Technology (ICCT). :1901–1905.

Trying to solve the risk of data privacy disclosure in classification process, a Random Forest algorithm under differential privacy named DPRF-gini is proposed in the paper. In the process of building decision tree, the algorithm first disturbed the process of feature selection and attribute partition by using exponential mechanism, and then meet the requirement of differential privacy by adding Laplace noise to the leaf node. Compared with the original algorithm, Empirical results show that protection of data privacy is further enhanced while the accuracy of the algorithm is slightly reduced.

2018-05-09
Zeng, Y. G..  2017.  Identifying Email Threats Using Predictive Analysis. 2017 International Conference on Cyber Security And Protection Of Digital Services (Cyber Security). :1–2.

Malicious emails pose substantial threats to businesses. Whether it is a malware attachment or a URL leading to malware, exploitation or phishing, attackers have been employing emails as an effective way to gain a foothold inside organizations of all kinds. To combat email threats, especially targeted attacks, traditional signature- and rule-based email filtering as well as advanced sandboxing technology both have their own weaknesses. In this paper, we propose a predictive analysis approach that learns the differences between legit and malicious emails through static analysis, creates a machine learning model and makes detection and prediction on unseen emails effectively and efficiently. By comparing three different machine learning algorithms, our preliminary evaluation reveals that a Random Forests model performs the best.

2018-04-11
Hasegawa, K., Yanagisawa, M., Togawa, N..  2017.  Trojan-Feature Extraction at Gate-Level Netlists and Its Application to Hardware-Trojan Detection Using Random Forest Classifier. 2017 IEEE International Symposium on Circuits and Systems (ISCAS). :1–4.

Recently, due to the increase of outsourcing in IC design, it has been reported that malicious third-party vendors often insert hardware Trojans into their ICs. How to detect them is a strong concern in IC design process. The features of hardware-Trojan infected nets (or Trojan nets) in ICs often differ from those of normal nets. To classify all the nets in netlists designed by third-party vendors into Trojan ones and normal ones, we have to extract effective Trojan features from Trojan nets. In this paper, we first propose 51 Trojan features which describe Trojan nets from netlists. Based on the importance values obtained from the random forest classifier, we extract the best set of 11 Trojan features out of the 51 features which can effectively detect Trojan nets, maximizing the F-measures. By using the 11 Trojan features extracted, the machine-learning based hardware Trojan classifier has achieved at most 100% true positive rate as well as 100% true negative rate in several TrustHUB benchmarks and obtained the average F-measure of 74.6%, which realizes the best values among existing machine-learning-based hardware-Trojan detection methods.

2018-01-10
Devyatkin, D., Smirnov, I., Ananyeva, M., Kobozeva, M., Chepovskiy, A., Solovyev, F..  2017.  Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts). 2017 IEEE International Conference on Intelligence and Security Informatics (ISI). :188–190.

In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.

2017-08-02
John, Adebayo Kolawole, Di Caro, Luigi, Boella, Guido.  2016.  A Supervised KeyPhrase Extraction System. Proceedings of the 12th International Conference on Semantic Systems. :57–62.

In this paper, we present a multi-featured supervised automatic keyword extraction system. We extracted salient semantic features which are descriptive of candidate keyphrases, a Random Forest classifier was used for training. The system achieved an accuracy of 58.3 % precision and has shown to outperform two top performing systems when benchmarked on a crowdsourced dataset. Furthermore, our approach achieved a personal best Precision and F-measure score of 32.7 and 25.5 respectively on the Semeval Keyphrase extraction challenge dataset. The paper describes the approaches used as well as the result obtained.

2017-04-24
Bulakh, Vlad, Gupta, Minaxi.  2016.  Countering Phishing from Brands' Vantage Point. Proceedings of the 2016 ACM on International Workshop on Security And Privacy Analytics. :17–24.

Most anti-phishing solutions that exist today require scanning a large portion of the web, which is vast and equivalent to finding a needle in a haystack. In addition, such solutions are not very efficient. We propose a different approach. Our solution does not rely on the scanning of the entire Internet or a large portion of it and only needs access to the brand's traffic in order to be able to detect phishing attempts against that brand. By analyzing a sample of phishing websites, we find features that can be used to distinguish phishing websites from the legitimate ones. We then use these features to train a machine learning classifier capable of helping brands detect phishing attempts against them. Our approach can detect up to 86% of phishing attacks against the brands and is best used as a complementary tool to the existing anti-phishing solutions.

2017-03-07
Kim, Kunho, Giles, C. Lee.  2016.  Financial Entity Record Linkage with Random Forests. Proceedings of the Second International Workshop on Data Science for Macro-Modeling. :13:1–13:2.

Record linkage refers to the task of finding same entity across different databases. We propose a machine learning based record linkage algorithm for financial entity databases. Record linkage on financial databases are essential for information integration on certain financial entity, since those databases do not have common unified identifier. Our algorithm works in two steps to determine if a pair of record is same entity or not. First we check with proposed rules if the record pair can be exactly matched after cleaning the entity name and address. Second, inspired by earlier work on author name disambiguation, we train a binary Random Forest classifier to decide the linkage. To reduce and scale the computation, this process is done only for candidate pairs within a proposed heuristic. Initial evaluation for precision, recall and F1 measures on two different linking tasks in the Financial Entity Identification and Information Integration (FEIII) Challenge show promising results.

2017-02-23
G. Kejela, C. Rong.  2015.  "Cross-Device Consumer Identification". 2015 IEEE International Conference on Data Mining Workshop (ICDMW). :1687-1689.

Nowadays, a typical household owns multiple digital devices that can be connected to the Internet. Advertising companies always want to seamlessly reach consumers behind devices instead of the device itself. However, the identity of consumers becomes fragmented as they switch from one device to another. A naive attempt is to use deterministic features such as user name, telephone number and email address. However consumers might refrain from giving away their personal information because of privacy and security reasons. The challenge in ICDM2015 contest is to develop an accurate probabilistic model for predicting cross-device consumer identity without using the deterministic user information. In this paper we present an accurate and scalable cross-device solution using an ensemble of Gradient Boosting Decision Trees (GBDT) and Random Forest. Our final solution ranks 9th both on the public and private LB with F0.5 score of 0.855.