# Biblio

Found 41 results

Filters: Keyword is Decision trees  [Clear All Filters]
2020-02-17
.  2019.  2019 Wireless Telecommunications Symposium (WTS). :1–5.
Current procedures for anomaly detection in self-organizing mobile communication networks use network-centric approaches to identify dysfunctional serving nodes. In this paper, a user-centric approach and a novel methodology for anomaly detection is proposed, where the Quality of Experience (QoE) metric is used to evaluate the end-user experience. The system model demonstrates how dysfunctional serving eNodeBs are successfully detected by implementing a parametric QoE model using machine learning for prediction of user QoE in a network scenario created by the ns-3 network simulator. This approach can play a vital role in the future ultra-dense and green mobile communication networks that are expected to be both self- organizing and self-healing.
2020-02-10
.  2019.  ICC 2019 - 2019 IEEE International Conference on Communications (ICC). :1–6.

Comment spam is one of the great challenges faced by forum administrators. Detecting and blocking comment spam can relieve the load on servers, improve user experience and purify the network conditions. This paper focuses on the detection of comment spam. The behaviors of spammer and the content of spam were analyzed. According to analysis results, two types of effective features are extracted which can make a better description of spammer characteristics. Additionally, a gradient boosting tree algorithm was used to construct the comment spam detector based on the extracted features. Our proposed method is examined on a blog spam dataset which was published by previous research, and the result illustrates that our method performs better than the previous method on detection accuracy. Moreover, the CPU time is recorded to demonstrate that the time spent on both training and testing maintains a small value.

2020-01-28
.  2019.  2019 International Joint Conference on Neural Networks (IJCNN). :1–8.

The current authentication systems based on password and pin code are not enough to guarantee attacks from malicious users. For this reason, in the last years, several studies are proposed with the aim to identify the users basing on their typing dynamics. In this paper, we propose a deep neural network architecture aimed to discriminate between different users using a set of keystroke features. The idea behind the proposed method is to identify the users silently and continuously during their typing on a monitored system. To perform such user identification effectively, we propose a feature model able to capture the typing style that is specific to each given user. The proposed approach is evaluated on a large dataset derived by integrating two real-world datasets from existing studies. The merged dataset contains a total of 1530 different users each writing a set of different typing samples. Several deep neural networks, with an increasing number of hidden layers and two different sets of features, are tested with the aim to find the best configuration. The final best classifier scores a precision equal to 0.997, a recall equal to 0.99 and an accuracy equal to 99% using an MLP deep neural network with 9 hidden layers. Finally, the performances obtained by using the deep learning approach are also compared with the performance of traditional decision-trees machine learning algorithm, attesting the effectiveness of the deep learning-based classifiers in the domain of keystroke analysis.

2020-01-21
.  2019.  2019 42nd International Conference on Telecommunications and Signal Processing (TSP). :67–70.
Being the revolutionary future networking architecture, information-centric networking (ICN) conducts network distribution based on content, which is ideally suitable for Internet of things (IoT). With the rapid growth of network traffic, compared to the conventional IoT, information-centric Internet of things (IC-IoT) is expected to provide users with the better satisfaction of the network quality of service (QoS). However, due to IC-IoT requirements of low latency, large data volume, marginalization, and intelligent processing, it urgently needs an efficient content distribution system. In this paper, we propose an edge learning based green content distribution scheme for IC-IoT. We implement intelligent path selection based on decision tree and edge calculation. Moreover, we apply distributed coding based content transmission to enhance the speed and recovery capability of content. Meanwhile, we have verified the effectiveness and performance of this scheme based on a large number of simulation experiments. The work of this paper is of great significance to improve the efficiency and flexibility of content distribution in IC-IoT.
2020-01-20
.  2019.  2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN). :1–5.

The computer network is used by billions of people worldwide for variety of purposes. This has made the security increasingly important in networks. It is essential to use Intrusion Detection Systems (IDS) and devices whose main function is to detect anomalies in networks. Mostly all the intrusion detection approaches focuses on the issues of boosting techniques since results are inaccurate and results in lengthy detection process. The major pitfall in network based intrusion detection is the wide-ranging volume of data gathered from the network. In this paper, we put forward a hybrid anomaly based intrusion detection system which uses Classification and Boosting technique. The Paper is organized in such a way it compares the performance three different Classifiers along with boosting. Boosting process maximizes classification accuracy. Results of proposed scheme will analyzed over different datasets like Intrusion Detection Kaggle Dataset and NSL KDD. Out of vast analysis it is found Random tree provides best average Accuracy rate of around 99.98%, Detection rate of 98.79% and a minimum False Alarm rate.

2019-12-30
.  2019.  2019 Fourth International Conference on Fog and Mobile Edge Computing (FMEC). :52–59.

Since the term “Fog Computing” has been coined by Cisco Systems in 2012, security and privacy issues of this promising paradigm are still open challenges. Among various security challenges, Access Control is a crucial concern for all cloud computing-like systems (e.g. Fog computing, Mobile edge computing) in the IoT era. Therefore, assigning the precise level of access in such an inherently scalable, heterogeneous and dynamic environment is not easy to perform. This work defines the uncertainty challenge for authentication phase of the access control in fog computing because on one hand fog has a number of characteristics that amplify uncertainty in authentication and on the other hand applying traditional access control models does not result in a flexible and resilient solution. Therefore, we have proposed a novel prediction model based on the extension of Attribute Based Access Control (ABAC) model. Our data-driven model is able to handle uncertainty in authentication. It is also able to consider the mobility of mobile edge devices in order to handle authentication. In doing so, we have built our model using and comparing four supervised classification algorithms namely as Decision Tree, Naïve Bayes, Logistic Regression and Support Vector Machine. Our model can achieve authentication performance with 88.14% accuracy using Logistic Regression.

2019-09-26
.  2018.  SoutheastCon 2018. :1-5.

With so much our daily lives relying on digital devices like personal computers and cell phones, there is a growing demand for code that not only functions properly, but is secure and keeps user data safe. However, ensuring this is not such an easy task, and many developers do not have the required skills or resources to ensure their code is secure. Many code analysis tools have been written to find vulnerabilities in newly developed code, but this technology tends to produce many false positives, and is still not able to identify all of the problems. Other methods of finding software vulnerabilities automatically are required. This proof-of-concept study applied natural language processing on Java byte code to locate SQL injection vulnerabilities in a Java program. Preliminary findings show that, due to the high number of terms in the dataset, using singular decision trees will not produce a suitable model for locating SQL injection vulnerabilities, while random forest structures proved more promising. Still, further work is needed to determine the best classification tool.

2019-09-04
.  2018.  2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). :198–203.
Recent research has proposed middleware to enable efficient distributed apps over mobile-cloud platforms. This paper presents a Context-Aware File Discovery Service (CAFDS) that allows distributed mobile-cloud applications to find and access files of interest shared by collaborating users. CAFDS enables programmers to search for files defined by context and content features, such as location, creation time, or the presence of certain object types within an image file. CAFDS provides low-latency through a cloud-based metadata server, which uses a decision tree to locate the nearest files that satisfy the context and content features requested by applications. We implemented CAFDS in Android and Linux. Experimental results show CAFDS achieves substantially lower latency than peer-to-peer solutions that cannot leverage context information.
2019-07-01
.  2018.  2018 IEEE International Conference on Software Quality, Reliability and Security (QRS). :95–102.

Software insecurity is being identified as one of the leading causes of security breaches. In this paper, we revisited one of the strategies in solving software insecurity, which is the use of software quality metrics. We utilized a multilayer deep feedforward network in examining whether there is a combination of metrics that can predict the appearance of security-related bugs. We also applied the traditional machine learning algorithms such as decision tree, random forest, naïve bayes, and support vector machines and compared the results with that of the Deep Learning technique. The results have successfully demonstrated that it was possible to develop an effective predictive model to forecast software insecurity based on the software metrics and using Deep Learning. All the models generated have shown an accuracy of more than sixty percent with Deep Learning leading the list. This finding proved that utilizing Deep Learning methods and a combination of software metrics can be tapped to create a better forecasting model thereby aiding software developers in predicting security bugs.

.  2018.  2018 16th Annual Conference on Privacy, Security and Trust (PST). :1-10.

Revealing private and sensitive information on Social Network Sites (SNSs) like Facebook is a common practice which sometimes results in unwanted incidents for the users. One approach for helping users to avoid regrettable scenarios is through awareness mechanisms which inform a priori about the potential privacy risks of a self-disclosure act. Privacy heuristics are instruments which describe recurrent regrettable scenarios and can support the generation of privacy awareness. One important component of a heuristic is the group of people who should not access specific private information under a certain privacy risk. However, specifying an exhaustive list of unwanted recipients for a given regrettable scenario can be a tedious task which necessarily demands the user's intervention. In this paper, we introduce an approach based on decision trees to instantiate the audience component of privacy heuristics with minor intervention from the users. We introduce Disclosure- Acceptance Trees, a data structure representative of the audience component of a heuristic and describe a method for their generation out of user-centred privacy preferences.

2019-06-10
.  2018.  2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI). :330-336.

With the increase in the popularity of computerized online applications, the analysis, and detection of a growing number of newly discovered stealthy malware poses a significant challenge to the security community. Signature-based and behavior-based detection techniques are becoming inefficient in detecting new unknown malware. Machine learning solutions are employed to counter such intelligent malware and allow performing more comprehensive malware detection. This capability leads to an automatic analysis of malware behavior. The proposed oblique random forest ensemble learning technique is efficient for malware classification. The effectiveness of the proposed method is demonstrated with three malware classification datasets from various sources. The results are compared with other variants of decision tree learning models. The proposed system performs better than the existing system in terms of classification accuracy and false positive rate.

.  2018.  2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI). :1-9.

Lately, we are facing the Malware crisis due to various types of malware or malicious programs or scripts available in the huge virtual world - the Internet. But, what is malware? Malware can be a malicious software or a program or a script which can be harmful to the user's computer. These malicious programs can perform a variety of functions, including stealing, encrypting or deleting sensitive data, altering or hijacking core computing functions and monitoring users' computer activity without their permission. There are various entry points for these programs and scripts in the user environment, but only one way to remove them is to find them and kick them out of the system which isn't an easy job as these small piece of script or code can be anywhere in the user system. This paper involves the understanding of different types of malware and how we will use Machine Learning to detect these malwares.

.  2018.  2018 Global Smart Industry Conference (GloSIC). :1-6.

Modern industrial control systems (ICS) act as victims of cyber attacks more often in last years. These attacks are hard to detect and their consequences can be catastrophic. Cyber attacks can cause anomalies in the work of the ICS and its technological equipment. The presence of mutual interference and noises in this equipment significantly complicates anomaly detection. Moreover, the traditional means of protection, which used in corporate solutions, require updating with each change in the structure of the industrial process. An approach based on the machine learning for anomaly detection was used to overcome these problems. It complements traditional methods and allows one to detect signal correlations and use them for anomaly detection. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation dataset was analyzed as example of industrial process. In the course of the research, correlations between the signals of the sensors were detected and preliminary data processing was carried out. Algorithms from the most common techniques of machine learning (decision trees, linear algorithms, support vector machines) and deep learning models (neural networks) were investigated for industrial process anomaly detection task. It's shown that linear algorithms are least demanding on computational resources, but they don't achieve an acceptable result and allow a significant number of errors. Decision tree-based algorithms provided an acceptable accuracy, but the amount of RAM, required for their operations, relates polynomially with the training sample volume. The deep neural networks provided the greatest accuracy, but they require considerable computing power for internal calculations.

.  2018.  2018 UKSim-AMSS 20th International Conference on Computer Modelling and Simulation (UKSim). :32-37.

With the exponential hike in cyber threats, organizations are now striving for better data mining techniques in order to analyze security logs received from their IT infrastructures to ensure effective and automated cyber threat detection. Machine Learning (ML) based analytics for security machine data is the next emerging trend in cyber security, aimed at mining security data to uncover advanced targeted cyber threats actors and minimizing the operational overheads of maintaining static correlation rules. However, selection of optimal machine learning algorithm for security log analytics still remains an impeding factor against the success of data science in cyber security due to the risk of large number of false-positive detections, especially in the case of large-scale or global Security Operations Center (SOC) environments. This fact brings a dire need for an efficient machine learning based cyber threat detection model, capable of minimizing the false detection rates. In this paper, we are proposing optimal machine learning algorithms with their implementation framework based on analytical and empirical evaluations of gathered results, while using various prediction, classification and forecasting algorithms.

.  2018.  2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT). :1-6.

The Machine Type Communication Devices (MTCDs) are usually based on Internet Protocol (IP), which can cause billions of connected objects to be part of the Internet. The enormous amount of data coming from these devices are quite heterogeneous in nature, which can lead to security issues, such as injection attacks, ballot stuffing, and bad mouthing. Consequently, this work considers machine learning trust evaluation as an effective and accurate option for solving the issues associate with security threats. In this paper, a comparative analysis is carried out with five different machine learning approaches: Naive Bayes (NB), Decision Tree (DT), Linear and Radial Support Vector Machine (SVM), KNearest Neighbor (KNN), and Random Forest (RF). As a critical element of the research, the recommendations consider different Machine-to-Machine (M2M) communication nodes with regard to their ability to identify malicious and honest information. To validate the performances of these models, two trust computation measures were used: Receiver Operating Characteristics (ROCs), Precision and Recall. The malicious data was formulated in Matlab. A scenario was created where 50% of the information were modified to be malicious. The malicious nodes were varied in the ranges of 10%, 20%, 30%, 40%, and the results were carefully analyzed.

2019-05-01
.  2018.  2018 IEEE Power Energy Society General Meeting (PESGM). :1–1.

This paper presents a computational platform for dynamic security assessment (DSA) of large electricity grids, developed as part of the iTesla project. It leverages High Performance Computing (HPC) to analyze large power systems, with many scenarios and possible contingencies, thus paving the way for pan-European operational stability analysis. The results of the DSA are summarized by decision trees of 11 stability indicators. The platform's workflow and parallel implementation architecture is described in detail, including the way commercial tools are integrated into a plug-in architecture. A case study of the French grid is presented, with over 8000 scenarios and 1980 contingencies. Performance data of the case study (using 10,000 parallel cores) is analyzed, including task timings and data flows. Finally, the generated decision trees are compared with test data to quantify the functional performance of the DSA platform.

2019-03-22
.  2018.  2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE). :1-5.

Malicious traffic has garnered more attention in recent years, owing to the rapid growth of information technology in today's world. In 2007 alone, an estimated loss of 13 billion dollars was made from malware attacks. Malware data in today's context is massive. To understand such information using primitive methods would be a tedious task. In this publication we demonstrate some of the most advanced deep learning techniques available, multilayer perceptron (MLP) and J48 (also known as C4.5 or ID3) on our selected dataset, Advanced Security Network Metrics & Non-Payload-Based Obfuscations (ASNM-NPBO) to show that the answer to managing cyber security threats lie in the fore-mentioned methodologies.

.  2018.  2018 IEEE 12th International Conference on Compatibility, Power Electronics and Power Engineering (CPE-POWERENG 2018). :1-6.

Technological developments in the energy sector while offering new business insights, also produces complex data. In this study, the relationship between smart grid and big data approaches have been investigated. After analyzing where the big data techniques and technologies are used in which areas of smart grid systems, the big data technologies used to detect attacks on smart grids have been focused on. Big data analytics produces efficient solutions, but it is more critical to choose which algorithm and metric. For this reason, an application prototype has been proposed using big data approaches to detect attacks on smart grids. The algorithms with high accuracy were determined as 92% with Random Forest and 87% with Decision Tree.

2019-03-06
.  2018.  2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA). :1-7.

Cybersecurity plays a critical role in protecting sensitive information and the structural integrity of networked systems. As networked systems continue to expand in numbers as well as in complexity, so does the threat of malicious activity and the necessity for advanced cybersecurity solutions. Furthermore, both the quantity and quality of available data on malicious content as well as the fact that malicious activity continuously evolves makes automated protection systems for this type of environment particularly challenging. Not only is the data quality a concern, but the volume of the data can be quite small for some of the classes. This creates a class imbalance in the data used to train a classifier; however, many classifiers are not well equipped to deal with class imbalance. One such example is detecting malicious HMTL files from static features. Unfortunately, collecting malicious HMTL files is extremely difficult and can be quite noisy from HTML files being mislabeled. This paper evaluates a specific application that is afflicted by these modern cybersecurity challenges: detection of malicious HTML files. Previous work presented a general framework for malicious HTML file classification that we modify in this work to use a $\chi$2 feature selection technique and synthetic minority oversampling technique (SMOTE). We experiment with different classifiers (i.e., AdaBoost, Gentle-Boost, RobustBoost, RusBoost, and Random Forest) and a pure detection model (i.e., Isolation Forest). We benchmark the different classifiers using SMOTE on a real dataset that contains a limited number of malicious files (40) with respect to the normal files (7,263). It was found that the modified framework performed better than the previous framework's results. However, additional evidence was found to imply that algorithms which train on both the normal and malicious samples are likely overtraining to the malicious distribution. We demonstrate the likely overtraining by determining that a subset of the malicious files, while suspicious, did not come from a malicious source.

2018-12-10
.  2018.  2018 21st International Conference on Information Fusion (FUSION). :2298–2305.

In military operations, Commander's Intent describes the desired end state and purpose of the operation, expressed in a concise and clear manner. Command by intent is a paradigm that empowers subordinate units to exercise measured initiative to meet mission goals and accept prudent risk within commander's intent. It improves agility of military operations by allowing exploitation of local opportunities without an explicit directive from the commander to do so. This paper discusses what the paradigm entails in terms of architectural decisions for data fusion systems tasked with real-time information collection to satisfy operational mission goals. In our system, information needs of decisions are expressed at a high level, and shared among relevant nodes. The selected nodes, then, jointly operate to meet mission information needs by forwarding and caching relevant data without explicit directives regarding the objects to fetch and sources to contact. A preliminary evaluation of the system is presented using a target tracking application, set in the context of a NATO-based mission scenario, called Anglova. Evaluation results show that delegating some decision authority to the data fusion system (in terms of objects to fetch and sources to contact) allows it to save more network resources, while also increasing mission success rate. The system is therefore particularly well-suited to operation in partially denied or contested environments, where resource bottlenecks caused by adversarial activity impair one's ability to collect real-time information for mission-critical decision making.

2018-09-28
.  2017.  2017 IEEE 17th International Conference on Communication Technology (ICCT). :1901–1905.

Trying to solve the risk of data privacy disclosure in classification process, a Random Forest algorithm under differential privacy named DPRF-gini is proposed in the paper. In the process of building decision tree, the algorithm first disturbed the process of feature selection and attribute partition by using exponential mechanism, and then meet the requirement of differential privacy by adding Laplace noise to the leaf node. Compared with the original algorithm, Empirical results show that protection of data privacy is further enhanced while the accuracy of the algorithm is slightly reduced.

2018-07-18
.  2017.  2017 International Conference on Soft Computing, Intelligent System and Information Technology (ICSIIT). :32–38.

The detection of cyber-attacks has become a crucial task for highly sophisticated systems like industrial control systems (ICS). These systems are an essential part of critical information infrastructure. Therefore, we can highlight their vital role in contemporary society. The effective and reliable ICS cyber defense is a significant challenge for the cyber security community. Thus, intrusion detection is one of the demanding tasks for the cyber security researchers. In this article, we examine classification problem. The proposed detection system is based on supervised anomaly detection techniques. Moreover, we utilized classifiers algorithms in order to increase intrusion detection capabilities. The fusion of the classifiers is the way how to achieve the predefined goal.

2018-06-20
.  2017.  2017 IEEE Symposium Series on Computational Intelligence (SSCI). :1–7.

Anti-virus vendors receive hundreds of thousands of malware to be analysed each day. Some are new malware while others are variations or evolutions of existing malware. Because analyzing each malware sample by hand is impossible, automated techniques to analyse and categorize incoming samples are needed. In this work, we explore various machine learning features extracted from malware samples through static analysis for classification of malware binaries into already known malware families. We present a new feature based on control statement shingling that has a comparable accuracy to ordinary opcode n-gram based features while requiring smaller dimensions. This, in turn, results in a shorter training time.

2018-06-07
.  2017.  2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE). :339–350.

Testing and fixing Web Application Firewalls (WAFs) are two relevant and complementary challenges for security analysts. Automated testing helps to cost-effectively detect vulnerabilities in a WAF by generating effective test cases, i.e., attacks. Once vulnerabilities have been identified, the WAF needs to be fixed by augmenting its rule set to filter attacks without blocking legitimate requests. However, existing research suggests that rule sets are very difficult to understand and too complex to be manually fixed. In this paper, we formalise the problem of fixing vulnerable WAFs as a combinatorial optimisation problem. To solve it, we propose an automated approach that combines machine learning with multi-objective genetic algorithms. Given a set of legitimate requests and bypassing SQL injection attacks, our approach automatically infers regular expressions that, when added to the WAF's rule set, prevent many attacks while letting legitimate requests go through. Our empirical evaluation based on both open-source and proprietary WAFs shows that the generated filter rules are effective at blocking previously identified and successful SQL injection attacks (recall between 54.6% and 98.3%), while triggering in most cases no or few false positives (false positive rate between 0% and 2%).

2018-05-30
.  2017.  2017 IEEE 86th Vehicular Technology Conference (VTC-Fall). :1–5.

With the rapid development of smart grid, smart meters are deployed at energy consumers' premises to collect real-time usage data. Although such a communication model can help the control center of the energy producer to improve the efficiency and reliability of electricity delivery, it also leads to some security issues. For example, this real-time data involves the customers' privacy. Attackers may violate the privacy for house breaking, or they may tamper with the transmitted data for their own benefits. For this purpose, many data aggregation schemes are proposed for privacy preservation. However, rare of them cares about both the data aggregation and fine-grained access control to improve the data utility. In this paper, we proposes a data aggregation scheme based on attribute decision tree. Security analysis illustrates that our scheme can achieve the data integrity, data privacy preservation and fine- grained data access control. Experiment results show that our scheme are more efficient than existing schemes.