Visible to the public Biblio

Filters: Keyword is machine learning algorithms  [Clear All Filters]
2020-02-17
Hao, Lina, Ng, Bryan.  2019.  Self-Healing Solutions for Wi-Fi Networks to Provide Seamless Handover. 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). :639–642.
The dynamic nature of the wireless channel poses a challenge to services requiring seamless and uniform network quality of service (QoS). Self-healing, a promising approach under the self-organizing networks (SON) paradigm, and has been shown to deal with unexpected network faults in cellular networks. In this paper, we use simple machine learning (ML) algorithms inspired by SON developments in cellular networks. Evaluation results show that the proposed approach identifies the faulty APs. Our proposed approach improves throughput by 63.6% and reduces packet loss rate by 16.6% compared with standard 802.11.
Pandelea, Alexandru-Ionut, Chiroiu, Mihai-Daniel.  2019.  Password Guessing Using Machine Learning on Wearables. 2019 22nd International Conference on Control Systems and Computer Science (CSCS). :304–311.
Wearables are now ubiquitous items equipped with a multitude of sensors such as GPS, accelerometer, or Bluetooth. The raw data from this sensors are typically used in a health context. However, we can also use it for security purposes. In this paper, we present a solution that aims at using data from the sensors of a wearable device to identify the password a user is typing on a keyboard by using machine learning algorithms. Hence, the purpose is to determine whether a malicious third party application could extract sensitive data through the raw data that it has access to.
2020-02-10
Gao, Hongcan, Zhu, Jingwen, Liu, Lei, Xu, Jing, Wu, Yanfeng, Liu, Ao.  2019.  Detecting SQL Injection Attacks Using Grammar Pattern Recognition and Access Behavior Mining. 2019 IEEE International Conference on Energy Internet (ICEI). :493–498.
SQL injection attacks are a kind of the greatest security risks on Web applications. Much research has been done to detect SQL injection attacks by rule matching and syntax tree. However, due to the complexity and variety of SQL injection vulnerabilities, these approaches fail to detect unknown and variable SQL injection attacks. In this paper, we propose a model, ATTAR, to detect SQL injection attacks using grammar pattern recognition and access behavior mining. The most important idea of our model is to extract and analyze features of SQL injection attacks in Web access logs. To achieve this goal, we first extract and customize Web access log fields from Web applications. Then we design a grammar pattern recognizer and an access behavior miner to obtain the grammatical and behavioral features of SQL injection attacks, respectively. Finally, based on two feature sets, machine learning algorithms, e.g., Naive Bayesian, SVM, ID3, Random Forest, and K-means, are used to train and detect our model. We evaluated our model on these two feature sets, and the results show that the proposed model can effectively detect SQL injection attacks with lower false negative rate and false positive rate. In addition, comparing the accuracy of our model based on different algorithms, ID3 and Random Forest have a better ability to detect various kinds of SQL injection attacks.
2020-01-27
Taher, Kazi Abu, Mohammed Yasin Jisan, Billal, Rahman, Md. Mahbubur.  2019.  Network Intrusion Detection using Supervised Machine Learning Technique with Feature Selection. 2019 International Conference on Robotics,Electrical and Signal Processing Techniques (ICREST). :643–646.
A novel supervised machine learning system is developed to classify network traffic whether it is malicious or benign. To find the best model considering detection success rate, combination of supervised learning algorithm and feature selection method have been used. Through this study, it is found that Artificial Neural Network (ANN) based machine learning with wrapper feature selection outperform support vector machine (SVM) technique while classifying network traffic. To evaluate the performance, NSL-KDD dataset is used to classify network traffic using SVM and ANN supervised machine learning techniques. Comparative study shows that the proposed model is efficient than other existing models with respect to intrusion detection success rate.
Hsu, Hsiao-Tzu, Jong, Gwo-Jia, Chen, Jhih-Hao, Jhe, Ciou-Guo.  2019.  Improve Iot Security System Of Smart-Home By Using Support Vector Machine. 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS). :674–677.
The traditional smart-home is designed to integrate the concept of the Internet of Things(IoT) into our home environment, and to improve the comfort of home. It connects electrical products and household goods to the network, and then monitors and controls them. However, this paper takes home safety as the main axis of research. It combines the past concept of smart-home and technology of machine learning to improve the whole system of smart-home. Through systematic self-learning, it automatically figure out whether it is normal or abnormal, and reports to remind building occupants safety. At the same time, it saves the cost of human resources preservation. This paper make a set of rules table as the basic criteria first, and then classify a part of data which collected by traditional Internet of Things of smart-home by manual way, which includes the opening and closing of doors and windows, the starting and stopping of motors, the connection and interruption of the system, and the time of sending each data to label, then use Support Vector Machine(SVM) algorithm to classify and build models, and then train it. The executed model is applied to our smart-home system. Finally, we verify the Accuracy of anomaly reporting in our system.
2020-01-21
Le, Duc C., Nur Zincir-Heywood, A..  2019.  Machine Learning Based Insider Threat Modelling and Detection. 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM). :1–6.

Recently, malicious insider attacks represent one of the most damaging threats to companies and government agencies. This paper proposes a new framework in constructing a user-centered machine learning based insider threat detection system on multiple data granularity levels. System evaluations and analysis are performed not only on individual data instances but also on normal and malicious insiders, where insider scenario specific results and delay in detection are reported and discussed. Our results show that the machine learning based detection system can learn from limited ground truth and detect new malicious insiders with a high accuracy.

Aldairi, Maryam, Karimi, Leila, Joshi, James.  2019.  A Trust Aware Unsupervised Learning Approach for Insider Threat Detection. 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI). :89–98.

With the rapidly increasing connectivity in cyberspace, Insider Threat is becoming a huge concern. Insider threat detection from system logs poses a tremendous challenge for human analysts. Analyzing log files of an organization is a key component of an insider threat detection and mitigation program. Emerging machine learning approaches show tremendous potential for performing complex and challenging data analysis tasks that would benefit the next generation of insider threat detection systems. However, with huge sets of heterogeneous data to analyze, applying machine learning techniques effectively and efficiently to such a complex problem is not straightforward. In this paper, we extract a concise set of features from the system logs while trying to prevent loss of meaningful information and providing accurate and actionable intelligence. We investigate two unsupervised anomaly detection algorithms for insider threat detection and draw a comparison between different structures of the system logs including daily dataset and periodically aggregated one. We use the generated anomaly score from the previous cycle as the trust score of each user fed to the next period's model and show its importance and impact in detecting insiders. Furthermore, we consider the psychometric score of users in our model and check its effectiveness in predicting insiders. As far as we know, our model is the first one to take the psychometric score of users into consideration for insider threat detection. Finally, we evaluate our proposed approach on CERT insider threat dataset (v4.2) and show how it outperforms previous approaches.

2019-12-16
Karve, Shreya, Nagmal, Arati, Papalkar, Sahil, Deshpande, S. A..  2018.  Context Sensitive Conversational Agent Using DNN. 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA). :475–478.
We investigate a method of building a closed domain intelligent conversational agent using deep neural networks. A conversational agent is a dialog system intended to converse with a human, with a coherent structure. Our conversational agent uses a retrieval based model that identifies the intent of the input user query and maps it to a knowledge base to return appropriate results. Human conversations are based on context, but existing conversational agents are context insensitive. To overcome this limitation, our system uses a simple stack based context identification and storage system. The conversational agent generates responses according to the current context of conversation. allowing more human-like conversations.
2019-11-12
Ferenc, Rudolf, Heged\H us, Péter, Gyimesi, Péter, Antal, Gábor, Bán, Dénes, Gyimóthy, Tibor.  2019.  Challenging Machine Learning Algorithms in Predicting Vulnerable JavaScript Functions. 2019 IEEE/ACM 7th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE). :8-14.

The rapid rise of cyber-crime activities and the growing number of devices threatened by them place software security issues in the spotlight. As around 90% of all attacks exploit known types of security issues, finding vulnerable components and applying existing mitigation techniques is a viable practical approach for fighting against cyber-crime. In this paper, we investigate how the state-of-the-art machine learning techniques, including a popular deep learning algorithm, perform in predicting functions with possible security vulnerabilities in JavaScript programs. We applied 8 machine learning algorithms to build prediction models using a new dataset constructed for this research from the vulnerability information in public databases of the Node Security Project and the Snyk platform, and code fixing patches from GitHub. We used static source code metrics as predictors and an extensive grid-search algorithm to find the best performing models. We also examined the effect of various re-sampling strategies to handle the imbalanced nature of the dataset. The best performing algorithm was KNN, which created a model for the prediction of vulnerable functions with an F-measure of 0.76 (0.91 precision and 0.66 recall). Moreover, deep learning, tree and forest based classifiers, and SVM were competitive with F-measures over 0.70. Although the F-measures did not vary significantly with the re-sampling strategies, the distribution of precision and recall did change. No re-sampling seemed to produce models preferring high precision, while re-sampling strategies balanced the IR measures.

2019-11-04
Altay, Osman, Ulas, Mustafa.  2018.  Location Determination by Processing Signal Strength of Wi-Fi Routers in the Indoor Environment with Linear Discriminant Classifier. 2018 6th International Symposium on Digital Forensic and Security (ISDFS). :1-4.

Location determination in the indoor areas as well as in open areas is important for many applications. But location determination in the indoor areas is a very difficult process compared to open areas. The Global Positioning System (GPS) signals used for position detection is not effective in the indoor areas. Wi-Fi signals are a widely used method for localization detection in the indoor area. In the indoor areas, localization can be used for many different purposes, such as intelligent home systems, locations of people, locations of products in the depot. In this study, it was tried to determine localization for with the classification method for 4 different areas by using Wi-Fi signal values obtained from different routers for indoor location determination. Linear discriminant analysis (LDA) classification was used for classification. In the test using 10k fold cross-validation, 97.2% accuracy value was calculated.

2019-09-05
Sun, Y., Zhang, L., Zhao, C..  2018.  A Study of Network Covert Channel Detection Based on Deep Learning. 2018 2nd IEEE Advanced Information Management,Communicates,Electronic and Automation Control Conference (IMCEC). :637-641.

Information security has become a growing concern. Computer covert channel which is regarded as an important area of information security research gets more attention. In order to detect these covert channels, a variety of detection algorithms are proposed in the course of the research. The algorithms of machine learning type show better results in these detection algorithms. However, the common machine learning algorithms have many problems in the testing process and have great limitations. Based on the deep learning algorithm, this paper proposes a new idea of network covert channel detection and forms a new detection model. On the one hand, this algorithmic model can detect more complex covert channels and, on the other hand, greatly improve the accuracy of detection due to the use of a new deep learning model. By optimizing this test model, we can get better results on the evaluation index.

2019-09-04
Vanjari, M. S. P., Balsaraf, M. K. P..  2018.  Efficient Exploration of Algorithm in Scholarly Big Data Document. 2018 International Conference on Information , Communication, Engineering and Technology (ICICET). :1–5.
Algorithms are used to develop, analyzing, and applying in the computer field and used for developing new application. It is used for finding solutions to any problems in different condition. It transforms the problems into algorithmic ones on which standard algorithms are applied. Day by day Scholarly Digital documents are increasing. AlgorithmSeer is a search engine used for searching algorithms. The main aim of it provides a large algorithm database. It is used to automatically encountering and take these algorithms in this big collection of documents that enable algorithm indexing, searching, discovery, and analysis. An original set to identify and pull out algorithm representations in a big collection of scholarly documents is proposed, of scale able techniques used by AlgorithmSeer. Along with this, particularly important and relevant textual content can be accessed the platform and highlight portions by anyone with different levels of knowledge. In support of lectures and self-learning, the highlighted documents can be shared with others. But different levels of learners cannot use the highlighted part of text at same understanding level. The problem of guessing new highlights of partially highlighted documents can be solved by us.
2019-07-01
Amjad, N., Afzal, H., Amjad, M. F., Khan, F. A..  2018.  A Multi-Classifier Framework for Open Source Malware Forensics. 2018 IEEE 27th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE). :106-111.

Traditional anti-virus technologies have failed to keep pace with proliferation of malware due to slow process of their signatures and heuristics updates. Similarly, there are limitations of time and resources in order to perform manual analysis on each malware. There is a need to learn from this vast quantity of data, containing cyber attack pattern, in an automated manner to proactively adapt to ever-evolving threats. Machine learning offers unique advantages to learn from past cyber attacks to handle future cyber threats. The purpose of this research is to propose a framework for multi-classification of malware into well-known categories by applying different machine learning models over corpus of malware analysis reports. These reports are generated through an open source malware sandbox in an automated manner. We applied extensive pre-modeling techniques for data cleaning, features exploration and features engineering to prepare training and test datasets. Best possible hyper-parameters are selected to build machine learning models. These prepared datasets are then used to train the machine learning classifiers and to compare their prediction accuracy. Finally, these results are validated through a comprehensive 10-fold cross-validation methodology. The best results are achieved through Gaussian Naive Bayes classifier with random accuracy of 96% and 10-Fold Cross Validation accuracy of 91.2%. The said framework can be deployed in an operational environment to learn from malware attacks for proactively adapting matching counter measures.

2019-06-10
Kargaard, J., Drange, T., Kor, A., Twafik, H., Butterfield, E..  2018.  Defending IT Systems against Intelligent Malware. 2018 IEEE 9th International Conference on Dependable Systems, Services and Technologies (DESSERT). :411-417.

The increasing amount of malware variants seen in the wild is causing problems for Antivirus Software vendors, unable to keep up by creating signatures for each. The methods used to develop a signature, static and dynamic analysis, have various limitations. Machine learning has been used by Antivirus vendors to detect malware based on the information gathered from the analysis process. However, adversarial examples can cause machine learning algorithms to miss-classify new data. In this paper we describe a method for malware analysis by converting malware binaries to images and then preparing those images for training within a Generative Adversarial Network. These unsupervised deep neural networks are not susceptible to adversarial examples. The conversion to images from malware binaries should be faster than using dynamic analysis and it would still be possible to link malware families together. Using the Generative Adversarial Network, malware detection could be much more effective and reliable.

Udayakumar, N., Saglani, V. J., Cupta, A. V., Subbulakshmi, T..  2018.  Malware Classification Using Machine Learning Algorithms. 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI). :1-9.

Lately, we are facing the Malware crisis due to various types of malware or malicious programs or scripts available in the huge virtual world - the Internet. But, what is malware? Malware can be a malicious software or a program or a script which can be harmful to the user's computer. These malicious programs can perform a variety of functions, including stealing, encrypting or deleting sensitive data, altering or hijacking core computing functions and monitoring users' computer activity without their permission. There are various entry points for these programs and scripts in the user environment, but only one way to remove them is to find them and kick them out of the system which isn't an easy job as these small piece of script or code can be anywhere in the user system. This paper involves the understanding of different types of malware and how we will use Machine Learning to detect these malwares.

Sokolov, A. N., Pyatnitsky, I. A., Alabugin, S. K..  2018.  Research of Classical Machine Learning Methods and Deep Learning Models Effectiveness in Detecting Anomalies of Industrial Control System. 2018 Global Smart Industry Conference (GloSIC). :1-6.

Modern industrial control systems (ICS) act as victims of cyber attacks more often in last years. These attacks are hard to detect and their consequences can be catastrophic. Cyber attacks can cause anomalies in the work of the ICS and its technological equipment. The presence of mutual interference and noises in this equipment significantly complicates anomaly detection. Moreover, the traditional means of protection, which used in corporate solutions, require updating with each change in the structure of the industrial process. An approach based on the machine learning for anomaly detection was used to overcome these problems. It complements traditional methods and allows one to detect signal correlations and use them for anomaly detection. Additional Tennessee Eastman Process Simulation Data for Anomaly Detection Evaluation dataset was analyzed as example of industrial process. In the course of the research, correlations between the signals of the sensors were detected and preliminary data processing was carried out. Algorithms from the most common techniques of machine learning (decision trees, linear algorithms, support vector machines) and deep learning models (neural networks) were investigated for industrial process anomaly detection task. It's shown that linear algorithms are least demanding on computational resources, but they don't achieve an acceptable result and allow a significant number of errors. Decision tree-based algorithms provided an acceptable accuracy, but the amount of RAM, required for their operations, relates polynomially with the training sample volume. The deep neural networks provided the greatest accuracy, but they require considerable computing power for internal calculations.

Farooq, H. M., Otaibi, N. M..  2018.  Optimal Machine Learning Algorithms for Cyber Threat Detection. 2018 UKSim-AMSS 20th International Conference on Computer Modelling and Simulation (UKSim). :32-37.

With the exponential hike in cyber threats, organizations are now striving for better data mining techniques in order to analyze security logs received from their IT infrastructures to ensure effective and automated cyber threat detection. Machine Learning (ML) based analytics for security machine data is the next emerging trend in cyber security, aimed at mining security data to uncover advanced targeted cyber threats actors and minimizing the operational overheads of maintaining static correlation rules. However, selection of optimal machine learning algorithm for security log analytics still remains an impeding factor against the success of data science in cyber security due to the risk of large number of false-positive detections, especially in the case of large-scale or global Security Operations Center (SOC) environments. This fact brings a dire need for an efficient machine learning based cyber threat detection model, capable of minimizing the false detection rates. In this paper, we are proposing optimal machine learning algorithms with their implementation framework based on analytical and empirical evaluations of gathered results, while using various prediction, classification and forecasting algorithms.

Liu, D., Li, Y., Tang, Y., Wang, B., Xie, W..  2018.  VMPBL: Identifying Vulnerable Functions Based on Machine Learning Combining Patched Information and Binary Comparison Technique by LCS. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :800-807.

Nowadays, most vendors apply the same open source code to their products, which is dangerous. In addition, when manufacturers release patches, they generally hide the exact location of the vulnerabilities. So, identifying vulnerabilities in binaries is crucial. However, just searching source program has a lower identifying accuracy of vulnerability, which requires operators further to differentiate searched results. Under this context, we propose VMPBL to enhance identifying the accuracy of vulnerability with the help of patch files. VMPBL, compared with other proposed schemes, uses patched functions according to its vulnerable functions in patch file to further distinguish results. We establish a prototype of VMPBL, which can effectively identify vulnerable function types and get rid of safe functions from results. Firstly, we get the potential vulnerable-patched functions by binary comparison technique based on K-Trace algorithm. Then we combine the functions with vulnerability and patch knowledge database to classify these function pairs and identify the possible vulnerable functions and the vulnerability types. Finally, we test some programs containing real-world CWE vulnerabilities, and one of the experimental results about CWE415 shows that the results returned from only searching source program are about twice as much as the results from VMPBL. We can see that using VMPBL can significantly reduce the false positive rate of discovering vulnerabilities compared with analyzing source files alone.

2019-05-01
Li, P., Liu, Q., Zhao, W., Wang, D., Wang, S..  2018.  Chronic Poisoning against Machine Learning Based IDSs Using Edge Pattern Detection. 2018 IEEE International Conference on Communications (ICC). :1-7.

In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). Poisoning attack, which is one of the most recognized security threats towards machine learning- based IDSs, injects some adversarial samples into the training phase, inducing data drifting of training data and a significant performance decrease of target IDSs over testing data. In this paper, we adopt the Edge Pattern Detection (EPD) algorithm to design a novel poisoning method that attack against several machine learning algorithms used in IDSs. Specifically, we propose a boundary pattern detection algorithm to efficiently generate the points that are near to abnormal data but considered to be normal ones by current classifiers. Then, we introduce a Batch-EPD Boundary Pattern (BEBP) detection algorithm to overcome the limitation of the number of edge pattern points generated by EPD and to obtain more useful adversarial samples. Based on BEBP, we further present a moderate but effective poisoning method called chronic poisoning attack. Extensive experiments on synthetic and three real network data sets demonstrate the performance of the proposed poisoning method against several well-known machine learning algorithms and a practical intrusion detection method named FMIFS-LSSVM-IDS.

2019-03-25
Mamdouh, M., Elrukhsi, M. A. I., Khattab, A..  2018.  Securing the Internet of Things and Wireless Sensor Networks via Machine Learning: A Survey. 2018 International Conference on Computer and Applications (ICCA). :215–218.

The Internet of Things (IoT) is the network where physical devices, sensors, appliances and other different objects can communicate with each other without the need for human intervention. Wireless Sensor Networks (WSNs) are main building blocks of the IoT. Both the IoT and WSNs have many critical and non-critical applications that touch almost every aspect of our modern life. Unfortunately, these networks are prone to various types of security threats. Therefore, the security of IoT and WSNs became crucial. Furthermore, the resource limitations of the devices used in these networks complicate the problem. One of the most recent and effective approaches to address such challenges is machine learning. Machine learning inspires many solutions to secure the IoT and WSNs. In this paper, we survey the different threats that can attack both IoT and WSNs and the machine learning techniques developed to counter them.

2019-03-15
Kim, D., Shin, D., Shin, D..  2018.  Unauthorized Access Point Detection Using Machine Learning Algorithms for Information Protection. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1876-1878.

With the frequent use of Wi-Fi and hotspots that provide a wireless Internet environment, awareness and threats to wireless AP (Access Point) security are steadily increasing. Especially when using unauthorized APs in company, government and military facilities, there is a high possibility of being subjected to various viruses and hacking attacks. It is necessary to detect unauthorized Aps for protection of information. In this paper, we use RTT (Round Trip Time) value data set to detect authorized and unauthorized APs in wired / wireless integrated environment, analyze them using machine learning algorithms including SVM (Support Vector Machine), C4.5, KNN (K Nearest Neighbors) and MLP (Multilayer Perceptron). Overall, KNN shows the highest accuracy.

Deliu, I., Leichter, C., Franke, K..  2018.  Collecting Cyber Threat Intelligence from Hacker Forums via a Two-Stage, Hybrid Process Using Support Vector Machines and Latent Dirichlet Allocation. 2018 IEEE International Conference on Big Data (Big Data). :5008-5013.

Traditional security controls, such as firewalls, anti-virus and IDS, are ill-equipped to help IT security and response teams keep pace with the rapid evolution of the cyber threat landscape. Cyber Threat Intelligence (CTI) can help remediate this problem by exploiting non-traditional information sources, such as hacker forums and "dark-web" social platforms. Security and response teams can use the collected intelligence to identify emerging threats. Unfortunately, when manual analysis is used to extract CTI from non-traditional sources, it is a time consuming, error-prone and resource intensive process. We address these issues by using a hybrid Machine Learning model that automatically searches through hacker forum posts, identifies the posts that are most relevant to cyber security and then clusters the relevant posts into estimations of the topics that the hackers are discussing. The first (identification) stage uses Support Vector Machines and the second (clustering) stage uses Latent Dirichlet Allocation. We tested our model, using data from an actual hacker forum, to automatically extract information about various threats such as leaked credentials, malicious proxy servers, malware that evades AV detection, etc. The results demonstrate our method is an effective means for quickly extracting relevant and actionable intelligence that can be integrated with traditional security controls to increase their effectiveness.

2019-02-25
Peng, W., Huang, L., Jia, J., Ingram, E..  2018.  Enhancing the Naive Bayes Spam Filter Through Intelligent Text Modification Detection. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :849–854.

Spam emails have been a chronic issue in computer security. They are very costly economically and extremely dangerous for computers and networks. Despite of the emergence of social networks and other Internet based information exchange venues, dependence on email communication has increased over the years and this dependence has resulted in an urgent need to improve spam filters. Although many spam filters have been created to help prevent these spam emails from entering a user's inbox, there is a lack or research focusing on text modifications. Currently, Naive Bayes is one of the most popular methods of spam classification because of its simplicity and efficiency. Naive Bayes is also very accurate; however, it is unable to correctly classify emails when they contain leetspeak or diacritics. Thus, in this proposes, we implemented a novel algorithm for enhancing the accuracy of the Naive Bayes Spam Filter so that it can detect text modifications and correctly classify the email as spam or ham. Our Python algorithm combines semantic based, keyword based, and machine learning algorithms to increase the accuracy of Naive Bayes compared to Spamassassin by over two hundred percent. Additionally, we have discovered a relationship between the length of the email and the spam score, indicating that Bayesian Poisoning, a controversial topic, is actually a real phenomenon and utilized by spammers.

Ali, S. S., Maqsood, J..  2018.  .Net library for SMS spam detection using machine learning: A cross platform solution. 2018 15th International Bhurban Conference on Applied Sciences and Technology (IBCAST). :470–476.

Short Message Service is now-days the most used way of communication in the electronic world. While many researches exist on the email spam detection, we haven't had the insight knowledge about the spam done within the SMS's. This might be because the frequency of spam in these short messages is quite low than the emails. This paper presents different ways of analyzing spam for SMS and a new pre-processing way to get the actual dataset of spam messages. This dataset was then used on different algorithm techniques to find the best working algorithm in terms of both accuracy and recall. Random Forest algorithm was then implemented in a real world application library written in C\# for cross platform .Net development. This library is capable of using a prebuild model for classifying a new dataset for spam and ham.

Vishagini, V., Rajan, A. K..  2018.  An Improved Spam Detection Method with Weighted Support Vector Machine. 2018 International Conference on Data Science and Engineering (ICDSE). :1–5.
Email is the most admired method of exchanging messages using the Internet. One of the intimidations to email users is to detect the spam they receive. This can be addressed using different detection and filtering techniques. Machine learning algorithms, especially Support Vector Machine (SVM), can play vital role in spam detection. We propose the use of weighted SVM for spam filtering using weight variables obtained by KFCM algorithm. The weight variables reflect the importance of different classes. The misclassification of emails is reduced by the growth of weight value. We evaluate the impact of spam detection using SVM, WSVM with KPCM and WSVM with KFCM.UCI Repository SMS Spam base dataset is used for our experimentation.