Visible to the public Identifying Malicious Botnet Traffic Using Logistic Regression

TitleIdentifying Malicious Botnet Traffic Using Logistic Regression
Publication TypeConference Paper
Year of Publication2018
AuthorsBapat, R., Mandya, A., Liu, X., Abraham, B., Brown, D. E., Kang, H., Veeraraghavan, M.
Conference Name2018 Systems and Information Engineering Design Symposium (SIEDS)
ISBN Number978-1-5386-6343-1
Keywordsaggregate statistics, Botnet, botnet activity, Botnet detection, botnet infestation, botnet malware, botnets, compositionality, cyber security, cyber-attacks, feature extraction, Hidden Markov models, identifying malicious botnet traffic, Internet, intrusion detection system, invasive software, learning (artificial intelligence), lightweight logistic regression model, logistic regression, Logistics, machine learning, Malware, Metrics, Network security, network traffic, Payloads, popular network monitoring framework, pubcrawl, regression analysis, resilience, Resiliency, security of data, significant economic harm, social harm, statistical learning method, telecommunication traffic, vulnerable devices

An important source of cyber-attacks is malware, which proliferates in different forms such as botnets. The botnet malware typically looks for vulnerable devices across the Internet, rather than targeting specific individuals, companies or industries. It attempts to infect as many connected devices as possible, using their resources for automated tasks that may cause significant economic and social harm while being hidden to the user and device. Thus, it becomes very difficult to detect such activity. A considerable amount of research has been conducted to detect and prevent botnet infestation. In this paper, we attempt to create a foundation for an anomaly-based intrusion detection system using a statistical learning method to improve network security and reduce human involvement in botnet detection. We focus on identifying the best features to detect botnet activity within network traffic using a lightweight logistic regression model. The network traffic is processed by Bro, a popular network monitoring framework which provides aggregate statistics about the packets exchanged between a source and destination over a certain time interval. These statistics serve as features to a logistic regression model responsible for classifying malicious and benign traffic. Our model is easy to implement and simple to interpret. We characterized and modeled 8 different botnet families separately and as a mixed dataset. Finally, we measured the performance of our model on multiple parameters using F1 score, accuracy and Area Under Curve (AUC).

Citation Keybapat_identifying_2018