Visible to the public DNS graph mining for malicious domain detection

TitleDNS graph mining for malicious domain detection
Publication TypeConference Paper
Year of Publication2017
AuthorsTran, H., Nguyen, A., Vo, P., Vu, T.
Conference Name2017 IEEE International Conference on Big Data (Big Data)
Keywordsattackers, belief propagation, belief propagation algorithm, computer network security, cyber security, data mining, DNS data servers, DNS graph mining, Domain Resolution, Graph Interface, graph mining, graph mining task, graph node, graph theory, graph-based interface, Human Behavior, Inference algorithms, Internet, invasive software, IP networks, IP resolutions, labeled domains, malicious domain detection technique, malicious probabilities, malware analysis, malware detection, Markov processes, message passing, Metrics, multiple malicious domains, Nickel, potential malicious domains, privacy, probability, pubcrawl, resilience, Resiliency, variety cyber attacks

As a vital component of variety cyber attacks, malicious domain detection becomes a hot topic for cyber security. Several recent techniques are proposed to identify malicious domains through analysis of DNS data because much of global information in DNS data which cannot be affected by the attackers. The attackers always recycle resources, so they frequently change the domain - IP resolutions and create new domains to avoid detection. Therefore, multiple malicious domains are hosted by the same IPs and multiple IPs also host same malicious domains in simultaneously, which create intrinsic association among them. Hence, using the labeled domains which can be traced back from queries history of all domains to verify and figure out the association of them all. Graphs seem the best candidate to represent for this relationship and there are many algorithms developed on graph with high performance. A graph-based interface can be developed and transformed to the graph mining task of inferring graph node's reputation scores using improvements of the belief propagation algorithm. Then higher reputation scores the nodes reveal, the more malicious probabilities they infer. For demonstration, this paper proposes a malicious domain detection technique and evaluates on a real-world dataset. The dataset is collected from DNS data servers which will be used for building a DNS graph. The proposed technique achieves high performance in accuracy rates over 98.3%, precision and recall rates as: 99.1%, 98.6%. Especially, with a small set of labeled domains (legitimate and malicious domains), the technique can discover a large set of potential malicious domains. The results indicate that the method is strongly effective in detecting malicious domains.

Citation Keytran_dns_2017