Visible to the public Security Scalability and Big Data, 2014

SoS Newsletter- Advanced Book Block


SoS Logo

Security Scalability and Big Data, 2014

Scalability is a hard problem in the Science of Security. Applied to Big Data, the problems of scaling security systems are compounded. The work cited here addresses the problem and was presented in 2014.

Eberle, W.; Holder, L., "A Partitioning Approach to Scaling Anomaly Detection in Graph Streams," Big Data (Big Data), 2014 IEEE International Conference on, vol., no., pp. 17, 24, 27-30 Oct. 2014. doi:10.1109/BigData.2014.7004367
Abstract: Due to potentially complex relationships among heterogeneous data sets, recent research efforts have involved the representation of this type of complex data as a graph. For instance, in the case of computer network traffic, a graph representation of the traffic might consist of nodes representing computers and edges representing communications between the corresponding computers. However, computer network traffic is typically voluminous, or acquired in real-time as a stream of information. In previous work on static graphs, we have used a compression-based measure to find normative patterns, and then analyzed the close matches to the normative patterns to indicate potential anomalies. However, while our approach has demonstrated its effectiveness in a variety of domains, the issue of scalability has limited this approach when dealing with domains containing millions of nodes and edges. To address this issue, we propose a novel approach called Pattern Learning and Anomaly Detection on Streams, or PLADS, that is not only scalable to real-world data that is streaming, but also maintains reasonable levels of effectiveness in detecting anomalies. In this paper we present a partitioning and windowing approach that partitions the graph as it streams in over time and maintains a set of normative patterns and anomalies. We then empirically evaluate our approach using publicly available network data as well as a dataset that represents e-commerce traffic.
Keywords: data mining; data structures; graph theory; learning (artificial intelligence); pattern classification; security of data; PLADS approach; anomaly detection scaling; computer network traffic; data representation; e-commerce traffic representation; electronic commerce; graph stream; heterogeneous data set; information stream; normative pattern; partitioning approach; pattern learning and anomaly detection on streams; windowing approach; Big data; Computers; Image edge detection; Internet; Scalability; Telecommunication traffic; Graph-based; anomaly detection; knowledge discovery; streaming data (ID#: 15-5786)


Sokolov, V.; Alekseev, I.; Mazilov, D.; Nikitinskiy, M., "A Network Analytics System in the SDN," Science and Technology Conference (Modern Networking Technologies) (MoNeTeC), 2014 First International, vol., no., pp. 1, 3, 28-29 Oct. 2014. doi:10.1109/MoNeTeC.2014.6995603
Abstract: The emergence of virtualization and security problems of the network services, their lack of scalability and flexibility force network operators to look for “smarter” tools for network design and management. With the continuous growth of the number of subscribers, the volume of traffic and competition at the telecommunication market, there is a stable interest in finding new ways to identify weak points of the existing architecture, preventing the collapse of the network as well as evaluating and predicting the risks of problems in the network. To solve the problems of increasing the fail-safety and the efficiency of the network infrastructure, we offer to use the analytical software in the SDN context.
Keywords: computer network management; computer network security; network analysers; software defined networking; virtualisation; SDN context; analytical software; fail-safety; force network operators; network analytics system; network design; network infrastructure; network management; network services; security problems; smarter tools; software-defined network; telecommunication market; virtualization; Bandwidth; Data models; Monitoring; Network topology; Ports (Computers); Protocols; Software systems; analysis; analytics; application programming interface; big data; dynamic network model; fail-safety; flow; flow table; heuristic; load balancing; monitoring;network statistics; network topology; openflow; protocol; sdn; smart tool; software system; software-defined networking; weighted graph (ID#: 15-5787)


Chenhui Li; Baciu, G., "VALID: A Web Framework for Visual Analytics of Large Streaming Data," Trust, Security and Privacy in Computing and Communications (TrustCom), 2014 IEEE 13th International Conference on, vol., no., pp. 686, 692, 24-26 Sept. 2014. doi:10.1109/TrustCom.2014.89
Abstract: Visual analytics of increasingly large data sets has become a challenge for traditional in-memory and off-line algorithms as well as in the cognitive process of understanding features at various scales of resolution. In this paper, we attempt a new web-based framework for the dynamic visualization of large data. The framework is based on the idea that no physical device can ever catch up to the analytical demand and the physical requirements of large data. Thus, we adopt a data streaming generator model that serializes the original data into multiple streams of data that can be contained on current hardware. Thus, the scalability of the visual analytics of large data is inherent in the streaming architecture supported by our platform. The platform is based on the traditional server-client model. However, the platform is enhanced by effective analytical methods that operate on data streams, such as binned points and bundling lines that reduce and enhance large streams of data for effective interactive visualization. We demonstrate the effectiveness of our framework on different types of large datasets.
Keywords: Big Data; Internet; client-server systems; data analysis; data visualisation; interactive systems; media streaming; Big Data visualization; VALID; Web framework; data streaming generator model; dynamic data visualization; interactive visualization; server-client model; streaming architecture;  Conferences; Privacy; Security; big data; dynamic visualization; streaming data; visual analytics (ID#: 15-5788)


Haltas, F.; Uzun, E.; Siseci, N.; Posul, A.; Emre, B., "An Automated Bot Detection System through Honeypots for Large-Scale," Cyber Conflict (CyCon 2014), 2014 6th International Conference on, vol., no., pp. 255, 270, 3-6 June 2014. doi:10.1109/CYCON.2014.6916407
Abstract: One of the purposes of active cyber defense systems is identifying infected machines in enterprise networks that are presumably root cause and main agent of various cyber-attacks. To achieve this, researchers have suggested many detection systems that rely on host-monitoring techniques and require deep packet inspection or which are trained by malware samples by applying machine learning and clustering techniques. To our knowledge, most approaches are either lack of being deployed easily to real enterprise networks, because of practicability of their training system which is supposed to be trained by malware samples or dependent to host-based or deep packet inspection analysis which requires a big amount of storage capacity for an enterprise. Beside this, honeypot systems are mostly used to collect malware samples for analysis purposes and identify coming attacks. Rather than keeping experimental results of bot detection techniques as theory and using honeypots for only analysis purposes, in this paper, we present a novel automated bot-infected machine detection system BFH (BotFinder through Honeypots), based on BotFinder, that identifies infected hosts in a real enterprise network by learning approach. Our solution, relies on NetFlow data, is capable of detecting bots which are infected by most-recent malwares whose samples are caught via 97 different honeypot systems. We train BFH by created models, according to malware samples, provided and updated by 97 honeypot systems. BFH system automatically sends caught malwares to classification unit to construct family groups. Later, samples are automatically given to training unit for modeling and perform detection over NetFlow data. Results are double checked by using full packet capture of a month and through tools that identify rogue domains. Our results show that BFH is able to detect infected hosts with very few false-positive rates and successful on handling most-recent malware families since it is fed by 97 Honeypot and it supports large networks with scalability of Hadoop infrastructure, as deployed in a large-scale enterprise network in Turkey.
Keywords: invasive software; learning (artificial intelligence); parallel processing; pattern clustering; BFH; Hadoop infrastructure; NetFlow data; active cyber defense systems; automated bot detection system; bot detection techniques; bot-infected machine detection system; botfinder through honeypots; clustering technique; cyber-attacks; deep packet inspection; enterprise networks; honeypot systems; host-monitoring techniques; learning approach; machine learning technique; malware; Data models; Feature extraction; Malware; Monitoring; Scalability; Training; Botnet; NetFlow analysis; honeypots; machine learning (ID#: 15-5789)


Irudayasamy, A.; Lawrence, A., "Enhanced Anonymization Algorithm to Preserve Confidentiality of Data in Public Cloud," Information Society (i-Society), 2014 International Conference on, vol., no., pp. 86, 91, 10-12 Nov. 2014.  doi:10.1109/i-Society.2014.7009017
Abstract: Cloud computing offers immense computation control and storing volume which permit users to organize applications. Many confidential and sensitive presentations like health facilities are constructed on cloud for monetary and working expediency. Generally, information in these requests is masked to safeguard the owner's confidential information, but such information can be possibly despoiled when new information is added to it. Preserving the confidentiality over distributed data sets is still a big challenge in the cloud environment because most of this information are huge and ranges through many storage nodes. Prevailing methods undergo reduced scalability and incompetence since information is assimilated and accesses all data repeatedly when apprises is done. In this paper, an efficient hash centered quasi-identifier anonymization method is introduced to confirm the confidentiality of the sensitive information and attain great value over spread data sets on cloud. Quasi-identifiers, which signify the groups of anonymized data, are hashed for adeptness. Consequently, a procedure is framed to fulfill this methodology. By this method, when deployed, results validate the effectiveness of confidential conservation on huge data sets that can be amended considerably over existing methods.
Keywords: cloud computing; cryptography; computation control; data confidentiality preservation; distributed data sets; enhanced anonymization algorithm; hash centered quasi-identifier anonymization method; prevailing methods; public cloud computing; sensitive information confidentiality; storage nodes; storing volume; Cloud computing; Computer science; Conferences; Distributed databases; Privacy; Societies; Taxonomy; Cloud Computing; Data anonymization; encryption; privacy; security (ID#: 15-5790)


Hassan, S.; Abbas Kamboh, A.; Azam, F., "Analysis of Cloud Computing Performance, Scalability, Availability, & Security," Information Science and Applications (ICISA), 2014 International Conference on, vol., no., pp. 1, 5, 6-9 May 2014. doi:10.1109/ICISA.2014.6847363
Abstract: Cloud Computing means that a relationship of many number of computers through a contact channel like internet. Through cloud computing we send, receive and store data on internet. Cloud Computing gives us an opportunity of parallel computing by using a large number of Virtual Machines. Now a days, Performance, scalability, availability and security may represent the big risks in cloud computing. In this paper we highlights the issues of security, availability and scalability issues and we will also identify that how we make our cloud computing based infrastructure more secure and more available. And we also highlight the elastic behavior of cloud computing. And some of characteristics which involved for gaining the high performance of cloud computing will also be discussed.
Keywords: cloud computing; parallel processing; security of data; virtual machines; Internet; parallel computing; scalability; security; virtual machine; Availability; Cloud computing; Computer hacking; Scalability (ID#: 15-5791)


Grolinger, K.; Hayes, M.; Higashino, W.A.; L'Heureux, A.; Allison, D.S.; Capretz, M.A.M., "Challenges for MapReduce in Big Data," Services (SERVICES), 2014 IEEE World Congress on, vol., no., pp. 182, 189, June 27 2014-July 2 2014. doi:10.1109/SERVICES.2014.41
Abstract: In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce paradigm which allows for massively parallel and distributed execution over a large number of computing nodes. This paper identifies MapReduce issues and challenges in handling Big Data with the objective of providing an overview of the field, facilitating better planning and management of Big Data projects, and identifying opportunities for future research in this field. The identified challenges are grouped into four main categories corresponding to Big Data tasks types: data storage (relational databases and NoSQL stores), Big Data analytics (machine learning and interactive analytics), online processing, and security and privacy. Moreover, current efforts aimed at improving and extending MapReduce to address identified challenges are presented. Consequently, by identifying issues and challenges MapReduce faces when handling Big Data, this study encourages future Big Data research.
Keywords: Big Data; SQL; data analysis; data privacy; learning (artificial intelligence); parallel programming; relational databases; security of data; storage management; Big Data analytics; Big Data community; Big Data project management; Big Data project planning; MapReduce paradigm; NoSQL stores; data security; data storage; interactive analytics; machine learning; massive data sets; massively distributed execution; massively parallel execution; online processing; relational databases; Algorithm design and analysis; Big data; Data models; Data visualization; Memory; Scalability; Security; Big Data Analytics; Interactive Analytics; Machine Learning; MapReduce; NoSQL; Online Processing; Privacy; Security (ID#: 15-5792)


Balusamy, M.; Muthusundari, S., "Data Anonymization through Generalization Using Map Reduce on Cloud," Computer Communication and Systems (CCCS), 2014 International Conference on, pp. 039, 042, 20-21 Feb. 2014. doi:10.1109/ICCCS.2014.7068164
Abstract: Now a day's cloud computing provides lot of computation power and storage capacity to the users can be share their private data. To providing the security to the users sensitive data is challenging and difficult one in a cloud environment. K-anonymity approach as far as used for providing privacy to users sensitive data, but cloud can be greatly increases in a big data manner. In the existing, top-town specialization approach to make the privacy of users sensitive data. When the scalability of users data increase means top-town specialization technique is difficult to preserve the sensitive data and provide security to users data. Here we propose the specialization approach through generalization to preserve the sensitive data and provide the security against scalability in an efficient way with the help of map-reduce. Our approach is founding better solution than existing approach in a scalable and efficient way to provide security to users data.
Keywords: cloud computing; data privacy; parallel processing; MapReduce; cloud environment; computation power; data anonymization; generalization; k-anonymity approach; private data sharing; security; storage capacity; top-town specialization approach; user sensitive data privacy; users data scalability; Cloud computing; Computers; Conferences; Data privacy; Privacy; Scalability; Security; Generalization; K-anonymity; Map-Reduce; Specialization; big data (ID#: 15-5793)


Choi, A.J., "Internet of Things: Evolution towards a Hyper-Connected Society," Solid-State Circuits Conference (A-SSCC), 2014 IEEE Asian, vol., no., pp. 5, 8, 10-12 Nov. 2014. doi:10.1109/ASSCC.2014.7008846
Abstract: Internet of Things is expected to encompass every aspect of our lives and to generate a paradigm shift towards a hyper-connected society. As more things are connected to the Internet, larger amount of data are generated and processed into useful actions that can make our lives safer and easier. Since IoT generate heavy traffics, it induces several challenges to next generation network. Therefore, IoT infrastructure should be designed in terms of flexibility and scalability. In addition, cloud computing and big data analytics are being integrated. They allow network to change itself much faster to service requirements with better operational efficiency and intelligence. IoT should also be vertically optimized from device to application in order to provide ultra-low power operation, cost-effectiveness, and service reliability with ensuring full security across the entire signal path. In this paper we address IoT challenges and technological requirements from the service provider perspective.
Keywords: Big Data; Internet; Internet of Things; cloud computing; computer network security; data analysis; data integration; next generation networks; reliability; IoT infrastructure; big data analytics; cost-effectiveness; hyper-connected society; next generation network; service reliability; ultra-low power operation; Business; Cloud computing; Intelligent sensors; Long Term Evolution; Security; IoT; flexiblity; scalability; security (ID#: 15-5794)


Ge Ma; Zhen Chen; Junwei Cao; Zhenhua Guo; Yixin Jiang; Xiaobin Guo, "A Tentative Comparison on CDN and NDN," Systems, Man and Cybernetics (SMC), 2014 IEEE International Conference on, vol., no., pp. 2893, 2898, 5-8 Oct. 2014. doi:10.1109/SMC.2014.6974369
Abstract: With the pretty prompt growth in Internet content, future Internet is emerging as the main usage shifting from traditional host-to-host model to content dissemination model, e.g. video makes up more than half of Internet traffic. ISPs, content providers and other third parties have widely deployed content delivery networks (CDNs) to support digital content distribution. Though CDN is an ad-hoc solution to the content dissemination problem, there are still big challenges, such as complicated control plane. By contrast, as a wholly new designed network architecture, named data networking (NDN) incorporates content delivery function in its network layer, its stateful routing and forwarding plane can effectively detect and adapt to the dynamic and ever-changing Internet. In this paper, we try to explore the similarities and differences between CDN and NDN. Hence, we evaluate the distribution efficiency, network security and protocol overhead between CDN and NDN. Especially in the implementation phase, we conduct their testbeds separately with the same topology to derive their performance of content delivery. Finally, summarizing our main results, we gather that: 1) NDN has its own advantage on lots of aspects, including security, scalability and quality of service (QoS); 2) NDN make full use of surrounding resources and is more adaptive to the dynamic and ever-changing Internet; 3) though CDN is a commercial and mature architecture, in some scenarios, NDN can perform better than CDN under the same topology and caching storage. In a word, NDN is practical to play an even greater role in the evolution of the Internet based on the massive distribution and retrieval in the future.
Keywords: Internet; quality of service; routing protocols; telecommunication traffic; CDN; ISP; Internet content; Internet traffic; NDN; QoS; complicated control plane; content delivery function; content delivery network; content dissemination model; content dissemination problem; content provider; digital content distribution; distribution efficiency; future Internet; host-to-host model; named data networking; network architecture; network security; pretty prompt growth; protocol overhead; quality of service; stateful routing and forwarding plane; usage shifting; Conferences; Cybernetics; Architecture; comparison; evaluation; named data networking (ID#: 15-5795)


Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.