Visible to the public Big Data

SoS Newsletter- Advanced Book Block

Big Data

Big data security is a growing area of interest for researchers. The work presented here ranges from cyber-threat detection in critical infrastructures to privacy protection. This work was presented and published in the first half of 2014.

  • Abawajy, J.; Kelarev, A; Chowdhury, M., "Large Iterative Multitier Ensemble Classifiers for Security of Big Data," Emerging Topics in Computing, IEEE Transactions on, vol. PP, no.99, pp.1,1, April 2014. doi: 10.1109/TETC.2014.2316510 This article introduces and investigates Large Iterative Multitier Ensemble (LIME) classifiers specifically tailored for Big Data. These classifiers are very large, but are quite easy to generate and use. They can be so large that it makes sense to use them only for Big Data. They are generated automatically as a result of several iterations in applying ensemble meta classifiers. They incorporate diverse ensemble meta classifiers into several tiers simultaneously and combine them into one automatically generated iterative system so that many ensemble meta classifiers function as integral parts of other ensemble meta classifiers at higher tiers. In this paper, we carry out a comprehensive investigation of the performance of LIME classifiers for a problem concerning security of big data. Our experiments compare LIME classifiers with various base classifiers and standard ordinary ensemble meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of classifications. LIME classifiers performed better than the base classifiers and standard ensemble meta classifiers.
    Keywords: Big data; Data handling; Data mining; Data storage systems; Information management; Iterative methods; Malware (ID#:14-2639)
  • Hurst, W.; Merabti, M.; Fergus, P., "Big Data Analysis Techniques for Cyber-threat Detection in Critical Infrastructures," Advanced Information Networking and Applications Workshops (WAINA), 2014 28th International Conference on, pp.916, 921, 13-16 May 2014. doi: 10.1109/WAINA.2014.141 The research presented in this paper offers a way of supporting the security currently in place in critical infrastructures by using behavioral observation and big data analysis techniques to add to the Defense in Depth (DiD). As this work demonstrates, applying behavioral observation to critical infrastructure protection has effective results. Our design for Behavioral Observation for Critical Infrastructure Security Support (BOCISS) processes simulated critical infrastructure data to detect anomalies which constitute threats to the system. This is achieved using feature extraction and data classification. The data is provided by the development of a nuclear power plant simulation using Siemens Tecnomatix Plant Simulator and the programming language SimTalk. Using this simulation, extensive realistic data sets are constructed and collected, when the system is functioning as normal and during a cyber-attack scenario. The big data analysis techniques, classification results and an assessment of the outcomes is presented.
    Keywords: Big Data; critical infrastructures; feature extraction; pattern classification; programming languages; security of data; BOCISS process; DiD; Siemens Tecnomatix Plant Simulator; anomaly detection; behavioral observation; big data analysis techniques ;critical infrastructure protection ;critical infrastructure security support process; cyber-attack scenario; cyber-threat detection; data classification; defence in depth; feature extraction; nuclear power plant simulation; programming language SimTalk; realistic data set; simulated critical infrastructure data; Big data; Data models; Feature extraction; Inductors; Security; Support vector machine classification; Water resources; Behavioral Observation; Big Data; Critical Infrastructure; Data Classification; Simulation (ID#:14-2640)
  • Demchenko, Y.; de Laat, C.; Membrey, P., "Defining Architecture Components of the Big Data Ecosystem," Collaboration Technologies and Systems (CTS), 2014 International Conference on, pp.104,112, 19-23 May 2014. doi: 10.1109/CTS.2014.6867550 Big Data are becoming a new technology focus both in science and in industry and motivate technology shift to data centric architecture and operational models. There is a vital need to define the basic information/semantic models, architecture components and operational models that together comprise a so-called Big Data Ecosystem. This paper discusses a nature of Big Data that may originate from different scientific, industry and social activity domains and proposes improved Big Data definition that includes the following parts: Big Data properties (also called Big Data 5V: Volume, Velocity, Variety, Value and Veracity), data models and structures, data analytics, infrastructure and security. The paper discusses paradigm change from traditional host or service based to data centric architecture and operational models in Big Data. The Big Data Architecture Framework (BDAF) is proposed to address all aspects of the Big Data Ecosystem and includes the following components: Big Data Infrastructure, Big Data Analytics, Data structures and models, Big Data Lifecycle Management, Big Data Security. The paper analyses requirements to and provides suggestions how the mentioned above components can address the main Big Data challenges. The presented work intends to provide a consolidated view of the Big Data phenomena and related challenges to modern technologies, and initiate wide discussion.
    Keywords: Big Data; data analysis; security of data; BDAF; Big Data analytics; Big Data architecture framework; Big Data ecosystem; Big Data infrastructure; Big Data lifecycle management; Big Data properties ;Big Data security; data analytics; data centric architecture; data infrastructure; data models; data operational models; data security; data structures; information-semantic models; value property; variety property; velocity property; veracity property; volume property; Big data; Biological system modeling; Computer architecture; Data models; Ecosystems; Industries; Security; Big Data Architecture Framework (BDAF); Big Data Ecosystem; Big Data Infrastructure (BDI); Big Data Lifecycle Management (BDLM);Big Data Technology; Cloud based Big Data Infrastructure Services (ID#:14-2641)
  • Rongxing Lu; Hui Zhu; Ximeng Liu; Liu, J.K.; Jun Shao, "Toward Efficient And Privacy-Preserving Computing In Big Data Era," Network, IEEE , vol.28, no.4, pp.46,50, July-August 2014. doi: 10.1109/MNET.2014.6863131 Big data, because it can mine new knowledge for economic growth and technical innovation, has recently received considerable attention, and many research efforts have been directed to big data processing due to its high volume, velocity, and variety (referred to as "3V") challenges. However, in addition to the 3V challenges, the flourishing of big data also hinges on fully understanding and managing newly arising security and privacy challenges. If data are not authentic, new mined knowledge will be unconvincing; while if privacy is not well addressed, people may be reluctant to share their data. Because security has been investigated as a new dimension, "veracity," in big data, in this article, we aim to exploit new challenges of big data in terms of privacy, and devote our attention toward efficient and privacy-preserving computing in the big data era. Specifically, we first formalize the general architecture of big data analytics, identify the corresponding privacy requirements, and introduce an efficient and privacy-preserving cosine similarity computing protocol as an example in response to data mining's efficiency and privacy requirements in the big data era.
    Keywords: Big Data; data analysis; data mining; data privacy; security of data; big data analytics; big data era; big data processing; data mining efficiency; privacy requirements; privacy-preserving cosine similarity computing protocol; security; Authentication; Big data; Cryptography; Data privacy; Economics ;Information analysis; Privacy (ID#:14-2642)
  • Kan Yang; Xiaohua Jia; Kui Ren; Ruitao Xie; Liusheng Huang, "Enabling Efficient Access Control With Dynamic Policy Updating For Big Data In The Cloud," INFOCOM, 2014 Proceedings IEEE, pp.2013,2021, April 27 2014-May 2 2014. doi: 10.1109/INFOCOM.2014.6848142 Due to the high volume and velocity of big data, it is an effective option to store big data in the cloud, because the cloud has capabilities of storing big data and processing high volume of user access requests. Attribute-Based Encryption (ABE) is a promising technique to ensure the end-to-end security of big data in the cloud. However, the policy updating has always been a challenging issue when ABE is used to construct access control schemes. A trivial implementation is to let data owners retrieve the data and re-encrypt it under the new access policy, and then send it back to the cloud. This method incurs a high communication overhead and heavy computation burden on data owners. In this paper, we propose a novel scheme that enabling efficient access control with dynamic policy updating for big data in the cloud. We focus on developing an outsourced policy updating method for ABE systems. Our method can avoid the transmission of encrypted data and minimize the computation work of data owners, by making use of the previously encrypted data with old access policies. Moreover, we also design policy updating algorithms for different types of access policies. The analysis shows that our scheme is correct, complete, secure and efficient.
    Keywords: Big Data; authorisation; cloud computing ;cryptography; ABE; Big Data; access control; access policy; attribute-based encryption; cloud; dynamic policy updating; end-to-end security ;outsourced policy updating method; Access control; Big data; Encryption; Public key; Servers;ABE; Access Control; Big Data; Cloud; Policy Updating (ID#:14-2643)
  • Xindong Wu; Xingquan Zhu; Gong-Qing Wu; Wei Ding, "Data Mining With Big Data," Knowledge and Data Engineering, IEEE Transactions on, vol.26, no.1, pp.97,107, Jan. 2014. doi: 10.1109/TKDE.2013.109 Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
    Keywords: data mining; user modelling; Big Data processing model; Big Data revolution ;HACE theorem; data collection capacity; data driven model; data mining; data storage; demand driven aggregation; growing data sets; information sources; networking; user interest modeling; Data handling; Data models; Data privacy; Data storage systems; Distributed databases; Information management; Big Data; autonomous sources; complex and evolving associations; data mining; heterogeneity (ID#:14-2644)
  • Sandryhaila, A; Moura, J., "Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure," Signal Processing Magazine, IEEE, vol.31, no.5, pp.80, 90, Sept. 2014. doi: 10.1109/MSP.2014.2329213 Analysis and processing of very large data sets, or big data, poses a significant challenge. Massive data sets are collected and studied in numerous domains, from engineering sciences to social networks, biomolecular research, commerce, and security. Extracting valuable information from big data requires innovative approaches that efficiently process large amounts of data as well as handle and, moreover, utilize their structure. This article discusses a paradigm for large-scale data analysis based on the discrete signal processing (DSP) on graphs (DSPG). DSPG extends signal processing concepts and methodologies from the classical signal processing theory to data indexed by general graphs. Big data analysis presents several challenges to DSPG, in particular, in filtering and frequency analysis of very large data sets. We review fundamental concepts of DSPG, including graph signals and graph filters, graph Fourier transform, graph frequency, and spectrum ordering, and compare them with their counterparts from the classical signal processing theory. We then consider product graphs as a graph model that helps extend the application of DSPG methods to large data sets through efficient implementation based on parallelization and vectorization. We relate the presented framework to existing methods for large-scale data processing and illustrate it with an application to data compression.
    Keywords: Big data; Data storage; Digital signal processing; Fourier transforms; Graph theory; Information analysis; Information processing; Time series analysis (ID#:14-2645)
  • Peng Li; Song Guo, "Load Balancing For Privacy-Preserving Access To Big Data In Cloud," Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on, pp.524,528, April 27 2014-May 2 2014. doi: 10.1109/INFCOMW.2014.6849286 In the era of big data, many users and companies start to move their data to cloud storage to simplify data management and reduce data maintenance cost. However, security and privacy issues become major concerns because third-party cloud service providers are not always trusty. Although data contents can be protected by encryption, the access patterns that contain important information are still exposed to clouds or malicious attackers. In this paper, we apply the ORAM algorithm to enable privacy-preserving access to big data that are deployed in distributed file systems built upon hundreds or thousands of servers in a single or multiple geo-distributed cloud sites. Since the ORAM algorithm would lead to serious access load unbalance among storage servers, we study a data placement problem to achieve a load balanced storage system with improved availability and responsiveness. Due to the NP-hardness of this problem, we propose a low-complexity algorithm that can deal with large-scale problem size with respect to big data. Extensive simulations are conducted to show that our proposed algorithm finds results close to the optimal solution, and significantly outperforms a random data placement algorithm.
    Keywords: Big Data; cloud computing; computational complexity; data protection; distributed databases; file servers; information retrieval; random processes; resource allocation; storage management; Big Data; NP-hardness; ORAM algorithm; cloud storage; data availability; data content protection; data maintenance cost reduction; data management; data placement problem; data security; distributed file system; encryption; file server; geo-distributed cloud site; load balanced storage system; low-complexity algorithm; privacy preserving access; random data placement algorithm; responsiveness; storage server; Big data; Cloud computing; Conferences; Data privacy; Random access memory; Security; Servers (ID#:14-2646)
  • Du, Nan; Manjunath, Niveditha; Shuai, Yao; Burger, Danilo; Skorupa, Ilona; Schuffny, Rene; Mayr, Christian; Basov, Dimitri N.; Di Ventra, Massimiliano; Schmidt, Oliver G.; Schmidt, Heidemarie, "Novel Implementation Of Memristive Systems For Data Encryption And Obfuscation," Journal of Applied Physics, vol. 115, no.12, pp.124501,124501-7, Mar 2014. doi: 10.1063/1.4869262 With the rise of big data handling, new solutions are required to drive cryptographic algorithms for maintaining data security. Here, we exploit the nonvolatile, nonlinear resistance change in BiFeO3 memristors [Shuai et al., J. Appl. Phys. 109, 124117 (2011)] by applying a voltage for the generation of second and higher harmonics and develop a new memristor-based encoding system from it to encrypt and obfuscate data. It is found that a BiFeO3 memristor in high and low resistance state can be used to generate two clearly distinguishable sets of second and higher harmonics as recently predicted theoretically [Cohen et al., Appl. Phys. Lett. 100, 133109 (2012)]. The computed autocorrelation of encrypted data using higher harmonics generated by a BiFeO3 memristor shows that the encoded data distribute randomly.
    Keywords: (not provided) (ID#:14-2647)
  • Kaushik, A; Satvika; Gupta, K.; Kumar, A, "Digital Image Chaotic Encryption (DICE - A Partial-Symmetric Key Cipher For Digital Images)," Optimization, Reliability, and Information Technology (ICROIT), 2014 International Conference on, pp.314,317, 6-8 Feb. 2014. doi: 10.1109/ICROIT.2014.6798345 The swift growth of communication facilities and ever decreasing cost of computer hardware has brought tremendous possibilities of expansion for commercial and academic rationales. With widely incremented communique like Internet, not only the good guys, but also bad guys have advantage. The hackers or crackers can take advantage of network vulnerabilities and pose a big threat to network security personnel. The information can be transferred by means of textual data, digital images, videos, animations, etc and thus requires better defense. Especially, the images are more visual and descriptive than textual data; hence they act as a momentous way of communication in the modern world. Protection of the digital images during transmission becomes more serious concern when they are confidential war plans, top-secret weapon photographs, stealthy military data and surreptitious architectural designs of financial buildings, etc. Several mechanisms like cryptography, steganography, hash functions, digital signatures have been designed to provide the ultimate safety for secret data. When the data is in form of digital images; certain features of images like high redundancy, strong correlation between neighboring pixels and abundance in information expression need some extra fortification while transmission. This paper proposes a new cryptographic cipher named Digital Image Chaotic Encryption (DICE) to convene the special requisites of secure image transfer. The strength of DICE lies in its partial-symmetric key nature i.e. even discovery of encryption key by hacker will not guarantee decoding of the original message.
    Keywords: computer network security; cryptography; image processing; DICE ;Internet; digital image chaotic encryption; digital images protection; digital signatures; hash functions; network security personnel; partial-symmetric key cipher; steganography; Algorithm design and analysis; Biomedical imaging; Encryption; Standards; Block cipher; DICE Partial-Symmetric key algorithm; Digital watermarking (ID#:14-2648)
  • Haoliang Lou; Yunlong Ma; Feng Zhang; Min Liu; Weiming Shen, "Data Mining For Privacy Preserving Association Rules Based On Improved MASK Algorithm," Computer Supported Cooperative Work in Design (CSCWD), Proceedings of the 2014 IEEE 18th International Conference on, pp.265,270, 21-23 May 2014. doi: 10.1109/CSCWD.2014.6846853 With the arrival of the big data era, information privacy and security issues become even more crucial. The Mining Associations with Secrecy Konstraints (MASK) algorithm and its improved versions were proposed as data mining approaches for privacy preserving association rules. The MASK algorithm only adopts a data perturbation strategy, which leads to a low privacy-preserving degree. Moreover, it is difficult to apply the MASK algorithm into practices because of its long execution time. This paper proposes a new algorithm based on data perturbation and query restriction (DPQR) to improve the privacy-preserving degree by multi-parameters perturbation. In order to improve the time-efficiency, the calculation to obtain an inverse matrix is simplified by dividing the matrix into blocks; meanwhile, a further optimization is provided to reduce the number of scanning database by set theory. Both theoretical analyses and experiment results prove that the proposed DPQR algorithm has better performance.
    Keywords: data mining; data privacy; matrix algebra; query processing; DPQR algorithm; data mining; data perturbation and query restriction; data perturbation strategy; improved MASK algorithm ;information privacy ;inverse matrix; mining associations with secrecy constraints; privacy preserving association rules; scanning database; security issues; Algorithm design and analysis; Association rules; Data privacy; Itemsets; Time complexity ;Data mining; association rules; multi-parameters perturbation; privacy preservation(ID#:14-2649)
  • Beigh, B.M., "One-stop: A novel hybrid model for intrusion detection system," Computing for Sustainable Global Development (INDIACom), 2014 International Conference on, pp.798,805, 5-7 March 2014. doi: 10.1109/IndiaCom.2014.6828072 Organizations are paying huge amount only for the sake of securing their confidential data from attackers or intruders. But the hackers are Big Bosses and are very sharp enough to crack the security of the organization. Therefore before they made security breach, let us hunt down them and make the alert for organization, so that they can save their confidential data. For the above mentioned purpose, Intrusion detection system came into existence. But the current systems are not capable enough to detect all the attacks coming towards them. In order to fix the problem of detecting novel attacks and reducing number of false alarm, here in this paper, we have proposed a hybrid model for intrusion detection system, which have enhanced quality of detecting the unknown attack via anomaly based detection and also have module which will try to reduce the number of false alarm generated by the system.
    Keywords: security of data; anomaly based detection; confidential data; false alarm reduction; intrusion detection system; one-stop model; organization security; security breach; Databases; Decoding; Engines; Hybrid power systems ;Intrusion detection; Organizations; Intrusion; attack; availability; confidentiality; detection; information; integrity; mitigate (ID#:14-2650)


Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to SoS.Project (at) for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.