Visible to the public Big Data Security Metrics, 2014

SoS Newsletter- Advanced Book Block


SoS Logo

Big Data Security Metrics


Measurement is a hard problem in the Science of Security. When applied to Big Data, the problems of measurement in security systems are compounded. The works cited here address these problems and were presented in 2014. 

Kotenko, I.; Novikova, E., "Visualization of Security Metrics for Cyber Situation Awareness," Availability, Reliability and Security (ARES), 2014 Ninth International Conference on, vol., no., pp. 506 , 513, 8-12 Sept. 2014. doi:10.1109/ARES.2014.75
Abstract: One of the important direction of research in situational awareness is implementation of visual analytics techniques which can be efficiently applied when working with big security data in critical operational domains. The paper considers a visual analytics technique for displaying a set of security metrics used to assess overall network security status and evaluate the efficiency of protection mechanisms. The technique can assist in solving such security tasks which are important for security information and event management (SIEM) systems. The approach suggested is suitable for displaying security metrics of large networks and support historical analysis of the data. To demonstrate and evaluate the usefulness of the proposed technique we implemented a use case corresponding to the Olympic Games scenario.
Keywords: Big Data; computer network security; data analysis; data visualisation; Olympic Games scenario; SIEM systems; big data security; cyber situation awareness; network security status; security information and event management systems; security metric visualization; visual analytics technique; Abstracts; Availability; Layout; Measurement; Security; Visualization; cyber situation awareness; high level metrics visualization; network security level assessment; security information visualization (ID#: 15-5776)


Vaarandi, R.; Pihelgas, M., "Using Security Logs for Collecting and Reporting Technical Security Metrics," Military Communications Conference (MILCOM), 2014 IEEE, vol., no., pp. 294, 299, 6-8 Oct. 2014. doi:10.1109/MILCOM.2014.53
Abstract: During recent years, establishing proper metrics for measuring system security has received increasing attention. Security logs contain vast amounts of information which are essential for creating many security metrics. Unfortunately, security logs are known to be very large, making their analysis a difficult task. Furthermore, recent security metrics research has focused on generic concepts, and the issue of collecting security metrics with log analysis methods has not been well studied. In this paper, we will first focus on using log analysis techniques for collecting technical security metrics from security logs of common types (e.g., Network IDS alarm logs, workstation logs, and Net flow data sets). We will also describe a production framework for collecting and reporting technical security metrics which is based on novel open-source technologies for big data.
Keywords: Big Data; computer network security; big data; log analysis methods; log analysis techniques; open source technology; security logs; technical security metric collection; technical security metric reporting; Correlation; Internet; Measurement; Monitoring; Peer-to-peer computing; Security; Workstations; security log analysis; security metrics (ID#: 15-5777)


Jiang, F.; Luo, D., "A New Coupled Metric Learning for Real-time Anomalies Detection with High-Frequency Field Programmable Gate Arrays," Data Mining Workshop (ICDMW), 2014 IEEE International Conference on, vol., no., pp. 1254, 1261, 14-14 Dec. 2014. doi:10.1109/ICDMW.2014.203
Abstract: Billions of internet end-users and device to device connections contribute to the significant data growth in recent years, large scale, unstructured, heterogeneous data and the corresponding complexity present challenges to the conventional real-time online fraud detection system security. With the advent of big data era, it is expected the data analytic techniques to be much faster and more efficient than ever before. Moreover, one of the challenges with many modern algorithms is that they run too slowly in software to have any practical value. This paper proposes a Field Programmable Gate Array (FPGA) -based intrusion detection system (IDS), driven by a new coupled metric learning to discover the inter- and intra-coupling relationships against the growth of data volumes and item relationship to provide a new approach for efficient anomaly detections. This work is experimented on our previously published NetFlow-based IDS dataset, which is further processed into the categorical data for coupled metric learning purpose. The overall performance of the new hardware system has been further compared with the presence of conventional Bayesian classifier and Support Vector Machines classifier. The experimental results show the very promising performance by considering the coupled metric learning scheme in the FPGA implementation. The false alarm rate is successfully reduced down to 5% while the high detection rate (=99.9%) is maintained.
Keywords: Internet; data analysis; field programmable gate arrays; security of data; support vector machines; Bayesian classifier; FPGA-based intrusion detection system; Internet end-users; NetFlow-based IDS dataset; data analytic techniques; device to device connections; false alarm rate; high-frequency field programmable gate arrays; metric learning; real-time anomalies detection; real-time online fraud detection system security; support vector machines classifier; Field programmable gate arrays; Intrusion detection; Measurement; Neural networks; Real-time systems; Software; Vectors; Metric Learning; Field Programmable Gate Arrays; Netflow; Intrusion Detection Systems (ID#: 15-5778)


Okuno, S.; Asai, H.; Yamana, H., "A Challenge of Authorship Identification for Ten-Thousand-Scale Microblog Users," Big Data (Big Data), 2014 IEEE International Conference on, vol., no., pp. 52, 54, 27-30 Oct. 2014. doi:10.1109/BigData.2014.7004491
Abstract: Internet security issues require authorship identification for all kinds of internet contents; however, authorship identification for microblog users is much harder than other documents because microblog texts are too short. Moreover, when the number of candidates becomes large, i.e., big data, it will take long time to identify. Our proposed method solves these problems. The experimental results show that our method successfully identifies the authorship with 53.2% of precision out of 10,000 microblog users in the almost half execution time of previous method.
Keywords: Big Data; security of data; social networking (online); Internet security issues; authorship identification; big data; microblog texts; ten-thousand-scale microblog users; Big data; Blogs; Computers; Distance measurement; Internet; Security; Training; Twitter; authorship attribution; authorship detection; authorship identification; microblog (ID#: 15-5779)


Yu Liu; Jianwei Niu; Lianjun Yang; Lei Shu, "eBPlatform: An IoT-based System for NCD Patients Homecare in China," Global Communications Conference (GLOBECOM), 2014 IEEE, vol., no., pp. 2448, 2453, 8-12 Dec. 2014. doi:10.1109/GLOCOM.2014.7037175
Abstract: The number of Non-communicable disease (NCD) patients in China is growing rapidly, which is far beyond the capacity of the national health and social security system. Community health stations do not have enough doctors to take care of their patients in traditional ways. In order to establish a bridge between doctors and patients, we propose eBPlatform, which is an information system based on the Internet of Things (IoT) technology for homecare of the NCD patients. The eBox is a sensor which can be deployed in the patient's home for blood pressure measurement, blood sugar measurement and ECG signals collection. Some services are running on the remote server, which can receive the samples, filter and analyze the ECG signals. The uploaded data will be pushed to a web portal, with which doctors provide treatments online. The system requirements, design and implementation of hardware and software are discussed respectively. Finally, we investigate a case study with 50 NCD patients for half a year in Beijing. The results show that eBPlatform can increase the efficiency of the doctor and make a big progress to eliminate the numerical imbalance between community medical practitioners and NCD patients.
Keywords: Internet of Things; blood pressure measurement; diseases; electrocardiography; filtering theory; health care; medical information systems; medical signal processing; portals; signal sampling; Beijing; China; ECG signal analysis; ECG signal collection; ECG signal filtering; IoT-based system; NCD patient homecare; Web portal; blood pressure measurement; blood sugar measurement; community health stations; community medical practitioners; data upload; eBPlatform; eBox; hardware design; hardware implementation; information system; national health; noncommunicable disease patients; numerical imbalance elimination; online treatment; patient care; patient home; remote server; social security system; software design; software implementation; system requirements; Biomedical monitoring; Biosensors; Blood pressure; Electrocardiography; Medical services; Pressure measurement; Servers; IoT application; eHealth; patients homecare; sensor network (ID#: 15-5780)


Gao Hui; Niu Haibo; Luo Wei, "Internet Information Source Discovery Based on Multi-Seeds Cocitation," Security, Pattern Analysis, and Cybernetics (SPAC), 2014 International Conference on, vol., no., pp. 368, 371, 18-19 Oct. 2014. doi:10.1109/SPAC.2014.6982717
Abstract: The technology of Internet information source discovery on specific topic is the groundwork of information acquisition in current big data era. This paper presents a multi-seeds cocitation algorithm to find new Internet information sources. The proposed algorithm is based on cocitation, but what difference with the traditional algorithms is that we use multiple websites on specific topic as input seeds. Then we induce Combined Cocitation Degree(CCD) to measure the relevancy of newly found websites, which is that the new websites have higher combined cocitation degree and are more topic related. Finally a websites collection of the biggest CCD is referred to as the new Internet information sources on the specific topic. The experiments show that the proposed method outperforms traditional algorithms in the scenarios we tested.
Keywords: Big Data; Internet; Web sites; citation analysis; data mining; CCD; Internet information source discovery; Web sites; combined cocitation degree; information acquisition; multiseeds cocitation; relevancy measurement; Algorithm design and analysis; Big data; Charge coupled devices; Google; Internet; Noise; Web pages; big data; cocitation; information source; related website (ID#: 15-5781)


Si-Yuan Jing; Jin Yang; Kun She, "A Parallel Method for Rough Entropy Computation Using MapReduce," Computational Intelligence and Security (CIS), 2014 Tenth International Conference on, vol., no., pp. 707, 710, 15-16 Nov. 2014. doi:10.1109/CIS.2014.41
Abstract: Rough set theory has been proven to be a successful computational intelligence tool. Rough entropy is a basic concept in rough set theory and it is usually used to measure the roughness of information set. Existing algorithms can only deal with small data set. Therefore, this paper proposes a method for parallel computation of entropy using MapReduce, which is hot in big data mining. Moreover, corresponding algorithm is also put forward to handle big data set. Experimental results show that the proposed parallel method is effective.
Keywords: Big Data; data mining; entropy; mathematics computing; parallel programming; rough set theory; MapReduce; big data mining; big data set handling; computational intelligence tool; information set roughness measurement; parallel computation method; rough entropy computation; rough set theory; Big data; Clustering algorithms; Computers; Data mining; Entropy; Information entropy; Set theory; Data Mining; Entropy; Hadoop; MapReduce; Rough set theory (ID#: 15-5782)


Agrawal, R.; Imran, A.; Seay, C.; Walker, J., "A Layer Based Architecture for Provenance in Big Data," Big Data (Big Data), 2014 IEEE International Conference on, vol., no., pp. 1, 7, 27-30 Oct. 2014. doi:10.1109/BigData.2014.7004468
Abstract: Big data is a new technology wave that makes the world awash in data. Various organizations accumulate data that are difficult to exploit. Government databases, social media, healthcare databases etc. are the examples of the big data. Big data covers absorbing and analyzing huge amount of data that may have originated or processed outside of the organization. Data provenance can be defined as origin and process of data. It carries significant information of a system. It can be useful for debugging, auditing, measuring performance and trust in data. Data provenance in big data is relatively unexplored topic. It is necessary to appropriately track the creation and collection process of the data to provide context and reproducibility. In this paper, we propose an intuitive layer based architecture of data provenance and visualization. In addition, we show a complete workflow of tracking provenance information of big data.
Keywords: Big Data; data visualisation; software architecture; auditing; data analysis; data origin; data processing; data provenance; data trust; data visualization; debugging; government databases; healthcare databases; layer based architecture; performance measurement; social media; system information; Big data; Computer architecture; Data models; Data visualization; Databases; Educational institutions; Security; Big data; Provenance; Query; Visualization (ID#: 15-5783)


Kiss, I.; Genge, B.; Haller, P.; Sebestyen, G., "Data Clustering-Based Anomaly Detection in Industrial Control Systems," Intelligent Computer Communication and Processing (ICCP), 2014 IEEE International Conference on, vol., no., pp. 275, 281, 4-6 Sept. 2014. doi:10.1109/ICCP.2014.6937009
Abstract: Modern Networked Critical Infrastructures (NCI), involving cyber and physical systems, are exposed to intelligent cyber attacks targeting the stable operation of these systems. In order to ensure anomaly awareness, the observed data can be used in accordance with data mining techniques to develop Intrusion Detection Systems (IDS) or Anomaly Detection Systems (ADS). There is an increase in the volume of sensor data generated by both cyber and physical sensors, so there is a need to apply Big Data technologies for real-time analysis of large data sets. In this paper, we propose a clustering based approach for detecting cyber attacks that cause anomalies in NCI. Various clustering techniques are explored to choose the most suitable for clustering the time-series data features, thus classifying the states and potential cyber attacks to the physical system. The Hadoop implementation of MapReduce paradigm is used to provide a suitable processing environment for large datasets. A case study on a NCI consisting of multiple gas compressor stations is presented.
Keywords: Big Data; control engineering computing; critical infrastructures; data mining; industrial control; pattern clustering; real-time systems; security of data; ADS; Big Data technology; Hadoop implementation; IDS; MapReduce paradigm; NCI; anomaly awareness; anomaly detection systems; clustering techniques; cyber and physical systems; cyber attack detection; cyber sensor; data clustering-based anomaly detection; data mining techniques; industrial control systems; intelligent cyber attacks; intrusion detection systems; large data sets; modern networked critical infrastructures; multiple gas compressor stations; physical sensor; real-time analysis; sensor data; time-series data feature; Big data; Clustering algorithms; Data mining; Density measurement; Security; Temperature measurement; Vectors; anomaly detection; big data; clustering; cyber-physical security; intrusion detection (ID#: 15-5784)


Singhal, Rekha; Nambiar, Manoj; Sukhwani, Harish; Trivedi, Kishor, "Performability Comparison of Lustre and HDFS for MR Applications," Software Reliability Engineering Workshops (ISSREW), 2014 IEEE International Symposium on, vol., no., pp. 51, 51, 3-6 Nov. 2014. doi:10.1109/ISSREW.2014.115
Abstract: With its simple principles to achieve parallelism and fault tolerance, the Map-reduce framework has captured wide attention, from traditional high performance computing to marketing organizations. The most popular open source implementation of this framework is Hadoop. Today, the Hadoop stack comprises of various software components including the Hadoop Distributed File System (HDFS), the distributed storage layer amongst others such as GPFS and WASB. The traditional high performance computing has always been at the forefront of developing and deploying cutting edge technology and solutions such as Lustre, a Parallel IO file systems, to meet its ever growing need. To support new and upcoming use cases, there is a focus on tighter integration of Hadoop with existing HPC stacks. In this paper, we share our work on one such integration by analyzing an FSI workload built using map reduce framework and evaluating the performance and reliability of the application on an integrated stack with Hadoop and Lustre through Hadoop extensions such as Hadoop Adapter for Lustre (HAL) and HPC Adapter for MapReduce (HAM) developed by Intel, while comparing the performance against the Hadoop Distributed File System (HDFS). We also carried out perform ability analysis of both the systems, where HDFS ensures reliability using replication factor and Lustre does not replicate any data but ensures reliability by having multiple OSSs connecting to multiple OSTs. The environment used for this evaluation is a 16 nodes HDDP cluster hosted in the Intel Big Data Lab in Swindon (UK). The cluster was divided into two clusters. One 8 node cluster was set up with CDH 5.0.2 and HDFS and another 8 node was set up with CDH 5.0.2 connected to Lustre through Intel HAL. We use Intel Enteprise Edition for Lustre 2.0 for the experiment based on Lustre 2.5. The Lustre setup includes 1 Meta Data Server (MDS) with 1 Meta Data Target (MDT) and 1 Management Target (MGT) and 4 Object Storage Servers (OSSs) with - 6 Object Storage Targets (OSTs). Both the systems were evaluated on performance metric 'average query response time' for FSI workload. The data is generated based on FSI application schema while MR jobs are written for few functionalities/queries of the FSI application which are used for the evaluation exercise. Apart from single query execution, both the systems were evaluated for concurrent workload as well. Tests were run for application data volumes varying from 100 GB to 7 TB. From our experiments, with appropriate tuning of Lustre file system, we observe that MR applications on Lustre platform perform at least twice better than that on HDFS. We conducted perform ability analysis of both the systems using Markov Reward Model. We propose linear extrapolation for estimating average query execution time for states exhibiting failure for some nodes and calculated the perform ability with reward for working states as the average query execution time. We assume that the time to failure, detect failure, and repair of both compute nodes as well data nodes are exponentially distributed, and took reasonable parameter values for the same. From our analysis, Expected query execution time for MR applications on Lustre file platform is at least half that of the applications on HDFS platform.
Keywords: Artificial neural networks; Disk drives; File systems; Measurement; Random access memory; Security; Switches; HDFS; LUSTRE; MR applications; Performability; Performance; Query Execution Time (ID#: 15-5785)


Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.