Visible to the public Biblio

Filters: Keyword is metadata  [Clear All Filters]
2021-05-18
Wei, Hanlin, Bai, Guangdong, Luo, Zongwei.  2020.  Foggy: A New Anonymous Communication Architecture Based on Microservices. 2020 25th International Conference on Engineering of Complex Computer Systems (ICECCS). :135–144.
This paper presents Foggy, an anonymous communication system focusing on providing users with anonymous web browsing. Foggy provides a microservice-based proxy for web browsing and other low-latency network activities without exposing users' metadata and browsed content to adversaries. It is designed with decentralized information management, web caching, and configurable service selection. Although Foggy seems to be more centralized compared with Tor, it gains an advantage in manageability while retaining anonymity. Foggy can be deployed by several agencies to become more decentralized. We prototype Foggy and test its performance. Our experiments show Foggy's low latency and deployability, demonstrating its potential to be a commercial solution for real-world deployment.
Niloy, Nishat Tasnim, Islam, Md. Shariful.  2020.  IntellCache: An Intelligent Web Caching Scheme for Multimedia Contents. 2020 Joint 9th International Conference on Informatics, Electronics Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision Pattern Recognition (icIVPR). :1–6.
The traditional reactive web caching system is getting less popular day by day due to its inefficiency in handling the overwhelming requests for multimedia content. An intelligent web caching system intends to take optimal cache decisions by predicting future popular contents (FPC) proactively. In recent years, a few approaches have proposed some intelligent caching system where they were concerned about proactive caching. Those works intensified the importance of FPC prediction using the prediction models. However, only FPC prediction may not help to get the optimal solution in every scenario. In this paper, a technique named IntellCache has been proposed that increases the caching efficiency by taking a cache decision i.e. content storing decision before storing the predicted FPC. Different deep learning models such as- multilayer perceptron (MLP), Long short-term memory (LSTM) of Recurrent Neural Network (RNN) and ConvLSTM a combination of LSTM and Convolutional Neural Network (CNN) are compared to identify the most efficient model for FPC. The information on the contents of 18 years from the MovieLens data repository has been mined to evaluate the proposed approach. Results show that this proposed scheme outperforms previous solutions by achieving a higher cache hit ratio and lower average delay and thus, ensures users' satisfaction.
2021-05-13
Jaafar, Fehmi, Avellaneda, Florent, Alikacem, El-Hackemi.  2020.  Demystifying the Cyber Attribution: An Exploratory Study. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :35–40.
Current cyber attribution approaches proposed to use a variety of datasets and analytical techniques to distill the information that will be useful to identify cyber attackers. In contrast, practitioners and researchers in cyber attribution face several technical and regulation challenges. In this paper, we describe the main challenges of cyber attribution and present a state of the art of used approaches to face these challenges. Then, we will present an exploratory study to perform cyber attacks attribution based on pattern recognition from real data. In our study, we are using attack pattern discovery and identification based on real data collection and analysis.
2021-05-05
Tabiban, Azadeh, Jarraya, Yosr, Zhang, Mengyuan, Pourzandi, Makan, Wang, Lingyu, Debbabi, Mourad.  2020.  Catching Falling Dominoes: Cloud Management-Level Provenance Analysis with Application to OpenStack. 2020 IEEE Conference on Communications and Network Security (CNS). :1—9.

The dynamicity and complexity of clouds highlight the importance of automated root cause analysis solutions for explaining what might have caused a security incident. Most existing works focus on either locating malfunctioning clouds components, e.g., switches, or tracing changes at lower abstraction levels, e.g., system calls. On the other hand, a management-level solution can provide a big picture about the root cause in a more scalable manner. In this paper, we propose DOMINOCATCHER, a novel provenance-based solution for explaining the root cause of security incidents in terms of management operations in clouds. Specifically, we first define our provenance model to capture the interdependencies between cloud management operations, virtual resources and inputs. Based on this model, we design a framework to intercept cloud management operations and to extract and prune provenance metadata. We implement DOMINOCATCHER on OpenStack platform as an attached middleware and validate its effectiveness using security incidents based on real-world attacks. We also evaluate the performance through experiments on our testbed, and the results demonstrate that DOMINOCATCHER incurs insignificant overhead and is scalable for clouds.

2021-03-22
OGISO, S., Mohri, M., Shiraishi, Y..  2020.  Transparent Provable Data Possession Scheme for Cloud Storage. 2020 International Symposium on Networks, Computers and Communications (ISNCC). :1–5.
Provable Data Possession (PDP) is one of the data security techniques to make sure that the data stored in the cloud storage exists. In PDP, the integrity of the data stored in the cloud storage is probabilistically verified by the user or a third-party auditor. In the conventional PDP, the user creates the metadata used for audition. From the viewpoint of user convenience, it is desirable to be able to audit without operations other than uploading. In other words, the challenge is to provide a transparent PDP that verifies the integrity of files according to the general cloud storage system model so as not to add operations to users. We propose a scheme in which the cloud generates the metadata used during verification, and the user only uploads files. It is shown that the proposed scheme is resistant to the forgery of cloud proof and the acquisition of data by a third-party auditor.
2021-03-15
Perkins, J., Eikenberry, J., Coglio, A., Rinard, M..  2020.  Comprehensive Java Metadata Tracking for Attack Detection and Repair. 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :39—51.

We present ClearTrack, a system that tracks meta-data for each primitive value in Java programs to detect and nullify a range of vulnerabilities such as integer overflow/underflow and SQL/command injection vulnerabilities. Contributions include new techniques for eliminating false positives associated with benign integer overflows and underflows, new metadata-aware techniques for detecting and nullifying SQL/command command injection attacks, and results from an independent evaluation team. These results show that 1) ClearTrack operates successfully on Java programs comprising hundreds of thousands of lines of code (including instrumented jar files and Java system libraries, the majority of the applications comprise over 3 million lines of code), 2) because of computations such as cryptography and hash table calculations, these applications perform millions of benign integer overflows and underflows, and 3) ClearTrack successfully detects and nullifies all tested integer overflow and underflow and SQL/command injection vulnerabilities in the benchmark applications.

2021-02-23
Patil, A., Jha, A., Mulla, M. M., Narayan, D. G., Kengond, S..  2020.  Data Provenance Assurance for Cloud Storage Using Blockchain. 2020 International Conference on Advances in Computing, Communication Materials (ICACCM). :443—448.

Cloud forensics investigates the crime committed over cloud infrastructures like SLA-violations and storage privacy. Cloud storage forensics is the process of recording the history of the creation and operations performed on a cloud data object and investing it. Secure data provenance in the Cloud is crucial for data accountability, forensics, and privacy. Towards this, we present a Cloud-based data provenance framework using Blockchain, which traces data record operations and generates provenance data. Initially, we design a dropbox like application using AWS S3 storage. The application creates a cloud storage application for the students and faculty of the university, thereby making the storage and sharing of work and resources efficient. Later, we design a data provenance mechanism for confidential files of users using Ethereum blockchain. We also evaluate the proposed system using performance parameters like query and transaction latency by varying the load and number of nodes of the blockchain network.

2021-01-11
Papadogiannaki, E., Deyannis, D., Ioannidis, S..  2020.  Head(er)Hunter: Fast Intrusion Detection using Packet Metadata Signatures. 2020 IEEE 25th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD). :1–6.
More than 75% of the Internet traffic is now encrypted, while this percentage is constantly increasing. The majority of communications are secured using common encryption protocols such as SSL/TLS and IPsec to ensure security and protect the privacy of Internet users. Yet, encryption can be exploited to hide malicious activities. Traditionally, network traffic inspection is based on techniques like deep packet inspection (DPI). Common applications for DPI include but are not limited to firewalls, intrusion detection and prevention systems, L7 filtering and packet forwarding. The core functionality of such DPI implementations is based on pattern matching that enables searching for specific strings or regular expressions inside the packet contents. With the widespread adoption of network encryption though, DPI tools that rely on packet payload content are becoming less effective, demanding the development of more sophisticated techniques in order to adapt to current network encryption trends. In this work, we present HeaderHunter, a fast signature-based intrusion detection system even in encrypted network traffic. We generate signatures using only network packet metadata extracted from packet headers. Also, to cope with the ever increasing network speeds, we accelerate the inner computations of our proposed system using off-the-shelf GPUs.
2020-12-11
Sabek, I., Chandramouli, B., Minhas, U. F..  2019.  CRA: Enabling Data-Intensive Applications in Containerized Environments. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1762—1765.
Today, a modern data center hosts a wide variety of applications comprising batch, interactive, machine learning, and streaming applications. In this paper, we factor out the commonalities in a large majority of these applications, into a generic dataflow layer called Common Runtime for Applications (CRA). In parallel, another trend, with containerization technologies (e.g., Docker), has taken a serious hold on cloud-scale data centers, with direct implications on building next generation of data center applications. Container orchestrators (e.g., Kubernetes) have made deployment a lot easy, and they solve many infrastructure level problems, e.g., service discovery, auto-restart, and replication. For best in class performance, there is a need to marry the next generation applications with containerization technologies. To that end, CRA leverages and builds upon the containerization and resource orchestration capabilities of Kubernetes/Docker, and makes it easy to build a wide range of cloud-edge applications on top. To the best of our knowledge, we are the first to present a cloud native runtime for building data center applications. We show the efficiency of CRA through various micro-benchmarking experiments.
Kumar, S., Vasthimal, D. K..  2019.  Raw Cardinality Information Discovery for Big Datasets. 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :200—205.
Real-time discovery of all different types of unique attributes within unstructured data is a challenging problem to solve when dealing with multiple petabytes of unstructured data volume everyday. Popular discovery solutions such as the creation of offline jobs to uniquely identify attributes or running aggregation queries on raw data sets limits real time discovery use-cases and often results into poor resource utilization. The discovery information must be treated as a parallel problem to just storing raw data sets efficiently onto back-end big data systems. Solving the discovery problem by creating a parallel discovery data store infrastructure has multiple benefits as it allows such to channel the actual search queries against the raw data set in much more funneled manner instead of being widespread across the entire data sets. Such focused search queries and data separation are far more performant and requires less compute and memory footprint.
Correia, A., Fonseca, B., Paredes, H., Schneider, D., Jameel, S..  2019.  Development of a Crowd-Powered System Architecture for Knowledge Discovery in Scientific Domains. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :1372—1377.
A substantial amount of work is often overlooked due to the exponential rate of growth in global scientific output across all disciplines. Current approaches for addressing this issue are usually limited in scope and often restrict the possibility of obtaining multidisciplinary views in practice. To tackle this problem, researchers can now leverage an ecosystem of citizens, volunteers and crowd workers to perform complex tasks that are either difficult for humans and machines to solve alone. Motivated by the idea that human crowds and computer algorithms have complementary strengths, we present an approach where the machine will learn from crowd behavior in an iterative way. This approach is embodied in the architecture of SciCrowd, a crowd-powered human-machine hybrid system designed to improve the analysis and processing of large amounts of publication records. To validate the proposal's feasibility, a prototype was developed and an initial evaluation was conducted to measure its robustness and reliability. We conclude this paper with a set of implications for design.
Xie, J., Zhang, M., Ma, Y..  2019.  Using Format Migration and Preservation Metadata to Support Digital Preservation of Scientific Data. 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). :1—6.

With the development of e-Science and data intensive scientific discovery, it needs to ensure scientific data available for the long-term, with the goal that the valuable scientific data should be discovered and re-used for downstream investigations, either alone, or in combination with newly generated data. As such, the preservation of scientific data enables that not only might experiment be reproducible and verifiable, but also new questions can be raised by other scientists to promote research and innovation. In this paper, we focus on the two main problems of digital preservation that are format migration and preservation metadata. Format migration includes both format verification and object transformation. The system architecture of format migration and preservation metadata is presented, mapping rules of object transformation are analyzed, data fixity and integrity and authenticity, digital signature and so on are discussed and an example is shown in detail.

Zhang, W., Byna, S., Niu, C., Chen, Y..  2019.  Exploring Metadata Search Essentials for Scientific Data Management. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). :83—92.

Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.

2020-11-30
Georgakopoulos, D..  2019.  A Global IoT Device Discovery and Integration Vision. 2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC). :214–221.
This paper presents the vision of establishing a global service for Global IoT Device Discovery and Integration (GIDDI). The establishment of a GIDDI will: (1) make IoT application development more efficient and cost-effective via enabling sharing and reuse of existing IoT devices owned and maintained by different providers, and (2) promote deployment of new IoT devices supported by a revenue generation scheme for their providers. More specifically, this paper proposes a distributed IoT blockchain ledger that is specifically designed for managing the metadata needed to describe IoT devices and the data they produce. This GIDDI Blockchain is Internet-owned (i.e., it is not controlled by any individual or organization) and is Internet-scaled (i.e., it can support the discovery and reuse billions of IoT devices). The paper also proposes a GIDDI Marketplace that provides the functionality needed for IoT device registration, query, integration, payment and security via the proposed GIDDI Blockchain. We outline the GIDDI Blockchain and Marketplace implementation. We also discuss ongoing research for automatically mining the IoT Device metadata needed for IoT Device query and integration from the data produce. This significantly reduces the need for IoT device providers to supply the metadata descriptions the devices and the data they produce during the registration of IoT Devices in the GIDDI Blockchain.
2020-09-14
Wu, Pengfei, Deng, Robert, Shen, Qingni, Liu, Ximeng, Li, Qi, Wu, Zhonghai.  2019.  ObliComm: Towards Building an Efficient Oblivious Communication System. IEEE Transactions on Dependable and Secure Computing. :1–1.
Anonymous Communication (AC) hides traffic patterns and protects message metadata from being leaked during message transmission. Many practical AC systems have been proposed aiming to reduce communication latency and support a large number of users. However, how to design AC systems which possess strong security property and at the same time achieve optimal performance (i.e., the lowest latency or highest horizontal scalability) has been a challenging problem. In this paper, we propose an ObliComm framework, which consists of six modular AC subroutines. We also present a strong security definition for AC, named oblivious communication, encompassing confidentiality, unobservability, and a new requirement sending-and-receiving operation hiding. The AC subroutines in ObliComm allow for modular construction of oblivious communication systems in different network topologies. All constructed systems satisfy oblivious communication definition and can be provably secure in the universal composability (UC) framework. Additionally, we model the relationship between the network topology and communication measurements by queuing theory, which enables the system's efficiency can be optimized and estimated by quantitative analysis and calculation. Through theoretical analyses and empirical experiments, we demonstrate the efficiency of our scheme and soundness of the queuing model.
2020-07-10
Ra, Gyeong-Jin, Lee, Im-Yeong.  2019.  A Study on Hybrid Blockchain-based XGS (XOR Global State) Injection Technology for Efficient Contents Modification and Deletion. 2019 Sixth International Conference on Software Defined Systems (SDS). :300—305.

Blockchain is a database technology that provides the integrity and trust of the system can't make arbitrary modifications and deletions by being an append-only distributed ledger. That is, the blockchain is not a modification or deletion but a CRAB (Create-Retrieve-Append-Burn) method in which data can be read and written according to a legitimate user's access right(For example, owner private key). However, this can not delete the created data once, which causes problems such as privacy breach. In this paper, we propose an on-off block-chained Hybrid Blockchain system to separate the data and save the connection history to the blockchain. In addition, the state is changed to the distributed database separately from the ledger record, and the state is changed by generating the arbitrary injection in the XOR form, so that the history of modification / deletion of the Off Blockchain can be efficiently retrieved.

2020-06-01
Nikolaidis, Fotios, Kossifidis, Nick, Leibovici, Thomas, Zertal, Soraya.  2018.  Towards a TRansparent I/O Solution. 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). :1221–1228.
The benefits of data distribution to multiple storage platforms with different characteristics have been widely acknowledged. Such systems are more tolerant to outages and bottlenecks and allow for more flexible policies regarding cost reduction, security and workload diversity. To leverage platforms simultaneously additional orchestration steps are needed. Existing approaches either implement such steps in the application's source code, resulting to minimum reusability across applications, or handle them at the infrastructure level. The latter usually involves over-engineering to handle different application behaviors and binds the system to a specific infrastructure. In this paper we present a middle-ware that decouples the I/O path from the application's source code and performs in-transit processing before data lands on the storage platforms. Abstracting the I/O process as a graph of reusable components allows the developers to easily implement complex storage solutions without the burden of writing custom code. Similarly, the administrators can create their own graph that reflects the infrastructure setup and append it to the preceding graph, so that various policies and infrastructure-related changes can be performed transparently to the application. Users can also extend the graph chain to enhance the application's functionality by using plug-ins. Our approach eliminates the need for custom I/O management code and allows for the applications to evolve independently of the storage back-end. To evaluate our system we employed a secure web service scenario that was seamlessly adapted to the changes in its storage back-end.
2020-04-03
Singi, Kapil, Kaulgud, Vikrant, Bose, R.P. Jagadeesh Chandra, Podder, Sanjay.  2019.  CAG: Compliance Adherence and Governance in Software Delivery Using Blockchain. 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB). :32—39.

The software development life cycle (SDLC) starts with business and functional specifications signed with a client. In addition to this, the specifications also capture policy / procedure / contractual / regulatory / legislation / standard compliances with respect to a given client industry. The SDLC must adhere to service level agreements (SLAs) while being compliant to development activities, processes, tools, frameworks, and reuse of open-source software components. In today's world, global software development happens across geographically distributed (autonomous) teams consuming extraordinary amounts of open source components drawn from a variety of disparate sources. Although this is helping organizations deal with technical and economic challenges, it is also increasing unintended risks, e.g., use of a non-complaint license software might lead to copyright issues and litigations, use of a library with vulnerabilities pose security risks etc. Mitigation of such risks and remedial measures is a challenge due to lack of visibility and transparency of activities across these distributed teams as they mostly operate in silos. We believe a unified model that non-invasively monitors and analyzes the activities of distributed teams will help a long way in building software that adhere to various compliances. In this paper, we propose a decentralized CAG - Compliance Adherence and Governance framework using blockchain technologies. Our framework (i) enables the capturing of required data points based on compliance specifications, (ii) analyzes the events for non-conformant behavior through smart contracts, (iii) provides real-time alerts, and (iv) records and maintains an immutable audit trail of various activities.

2020-03-30
Mao, Huajian, Chi, Chenyang, Yu, Jinghui, Yang, Peixiang, Qian, Cheng, Zhao, Dongsheng.  2019.  QRStream: A Secure and Convenient Method for Text Healthcare Data Transferring. 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). :3458–3462.
With the increasing of health awareness, the users become more and more interested in their daily health information and healthcare activities results from healthcare organizations. They always try to collect them together for better usage. Traditionally, the healthcare data is always delivered by paper format from the healthcare organizations, and it is not easy and convenient for data usage and management. They would have to translate these data on paper to digital version which would probably introduce mistakes into the data. It would be necessary if there is a secure and convenient method for electronic health data transferring between the users and the healthcare organizations. However, for the security and privacy problems, almost no healthcare organization provides a stable and full service for health data delivery. In this paper, we propose a secure and convenient method, QRStream, which splits original health data and loads them onto QR code frame streaming for the data transferring. The results shows that QRStream can transfer text health data smoothly with an acceptable performance, for example, transferring 10K data in 10 seconds.
Bharati, Aparna, Moreira, Daniel, Brogan, Joel, Hale, Patricia, Bowyer, Kevin, Flynn, Patrick, Rocha, Anderson, Scheirer, Walter.  2019.  Beyond Pixels: Image Provenance Analysis Leveraging Metadata. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). :1692–1702.
Creative works, whether paintings or memes, follow unique journeys that result in their final form. Understanding these journeys, a process known as "provenance analysis," provides rich insights into the use, motivation, and authenticity underlying any given work. The application of this type of study to the expanse of unregulated content on the Internet is what we consider in this paper. Provenance analysis provides a snapshot of the chronology and validity of content as it is uploaded, re-uploaded, and modified over time. Although still in its infancy, automated provenance analysis for online multimedia is already being applied to different types of content. Most current works seek to build provenance graphs based on the shared content between images or videos. This can be a computationally expensive task, especially when considering the vast influx of content that the Internet sees every day. Utilizing non-content-based information, such as timestamps, geotags, and camera IDs can help provide important insights into the path a particular image or video has traveled during its time on the Internet without large computational overhead. This paper tests the scope and applicability of metadata-based inferences for provenance graph construction in two different scenarios: digital image forensics and cultural analytics.
Miao, Hui, Deshpande, Amol.  2019.  Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1710–1713.
Increasingly modern data science platforms today have non-intrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to represent and manipulate the stored provenance/context information. Due to the schema-later nature of the metadata, multiple versions of the same files, and unfamiliar artifacts introduced by team members, the resulting "provenance graphs" are quite verbose and evolving; further, it is very difficult for the users to compose queries and utilize this valuable information just using standard graph query model. In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. First, we introduce a graph segmentation operator, which queries the retrospective provenance between a set of source vertices and a set of destination vertices via flexible boundary criteria to help users get insight about the derivation relationships among those vertices. We show the semantics of such a query in terms of a context-free grammar, and develop efficient algorithms that run orders of magnitude faster than state-of-the-art. Second, we propose a graph summarization operator that combines similar segments together to query prospective provenance of the underlying project. The operator allows tuning the summary by ignoring vertex details and characterizing local structures, and ensures the provenance meaning using path constraints. We show the optimal summary problem is PSPACE-complete and develop effective approximation algorithms. We implement the operators on top of Neo4j, evaluate our query techniques extensively, and show the effectiveness and efficiency of the proposed methods.
2020-02-18
Hasslinger, Gerhard, Ntougias, Konstantinos, Hasslinger, Frank, Hohlfeld, Oliver.  2019.  Fast and Efficient Web Caching Methods Regarding the Size and Performance Measures per Data Object. 2019 IEEE 24th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD). :1–7.

Caching methods are developed since 50 years for paging in CPU and database systems, and since 25 years for web caching as main application areas among others. Pages of unique size are usual in CPU caches, whereas web caches are storing data chunks of different size in a widely varying range. We study the impact of different object sizes on the performance and the overhead of web caching. This entails different caching goals, starting from the byte and object hit ratio to a generalized value hit ratio for optimized costs and benefits of caching regarding traffic engineering (TE), reduced delays and other QoS measures. The selection of the cache contents turns out to be crucial for the web cache efficiency with awareness of the size and other properties in a score for each object. We introduce a new class of rank exchange caching methods and show how their performance compares to other strategies with extensions needed to include the size and scores for QoS and TE caching goals. Finally, we derive bounds on the object, byte and value hit ratio for the independent request model (IRM) based on optimum knapsack solutions of the cache content.

2020-02-17
Ezick, James, Henretty, Tom, Baskaran, Muthu, Lethin, Richard, Feo, John, Tuan, Tai-Ching, Coley, Christopher, Leonard, Leslie, Agrawal, Rajeev, Parsons, Ben et al..  2019.  Combining Tensor Decompositions and Graph Analytics to Provide Cyber Situational Awareness at HPC Scale. 2019 IEEE High Performance Extreme Computing Conference (HPEC). :1–7.

This paper describes MADHAT (Multidimensional Anomaly Detection fusing HPC, Analytics, and Tensors), an integrated workflow that demonstrates the applicability of HPC resources to the problem of maintaining cyber situational awareness. MADHAT combines two high-performance packages: ENSIGN for large-scale sparse tensor decompositions and HAGGLE for graph analytics. Tensor decompositions isolate coherent patterns of network behavior in ways that common clustering methods based on distance metrics cannot. Parallelized graph analysis then uses directed queries on a representation that combines the elements of identified patterns with other available information (such as additional log fields, domain knowledge, network topology, whitelists and blacklists, prior feedback, and published alerts) to confirm or reject a threat hypothesis, collect context, and raise alerts. MADHAT was developed using the collaborative HPC Architecture for Cyber Situational Awareness (HACSAW) research environment and evaluated on structured network sensor logs collected from Defense Research and Engineering Network (DREN) sites using HPC resources at the U.S. Army Engineer Research and Development Center DoD Supercomputing Resource Center (ERDC DSRC). To date, MADHAT has analyzed logs with over 650 million entries.

2020-02-10
Prout, Andrew, Arcand, William, Bestor, David, Bergeron, Bill, Byun, Chansup, Gadepally, Vijay, Houle, Michael, Hubbell, Matthew, Jones, Michael, Klein, Anna et al..  2019.  Securing HPC using Federated Authentication. 2019 IEEE High Performance Extreme Computing Conference (HPEC). :1–7.
Federated authentication can drastically reduce the overhead of basic account maintenance while simultaneously improving overall system security. Integrating with the user's more frequently used account at their primary organization both provides a better experience to the end user and makes account compromise or changes in affiliation more likely to be noticed and acted upon. Additionally, with many organizations transitioning to multi-factor authentication for all account access, the ability to leverage external federated identity management systems provides the benefit of their efforts without the additional overhead of separately implementing a distinct multi-factor authentication process. This paper describes our experiences and the lessons we learned by enabling federated authentication with the U.S. Government PKI and In Common Federation, scaling it up to the user base of a production HPC system, and the motivations behind those choices. We have received only positive feedback from our users.
2019-12-02
Burow, Nathan, Zhang, Xinping, Payer, Mathias.  2019.  SoK: Shining Light on Shadow Stacks. 2019 IEEE Symposium on Security and Privacy (SP). :985–999.

Control-Flow Hijacking attacks are the dominant attack vector against C/C++ programs. Control-Flow Integrity (CFI) solutions mitigate these attacks on the forward edge, i.e., indirect calls through function pointers and virtual calls. Protecting the backward edge is left to stack canaries, which are easily bypassed through information leaks. Shadow Stacks are a fully precise mechanism for protecting backwards edges, and should be deployed with CFI mitigations. We present a comprehensive analysis of all possible shadow stack mechanisms along three axes: performance, compatibility, and security. For performance comparisons we use SPEC CPU2006, while security and compatibility are qualitatively analyzed. Based on our study, we renew calls for a shadow stack design that leverages a dedicated register, resulting in low performance overhead, and minimal memory overhead, but sacrifices compatibility. We present case studies of our implementation of such a design, Shadesmar, on Phoronix and Apache to demonstrate the feasibility of dedicating a general purpose register to a security monitor on modern architectures, and Shadesmar's deployability. Our comprehensive analysis, including detailed case studies for our novel design, allows compiler designers and practitioners to select the correct shadow stack design for different usage scenarios. Shadow stacks belong to the class of defense mechanisms that require metadata about the program's state to enforce their defense policies. Protecting this metadata for deployed mitigations requires in-process isolation of a segment of the virtual address space. Prior work on defenses in this class has relied on information hiding to protect metadata. We show that stronger guarantees are possible by repurposing two new Intel x86 extensions for memory protection (MPX), and page table control (MPK). Building on our isolation efforts with MPX and MPK, we present the design requirements for a dedicated hardware mechanism to support intra-process memory isolation, and discuss how such a mechanism can empower the next wave of highly precise software security mitigations that rely on partially isolated information in a process.