Visible to the public Biblio

Filters: Keyword is Internet-scale Computing Security  [Clear All Filters]
Kalyanaraman, A., Halappanavar, M..  2018.  Guest Editorial: Advances in Parallel Graph Processing: Algorithms, Architectures, and Application Frameworks. IEEE Transactions on Multi-Scale Computing Systems. 4:188—189.

The papers in this special section explore recent advancements in parallel graph processing. In the sphere of modern data science and data-driven applications, graph algorithms have achieved a pivotal place in advancing the state of scientific discovery and knowledge. Nearly three centuries of ideas have made graph theory and its applications a mature area in computational sciences. Yet, today we find ourselves at a crossroads between theory and application. Spurred by the digital revolution, data from a diverse range of high throughput channels and devices, from across internet-scale applications, are starting to mark a new era in data-driven computing and discovery. Building robust graph models and implementing scalable graph application frameworks in the context of this new era are proving to be significant challenges. Concomitant to the digital revolution, we have also experienced an explosion in computing architectures, with a broad range of multicores, manycores, heterogeneous platforms, and hardware accelerators (CPUs, GPUs) being actively developed and deployed within servers and multinode clusters. Recent advances have started to show that in more than one way, these two fields—graph theory and architectures–are capable of benefiting and in fact spurring new research directions in one another. This special section is aimed at introducing some of the new avenues of cutting-edge research happening at the intersection of graph algorithm design and their implementation on advanced parallel architectures.

Xu, W., Peng, Y..  2018.  SharaBLE: A Software Framework for Shared Usage of BLE Devices over the Internet. 2018 IEEE 29th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC). :381—385.

With the development of Internet of Things, numerous IoT devices have been brought into our daily lives. Bluetooth Low Energy (BLE), due to the low energy consumption and generic service stack, has become one of the most popular wireless communication technologies for IoT. However, because of the short communication range and exclusive connection pattern, a BLE-equipped device can only be used by a single user near the device. To fully explore the benefits of BLE and make BLE-equipped devices truly accessible over the Internet as IoT devices, in this paper, we propose a cloud-based software framework that can enable multiple users to interact with various BLE IoT devices over the Internet. This framework includes an agent program, a suite of services hosting in cloud, and a set of RESTful APIs exposed to Internet users. Given the availability of this framework, the access to BLE devices can be extended from local to the Internet scale without any software or hardware changes to BLE devices, and more importantly, shared usage of remote BLE devices over the Internet is also made available.

Li, W., Guo, D., Li, K., Qi, H., Zhang, J..  2018.  iDaaS: Inter-Datacenter Network as a Service. IEEE Transactions on Parallel and Distributed Systems. 29:1515—1529.

Increasing number of Internet-scale applications, such as video streaming, incur huge amount of wide area traffic. Such traffic over the unreliable Internet without bandwidth guarantee suffers unpredictable network performance. This result, however, is unappealing to the application providers. Fortunately, Internet giants like Google and Microsoft are increasingly deploying their private wide area networks (WANs) to connect their global datacenters. Such high-speed private WANs are reliable, and can provide predictable network performance. In this paper, we propose a new type of service-inter-datacenter network as a service (iDaaS), where traditional application providers can reserve bandwidth from those Internet giants to guarantee their wide area traffic. Specifically, we design a bandwidth trading market among multiple iDaaS providers and application providers, and concentrate on the essential bandwidth pricing problem. The involved challenging issue is that the bandwidth price of each iDaaS provider is not only influenced by other iDaaS providers, but also affected by the application providers. To address this issue, we characterize the interaction between iDaaS providers and application providers using a Stackelberg game model, and analyze the existence and uniqueness of the equilibrium. We further present an efficient bandwidth pricing algorithm by blending the advantage of a geometrical Nash bargaining solution and the demand segmentation method. For comparison, we present two bandwidth reservation algorithms, where each iDaaS provider's bandwidth is reserved in a weighted fair manner and a max-min fair manner, respectively. Finally, we conduct comprehensive trace-driven experiments. The evaluation results show that our proposed algorithms not only ensure the revenue of iDaaS providers, but also provide bandwidth guarantee for application providers with lower bandwidth price per unit.

Yang, R., Ouyang, X., Chen, Y., Townend, P., Xu, J..  2018.  Intelligent Resource Scheduling at Scale: A Machine Learning Perspective. 2018 IEEE Symposium on Service-Oriented System Engineering (SOSE). :132—141.

Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.

Garbo, A., Quer, S..  2018.  A Fast MPEG’s CDVS Implementation for GPU Featured in Mobile Devices. IEEE Access. 6:52027—52046.
The Moving Picture Experts Group's Compact Descriptors for Visual Search (MPEG's CDVS) intends to standardize technologies in order to enable an interoperable, efficient, and cross-platform solution for internet-scale visual search applications and services. Among the key technologies within CDVS, we recall the format of visual descriptors, the descriptor extraction process, and the algorithms for indexing and matching. Unfortunately, these steps require precision and computation accuracy. Moreover, they are very time-consuming, as they need running times in the order of seconds when implemented on the central processing unit (CPU) of modern mobile devices. In this paper, to reduce computation times and maintain precision and accuracy, we re-design, for many-cores embedded graphical processor units (GPUs), all main local descriptor extraction pipeline phases of the MPEG's CDVS standard. To reach this goal, we introduce new techniques to adapt the standard algorithm to parallel processing. Furthermore, to reduce memory accesses and efficiently distribute the kernel workload, we use new approaches to store and retrieve CDVS information on proper GPU data structures. We present a complete experimental analysis on a large and standard test set. Our experiments show that our GPU-based approach is remarkably faster than the CPU-based reference implementation of the standard, and it maintains a comparable precision in terms of true and false positive rates.
Zhang, Y., Deng, L., Chen, M., Wang, P..  2018.  Joint Bidding and Geographical Load Balancing for Datacenters: Is Uncertainty a Blessing or a Curse? IEEE/ACM Transactions on Networking. 26:1049—1062.

We consider the scenario where a cloud service provider (CSP) operates multiple geo-distributed datacenters to provide Internet-scale service. Our objective is to minimize the total electricity and bandwidth cost by jointly optimizing electricity procurement from wholesale markets and geographical load balancing (GLB), i.e., dynamically routing workloads to locations with cheaper electricity. Under the ideal setting where exact values of market prices and workloads are given, this problem reduces to a simple linear programming and is easy to solve. However, under the realistic setting where only distributions of these variables are available, the problem unfolds into a non-convex infinite-dimensional one and is challenging to solve. One of our main contributions is to develop an algorithm that is proven to solve the challenging problem optimally, by exploring the full design space of strategic bidding. Trace-driven evaluations corroborate our theoretical results, demonstrate fast convergence of our algorithm, and show that it can reduce the cost for the CSP by up to 20% as compared with baseline alternatives. This paper highlights the intriguing role of uncertainty in workloads and market prices, measured by their variances. While uncertainty in workloads deteriorates the cost-saving performance of joint electricity procurement and GLB, counter-intuitively, uncertainty in market prices can be exploited to achieve a cost reduction even larger than the setting without price uncertainty.

Kathiravelu, P., Chiesa, M., Marcos, P., Canini, M., Veiga, L..  2018.  Moving Bits with a Fleet of Shared Virtual Routers. 2018 IFIP Networking Conference (IFIP Networking) and Workshops. :1—9.

The steady decline of IP transit prices in the past two decades has helped fuel the growth of traffic demands in the Internet ecosystem. Despite the declining unit pricing, bandwidth costs remain significant due to ever-increasing scale and reach of the Internet, combined with the price disparity between the Internet's core hubs versus remote regions. In the meantime, cloud providers have been auctioning underutilized computing resources in their marketplace as spot instances for a much lower price, compared to their on-demand instances. This state of affairs has led the networking community to devote extensive efforts to cloud-assisted networks - the idea of offloading network functionality to cloud platforms, ultimately leading to more flexible and highly composable network service chains.We initiate a critical discussion on the economic and technological aspects of leveraging cloud-assisted networks for Internet-scale interconnections and data transfers. Namely, we investigate the prospect of constructing a large-scale virtualized network provider that does not own any fixed or dedicated resources and runs atop several spot instances. We construct a cloud-assisted overlay as a virtual network provider, by leveraging third-party cloud spot instances. We identify three use case scenarios where such approach will not only be economically and technologically viable but also provide performance benefits compared to current commercial offerings of connectivity and transit providers.

Sunny, S. M. N. A., Liu, X., Shahriar, M. R..  2018.  Remote Monitoring and Online Testing of Machine Tools for Fault Diagnosis and Maintenance Using MTComm in a Cyber-Physical Manufacturing Cloud. 2018 IEEE 11th International Conference on Cloud Computing (CLOUD). :532—539.

Existing systems allow manufacturers to acquire factory floor data and perform analysis with cloud applications for machine health monitoring, product quality prediction, fault diagnosis and prognosis etc. However, they do not provide capabilities to perform testing of machine tools and associated components remotely, which is often crucial to identify causes of failure. This paper presents a fault diagnosis system in a cyber-physical manufacturing cloud (CPMC) that allows manufacturers to perform diagnosis and maintenance of manufacturing machine tools through remote monitoring and online testing using Machine Tool Communication (MTComm). MTComm is an Internet scale communication method that enables both monitoring and operation of heterogeneous machine tools through RESTful web services over the Internet. It allows manufacturers to perform testing operations from cloud applications at both machine and component level for regular maintenance and fault diagnosis. This paper describes different components of the system and their functionalities in CPMC and techniques used for anomaly detection and remote online testing using MTComm. It also presents the development of a prototype of the proposed system in a CPMC testbed. Experiments were conducted to evaluate its performance to diagnose faults and test machine tools remotely during various manufacturing scenarios. The results demonstrated excellent feasibility to detect anomaly during manufacturing operations and perform testing operations remotely from cloud applications using MTComm.

Shahriar, M. R., Sunny, S. M. N. A., Liu, X., Leu, M. C., Hu, L., Nguyen, N..  2018.  MTComm Based Virtualization and Integration of Physical Machine Operations with Digital-Twins in Cyber-Physical Manufacturing Cloud. 2018 5th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2018 4th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom). :46—51.

Digital-Twins simulate physical world objects by creating 'as-is' virtual images in a cyberspace. In order to create a well synchronized digital-twin simulator in manufacturing, information and activities of a physical machine need to be virtualized. Many existing digital-twins stream read-only data of machine sensors and do not incorporate operations of manufacturing machines through Internet. In this paper, a new method of virtualization is proposed to integrate machining data and operations into the digital-twins using Internet scale machine tool communication method. A fully functional digital-twin is implemented in CPMC testbed using MTComm and several manufacturing application scenarios are developed to evaluate the proposed method and system. Performance analysis shows that it is capable of providing data-driven visual monitoring of a manufacturing process and performing manufacturing operations through digital twins over the Internet. Results of the experiments also shows that the MTComm based digital twins have an excellent efficiency.

Shaikh, F., Bou-Harb, E., Neshenko, N., Wright, A. P., Ghani, N..  2018.  Internet of Malicious Things: Correlating Active and Passive Measurements for Inferring and Characterizing Internet-Scale Unsolicited IoT Devices. IEEE Communications Magazine. 56:170—177.

Advancements in computing, communication, and sensing technologies are making it possible to embed, control, and gather vital information from tiny devices that are being deployed and utilized in practically every aspect of our modernized society. From smart home appliances to municipal water and electric industrial facilities to our everyday work environments, the next Internet frontier, dubbed IoT, is promising to revolutionize our lives and tackle some of our nations' most pressing challenges. While the seamless interconnection of IoT devices with the physical realm is envisioned to bring a plethora of critical improvements in many aspects and diverse domains, it will undoubtedly pave the way for attackers that will target and exploit such devices, threatening the integrity of their data and the reliability of critical infrastructure. Further, such compromised devices will undeniably be leveraged as the next generation of botnets, given their increased processing capabilities and abundant bandwidth. While several demonstrations exist in the literature describing the exploitation procedures of a number of IoT devices, the up-to-date inference, characterization, and analysis of unsolicited IoT devices that are currently deployed "in the wild" is still in its infancy. In this article, we address this imperative task by leveraging active and passive measurements to report on unsolicited Internet-scale IoT devices. This work describes a first step toward exploring the utilization of passive measurements in combination with the results of active measurements to shed light on the Internet-scale insecurities of the IoT paradigm. By correlating results of Internet-wide scanning with Internet background radiation traffic, we disclose close to 14,000 compromised IoT devices in diverse sectors, including critical infrastructure and smart home appliances. To this end, we also analyze their generated traffic to create effective mitigation signatures that could be deployed in local IoT realms. To support largescale empirical data analytics in the context of IoT, we make available the inferred and extracted IoT malicious raw data through an authenticated front-end service. The outcomes of this work confirm the existence of such compromised devices on an Internet scale, while the generated inferences and insights are postulated to be employed for inferring other similarly compromised IoT devices, in addition to contributing to IoT cyber security situational awareness.

Blake, M. Brian, Helal, A., Mei, H..  2019.  Guest Editor's Introduction: Special Section on Services and Software Engineering Towards Internetware. IEEE Transactions on Services Computing. 12:4–5.
The six papers in this special section focuses on services and software computing. Services computing provides a foundation to build software systems and applications over the Internet as well as emerging hybrid networked platforms motivated by it. Due to the open, dynamic, and evolving nature of the Internet, new features were born with these Internet-scale and service-based software systems. Such systems should be situation- aware, adaptable, and able to evolve to effectively deal with rapid changes of user requirements and runtime contexts. These emerging software systems enable and require novel methods in conducting software requirement, design, deployment, operation, and maintenance beyond existing services computing technologies. New programming and lifecycle paradigms accommodating such Internet- scale and service-based software systems, referred to as Internetware, are inevitable. The goal of this special section is to present the innovative solutions and challenging technical issues, so as to explore various potential pathways towards Internet-scale and service-based software systems.
Hilt, V., Sparks, K..  2019.  Future edge clouds. Bell Labs Technical Journal. 24:1–17.
Widespread deployment of centralized clouds has changed the way internet services are developed, deployed and operated. Centralized clouds have substantially extended the market opportunities for online services, enabled new entities to create and operate internet-scale services, and changed the way traditional companies run their operations. However, there are types of services that are unsuitable for today's centralized clouds such as highly interactive virtual and augmented reality (VR/AR) applications, high-resolution gaming, virtualized RAN, mass IoT data processing and industrial robot control. They can be broadly categorized as either latency-sensitive network functions, latency-sensitive applications, and/or high-bandwidth services. What these basic functions have in common is the need for a more distributed cloud infrastructure—an infrastructure we call edge clouds. In this paper, we examine the evolution of clouds, and edge clouds especially, and look at the developing market for edge clouds and what developments are required in networking, hardware and software to support them.
Pan, T., Xu, C., Lv, J., Shi, Q., Li, Q., Jia, C., Huang, T., Lin, X..  2019.  LD-ICN: Towards Latency Deterministic Information-Centric Networking. 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). :973–980.
Deterministic latency is the key challenge that must be addressed in numerous 5G applications such as AR/VR. However, it is difficult to make customized end-to-end resource reservation across multiple ISPs using IP-based QoS mechanisms. Information-Centric Networking (ICN) provides scalable and efficient content distribution at the Internet scale due to its in-network caching and native multicast capabilities, and the deterministic latency can promisingly be guaranteed by caching the relevant content objects in appropriate locations. Existing proposals formulate the ICN cache placement problem into numerous theoretical models. However, the underlying mechanisms to support such cache coordination are not discussed in detail. Especially, how to efficiently make cache reservation, how to avoid route oscillation when content cache is updated and how to conduct the real-time latency measurement? In this work, we propose Latency Deterministic Information-Centric Networking (LD-ICN). LD-ICN relies on source routing-based latency telemetry and leverages an on-path caching technique to avoid frequent route oscillation while still achieve the optimal cache placement under the SDN architecture. Extensive evaluation shows that under LD-ICN, 90.04% of the content requests are satisfied within the hard latency requirements.
Ray, K., Banerjee, A., Mohalik, S. K..  2019.  Web Service Selection with Correlations: A Feature-Based Abstraction Refinement Approach. 2019 IEEE 12th Conference on Service-Oriented Computing and Applications (SOCA). :33–40.
In this paper, we address the web service selection problem for linear workflows. Given a linear workflow specifying a set of ordered tasks and a set of candidate services providing different features for each task, the selection problem deals with the objective of selecting the most eligible service for each task, given the ordering specified. A number of approaches to solving the selection problem have been proposed in literature. With web services growing at an incredible pace, service selection at the Internet scale has resurfaced as a problem of recent research interest. In this work, we present our approach to the selection problem using an abstraction refinement technique to address the scalability limitations of contemporary approaches. Experiments on web service benchmarks show that our approach can add substantial performance benefits in terms of space when compared to an approach without our optimization.
Stokes, J. W., Agrawal, R., McDonald, G., Hausknecht, M..  2019.  ScriptNet: Neural Static Analysis for Malicious JavaScript Detection. MILCOM 2019 - 2019 IEEE Military Communications Conference (MILCOM). :1–8.
Malicious scripts are an important computer infection threat vector for computer users. For internet-scale processing, static analysis offers substantial computing efficiencies. We propose the ScriptNet system for neural malicious JavaScript detection which is based on static analysis. We also propose a novel deep learning model, Pre-Informant Learning (PIL), which processes Javascript files as byte sequences. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Unlike previously proposed solutions, our model variants are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Evaluating this model on a large corpus of 212,408 JavaScript files indicates that the best performing PIL model offers a 98.10% true positive rate (TPR) for the first 60K byte subsequences and 81.66% for the full-length files, at a false positive rate (FPR) of 0.50%. Both models significantly outperform several baseline models. The best performing PIL model can successfully detect 92.02% of unknown malware samples in a hindsight experiment where the true labels of the malicious JavaScript files were not known when the model was trained.
Zhou, K., Sun, S., Wang, H., Huang, P., He, X., Lan, R., Li, W., Liu, W., Yang, T..  2019.  Improving Cache Performance for Large-Scale Photo Stores via Heuristic Prefetching Scheme. IEEE Transactions on Parallel and Distributed Systems. 30:2033–2045.
Photo service providers are facing critical challenges of dealing with the huge amount of photo storage, typically in a magnitude of billions of photos, while ensuring national-wide or world-wide satisfactory user experiences. Distributed photo caching architecture is widely deployed to meet high performance expectations, where efficient still mysterious caching policies play essential roles. In this work, we present a comprehensive study on internet-scale photo caching algorithms in the case of QQPhoto from Tencent Inc., the largest social network service company in China. We unveil that even advanced cache algorithms can only perform at a similar level as simple baseline algorithms and there still exists a large performance gap between these cache algorithms and the theoretically optimal algorithm due to the complicated access behaviors in such a large multi-tenant environment. We then expound the reasons behind this phenomenon via extensively investigating the characteristics of QQPhoto workloads. Finally, in order to realistically further improve QQPhoto cache efficiency, we propose to incorporate a prefetcher in the cache stack based on the observed immediacy feature that is unique to the QQPhoto workload. The prefetcher proactively prefetches selected photos into cache before they are requested for the first time to eliminate compulsory misses and promote hit ratios. Our extensive evaluation results show that with appropriate prefetching we improve the cache hit ratio by up to 7.4 percent, while reducing the average access latency by 6.9 percent at a marginal cost of 4.14 percent backend network traffic compared to the original system that performs no prefetching.
Chai, W. K., Pavlou, G., Kamel, G., Katsaros, K. V., Wang, N..  2019.  A Distributed Interdomain Control System for Information-Centric Content Delivery. IEEE Systems Journal. 13:1568–1579.
The Internet, the de facto platform for large-scale content distribution, suffers from two issues that limit its manageability, efficiency, and evolution. First, the IP-based Internet is host-centric and agnostic to the content being delivered and, second, the tight coupling of the control and data planes restrict its manageability, and subsequently the possibility to create dynamic alternative paths for efficient content delivery. Here, we present the CURLING system that leverages the emerging Information-Centric Networking paradigm for enabling cost-efficient Internet-scale content delivery by exploiting multicasting and in-network caching. Following the software-defined networking concept that decouples the control and data planes, CURLING adopts an interdomain hop-by-hop content resolution mechanism that allows network operators to dynamically enforce/change their network policies in locating content sources and optimizing content delivery paths. Content publishers and consumers may also control content access according to their preferences. Based on both analytical modeling and simulations using real domain-level Internet subtopologies, we demonstrate how CURLING supports efficient Internet-scale content delivery without the necessity for radical changes to the current Internet.
Cheng, D., Zhou, X., Ding, Z., Wang, Y., Ji, M..  2019.  Heterogeneity Aware Workload Management in Distributed Sustainable Datacenters. IEEE Transactions on Parallel and Distributed Systems. 30:375–387.
The tremendous growth of cloud computing and large-scale data analytics highlight the importance of reducing datacenter power consumption and environmental impact of brown energy. While many Internet service operators have at least partially powered their datacenters by green energy, it is challenging to effectively utilize green energy due to the intermittency of renewable sources, such as solar or wind. We find that the geographical diversity of internet-scale services can be carefully scheduled to improve the efficiency of applying green energy in datacenters. In this paper, we propose a holistic heterogeneity-aware cloud workload management approach, sCloud, that aims to maximize the system goodput in distributed self-sustainable datacenters. sCloud adaptively places the transactional workload to distributed datacenters, allocates the available resource to heterogeneous workloads in each datacenter, and migrates batch jobs across datacenters, while taking into account the green power availability and QoS requirements. We formulate the transactional workload placement as a constrained optimization problem that can be solved by nonlinear programming. Then, we propose a batch job migration algorithm to further improve the system goodput when the green power supply varies widely at different locations. Finally, we extend sCloud by integrating a flexible batch job manager to dynamically control the job execution progress without violating the deadlines. We have implemented sCloud in a university cloud testbed with real-world weather conditions and workload traces. Experimental results demonstrate sCloud can achieve near-to-optimal system performance while being resilient to dynamic power availability. sCloud with the flexible batch job management approach outperforms a heterogeneity-oblivious approach by 37 percent in improving system goodput and 33 percent in reducing QoS violations.
Xu, Y., Chen, H., Zhao, Y., Zhang, W., Shen, Q., Zhang, X., Ma, Z..  2019.  Neural Adaptive Transport Framework for Internet-scale Interactive Media Streaming Services. 2019 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). :1–6.
Network dynamics, such as bandwidth fluctuation and unexpected latency, hurt users' quality of experience (QoE) greatly for media services over the Internet. In this work, we propose a neural adaptive transport (NAT) framework to tackle the network dynamics for Internet-scale interactive media services. The entire NAT system has three major components: a learning based cloud overlay routing (COR) scheme for the best delivery path to bypass the network bottlenecks while offering the minimal end-to-end latency simultaneously; a residual neural network based collaborative video processing (CVP) system to trade the computational capability at client-end for QoE improvement via learned resolution scaling; and a deep reinforcement learning (DRL) based adaptive real-time streaming (ARS) strategy to select the appropriate video bitrate for maximal QoE. We have demonstrated that COR could improve the user satisfaction from 5% to 43%, CVP could reduce the bandwidth consumption more than 30% at the same quality, and DRL-based ARS can maintain the smooth streaming with \textbackslashtextless; 50% QoE improvement, respectively.
Georgakopoulos, D..  2019.  A Global IoT Device Discovery and Integration Vision. 2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC). :214–221.
This paper presents the vision of establishing a global service for Global IoT Device Discovery and Integration (GIDDI). The establishment of a GIDDI will: (1) make IoT application development more efficient and cost-effective via enabling sharing and reuse of existing IoT devices owned and maintained by different providers, and (2) promote deployment of new IoT devices supported by a revenue generation scheme for their providers. More specifically, this paper proposes a distributed IoT blockchain ledger that is specifically designed for managing the metadata needed to describe IoT devices and the data they produce. This GIDDI Blockchain is Internet-owned (i.e., it is not controlled by any individual or organization) and is Internet-scaled (i.e., it can support the discovery and reuse billions of IoT devices). The paper also proposes a GIDDI Marketplace that provides the functionality needed for IoT device registration, query, integration, payment and security via the proposed GIDDI Blockchain. We outline the GIDDI Blockchain and Marketplace implementation. We also discuss ongoing research for automatically mining the IoT Device metadata needed for IoT Device query and integration from the data produce. This significantly reduces the need for IoT device providers to supply the metadata descriptions the devices and the data they produce during the registration of IoT Devices in the GIDDI Blockchain.
Ghosh, Shalini, Das, Ariyam, Porras, Phil, Yegneswaran, Vinod, Gehani, Ashish.  2017.  Automated Categorization of Onion Sites for Analyzing the Darkweb Ecosystem. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :1793–1802.

Onion sites on the darkweb operate using the Tor Hidden Service (HS) protocol to shield their locations on the Internet, which (among other features) enables these sites to host malicious and illegal content while being resistant to legal action and seizure. Identifying and monitoring such illicit sites in the darkweb is of high relevance to the Computer Security and Law Enforcement communities. We have developed an automated infrastructure that crawls and indexes content from onion sites into a large-scale data repository, called LIGHTS, with over 100M pages. In this paper we describe Automated Tool for Onion Labeling (ATOL), a novel scalable analysis service developed to conduct a thematic assessment of the content of onion sites in the LIGHTS repository. ATOL has three core components – (a) a novel keyword discovery mechanism (ATOLKeyword) which extends analyst-provided keywords for different categories by suggesting new descriptive and discriminative keywords that are relevant for the categories; (b) a classification framework (ATOLClassify) that uses the discovered keywords to map onion site content to a set of categories when sufficient labeled data is available; (c) a clustering framework (ATOLCluster) that can leverage information from multiple external heterogeneous knowledge sources, ranging from domain expertise to Bitcoin transaction data, to categorize onion content in the absence of sufficient supervised data. The paper presents empirical results of ATOL on onion datasets derived from the LIGHTS repository, and additionally benchmarks ATOL's algorithms on the publicly available 20 Newsgroups dataset to demonstrate the reproducibility of its results. On the LIGHTS dataset, ATOLClassify gives a 12% performance gain over an analyst-provided baseline, while ATOLCluster gives a 7% improvement over state-of-the-art semi-supervised clustering algorithms. We also discuss how ATOL has been deployed and externally evaluated, as part of the LIGHTS system.

Lyu, Minzhao, Sherratt, Dainel, Sivanathan, Arunan, Gharakheili, Hassan Habibi, Radford, Adam, Sivaraman, Vijay.  2017.  Quantifying the Reflective DDoS Attack Capability of Household IoT Devices. Proceedings of the 10th ACM Conference on Security and Privacy in Wireless and Mobile Networks. :46–51.

Distributed Denial-of-Service (DDoS) attacks are increasing in frequency and volume on the Internet, and there is evidence that cyber-criminals are turning to Internet-of-Things (IoT) devices such as cameras and vending machines as easy launchpads for large-scale attacks. This paper quantifies the capability of consumer IoT devices to participate in reflective DDoS attacks. We first show that household devices can be exposed to Internet reflection even if they are secured behind home gateways. We then evaluate eight household devices available on the market today, including lightbulbs, webcams, and printers, and experimentally profile their reflective capability, amplification factor, duration, and intensity rate for TCP, SNMP, and SSDP based attacks. Lastly, we demonstrate reflection attacks in a real-world setting involving three IoT-equipped smart-homes, emphasising the imminent need to address this problem before it becomes widespread.

Quach, Alan, Wang, Zhongjie, Qian, Zhiyun.  2017.  Investigation of the 2016 Linux TCP Stack Vulnerability at Scale. Proceedings of the 2017 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems. :8–8.

To combat blind in-window attacks against TCP, changes proposed in RFC 5961 have been implemented by Linux since late 2012. While successfully eliminating the old vulnerabilities, the new TCP implementation was reported in August 2016 to have introduced a subtle yet serious security flaw. Assigned CVE-2016-5696, the flaw exploits the challenge ACK rate limiting feature that could allow an off-path attacker to infer the presence/absence of a TCP connection between two arbitrary hosts, terminate such a connection, and even inject malicious payload. In this work, we perform a comprehensive measurement of the impact of the new vulnerability. This includes (1) tracking the vulnerable Internet servers, (2) monitoring the patch behavior over time, (3) picturing the overall security status of TCP stacks at scale. Towards this goal, we design a scalable measurement methodology to scan the Alexa top 1 million websites for almost 6 months. We also present how notifications impact the patching behavior, and compare the result with the Heartbleed and the Debian PRNG vulnerability. The measurement represents a valuable data point in understanding how Internet servers react to serious security flaws in the operating system kernel.

Llewellynn, Tim, Fernández-Carrobles, M. Milagro, Deniz, Oscar, Fricker, Samuel, Storkey, Amos, Pazos, Nuria, Velikic, Gordana, Leufgen, Kirsten, Dahyot, Rozenn, Koller, Sebastian et al..  2017.  BONSEYES: Platform for Open Development of Systems of Artificial Intelligence: Invited Paper. Proceedings of the Computing Frontiers Conference. :299–304.

The Bonseyes EU H2020 collaborative project aims to develop a platform consisting of a Data Marketplace, a Deep Learning Toolbox, and Developer Reference Platforms for organizations wanting to adopt Artificial Intelligence. The project will be focused on using artificial intelligence in low power Internet of Things (IoT) devices ("edge computing"), embedded computing systems, and data center servers ("cloud computing"). It will bring about orders of magnitude improvements in efficiency, performance, reliability, security, and productivity in the design and programming of systems of artificial intelligence that incorporate Smart Cyber-Physical Systems (CPS). In addition, it will solve a causality problem for organizations who lack access to Data and Models. Its open software architecture will facilitate adoption of the whole concept on a wider scale. To evaluate the effectiveness, technical feasibility, and to quantify the real-world improvements in efficiency, security, performance, effort and cost of adding AI to products and services using the Bonseyes platform, four complementary demonstrators will be built. Bonseyes platform capabilities are aimed at being aligned with the European FI-PPP activities and take advantage of its flagship project FIWARE. This paper provides a description of the project motivation, goals and preliminary work.

Guarnizo, Juan David, Tambe, Amit, Bhunia, Suman Sankar, Ochoa, Martin, Tippenhauer, Nils Ole, Shabtai, Asaf, Elovici, Yuval.  2017.  SIPHON: Towards Scalable High-Interaction Physical Honeypots. Proceedings of the 3rd ACM Workshop on Cyber-Physical System Security. :57–68.

In recent years, the emerging Internet-of-Things (IoT) has led to rising concerns about the security of networked embedded devices. In this work, we propose the SIPHON architecture–-a Scalable high-Interaction Honeypot platform for IoT devices. Our architecture leverages IoT devices that are physically at one location and are connected to the Internet through so-called $\backslash$emph\wormholes\ distributed around the world. The resulting architecture allows exposing few physical devices over a large number of geographically distributed IP addresses. We demonstrate the proposed architecture in a large scale experiment with 39 wormhole instances in 16 cities in 9 countries. Based on this setup, five physical IP cameras, one NVR and one IP printer are presented as 85 real IoT devices on the Internet, attracting a daily traffic of 700MB for a period of two months. A preliminary analysis of the collected traffic indicates that devices in some cities attracted significantly more traffic than others (ranging from 600 000 incoming TCP connections for the most popular destination to less than 50 000 for the least popular). We recorded over 400 brute-force login attempts to the web-interface of our devices using a total of 1826 distinct credentials, from which 11 attempts were successful. Moreover, we noted login attempts to Telnet and SSH ports some of which used credentials found in the recently disclosed Mirai malware.