Biblio
The papers in this special section explore recent advancements in parallel graph processing. In the sphere of modern data science and data-driven applications, graph algorithms have achieved a pivotal place in advancing the state of scientific discovery and knowledge. Nearly three centuries of ideas have made graph theory and its applications a mature area in computational sciences. Yet, today we find ourselves at a crossroads between theory and application. Spurred by the digital revolution, data from a diverse range of high throughput channels and devices, from across internet-scale applications, are starting to mark a new era in data-driven computing and discovery. Building robust graph models and implementing scalable graph application frameworks in the context of this new era are proving to be significant challenges. Concomitant to the digital revolution, we have also experienced an explosion in computing architectures, with a broad range of multicores, manycores, heterogeneous platforms, and hardware accelerators (CPUs, GPUs) being actively developed and deployed within servers and multinode clusters. Recent advances have started to show that in more than one way, these two fields—graph theory and architectures–are capable of benefiting and in fact spurring new research directions in one another. This special section is aimed at introducing some of the new avenues of cutting-edge research happening at the intersection of graph algorithm design and their implementation on advanced parallel architectures.
With the development of Internet of Things, numerous IoT devices have been brought into our daily lives. Bluetooth Low Energy (BLE), due to the low energy consumption and generic service stack, has become one of the most popular wireless communication technologies for IoT. However, because of the short communication range and exclusive connection pattern, a BLE-equipped device can only be used by a single user near the device. To fully explore the benefits of BLE and make BLE-equipped devices truly accessible over the Internet as IoT devices, in this paper, we propose a cloud-based software framework that can enable multiple users to interact with various BLE IoT devices over the Internet. This framework includes an agent program, a suite of services hosting in cloud, and a set of RESTful APIs exposed to Internet users. Given the availability of this framework, the access to BLE devices can be extended from local to the Internet scale without any software or hardware changes to BLE devices, and more importantly, shared usage of remote BLE devices over the Internet is also made available.
Increasing number of Internet-scale applications, such as video streaming, incur huge amount of wide area traffic. Such traffic over the unreliable Internet without bandwidth guarantee suffers unpredictable network performance. This result, however, is unappealing to the application providers. Fortunately, Internet giants like Google and Microsoft are increasingly deploying their private wide area networks (WANs) to connect their global datacenters. Such high-speed private WANs are reliable, and can provide predictable network performance. In this paper, we propose a new type of service-inter-datacenter network as a service (iDaaS), where traditional application providers can reserve bandwidth from those Internet giants to guarantee their wide area traffic. Specifically, we design a bandwidth trading market among multiple iDaaS providers and application providers, and concentrate on the essential bandwidth pricing problem. The involved challenging issue is that the bandwidth price of each iDaaS provider is not only influenced by other iDaaS providers, but also affected by the application providers. To address this issue, we characterize the interaction between iDaaS providers and application providers using a Stackelberg game model, and analyze the existence and uniqueness of the equilibrium. We further present an efficient bandwidth pricing algorithm by blending the advantage of a geometrical Nash bargaining solution and the demand segmentation method. For comparison, we present two bandwidth reservation algorithms, where each iDaaS provider's bandwidth is reserved in a weighted fair manner and a max-min fair manner, respectively. Finally, we conduct comprehensive trace-driven experiments. The evaluation results show that our proposed algorithms not only ensure the revenue of iDaaS providers, but also provide bandwidth guarantee for application providers with lower bandwidth price per unit.
Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.
We consider the scenario where a cloud service provider (CSP) operates multiple geo-distributed datacenters to provide Internet-scale service. Our objective is to minimize the total electricity and bandwidth cost by jointly optimizing electricity procurement from wholesale markets and geographical load balancing (GLB), i.e., dynamically routing workloads to locations with cheaper electricity. Under the ideal setting where exact values of market prices and workloads are given, this problem reduces to a simple linear programming and is easy to solve. However, under the realistic setting where only distributions of these variables are available, the problem unfolds into a non-convex infinite-dimensional one and is challenging to solve. One of our main contributions is to develop an algorithm that is proven to solve the challenging problem optimally, by exploring the full design space of strategic bidding. Trace-driven evaluations corroborate our theoretical results, demonstrate fast convergence of our algorithm, and show that it can reduce the cost for the CSP by up to 20% as compared with baseline alternatives. This paper highlights the intriguing role of uncertainty in workloads and market prices, measured by their variances. While uncertainty in workloads deteriorates the cost-saving performance of joint electricity procurement and GLB, counter-intuitively, uncertainty in market prices can be exploited to achieve a cost reduction even larger than the setting without price uncertainty.
The steady decline of IP transit prices in the past two decades has helped fuel the growth of traffic demands in the Internet ecosystem. Despite the declining unit pricing, bandwidth costs remain significant due to ever-increasing scale and reach of the Internet, combined with the price disparity between the Internet's core hubs versus remote regions. In the meantime, cloud providers have been auctioning underutilized computing resources in their marketplace as spot instances for a much lower price, compared to their on-demand instances. This state of affairs has led the networking community to devote extensive efforts to cloud-assisted networks - the idea of offloading network functionality to cloud platforms, ultimately leading to more flexible and highly composable network service chains.We initiate a critical discussion on the economic and technological aspects of leveraging cloud-assisted networks for Internet-scale interconnections and data transfers. Namely, we investigate the prospect of constructing a large-scale virtualized network provider that does not own any fixed or dedicated resources and runs atop several spot instances. We construct a cloud-assisted overlay as a virtual network provider, by leveraging third-party cloud spot instances. We identify three use case scenarios where such approach will not only be economically and technologically viable but also provide performance benefits compared to current commercial offerings of connectivity and transit providers.
Existing systems allow manufacturers to acquire factory floor data and perform analysis with cloud applications for machine health monitoring, product quality prediction, fault diagnosis and prognosis etc. However, they do not provide capabilities to perform testing of machine tools and associated components remotely, which is often crucial to identify causes of failure. This paper presents a fault diagnosis system in a cyber-physical manufacturing cloud (CPMC) that allows manufacturers to perform diagnosis and maintenance of manufacturing machine tools through remote monitoring and online testing using Machine Tool Communication (MTComm). MTComm is an Internet scale communication method that enables both monitoring and operation of heterogeneous machine tools through RESTful web services over the Internet. It allows manufacturers to perform testing operations from cloud applications at both machine and component level for regular maintenance and fault diagnosis. This paper describes different components of the system and their functionalities in CPMC and techniques used for anomaly detection and remote online testing using MTComm. It also presents the development of a prototype of the proposed system in a CPMC testbed. Experiments were conducted to evaluate its performance to diagnose faults and test machine tools remotely during various manufacturing scenarios. The results demonstrated excellent feasibility to detect anomaly during manufacturing operations and perform testing operations remotely from cloud applications using MTComm.
Digital-Twins simulate physical world objects by creating 'as-is' virtual images in a cyberspace. In order to create a well synchronized digital-twin simulator in manufacturing, information and activities of a physical machine need to be virtualized. Many existing digital-twins stream read-only data of machine sensors and do not incorporate operations of manufacturing machines through Internet. In this paper, a new method of virtualization is proposed to integrate machining data and operations into the digital-twins using Internet scale machine tool communication method. A fully functional digital-twin is implemented in CPMC testbed using MTComm and several manufacturing application scenarios are developed to evaluate the proposed method and system. Performance analysis shows that it is capable of providing data-driven visual monitoring of a manufacturing process and performing manufacturing operations through digital twins over the Internet. Results of the experiments also shows that the MTComm based digital twins have an excellent efficiency.
Advancements in computing, communication, and sensing technologies are making it possible to embed, control, and gather vital information from tiny devices that are being deployed and utilized in practically every aspect of our modernized society. From smart home appliances to municipal water and electric industrial facilities to our everyday work environments, the next Internet frontier, dubbed IoT, is promising to revolutionize our lives and tackle some of our nations' most pressing challenges. While the seamless interconnection of IoT devices with the physical realm is envisioned to bring a plethora of critical improvements in many aspects and diverse domains, it will undoubtedly pave the way for attackers that will target and exploit such devices, threatening the integrity of their data and the reliability of critical infrastructure. Further, such compromised devices will undeniably be leveraged as the next generation of botnets, given their increased processing capabilities and abundant bandwidth. While several demonstrations exist in the literature describing the exploitation procedures of a number of IoT devices, the up-to-date inference, characterization, and analysis of unsolicited IoT devices that are currently deployed "in the wild" is still in its infancy. In this article, we address this imperative task by leveraging active and passive measurements to report on unsolicited Internet-scale IoT devices. This work describes a first step toward exploring the utilization of passive measurements in combination with the results of active measurements to shed light on the Internet-scale insecurities of the IoT paradigm. By correlating results of Internet-wide scanning with Internet background radiation traffic, we disclose close to 14,000 compromised IoT devices in diverse sectors, including critical infrastructure and smart home appliances. To this end, we also analyze their generated traffic to create effective mitigation signatures that could be deployed in local IoT realms. To support largescale empirical data analytics in the context of IoT, we make available the inferred and extracted IoT malicious raw data through an authenticated front-end service. The outcomes of this work confirm the existence of such compromised devices on an Internet scale, while the generated inferences and insights are postulated to be employed for inferring other similarly compromised IoT devices, in addition to contributing to IoT cyber security situational awareness.
Onion sites on the darkweb operate using the Tor Hidden Service (HS) protocol to shield their locations on the Internet, which (among other features) enables these sites to host malicious and illegal content while being resistant to legal action and seizure. Identifying and monitoring such illicit sites in the darkweb is of high relevance to the Computer Security and Law Enforcement communities. We have developed an automated infrastructure that crawls and indexes content from onion sites into a large-scale data repository, called LIGHTS, with over 100M pages. In this paper we describe Automated Tool for Onion Labeling (ATOL), a novel scalable analysis service developed to conduct a thematic assessment of the content of onion sites in the LIGHTS repository. ATOL has three core components – (a) a novel keyword discovery mechanism (ATOLKeyword) which extends analyst-provided keywords for different categories by suggesting new descriptive and discriminative keywords that are relevant for the categories; (b) a classification framework (ATOLClassify) that uses the discovered keywords to map onion site content to a set of categories when sufficient labeled data is available; (c) a clustering framework (ATOLCluster) that can leverage information from multiple external heterogeneous knowledge sources, ranging from domain expertise to Bitcoin transaction data, to categorize onion content in the absence of sufficient supervised data. The paper presents empirical results of ATOL on onion datasets derived from the LIGHTS repository, and additionally benchmarks ATOL's algorithms on the publicly available 20 Newsgroups dataset to demonstrate the reproducibility of its results. On the LIGHTS dataset, ATOLClassify gives a 12% performance gain over an analyst-provided baseline, while ATOLCluster gives a 7% improvement over state-of-the-art semi-supervised clustering algorithms. We also discuss how ATOL has been deployed and externally evaluated, as part of the LIGHTS system.
Distributed Denial-of-Service (DDoS) attacks are increasing in frequency and volume on the Internet, and there is evidence that cyber-criminals are turning to Internet-of-Things (IoT) devices such as cameras and vending machines as easy launchpads for large-scale attacks. This paper quantifies the capability of consumer IoT devices to participate in reflective DDoS attacks. We first show that household devices can be exposed to Internet reflection even if they are secured behind home gateways. We then evaluate eight household devices available on the market today, including lightbulbs, webcams, and printers, and experimentally profile their reflective capability, amplification factor, duration, and intensity rate for TCP, SNMP, and SSDP based attacks. Lastly, we demonstrate reflection attacks in a real-world setting involving three IoT-equipped smart-homes, emphasising the imminent need to address this problem before it becomes widespread.
To combat blind in-window attacks against TCP, changes proposed in RFC 5961 have been implemented by Linux since late 2012. While successfully eliminating the old vulnerabilities, the new TCP implementation was reported in August 2016 to have introduced a subtle yet serious security flaw. Assigned CVE-2016-5696, the flaw exploits the challenge ACK rate limiting feature that could allow an off-path attacker to infer the presence/absence of a TCP connection between two arbitrary hosts, terminate such a connection, and even inject malicious payload. In this work, we perform a comprehensive measurement of the impact of the new vulnerability. This includes (1) tracking the vulnerable Internet servers, (2) monitoring the patch behavior over time, (3) picturing the overall security status of TCP stacks at scale. Towards this goal, we design a scalable measurement methodology to scan the Alexa top 1 million websites for almost 6 months. We also present how notifications impact the patching behavior, and compare the result with the Heartbleed and the Debian PRNG vulnerability. The measurement represents a valuable data point in understanding how Internet servers react to serious security flaws in the operating system kernel.
The Bonseyes EU H2020 collaborative project aims to develop a platform consisting of a Data Marketplace, a Deep Learning Toolbox, and Developer Reference Platforms for organizations wanting to adopt Artificial Intelligence. The project will be focused on using artificial intelligence in low power Internet of Things (IoT) devices ("edge computing"), embedded computing systems, and data center servers ("cloud computing"). It will bring about orders of magnitude improvements in efficiency, performance, reliability, security, and productivity in the design and programming of systems of artificial intelligence that incorporate Smart Cyber-Physical Systems (CPS). In addition, it will solve a causality problem for organizations who lack access to Data and Models. Its open software architecture will facilitate adoption of the whole concept on a wider scale. To evaluate the effectiveness, technical feasibility, and to quantify the real-world improvements in efficiency, security, performance, effort and cost of adding AI to products and services using the Bonseyes platform, four complementary demonstrators will be built. Bonseyes platform capabilities are aimed at being aligned with the European FI-PPP activities and take advantage of its flagship project FIWARE. This paper provides a description of the project motivation, goals and preliminary work.
In recent years, the emerging Internet-of-Things (IoT) has led to rising concerns about the security of networked embedded devices. In this work, we propose the SIPHON architecture–-a Scalable high-Interaction Honeypot platform for IoT devices. Our architecture leverages IoT devices that are physically at one location and are connected to the Internet through so-called $\backslash$emph\wormholes\ distributed around the world. The resulting architecture allows exposing few physical devices over a large number of geographically distributed IP addresses. We demonstrate the proposed architecture in a large scale experiment with 39 wormhole instances in 16 cities in 9 countries. Based on this setup, five physical IP cameras, one NVR and one IP printer are presented as 85 real IoT devices on the Internet, attracting a daily traffic of 700MB for a period of two months. A preliminary analysis of the collected traffic indicates that devices in some cities attracted significantly more traffic than others (ranging from 600 000 incoming TCP connections for the most popular destination to less than 50 000 for the least popular). We recorded over 400 brute-force login attempts to the web-interface of our devices using a total of 1826 distinct credentials, from which 11 attempts were successful. Moreover, we noted login attempts to Telnet and SSH ports some of which used credentials found in the recently disclosed Mirai malware.