Visible to the public Biblio

Filters: Keyword is data centers  [Clear All Filters]
K, Devaki, L, Leena Jenifer.  2022.  Re-Encryption Model for Multi-Block Data Updates in Network Security. 2022 International Conference on Applied Artificial Intelligence and Computing (ICAAIC). :1331–1336.
Nowadays, online cloud storage networks can be accessed by third parties. Businesses that host large data centers buy or rent storage space from individuals who need to store their data. According to customer needs, data hub operators visualise the data and expose the cloud storage for storing data. Tangibly, the resources may wander around numerous servers. Data resilience is a prior need for all storage methods. For routines in a distributed data center, distributed removable code is appropriate. A safe cloud cache solution, AES-UCODR, is proposed to decrease I/O overheads for multi-block updates in proxy re-encryption systems. Its competence is evaluated using the real-world finance sector.
Yu, Xiao, Wang, Dong, Sun, Xiaojuan, Zheng, Bingbing, Du, Yankai.  2022.  Design and Implementation of a Software Disaster Recovery Service for Cloud Computing-Based Aerospace Ground Systems. 2022 11th International Conference on Communications, Circuits and Systems (ICCCAS). :220—225.
The data centers of cloud computing-based aerospace ground systems and the businesses running on them are extremely vulnerable to man-made disasters, emergencies, and other disasters, which means security is seriously threatened. Thus, cloud centers need to provide effective disaster recovery services for software and data. However, the disaster recovery methods for current cloud centers of aerospace ground systems have long been in arrears, and the disaster tolerance and anti-destruction capability are weak. Aiming at the above problems, in this paper we design a disaster recovery service for aerospace ground systems based on cloud computing. On account of the software warehouse, this service adopts the main standby mode to achieve the backup, local disaster recovery, and remote disaster recovery of software and data. As a result, this service can timely response to the disasters, ensure the continuous running of businesses, and improve the disaster tolerance and anti-destruction capability of aerospace ground systems. Extensive simulation experiments validate the effectiveness of the disaster recovery service proposed in this paper.
Hariyanto, Budi, Ramli, Kalamullah, Suryanto, Yohan.  2021.  Risk Management System for Operational Services in Data Center : DC Papa Oscar Cikeas Case study. 2021 International Conference on Artificial Intelligence and Computer Science Technology (ICAICST). :118—123.
The presence of the Information Technology System (ITS) has become one of the components for basic needs that must be met in navigating through the ages. Organizational programs in responding to the industrial era 4.0 make the use of ITS is a must in order to facilitate all processes related to quality service in carrying out the main task of protecting and serving the community. The implementation of ITS is actually not easy forthe threat of challenges and disturbances in the form of risks haunts ITS's operations. These conditions must be able to be identified and analyzed and then action can be executed to reduce the negative impact, so the risks are acceptable. This research will study about ITS risk management using the the guideline of Information Technology Infrastructure Library (ITIL) to formulate an operational strategy in order ensure that STI services at the Papa Oscar Cikeas Data Center (DC) can run well in the form of recommendations. Based on a survey on the implementing elements of IT function, 82.18% of respondents considered that the IT services provided by DC were very important, 86.49% of respondents knew the importance of having an emergency plan to ensure their products and services were always available, and 67.17% of respondents believes that DC is well managed. The results of the study concludes that it is necessary to immediately form a structural DC organization to prepare a good path for the establishment of a professional data center in supporting public service information technology systems.
Cheng, Jie, Zhang, Kun, Tu, Bibo.  2021.  Remote Attestation of Large-scale Virtual Machines in the Cloud Data Center. 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :180—187.
With the development of cloud computing, remote attestation of virtual machines has received extensive attention. However, the current schemes mainly concentrate on the single prover, and the attestation of a large-scale virtualization environment will cause TPM bottleneck and network congestion, resulting in low efficiency of attestation. This paper proposes CloudTA, an extensible remote attestation architecture. CloudTA groups all virtual machines on each cloud server and introduces an integrity measurement group (IMG) to measure virtual machines and generate trusted evidence by a group. Subsequently, the cloud server reports the physical platform and VM group's trusted evidence for group verification, reducing latency and improving efficiency. Besides, CloudTA designs a hybrid high concurrency communication framework for supporting remote attestation of large-scale virtual machines by combining active requests and periodic reports. The evaluation results suggest that CloudTA has good efficiency and scalability and can support remote attestation of ten thousand virtual machines.
Nougnanke, Kokouvi Benoit, Labit, Yann, Bruyere, Marc, Ferlin, Simone, Aïvodji, Ulrich.  2021.  Learning-based Incast Performance Inference in Software-Defined Data Centers. 2021 24th Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN). :118–125.
Incast traffic is a many-to-one communication pattern used in many applications, including distributed storage, web-search with partition/aggregation design pattern, and MapReduce, commonly in data centers. It is generally composed of short-lived flows that may be queued behind large flows' packets in congested switches where performance degradation is observed. Smart buffering at the switch level is sensed to mitigate this issue by automatically and dynamically adapting to traffic conditions changes in the highly dynamic data center environment. But for this dynamic and smart buffer management to become effectively beneficial for all the traffic, and especially for incast the most critical one, incast performance models that provide insights on how various factors affect it are needed. The literature lacks these types of models. The existing ones are analytical models, which are either tightly coupled with a particular protocol version or specific to certain empirical data. Motivated by this observation, we propose a machine-learning-based incast performance inference. With this prediction capability, smart buffering scheme or other QoS optimization algorithms could anticipate and efficiently optimize system parameters adjustment to achieve optimal performance. Since applying machine learning to networks managed in a distributed fashion is hard, the prediction mechanism will be deployed on an SDN control plane. We could then take advantage of SDN's centralized global view, its telemetry capabilities, and its management flexibility.
Khan, Maher, Babay, Amy.  2021.  Toward Intrusion Tolerance as a Service: Confidentiality in Partially Cloud-Based BFT Systems. 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :14–25.
Recent work on intrusion-tolerance has shown that resilience to sophisticated network attacks requires system replicas to be deployed across at least three geographically distributed sites. While commodity data centers offer an attractive solution for hosting these sites due to low cost and management overhead, their use raises significant confidentiality concerns: system operators may not want private data or proprietary algorithms exposed to servers outside their direct control. We present a new model for Byzantine Fault Tolerant replicated systems that moves toward “intrusion tolerance as a service”. Under this model, application logic and data are only exposed to servers hosted on the system operator's premises. Additional offsite servers hosted in data centers can support the needed resilience without executing application logic or accessing unencrypted state. We have implemented this approach in the open-source Spire system, and our evaluation shows that the performance overhead of providing confidentiality can be less than 4% in terms of latency.
Aldawood, Mansour, Jhumka, Arshad.  2021.  Secure Allocation for Graph-Based Virtual Machines in Cloud Environments. 2021 18th International Conference on Privacy, Security and Trust (PST). :1–7.

Cloud computing systems (CCSs) enable the sharing of physical computing resources through virtualisation, where a group of virtual machines (VMs) can share the same physical resources of a given machine. However, this sharing can lead to a so-called side-channel attack (SCA), widely recognised as a potential threat to CCSs. Specifically, malicious VMs can capture information from (target) VMs, i.e., those with sensitive information, by merely co-located with them on the same physical machine. As such, a VM allocation algorithm needs to be cognizant of this issue and attempts to allocate the malicious and target VMs onto different machines, i.e., the allocation algorithm needs to be security-aware. This paper investigates the allocation patterns of VM allocation algorithms that are more likely to lead to a secure allocation. A driving objective is to reduce the number of VM migrations during allocation. We also propose a graph-based secure VMs allocation algorithm (GbSRS) to minimise SCA threats. Our results show that algorithms following a stacking-based behaviour are more likely to produce secure VMs allocation than those following spreading or random behaviours.

Halabi, Talal.  2021.  Adaptive Security Risk Mitigation in Edge Computing: Randomized Defense Meets Prospect Theory. 2021 IEEE/ACM Symposium on Edge Computing (SEC). :432–437.

Edge computing supports the deployment of ubiquitous, smart services by providing computing and storage closer to terminal devices. However, ensuring the full security and privacy of computations performed at the edge is challenging due to resource limitation. This paper responds to this challenge and proposes an adaptive approach to defense randomization among the edge data centers via a stochastic game, whose solution corresponds to the optimal security deployment at the network's edge. Moreover, security risk is evaluated subjectively based on Prospect Theory to reflect realistic scenarios where the attacker and the edge system do not similarly perceive the status of the infrastructure. The results show that a non-deterministic defense policy yields better security compared to a static defense strategy.

Liang, Haolan, Ye, Chunxiao, Zhou, Yuangao, Yang, Hongzhao.  2021.  Anomaly Detection Based on Edge Computing Framework for AMI. 2021 IEEE International Conference on Electrical Engineering and Mechatronics Technology (ICEEMT). :385—390.
Aiming at the cyber security problem of the advanced metering infrastructure(AMI), an anomaly detection method based on edge computing framework for the AMI is proposed. Due to the characteristics of the edge node of data concentrator, the data concentrator has the capability of computing a large amount of data. In this paper, distributing the intrusion detection model on the edge node data concentrator of the AMI instead of the metering center, meanwhile, two-way communication of distributed local model parameters replaces a large amount of data transmission. The proposed method avoids the risk of privacy leakage during the communication of data in AMI, and it greatly reduces communication delay and computational time. In this paper, KDDCUP99 datasets is used to verify the effectiveness of the method. The results show that compared with Deep Convolutional Neural Network (DCNN), the detection accuracy of the proposed method reach 99.05%, and false detection rate only gets 0.74%, and the results indicts the proposed method ensures a high detection performance with less communication rounds, it also reduces computational consumption.
Zhang, Cuicui, Sun, Jiali, Lu, Ruixuan, Wang, Peng.  2021.  Anomaly Detection Model of Power Grid Data Based on STL Decomposition. 2021 IEEE 5th Information Technology,Networking,Electronic and Automation Control Conference (ITNEC). 5:1262—1265.
This paper designs a data anomaly detection method for power grid data centers. The method uses cloud computing architecture to realize the storage and calculation of large amounts of data from power grid data centers. After that, the STL decomposition method is used to decompose the grid data, and then the decomposed residual data is used for anomaly analysis to complete the detection of abnormal data in the grid data. Finally, the feasibility of the method is verified through experiments.
Giechaskiel, Ilias, Tian, Shanquan, Szefer, Jakub.  2021.  Cross-VM Information Leaks in FPGA-Accelerated Cloud Environments. 2021 IEEE International Symposium on Hardware Oriented Security and Trust (HOST). :91–101.
The availability of FPGAs in cloud data centers offers rapid, on-demand access to hardware compute resources that users can configure to their own needs. However, the low-level access to the hardware FPGA and associated resources such as PCIe, SSD, or DRAM also opens up threats of malicious attackers uploading designs that are able to infer information about other users or about the cloud infrastructure itself. In particular, this work presents a new, fast PCIe-contention-based channel that is able to transmit data between different FPGA-accelerated virtual machines with bandwidths reaching 2 kbps with 97% accuracy. This paper further demonstrates that the PCIe receiver circuits are able to not just receive covert transmissions, but can also perform fine-grained monitoring of the PCIe bus or detect different types of activities from other users' FPGA-accelerated virtual machines based on their PCIe traffic signatures. Beyond leaking information across different virtual machines, the ability to monitor the PCIe bandwidth over hours or days can be used to estimate the data center utilization and map the behavior of the other users. The paper also introduces further novel threats in FPGA-accelerated instances, including contention due to shared NVMe SSDs as well as thermal monitoring to identify FPGA co-location using the DRAM modules attached to the FPGA boards. This is the first work to demonstrate that it is possible to break the separation of privilege in FPGA-accelerated cloud environments, and highlights that defenses for public clouds using FPGAs need to consider PCIe, SSD, and DRAM resources as part of the attack surface that should be protected.
Li, Kun, Wang, Rui, Li, Haiwei, Hao, Yan.  2021.  A Network Attack Blocking Scheme Based on Threat Intelligence. 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP). :976–980.
In the current network security situation, the types of network threats are complex and changeable. With the development of the Internet and the application of information technology, the general trend is opener. Important data and important business applications will face more serious security threats. However, with the development of cloud computing technology, the trend of large-scale deployment of important business applications in cloud centers has greatly increased. The development and use of software-defined networks in cloud data centers have greatly reduced the effect of traditional network security boundary protection. How to find an effective way to protect important applications in open multi-step large-scale cloud data centers is a problem we need to solve. Threat intelligence has become an important means to solve complex network attacks, realize real-time threat early warning and attack tracking because of its ability to analyze the threat intelligence data of various network attacks. Based on the research of threat intelligence, machine learning, cloud central network, SDN and other technologies, this paper proposes an active defense method of network security based on threat intelligence for super-large cloud data centers.
Nath, Shubha Brata, Addya, Sourav Kanti, Chakraborty, Sandip, Ghosh, Soumya K.  2021.  Container-based Service State Management in Cloud Computing. 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM). :487—493.
In a cloud data center, the client requests are catered by placing the services in its servers. Such services are deployed through a sandboxing platform to ensure proper isolation among services from different users. Due to the lightweight nature, containers have become increasingly popular to support such sandboxing. However, for supporting effective and efficient data center resource usage with minimum resource footprints, improving the containers' consolidation ratio is significant for the cloud service providers. Towards this end, in this paper, we propose an exciting direction to significantly boost up the consolidation ratio of a data-center environment by effectively managing the containers' states. We observe that many cloud-based application services are event-triggered, so they remain inactive unless some external service request comes. We exploit the fact that the containers remain in an idle state when the underlying service is not active, and thus such idle containers can be checkpointed unless an external service request comes. However, the challenge here is to design an efficient mechanism such that an idle container can be resumed quickly to prevent the loss of the application's quality of service (QoS). We have implemented the system, and the evaluation is performed in Amazon Elastic Compute Cloud. The experimental results have shown that the proposed algorithm can manage the containers' states, ensuring the increase of consolidation ratio.
Kh., Djuraev R., R., Botirov S., O., Juraev F..  2021.  A simulation model of a cloud data center based on traditional networks and Software-defined network. 2021 International Conference on Information Science and Communications Technologies (ICISCT). :1–4.
In this article we have developed a simulation model in the Mininet environment for analyzing the operation of a software-defined network (SDN) in cloud data centers. The results of the simulation model of the operation of the SDN network on the Mininet emulator and the results of the simulation of the traditional network in the Graphical Network Simulator 3 emulator are presented.
Badran, Sultan, Arman, Nabil, Farajallah, Mousa.  2020.  Towards a Hybrid Data Partitioning Technique for Secure Data Outsourcing. 2020 21st International Arab Conference on Information Technology (ACIT). :1–9.
In light of the progress achieved by the technology sector in the areas of internet speed and cloud services development, and in addition to other advantages provided by the cloud such as reliability and easy access from anywhere and anytime, most data owners find an opportunity to take advantage of the cloud to store data. However, data owners find a challenge that was and is still facing them in the field of outsourcing, which is protecting sensitive data from leakage. Researchers found that partitioning data into partitions, based on data sensitivity, can be used to protect data from leakage and to increase performance by storing the partition, which contains sensitive data in an encrypted form. In this paper, we review the methods used in designing partitions and dividing data approaches. A hybrid data partitioning approach is proposed to improve these techniques. We consider the frequency attack types used to guess the sensitive data and the most important properties that must be available in order for the encryption to be strong against frequency attacks.
Calvo, Miguel, Beltrán, Marta.  2021.  Remote Attestation as a Service for Edge-Enabled IoT. 2021 IEEE International Conference on Services Computing (SCC). :329–339.
The Internet of Things integrates multiple hardware appliances from large cloud data centres to constrained devices embedded within the physical reality, from multiple vendors and providers, under the same infrastructure. These appliances are subject to different restrictions, have different available resources and show different risk profiles and vulnerabilities. In these scenarios, remote attestation mechanisms are essential, enabling the verification of a distant appliance’s internal state before allowing it to access sensitive data or execute critical workloads. This work proposes a new attestation approach based on a Trusted Platform Module (TPM), devoted to performing Remote Attestation as a Service (RAaaS) while guaranteeing essential properties such as flexibility, generality, domain separation and authorized initiation. The proposed solution can prove both edge devices and IoT devices reliability to services running on cloud data centres. Furthermore, the first prototype of this service has been validated and evaluated via a real use case.
Feng, Na, Yin, Qiangguo.  2020.  Research on Computer Software Engineering Database Programming Technology Based on Virtualization Cloud Platform. 2020 IEEE 3rd International Conference of Safe Production and Informatization (IICSPI). :696—699.
The most important advantage of database is that it can form an intensive management system and serve a large number of information users, which shows the importance of information security in network development. However, there are many problems in the current computer software engineering industry, which seriously hinder the development of computer software engineering, among which the most remarkable and prominent one is that the database programming technology is difficult to be effectively utilized. In this paper, virtualization technology is used to manage the underlying resources of data center with the application background of big data technology, and realize the virtualization of network resources, storage resources and computing resources. It can play a constructive role in the construction of data center, integrate traditional and old resources, realize the computing data center system through virtualization, distributed storage and resource scheduling, and realize the clustering and load balancing of non-relational databases.
Flores, Hugo, Tran, Vincent, Tang, Bin.  2020.  PAM PAL: Policy-Aware Virtual Machine Migration and Placement in Dynamic Cloud Data Centers. IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. :2549—2558.
We focus on policy-aware data centers (PADCs), wherein virtual machine (VM) traffic traverses a sequence of middleboxes (MBs) for security and performance purposes, and propose two new VM placement and migration problems. We first study PAL: policy-aware virtual machine placement. Given a PADC with a data center policy that communicating VM pairs must satisfy, the goal of PAL is to place the VMs into the PADC to minimize their total communication cost. Due to dynamic traffic loads in PADCs, however, above VM placement may no longer be optimal after some time. We thus study PAM: policy-aware virtual machine migration. Given an existing VM placement in the PADC and dynamic traffic rates among communicating VMs, PAM migrates VMs in order to minimize the total cost of migration and communication of the VM pairs. We design optimal, approximation, and heuristic policyaware VM placement and migration algorithms. Our experiments show that i) VM migration is an effective technique, reducing total communication cost of VM pairs by 25%, ii) our PAL algorithms outperform state-of-the-art VM placement algorithm that is oblivious to data center policies by 40-50%, and iii) our PAM algorithms outperform the only existing policy-aware VM migration scheme by 30%.
Long, Saiqin, Li, Zhetao, Xing, Yun, Tian, Shujuan, Li, Dongsheng, Yu, Rong.  2020.  A Reinforcement Learning-Based Virtual Machine Placement Strategy in Cloud Data Centers. :223—230.
{With the widespread use of cloud computing, energy consumption of cloud data centers is increasing which mainly comes from IT equipment and cooling equipment. This paper argues that once the number of virtual machines on the physical machines reaches a certain level, resource competition occurs, resulting in a performance loss of the virtual machines. Unlike most papers, we do not impose placement constraints on virtual machines by giving a CPU cap to achieve the purpose of energy savings in cloud data centers. Instead, we use the measure of performance loss to weigh. We propose a reinforcement learning-based virtual machine placement strategy(RLVMP) for energy savings in cloud data centers. The strategy considers the weight of virtual machine performance loss and energy consumption, which is finally solved with the greedy strategy. Simulation experiments show that our strategy has a certain improvement in energy savings compared with the other algorithms.
Xu, Lei, Gao, Zhimin, Fan, Xinxin, Chen, Lin, Kim, Hanyee, Suh, Taeweon, Shi, Weidong.  2020.  Blockchain Based End-to-End Tracking System for Distributed IoT Intelligence Application Security Enhancement. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1028–1035.
IoT devices provide a rich data source that is not available in the past, which is valuable for a wide range of intelligence applications, especially deep neural network (DNN) applications that are data-thirsty. An established DNN model provides useful analysis results that can improve the operation of IoT systems in turn. The progress in distributed/federated DNN training further unleashes the potential of integration of IoT and intelligence applications. When a large number of IoT devices are deployed in different physical locations, distributed training allows training modules to be deployed to multiple edge data centers that are close to the IoT devices to reduce the latency and movement of large amounts of data. In practice, these IoT devices and edge data centers are usually owned and managed by different parties, who do not fully trust each other or have conflicting interests. It is hard to coordinate them to provide end-to-end integrity protection of the DNN construction and application with classical security enhancement tools. For example, one party may share an incomplete data set with others, or contribute a modified sub DNN model to manipulate the aggregated model and affect the decision-making process. To mitigate this risk, we propose a novel blockchain based end-to-end integrity protection scheme for DNN applications integrated with an IoT system in the edge computing environment. The protection system leverages a set of cryptography primitives to build a blockchain adapted for edge computing that is scalable to handle a large number of IoT devices. The customized blockchain is integrated with a distributed/federated DNN to offer integrity and authenticity protection services.
Mujib, M., Sari, R. F..  2020.  Performance Evaluation of Data Center Network with Network Micro-segmentation. 2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE). :27—32.

Research on the design of data center infrastructure is increasing, both from academia and industry, due to the rapid development of cloud-based applications such as search engines, social networks, and large-scale computing. On a large scale, data centers can consist of hundreds to thousands of servers that require systems with high-performance requirements and low downtime. To meet the network's needs in a dynamic data center, infrastructure of applications and services are growing. It takes a process of designing a network topology so that it can guarantee availability and security. One way to surmount this is by implementing the zero trust security model based on micro-segmentation. Zero trust is a security idea based on the principle of "never trust, always verify" in which no concepts of trust and untrust in network traffic. The zero trust security model implemented network traffic in the form of untrust. Micro-segmentation is a way to achieve zero trust by dividing a network into smaller logical segments to restrict the traffic. In this research, data center network performance based on software-defined networking with zero trust security model using micro-segmentation has been evaluated using a testbed simulation of Cisco Application Centric Infrastructure by measuring the round trip time, jitter, and packet loss during experiments. Performance evaluation results show that micro-segmentation adds an average round trip time of 4 μs and jitter of 11 μs without packet loss so that the security can be improved without significantly affecting network performance on the data center.

Sabek, I., Chandramouli, B., Minhas, U. F..  2019.  CRA: Enabling Data-Intensive Applications in Containerized Environments. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1762—1765.
Today, a modern data center hosts a wide variety of applications comprising batch, interactive, machine learning, and streaming applications. In this paper, we factor out the commonalities in a large majority of these applications, into a generic dataflow layer called Common Runtime for Applications (CRA). In parallel, another trend, with containerization technologies (e.g., Docker), has taken a serious hold on cloud-scale data centers, with direct implications on building next generation of data center applications. Container orchestrators (e.g., Kubernetes) have made deployment a lot easy, and they solve many infrastructure level problems, e.g., service discovery, auto-restart, and replication. For best in class performance, there is a need to marry the next generation applications with containerization technologies. To that end, CRA leverages and builds upon the containerization and resource orchestration capabilities of Kubernetes/Docker, and makes it easy to build a wide range of cloud-edge applications on top. To the best of our knowledge, we are the first to present a cloud native runtime for building data center applications. We show the efficiency of CRA through various micro-benchmarking experiments.
Yang, R., Ouyang, X., Chen, Y., Townend, P., Xu, J..  2018.  Intelligent Resource Scheduling at Scale: A Machine Learning Perspective. 2018 IEEE Symposium on Service-Oriented System Engineering (SOSE). :132—141.

Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.

Zhang, Y., Deng, L., Chen, M., Wang, P..  2018.  Joint Bidding and Geographical Load Balancing for Datacenters: Is Uncertainty a Blessing or a Curse? IEEE/ACM Transactions on Networking. 26:1049—1062.

We consider the scenario where a cloud service provider (CSP) operates multiple geo-distributed datacenters to provide Internet-scale service. Our objective is to minimize the total electricity and bandwidth cost by jointly optimizing electricity procurement from wholesale markets and geographical load balancing (GLB), i.e., dynamically routing workloads to locations with cheaper electricity. Under the ideal setting where exact values of market prices and workloads are given, this problem reduces to a simple linear programming and is easy to solve. However, under the realistic setting where only distributions of these variables are available, the problem unfolds into a non-convex infinite-dimensional one and is challenging to solve. One of our main contributions is to develop an algorithm that is proven to solve the challenging problem optimally, by exploring the full design space of strategic bidding. Trace-driven evaluations corroborate our theoretical results, demonstrate fast convergence of our algorithm, and show that it can reduce the cost for the CSP by up to 20% as compared with baseline alternatives. This paper highlights the intriguing role of uncertainty in workloads and market prices, measured by their variances. While uncertainty in workloads deteriorates the cost-saving performance of joint electricity procurement and GLB, counter-intuitively, uncertainty in market prices can be exploited to achieve a cost reduction even larger than the setting without price uncertainty.

Zhou, K., Sun, S., Wang, H., Huang, P., He, X., Lan, R., Li, W., Liu, W., Yang, T..  2019.  Improving Cache Performance for Large-Scale Photo Stores via Heuristic Prefetching Scheme. IEEE Transactions on Parallel and Distributed Systems. 30:2033–2045.
Photo service providers are facing critical challenges of dealing with the huge amount of photo storage, typically in a magnitude of billions of photos, while ensuring national-wide or world-wide satisfactory user experiences. Distributed photo caching architecture is widely deployed to meet high performance expectations, where efficient still mysterious caching policies play essential roles. In this work, we present a comprehensive study on internet-scale photo caching algorithms in the case of QQPhoto from Tencent Inc., the largest social network service company in China. We unveil that even advanced cache algorithms can only perform at a similar level as simple baseline algorithms and there still exists a large performance gap between these cache algorithms and the theoretically optimal algorithm due to the complicated access behaviors in such a large multi-tenant environment. We then expound the reasons behind this phenomenon via extensively investigating the characteristics of QQPhoto workloads. Finally, in order to realistically further improve QQPhoto cache efficiency, we propose to incorporate a prefetcher in the cache stack based on the observed immediacy feature that is unique to the QQPhoto workload. The prefetcher proactively prefetches selected photos into cache before they are requested for the first time to eliminate compulsory misses and promote hit ratios. Our extensive evaluation results show that with appropriate prefetching we improve the cache hit ratio by up to 7.4 percent, while reducing the average access latency by 6.9 percent at a marginal cost of 4.14 percent backend network traffic compared to the original system that performs no prefetching.