Visible to the public Improving Cache Performance for Large-Scale Photo Stores via Heuristic Prefetching Scheme

TitleImproving Cache Performance for Large-Scale Photo Stores via Heuristic Prefetching Scheme
Publication TypeJournal Article
Year of Publication2019
AuthorsZhou, K., Sun, S., Wang, H., Huang, P., He, X., Lan, R., Li, W., Liu, W., Yang, T.
JournalIEEE Transactions on Parallel and Distributed Systems
Keywordsaccess latency, advanced cache algorithms, backend network traffic, cache hit ratio, cache performance, cache stack, cache storage, Caching Algorithm, caching policies, China, cloud computing, composability, data centers, distributed photo caching architecture, distributed storage, heuristic prefetching scheme, high performance expectations, Human Behavior, image retrieval, Internet, Internet-scale Computing Security, Internet-scale photo caching algorithms, large-scale photo stores, largest social network service company, Metrics, multitenant environment, performance gap, photo service providers, photo storage, policy governance, Prediction algorithms, prefetcher, Prefetching, pubcrawl, QQPhoto cache efficiency, QQPhoto workload, Resiliency, Servers, simple baseline algorithms, social networking (online), Sun, user experiences
AbstractPhoto service providers are facing critical challenges of dealing with the huge amount of photo storage, typically in a magnitude of billions of photos, while ensuring national-wide or world-wide satisfactory user experiences. Distributed photo caching architecture is widely deployed to meet high performance expectations, where efficient still mysterious caching policies play essential roles. In this work, we present a comprehensive study on internet-scale photo caching algorithms in the case of QQPhoto from Tencent Inc., the largest social network service company in China. We unveil that even advanced cache algorithms can only perform at a similar level as simple baseline algorithms and there still exists a large performance gap between these cache algorithms and the theoretically optimal algorithm due to the complicated access behaviors in such a large multi-tenant environment. We then expound the reasons behind this phenomenon via extensively investigating the characteristics of QQPhoto workloads. Finally, in order to realistically further improve QQPhoto cache efficiency, we propose to incorporate a prefetcher in the cache stack based on the observed immediacy feature that is unique to the QQPhoto workload. The prefetcher proactively prefetches selected photos into cache before they are requested for the first time to eliminate compulsory misses and promote hit ratios. Our extensive evaluation results show that with appropriate prefetching we improve the cache hit ratio by up to 7.4 percent, while reducing the average access latency by 6.9 percent at a marginal cost of 4.14 percent backend network traffic compared to the original system that performs no prefetching.
Citation Keyzhou_improving_2019