Visible to the public Web Caching

SoS Newsletter- Advanced Book Block

Web Caching

Web Caching Web caches offer a potential for mischief. With the expanded need for caching capability with the cloud and mobile communications, the need for more and better security has also grown. The articles cited here address cache security issues including geo-inference attacks, scriptless timing attacks, and a proposed incognito tab. Other research on caching generally is cited. These articles appeared in 2014.

  • Jia, Y.; Dong, X.; Liang, Z.; Saxena, P., "I Know Where You've Been: Geo-Inference Attacks via the Browser Cache," Internet Computing, IEEE, vol. PP, no.99, pp.1, 1, August 2014. doi: 10.1109/MIC.2014.103 Many websites customize their services according to different geo-locations of users, to provide more relevant content and better responsiveness, including Google, Craigslist, etc. Recently, mobile devices further allow web applications to directly read users' geo-location information from GPS sensors. However, if geo-oriented websites leave location-sensitive content in the browser cache, other sites can sniff users' geo-locations by utilizing timing side-channels. In this paper, we demonstrate that such geo-location leakage channels are widely open in popular web applications today, including 62 percent of Alexa Top 100 websites. With geo-inference attacks that measure the timing of browser cache queries, we can locate users' countries, cities and neighborhoods in our case studies. We also discuss existing defenses and propose a more balanced solution to defeat such attacks with minor performance overhead.
    Keywords: (not provided) (ID#:14-3050)
  • Bin Liang; Wei You; Liangkun Liu; Wenchang Shi; Heiderich, M., "Scriptless Timing Attacks on Web Browser Privacy," Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on, pp.112,123, 23-26 June 2014. doi: 10.1109/DSN.2014.93 The existing Web timing attack methods are heavily dependent on executing client-side scripts to measure the time. However, many techniques have been proposed to block the executions of suspicious scripts recently. This paper presents a novel timing attack method to sniff users' browsing histories without executing any scripts. Our method is based on the fact that when a resource is loaded from the local cache, its rendering process should begin earlier than when it is loaded from a remote website. We leverage some Cascading Style Sheets (CSS) features to indirectly monitor the rendering of the target resource. Three practical attack vectors are developed for different attack scenarios and applied to six popular desktop and mobile browsers. The evaluation shows that our method can effectively sniff users' browsing histories with very high precision. We believe that modern browsers protected by script-blocking techniques are still likely to suffer serious privacy leakage threats.
    Keywords: data privacy; online front-ends; CSS features; Web browser privacy; Web timing attack methods; cascading style sheets; client-side scripts; desktop browser; mobile browser; privacy leakage threats; rendering process; script-blocking techniques; scriptless timing attacks; user browsing history; Animation; Browsers; Cascading style sheets; History; Rendering (computer graphics); Timing; Web privacy; browsing history; scriptless attack; timing attack (ID#:14-3051)
  • Qingsong Wei; Cheng Chen; Jun Yang, "CBM: A Cooperative Buffer Management for SSD," Mass Storage Systems and Technologies (MSST), 2014 30th Symposium on, pp.1, 12, 2-6 June 2014. doi: 10.1109/MSST.2014.6855545 Random writes significantly limit the application of Solid State Drive (SSD) in the I/O intensive applications such as scientific computing, Web services, and database. While several buffer management algorithms are proposed to reduce random writes, their ability to deal with workloads mixed with sequential and random accesses is limited. In this paper, we propose a cooperative buffer management scheme referred to as CBM, which coordinates write buffer and read cache to fully exploit temporal and spatial localities among I/O intensive workload. To improve both buffer hit rate and destage sequentiality, CBM divides write buffer space into Page Region and Block Region. Randomly written data is put in the Page Region at page granularity, while sequentially written data is stored in the Block Region at block granularity. CBM leverages threshold-based migration to dynamically classify random write from sequential writes. When a block is evicted from write buffer, CBM merges the dirty pages in write buffer and the clean pages in read cache belonging to the evicted block to maximize the possibility of forming full block write. CBM has been extensively evaluated with simulation and real implementation on OpenSSD. Our testing results conclusively demonstrate that CBM can achieve up to 84% performance improvement and 85% garbage collection overhead reduction compared to existing buffer management schemes.
    Keywords: cache storage ;flash memories; input-output programs; CBM; I/O intensive workload; OpenSSD; block granularity; block region; buffer hit rate; buffer management algorithms; cooperative buffer management; flash memory; garbage collection overhead reduction; page region; performance improvement; random write reduction; solid state drive; write sequentiality; Algorithm design and analysis; Buffer storage; Flash memories; Nonvolatile memory; Power line communications; Radiation detectors; Random access memory; buffer hit ratio; cooperative buffer management; flash memory; write sequentiality (ID#:14-3052)
  • Gomaa, H.; Messier, G.G.; Davies, R., "Hierarchical Cache Performance Analysis Under TTL-Based Consistency," Networking, IEEE/ACM Transactions on, vol. PP, no. 99, pp. 1, 1, May 2014. doi: 10.1109/TNET.2014.2320723 This paper introduces an analytical model for characterizing the instantaneous hit ratio and instantaneous average hit distance of a traditional least recently used (LRU) cache hierarchy. The analysis accounts for the use of two variants of the Time-to-Live (TTL) weak consistency mechanism. The first is the typical TTL scheme (TTL-T) used in the HTTP/1.1 protocol where expired objects are refreshed using conditional GET requests. The second is TTL immediate ejection (TTL-IE) where objects are ejected as soon as they expire. The analysis also accounts for two sharing protocols: Leave Copy Everywhere (LCE) and Promote Cached Objects (PCO). PCO is a new sharing protocol introduced in this paper that decreases the user's perceived latency and is robust under nonstationary access patterns.
    Keywords: Analytical models; IEEE transactions; Markov processes; Measurement; Probability; Protocols; Servers; Analysis; Markov chain; Web; cache consistency; content-centric network (CCN);hierarchical cache; least recently used (LRU);time-to-live (TTL) (ID#:14-3053)
  • Kumar, K.; Bose, J., "User Data Management By Tabs During A Browsing Session," Digital Information and Communication Technology and it's Applications (DICTAP), 2014 Fourth International Conference on, pp.258,263, 6-8 May 2014.doi: 10.1109/DICTAP.2014.6821692 Nowadays, most browsers are multi tab, where the user activity is segregated in parallel sessions, one on each tab. However, the user data, including history, cookies and cache, while browsing is not similarly segregated and only accessible together. This presents difficulties for the user to access their data separately by the tab. In this paper, we seek to solve the problem by organizing tab specific browser data in different tabs. We implement the system and present alternate ways to visualize the tab specific data, and also show it does not lead to appreciable slowdown in the browser performance. We also propose a method to convert an incognito tab, where the data is not stored, while browsing into a normal tab and vice versa. Such methods of tabbed data management will enable the user to better organize and view the tab specific data.
    Keywords: data handling; online front-ends; Web browser; incognito tab; multitab browsing session; parallel sessions; tab specific browser data; user data management; Browsers; Clustering algorithms; Context; Databases; History; Organizing; Switches; android; incognito mode; tabbed browsing; user data; web browser (ID#:14-3054)
  • Kazi, A.W.; Badr, H., "Some Observations On The Performance of CCN-Flooding," Computing, Networking and Communications (ICNC), 2014 International Conference on, pp.334, 340, 3-6 Feb. 2014 doi: 10.1109/ICCNC.2014.6785356 We focus on one of the earliest forwarding strategies proposed for Content-Centric Networks (CCN), namely the CCN-Flooding approach to populate the Forwarding Information Bases (FIB) and forward packets. Pure CCN-Flooding in its own right is a potentially viable, though highly deprecated, option to forward packets. But CCN-Flooding is also proposed as an integral component of alternative forwarding strategies. Thus, it cannot entirely be dismissed, and its behavior merits study. We examine the CCN-Flooding approach using a combination of several topologies and workload sets with differing characteristics. In addition to topological effects, we identify various issues that arise, such as: the difficulty of calibrating Pending Interest Table (PIT) timeouts; a PIT-induced isolation effect that negatively impacts bandwidth consumption and system response time; and the effects of adopting or not adopting FIB routes based on volatile in-network cache entries. In conclusion, we briefly compare CCN-Flooding vs. CCN-Publication when the overhead bandwidth costs of pre-populating FIBs in the latter are also taken into account.
    Keywords: computer networks; packet radio networks; telecommunication network topology; CCN-flooding; CCN-publication; FIB; PIT timeouts; PIT-induced isolation effect; bandwidth consumption; behavior merits; content-centric networks; forward packets; forwarding information bases; integral component; pending interest table timeouts; several topology; system response time; topological effects; volatile in-network cache entries; workload sets; Bandwidth; Floods; IP networks; Measurement; Network topology; Topology; Web and internet services; CCN performance evaluation; bandwidth consumption; caching; forwarding (ID#:14-3055)
  • Lei Wang; Jianfeng Zhan; Chunjie Luo; Yuqing Zhu; Qiang Yang; Yongqiang He; Wanling Gao; Zhen Jia; Yingjie Shi; Shujie Zhang; Chen Zheng; Gang Lu; Zhan, K.; Xiaona Li; Bizhu Qiu, "BigDataBench: A Big Data Benchmark Suite From Internet Services," High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on, pp.488,499, 15-19 Feb. 2014. doi: 10.1109/HPCA.2014.6835958 As architecture, systems, and data management communities pay greater attention to innovative big data systems and architecture, the pressure of benchmarking and evaluating these systems rises. However, the complexity, diversity, frequently changed workloads, and rapid evolution of big data systems raise great challenges in big data benchmarking. Considering the broad use of big data systems, for the sake of fairness, big data benchmarks must include diversity of data and workloads, which is the prerequisite for evaluating big data systems and architecture. Most of the state-of-the-art big data benchmarking efforts target evaluating specific types of applications or system software stacks, and hence they are not qualified for serving the purposes mentioned above. This paper presents our joint research efforts on this issue with several industrial partners. Our big data benchmark suite-BigDataBench not only covers broad application scenarios, but also includes diverse and representative data sets. Currently, we choose 19 big data benchmarks from dimensions of application scenarios, operations/ algorithms, data types, data sources, software stacks, and application types, and they are comprehensive for fairly measuring and evaluating big data systems and architecture. BigDataBench is publicly available from the project home page Also, we comprehensively characterize 19 big data workloads included in BigDataBench with varying data inputs. On a typical state-of-practice processor, Intel Xeon E5645, we have the following observations: First, in comparison with the traditional benchmarks: including PARSEC, HPCC, and SPECCPU, big data applications have very low operation intensity, which measures the ratio of the total number of instructions divided by the total byte number of memory accesses; Second, the volume of data input has non-negligible impact on micro-architecture characteristics, which may impose challenges for simulation-based- big data architecture research; Last but not least, corroborating the observations in CloudSuite and DCBench (which use smaller data inputs), we find that the numbers of L1 instruction cache (L1I) misses per 1000 instructions (in short, MPKI) of the big data applications are higher than in the traditional benchmarks; also, we find that L3 caches are effective for the big data applications, corroborating the observation in DCBench.
    Keywords: Big Data; Web services; cache storage; memory architecture; Big Data benchmark suite; Big Data systems; BigDataBench; CloudSuite; DCBench; HPCC; Intel Xeon E5645;Internet services;L1 instruction cache misses; MPKI; PARSEC; SPECCPU; big data benchmark suite; big data benchmarking; data management community; data sources; data types; memory access; micro-architecture characteristics; simulation-based big data architecture research; software stacks; system software stack; Benchmark testing; Computer architecture; Search engines; Social network services; System software (ID#:14-3056)
  • Imtiaz, Al; Hossain, Md.Jayed, "Distributed Cache Management Architecture: To Reduce The Internet Traffic By Integrating Browser And Proxy Caches," Electrical Engineering and Information & Communication Technology (ICEEICT), 2014 International Conference on, pp.1,4, 10-12 April 2014. doi: 10.1109/ICEEICT.2014.6919088 The World Wide Web is one of the most popular Internet applications, and its traffic volume is increasing and evolving due to the popularity of social networking, file hosting, and video streaming sites. Wide ranges of research have been done on this field and numbers of architecture exist for caching those web content. Each of those has their own advantages and limitations. Browser caches handle single user by caching and storing web content on user computer. Where the proxy caches could handles thousands of users by handling, providing, and optimizing those web contents. But the World Wide Web (WWW) suffers from scaling and reliability problems due to overloaded and congested proxy servers. Distributed and Hierarchical architecture could be integrated as hybrid architecture for better performance and efficiency. Based on the secondary information by literature review, this paper is aimed to propose few feasible strategies to improve the cache management architecture by integrating browser with proxy caches server, where the browser cache will act as proxy cache server by sharing its content through hybrid architecture. This paper will also focus on the present architecture and challenges of current system that are needed to be resolved.
    Keywords: Browsers; Computer architecture; Computers; Internet; Protocols; Servers; Web pages; Browser cache; Cache management; Distributed cache; Web Traffic; Web cache (ID#:14-3057)
  • Einziger, G.; Friedman, R., "TinyLFU: A Highly Efficient Cache Admission Policy," Parallel, Distributed and Network-Based Processing (PDP), 2014 22nd Euromicro International Conference on, pp.146, 153, 12-14 Feb. 2014. doi: 10.1109/PDP.2014.34 This paper proposes to use a frequency based cache admission policy in order to boost the effectiveness of caches subject to skewed access distributions. Rather than deciding on which object to evict, TinyLFU decides, based on the recent access history, whether it is worth admitting an accessed object into the cache at the expense of the eviction candidate. Realizing this concept is enabled through a novel approximate LFU structure called TinyLFU, which maintains an approximate representation of the access frequency of recently accessed objects. TinyLFU is extremely compact and lightweight as it builds upon Bloom filter theory. The paper shows an analysis of the properties of TinyLFU including simulations of both synthetic workloads as well as YouTube and Wikipedia traces.
    Keywords: cache storage; data structures; Bloom filter theory; TinyLFU; Wikipedia; YouTube; access frequency; frequency based cache admission policy; novel approximate LFU structure; Approximation methods; Finite wordlength effects; Histograms; History; Memory management; Optimization; Radiation detectors; Cache; LFU; TinyLFU; approximate count; bloom filter; cloud cache; data cache; sketch; sliding window; web cache; zipf (ID#:14-3058)
  • Pal, M.B.; Jain, D.C., "Web Service Enhancement Using Web Pre-fetching by Applying Markov Model," Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on, pp.393,397, 7-9 April 2014. doi: 10.1109/CSNT.2014.84 Rapid growth of web application has increased the researcher's interests in this era. All over the world has surrounded by the computer network. There is a very useful application call web application used for the communication and data transfer. An application that is accessed via a web browser over a network is called the web application. Web caching is a well-known strategy for improving the performance of Web based system by keeping Web objects that are likely to be used in the near future in location closer to user. The Web caching mechanisms are implemented at three levels: client level, proxy level and original server level. Significantly, proxy servers play the key roles between users and web sites in lessening of the response time of user requests and saving of network bandwidth. Therefore, for achieving better response time, an efficient caching approach should be built in a proxy server. This paper use FP growth, weighted rule mining concept and Markov model for fast and frequent web pre fetching in order to has improved user response of web page and expedites users visiting speed.
    Keywords: Markov processes; Web services; Web sites; cache storage; data mining; file servers; Markov model; Web application; Web based system performance; Web browser; Web caching mechanism; Web objects; Web page; Web prefetching; Web service enhancement; Web sites; client level caching; communication; computer network; data transfer; network bandwidth saving; proxy level caching; proxy server; server level caching; user request; user response; weighted rule mining concept; Cleaning; Markov processes; Servers; Web mining; Web pages; Log file; Web Services; data cleaning; log preprocessing (ID#:14-3059)
  • Johnson, T.; Seeling, P., "Desktop and Mobile Web Page Comparison: Characteristics, Trends, And Implications," Communications Magazine, IEEE, vol.52, no.9, pp.144, 151, September 2014. doi: 10.1109/MCOM.2014.6894465 The broad proliferation of mobile devices in recent years has drastically changed the means of accessing the World Wide Web. Describing a shift away from the desktop computer era for content consumption, predictions indicate that the main access of web-based content will come from mobile devices. Concurrently, the manner of content presentation has changed as well; web artifacts are allowing for richer media and higher levels of user interaction which is enabled by the increasing speeds of access networks. This article provides an overview of more than two years of high level web page characteristics by comparing the desktop and mobile client versions. Our study is the first long-term evaluation of differences as seen by desktop and mobile web browser clients. We showcase the main differentiating factors with respect to the number of web page object requests, their sizes, relationships, and web page object caching. We find that over time, initial page view sizes and number of objects increase faster for desktop versions. However, web page objects have similar sizes in both versions, though they exhibit a different composition by type of object in greater detail.
    Keywords: Web sites; microcomputers; mobile computing; online front-ends; subscriber loops; World Wide Web; access networks; broad proliferation; content consumption; desktop client versions; desktop computer; desktop web page; high level web page characteristics; mobile client versions; mobile devices; mobile web browser clients; mobile web page; web artifacts; web page object caching; web-based content; Cascading style sheets; Internet; Market research; Mobile communication; Mobile handsets; Web pages (ID#:14-3060)
  • Pourmir, A.; Ramanathan, P., "Distributed caching and coding in VoD," Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on, pp.233, 238, April 27 2014-May 2 2014. doi: 10.1109/INFCOMW.2014.6849237 Caching decreases content access time by keeping contents closer to the clients. In this paper we show that network coding chunks of different contents and storing them in cache, can be beneficial. Recent research considers caching network coded chunks of same content, but not different contents. This paper proposes three different methods, IP, layered-IP and Greedy algorithm, with different performance and complexity. Simulation results show that caching encoded chunks of different contents can significantly reduce the average data access time. Although we evaluate our ideas using Video on Demand (VoD) application on cable networks, they can be extended to broader contexts including content distribution in peer-to-peer networks and proxy web caches.
    Keywords: IP networks; Internet; cache storage; client-server systems; computational complexity; greedy algorithms; integer programming; network coding; peer-to-peer computing; video on demand; VoD application; average data access time; binary integer program; cable networks; caching network coded chunks; content distribution; greedy algorithm; layered-IP methods; peer-to-peer networks; proxy Web cache; video on demand application; Arrays; Conferences; Encoding; IP networks; Mathematical model; Probability; Servers; Binary Integer Program; caching; network coding (ID#:14-3061)
  • Fankhauser, T.; Qi Wang; Gerlicher, A.; Grecos, C.; Xinheng Wang, "Web Scaling Frameworks: A Novel Class Of Frameworks For Scalable Web Services In Cloud Environments," Communications (ICC), 2014 IEEE International Conference on, pp.1760, 1766, 10-14 June 2014. doi: 10.1109/ICC.2014.6883577 The social web and huge growth of mobile smart devices dramatically increases the performance requirements for web services. State-of-the-art Web Application Frameworks (WAFs) do not offer complete scaling concepts with automatic resource-provisioning, elastic caching or guaranteed maximum response times. These functionalities, however, are supported by cloud computing and needed to scale an application to its demands. Components like proxies, load-balancers, distributed caches, queuing and messaging systems have been around for a long time and in each field relevant research exists. Nevertheless, to create a scalable web service it is seldom enough to deploy only one component. In this work we propose to combine those complementary components to a predictable, composed system. The proposed solution introduces a novel class of web frameworks called Web Scaling Frameworks (WSFs) that take over the scaling. The proposed mathematical model allows a universally applicable prediction of performance in the single-machine- and multi-machine scope. A prototypical implementation is created to empirically validate the mathematical model and demonstrates both the feasibility and increase of performance of a WSF. The results show that the application of a WSF can triple the requests handling capability of a single machine and additionally reduce the number of total machines by 44%.
    Keywords: Web services; cache storage; cloud computing; WSFs; Web application frameworks; Web scaling frameworks; automatic resource-provisioning; cloud computing; cloud environments; distributed caches; elastic caching; guaranteed maximum response times; load-balancers; mathematical model; messaging systems; mobile smart devices; proxies; queuing; scalable Web services; social Web; Concurrent computing; Delays; Mathematical model; Multimedia communication; Radio frequency; Web services (ID#:14-3062)
  • Guedes, Erico A.C.; Silva, Luis E.T.; Maciel, Paulo R.M., "Performability Analysis Of I/O Bound Application On Container-Based Server Virtualization Cluster," Computers and Communication (ISCC), 2014 IEEE Symposium on, pp.1, 7, 23-26 June 2014. doi: 10.1109/ISCC.2014.6912556 Use of server virtualization for providing applications produces overloads that degrade the performance of provided systems. The use of container-based virtualization enabled a narrowing of this overload. On this work, we go a step forward and demonstrate how a broad tuning combination of several performance factors concerning to web cache server - the I/O bound application analysed - to file system and to operating system, led to a higher performance of proposed cluster, when it is executed on a container-based operating system virtualization environment. Availability and performance similarity of web cache service, under non-virtualized and virtualized systems, were evaluated when submitted to proposed web workload. Results reveal that web cache service provided under virtual environment, without unresponsiveness fails due to overload, i. e., with high availability, presents a 6% higher hit ratio and a 21.4% lower response time than those observed on non-virtualized environments.
    Keywords: Availability; Operating systems; Protocols; Servers; Throughput; Time factors; Virtualization; Container-based Operating Systems; Performability; Server Virtualization (ID#:14-3063)
  • Akherfi, K.; Harroud, H.; Gerndt, M., "A Mobile Cloud Middleware To Support Mobility And Cloud Interoperability," Multimedia Computing and Systems (ICMCS), 2014 International Conference on, pp.1189, 1194, 14-16 April 2014. doi: 10.1109/ICMCS.2014.6911331 With the recent advances in cloud computing and the improvement in the capabilities of mobile devices in terms of speed, storage, and computing power, Mobile Cloud Computing (MCC) is emerging as one of important branches of cloud computing. MCC is an extension of cloud computing with the support of mobility. In this paper, we first present the specific concerns and key challenges in mobile cloud computing, we then discuss the different approaches to tackle the main issues in MCC that have been introduced so far, and finally we focus on describing the proposed overall architecture of a middleware that will contribute to providing mobile users data storage and processing services based on their mobile devices capabilities, availability, and usage. A prototype of the middleware is developed and three scenarios are described to demonstrate how the middleware performs in adapting the provision of cloud web services by transforming SOAP messages to REST and XML format to JSON, in optimizing the results by extracting relevant information, and in improving the availability by caching. Initial analysis shows that the mobile cloud middleware improves the quality of service for mobiles, and provides lightweight responses for mobile cloud services.
    Keywords: cloud computing; middleware; mobile computing; object-oriented methods; JSON format; MCC;REST format; SOAP messages; XML format; cloud interoperability; data processing service; data storage service; mobile cloud computing; mobile devices; mobility support; Cloud computing; Mobile communication; Mobile computing; Mobile handsets; Simple object access protocol; XML (ID#:14-3064)
  • Yizheng Chen; Antonakakis, M.; Perdisci, R.; Nadji, Y.; Dagon, D.; Wenke Lee, "DNS Noise: Measuring the Pervasiveness of Disposable Domains in Modern DNS Traffic," Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on, pp.598,609, 23-26 June 2014. doi: 10.1109/DSN.2014.61 In this paper, we present an analysis of a new class of domain names: disposable domains. We observe that popular web applications, along with other Internet services, systematically use this new class of domain names. Disposable domains are likely generated automatically, characterized by a "one-time use" pattern, and appear to be used as a way of "signaling" via DNS queries. To shed light on the pervasiveness of disposable domains, we study 24 days of live DNS traffic spanning a year observed at a large Internet Service Provider. We find that disposable domains increased from 23.1% to 27.6% of all queried domains, and from 27.6% to 37.2% of all resolved domains observed daily. While this creative use of DNS may enable new applications, it may also have unanticipated negative consequences on the DNS caching infrastructure, DNSSEC validating resolvers, and passive DNS data collection systems.
    Keywords: Internet; query processing; telecommunication traffic; ubiquitous computing; DNS caching infrastructure; DNS noise; DNS queries; DNS traffic; DNSSEC; Internet service provider; Internet services; Web applications; disposable domain pervasiveness measurement; live DNS traffic spanning; one-time use pattern; passive DNS data collection systems; signaling; Data collection Educational institutions; Google; Internet; Monitoring; Servers; Web and internet services; Disposable Domain Name; Internet Measurement (ID#:14-3065)
  • Rottenstreich, O.; Keslassy, I., "The Bloom Paradox: When Not to Use a Bloom Filter," Networking, IEEE/ACM Transactions on, vol. PP, no.99, pp.1, 1, Feb 2014. doi: 10.1109/TNET.2014.2306060 In this paper, we uncover the Bloom paradox in Bloom Filters: Sometimes, the Bloom Filter is harmful and should not be queried. We first analyze conditions under which the Bloom paradox occurs in a Bloom Filter and demonstrate that it depends on the a priori probability that a given element belongs to the represented set. We show that the Bloom paradox also applies to Counting Bloom Filters (CBFs) and depends on the product of the hashed counters of each element. In addition, we further suggest improved architectures that deal with the Bloom paradox in Bloom Filters, CBFs, and their variants. We further present an application of the presented theory in cache sharing among Web proxies. Lastly, using simulations, we verify our theoretical results and show that our improved schemes can lead to a large improvement in the performance of Bloom Filters and CBFs.
    Keywords: A priori membership probability; Bloom Filter; Counting Bloom Filter; the Bloom Filter paradox (ID#:14-3066)


Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to SoS.Project (at) for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.