Visible to the public Biblio

Filters: Keyword is search problems  [Clear All Filters]
2021-07-27
Zheng, Zhihao, Cao, Zhenfu, Shen, Jiachen.  2020.  Practical and Secure Circular Range Search on Private Spatial Data. 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :639–645.
With the location-based services (LBS) booming, the volume of spatial data inevitably explodes. In order to reduce local storage and computational overhead, users tend to outsource data and initiate queries to the cloud. However, sensitive data or queries may be compromised if cloud server has access to raw data and plaintext token. To cope with this problem, searchable encryption for geometric range is applied. Geometric range search has wide applications in many scenarios, especially the circular range search. In this paper, a practical and secure circular range search scheme (PSCS) is proposed to support searching for spatial data in a circular range. With our scheme, a semi-honest cloud server will return data for a given circular range correctly without uncovering index privacy or query privacy. We propose a polynomial split algorithm which can decompose the inner product calculation neatly. Then, we define the security of our PSCS formally and prove that it is secure under same-closeness-pattern chosen-plaintext attacks (CLS-CPA) in theory. In addition, we demonstrate the efficiency and accuracy through analysis and experiments compared with existing schemes.
Xu, Jiahui, Wang, Chen, Li, Tingting, Xiang, Fengtao.  2020.  Improved Adversarial Attack against Black-box Machine Learning Models. 2020 Chinese Automation Congress (CAC). :5907–5912.
The existence of adversarial samples makes the security of machine learning models in practical application questioned, especially the black-box adversarial attack, which is very close to the actual application scenario. Efficient search for black-box attack samples is helpful to train more robust models. We discuss the situation that the attacker can get nothing except the final predict label. As for this problem, the current state-of-the-art method is Boundary Attack(BA) and its variants, such as Biased Boundary Attack(BBA), however it still requires large number of queries and kills a lot of time. In this paper, we propose a novel method to solve these shortcomings. First, we improved the algorithm for generating initial adversarial samples with smaller L2 distance. Second, we innovatively combine a swarm intelligence algorithm - Particle Swarm Optimization(PSO) with Biased Boundary Attack and propose PSO-BBA method. Finally, we experiment on ImageNet dataset, and compared our algorithm with the baseline algorithm. The results show that:(1)our improved initial point selection algorithm effectively reduces the number of queries;(2)compared with the most advanced methods, our PSO-BBA method improves the convergence speed while ensuring the attack accuracy;(3)our method has a good effect on both targeted attack and untargeted attack.
2021-05-18
Liu, Xiaodong, Chen, Zezong, Wang, Yuhao, Zhou, Fuhui, Ma, Shuai, Hu, Rose Qingyang.  2020.  Secure Beamforming Designs in MISO Visible Light Communication Networks with SLIPT. GLOBECOM 2020 - 2020 IEEE Global Communications Conference. :1–6.
Visible light communication (VLC) is a promising technique in the fifth and beyond wireless communication networks. In this paper, a secure multiple-input single-output VLC network is studied, where simultaneous lightwave information and power transfer (SLIPT) is exploited to support energy-limited devices taking into account a practical non-linear energy harvesting model. Specifically, the optimal beamforming design problems for minimizing transmit power and maximizing the minimum secrecy rate are studied under the imperfect channel state information (CSI). S-Procedure and a bisection search is applied to tackle challenging non-convex problems and to obtain efficient resource allocation algorithm. It is proved that optimal beamforming schemes can be obtained. It is found that there is a non-trivial trade-off between the average harvested power and the minimum secrecy rate. Moreover, we show that the quality of CSI has a significant impact on achievable performance.
2021-04-27
Chen, B., Wu, L., Li, L., Choo, K. R., He, D..  2020.  A Parallel and Forward Private Searchable Public-Key Encryption for Cloud-Based Data Sharing. IEEE Access. 8:28009–28020.
Data sharing through the cloud is flourishing with the development of cloud computing technology. The new wave of technology will also give rise to new security challenges, particularly the data confidentiality in cloud-based sharing applications. Searchable encryption is considered as one of the most promising solutions for balancing data confidentiality and usability. However, most existing searchable encryption schemes cannot simultaneously satisfy requirements for both high search efficiency and strong security due to lack of some must-have properties, such as parallel search and forward security. To address this problem, we propose a variant searchable encryption with parallelism and forward privacy, namely the parallel and forward private searchable public-key encryption (PFP-SPE). PFP-SPE scheme achieves both the parallelism and forward privacy at the expense of slightly higher storage costs. PFP-SPE has similar search efficiency with that of some searchable symmetric encryption schemes but no key distribution problem. The security analysis and the performance evaluation on a real-world dataset demonstrate that the proposed scheme is suitable for practical application.
2021-03-30
Khan, W. Z., Arshad, Q.-u-A., Hakak, S., Khan, M. K., Saeed-Ur-Rehman.  2020.  Trust Management in Social Internet of Things: Architectures, Recent Advancements and Future Challenges. IEEE Internet of Things Journal. :1—1.

Social Internet of Things (SIoT) is an extension of Internet of Things (IoT) that converges with Social networking concepts to create Social networks of interconnected smart objects. This convergence allows the enrichment of the two paradigms, resulting into new ecosystems. While IoT follows two interaction paradigms, human-to-human (H2H) and thing-to-thing (T2T), SIoT adds on human-to-thing (H2T) interactions. SIoT enables smart “Social objects” that intelligently mimic the social behavior of human in the daily life. These social objects are equipped with social functionalities capable of discovering other social objects in the surroundings and establishing social relationships. They crawl through the social network of objects for the sake of searching for services and information of interest. The notion of trust and trustworthiness in social communities formed in SIoT is still new and in an early stage of investigation. In this paper, our contributions are threefold. First, we present the fundamentals of SIoT and trust concepts in SIoT, clarifying the similarities and differences between IoT and SIoT. Second, we categorize the trust management solutions proposed so far in the literature for SIoT over the last six years and provide a comprehensive review. We then perform a comparison of the state of the art trust management schemes devised for SIoT by performing comparative analysis in terms of trust management process. Third, we identify and discuss the challenges and requirements in the emerging new wave of SIoT, and also highlight the challenges in developing trust and evaluating trustworthiness among the interacting social objects.

2021-03-22
shree, S. R., Chelvan, A. Chilambu, Rajesh, M..  2020.  Optimization of Secret Key using cuckoo Search Algorithm for ensuring data integrity in TPA. 2020 International Conference on Computer Communication and Informatics (ICCCI). :1–5.
Optimization plays an important role in many problems that expect the accurate output. Security of the data stored in remote servers purely based on secret key which is used for encryption and decryption purpose. Many secret key generation algorithms such as RSA, AES are available to generate the key. The key generated by such algorithms are need to be optimized to provide more security to your data from unauthorized users as well as from the third party auditors(TPA) who is going to verify our data for integrity purpose. In this paper a method to optimize the secret key by using cuckoo search algorithm (CSA) is proposed.
2021-03-01
Xiao, R., Li, X., Pan, M., Zhao, N., Jiang, F., Wang, X..  2020.  Traffic Off-Loading over Uncertain Shared Spectrums with End-to-End Session Guarantee. 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall). :1–5.
As a promising solution of spectrum shortage, spectrum sharing has received tremendous interests recently. However, under different sharing policies of different licensees, the shared spectrum is heterogeneous both temporally and spatially, and is usually uncertain due to the unpredictable activities of incumbent users. In this paper, considering the spectrum uncertainty, we propose a spectrum sharing based delay-tolerant traffic off-loading (SDTO) scheme. To capture the available heterogeneous shared bands, we adopt a mesh cognitive radio network and employ the multi-hop transmission mode. To statistically guarantee the end-to-end (E2E) session request under the uncertain spectrum supply, we formulate the SDTO scheme into a stochastic optimization problem, which is transformed into a mixed integer nonlinear programming (MINLP) problem. Then, a coarse-fine search based iterative heuristic algorithm is proposed to solve the MINLP problem. Simulation results demonstrate that the proposed SDTO scheme can well schedule the network resource with an E2E session guarantee.
2021-02-22
Li, M., Zhang, Y., Sun, Y., Wang, W., Tsang, I. W., Lin, X..  2020.  I/O Efficient Approximate Nearest Neighbour Search based on Learned Functions. 2020 IEEE 36th International Conference on Data Engineering (ICDE). :289–300.
Approximate nearest neighbour search (ANNS) in high dimensional space is a fundamental problem in many applications, such as multimedia database, computer vision and information retrieval. Among many solutions, data-sensitive hashing-based methods are effective to this problem, yet few of them are designed for external storage scenarios and hence do not optimized for I/O efficiency during the query processing. In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os. The proposed index consists of several lists of point IDs, ordered by values that are obtained by learned hashing (i.e., mapping) functions on each corresponding data point. The functions are learned from the data and approximately preserve the order in the high-dimensional space. We consider two instantiations of the functions (linear and non-linear), both learned from the data with novel objective functions. We also develop an I/O efficient ANNS framework based on the index. Comprehensive experiments on six benchmark datasets show that our proposed methods with learned index structure perform much better than the state-of-the-art external memory-based ANNS methods in terms of I/O efficiency and accuracy.
Bashyam, K. G. Renga, Vadhiyar, S..  2020.  Fast Scalable Approximate Nearest Neighbor Search for High-dimensional Data. 2020 IEEE International Conference on Cluster Computing (CLUSTER). :294–302.
K-Nearest Neighbor (k-NN) search is one of the most commonly used approaches for similarity search. It finds extensive applications in machine learning and data mining. This era of big data warrants efficiently scaling k-NN search algorithms for billion-scale datasets with high dimensionality. In this paper, we propose a solution towards this end where we use vantage point trees for partitioning the dataset across multiple processes and exploit an existing graph-based sequential approximate k-NN search algorithm called HNSW (Hierarchical Navigable Small World) for searching locally within a process. Our hybrid MPI-OpenMP solution employs techniques including exploiting MPI one-sided communication for reducing communication times and partition replication for better load balancing across processes. We demonstrate computation of k-NN for 10,000 queries in the order of seconds using our approach on 8000 cores on a dataset with billion points in an 128-dimensional space. We also show 10X speedup over a completely k-d tree-based solution for the same dataset, thus demonstrating better suitability of our solution for high dimensional datasets. Our solution shows almost linear strong scaling.
Han, Z., Wang, F., Li, Z..  2020.  Research on Nearest Neighbor Data Association Algorithm Based on Target “Dynamic” Monitoring Model. 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC). 1:665–668.
In order to solve the problem that the Nearest Neighbor Data Association (NNDA) algorithm cannot detect the “dynamic” change of the target, this paper proposes the nearest neighbor data association algorithm based on the Targets “Dynamic” Monitoring Model (TDMM). Firstly, the gate searching and updating of targets are completed based on TDMM, then the NNDA algorithm is utilized to achieve the data association of targets to realize track updating. Finally, the NNDA algorithm based on TDMM is realized by simulation. The experimental results show that the algorithm proposed can achieve “dynamic” monitoring in multi-target data association, and have more obvious advantages than Multiple Hypothesis Tracking (MHT) in timeliness and association performance.
Oliver, J., Ali, M., Hagen, J..  2020.  HAC-T and Fast Search for Similarity in Security. 2020 International Conference on Omni-layer Intelligent Systems (COINS). :1–7.
Similarity digests have gained popularity for many security applications like blacklisting/whitelisting, and finding similar variants of malware. TLSH has been shown to be particularly good at hunting similar malware, and is resistant to evasion as compared to other similarity digests like ssdeep and sdhash. Searching and clustering are fundamental tools which help the security analysts and security operations center (SOC) operators in hunting and analyzing malware. Current approaches which aim to cluster malware are not scalable enough to keep up with the vast amount of malware and goodware available in the wild. In this paper, we present techniques which allow for fast search and clustering of TLSH hash digests which can aid analysts to inspect large amounts of malware/goodware. Our approach builds on fast nearest neighbor search techniques to build a tree-based index which performs fast search based on TLSH hash digests. The tree-based index is used in our threshold based Hierarchical Agglomerative Clustering (HAC-T) algorithm which is able to cluster digests in a scalable manner. Our clustering technique can cluster digests in O (n logn) time on average. We performed an empirical evaluation by comparing our approach with many standard and recent clustering techniques. We demonstrate that our approach is much more scalable and still is able to produce good cluster quality. We measured cluster quality using purity on 10 million samples obtained from VirusTotal. We obtained a high purity score in the range from 0.97 to 0.98 using labels from five major anti-virus vendors (Kaspersky, Microsoft, Symantec, Sophos, and McAfee) which demonstrates the effectiveness of the proposed method.
2020-12-11
Zhang, W., Byna, S., Niu, C., Chen, Y..  2019.  Exploring Metadata Search Essentials for Scientific Data Management. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). :83—92.

Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.

2020-12-01
Wang, S., Mei, Y., Park, J., Zhang, M..  2019.  A Two-Stage Genetic Programming Hyper-Heuristic for Uncertain Capacitated Arc Routing Problem. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :1606—1613.

Genetic Programming Hyper-heuristic (GPHH) has been successfully applied to automatically evolve effective routing policies to solve the complex Uncertain Capacitated Arc Routing Problem (UCARP). However, GPHH typically ignores the interpretability of the evolved routing policies. As a result, GP-evolved routing policies are often very complex and hard to be understood and trusted by human users. In this paper, we aim to improve the interpretability of the GP-evolved routing policies. To this end, we propose a new Multi-Objective GP (MOGP) to optimise the performance and size simultaneously. A major issue here is that the size is much easier to be optimised than the performance, and the search tends to be biased to the small but poor routing policies. To address this issue, we propose a simple yet effective Two-Stage GPHH (TS-GPHH). In the first stage, only the performance is to be optimised. Then, in the second stage, both objectives are considered (using our new MOGP). The experimental results showed that TS-GPHH could obtain much smaller and more interpretable routing policies than the state-of-the-art single-objective GPHH, without deteriorating the performance. Compared with traditional MOGP, TS-GPHH can obtain a much better and more widespread Pareto front.

Nam, C., Li, H., Li, S., Lewis, M., Sycara, K..  2018.  Trust of Humans in Supervisory Control of Swarm Robots with Varied Levels of Autonomy. 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). :825—830.

In this paper, we study trust-related human factors in supervisory control of swarm robots with varied levels of autonomy (LOA) in a target foraging task. We compare three LOAs: manual, mixed-initiative (MI), and fully autonomous LOA. In the manual LOA, the human operator chooses headings for a flocking swarm, issuing new headings as needed. In the fully autonomous LOA, the swarm is redirected automatically by changing headings using a search algorithm. In the mixed-initiative LOA, if performance declines, control is switched from human to swarm or swarm to human. The result of this work extends the current knowledge on human factors in swarm supervisory control. Specifically, the finding that the relationship between trust and performance improved for passively monitoring operators (i.e., improved situation awareness in higher LOAs) is particularly novel in its contradiction of earlier work. We also discover that operators switch the degree of autonomy when their trust in the swarm system is low. Last, our analysis shows that operator's preference for a lower LOA is confirmed for a new domain of swarm control.

2020-11-23
Ma, S..  2018.  Towards Effective Genetic Trust Evaluation in Open Network. 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). :563–569.
In open network environments, since there is no centralized authority to monitor misbehaving entities, malicious entities can easily cause the degradation of the service quality. Trust has become an important factor to ensure network security, which can help entities to distinguish good partners from bad ones. In this paper, trust in open network environment is regarded as a self-organizing system, using self-organization principle of human social trust propagation, a genetic trust evaluation method with self-optimization and family attributes is proposed. In this method, factors of trust evaluation include time, IP, behavior feedback and intuitive trust. Data structure of access record table and trust record table are designed to store the relationship between ancestor nodes and descendant nodes. A genetic trust search algorithm is designed by simulating the biological evolution process. Based on trust information of the current node's ancestors, heuristics generate randomly chromosome populations, whose structure includes time, IP address, behavior feedback and intuitive trust. Then crossover and mutation strategy is used to make the population evolutionary searching. According to the genetic searching termination condition, the optimal trust chromosome in the population is selected, and trust value of the chromosome is computed, which is the node's genetic trust evaluation result. The simulation result shows that the genetic trust evaluation method is effective, and trust evaluation process of the current node can be regarded as the process of searching for optimal trust results from the ancestor nodes' information. With increasing of ancestor nodes' genetic trust information, the trust evaluation result from genetic algorithm searching is more accurate, which can effectively solve the joint fraud problem.
2020-11-20
Paul, S., Padhy, N. P., Mishra, S. K., Srivastava, A. K..  2019.  UUCA: Utility-User Cooperative Algorithm for Flexible Load Scheduling in Distribution System. 2019 8th International Conference on Power Systems (ICPS). :1—6.
Demand response analysis in smart grid deployment substantiated itself as an important research area in recent few years. Two-way communication between utility and users makes peak load reduction feasible by delaying the operation of deferrable appliances. Flexible appliance rescheduling is preferred to the users compared to traditional load curtailment. Again, if users' preferences are accounted into appliance transferring process, then customers concede a little discomfort to help the utility in peak reduction. This paper presents a novel Utility-User Cooperative Algorithm (UUCA) to lower total electricity cost and gross peak demand while preserving users' privacy and preferences. Main driving force in UUCA to motivate the consumers is a new cost function for their flexible appliances. As a result, utility will experience low peak and due to electricity cost decrement, users will get reduced bill. However, to maintain privacy, the behaviors of one customer have not be revealed either to other customers or to the central utility. To justify the effectiveness, UUCA is executed separately on residential, commercial and industrial customers of a distribution grid. Harmony search optimization technique has proved itself superior compared to other heuristic search techniques to prove efficacy of UUCA.
2020-09-04
Jing, Huiyun, Meng, Chengrui, He, Xin, Wei, Wei.  2019.  Black Box Explanation Guided Decision-Based Adversarial Attacks. 2019 IEEE 5th International Conference on Computer and Communications (ICCC). :1592—1596.
Adversarial attacks have been the hot research field in artificial intelligence security. Decision-based black-box adversarial attacks are much more appropriate in the real-world scenarios, where only the final decisions of the targeted deep neural networks are accessible. However, since there is no available guidance for searching the imperceptive adversarial perturbation, boundary attack, one of the best performing decision-based black-box attacks, carries out computationally expensive search. For improving attack efficiency, we propose a novel black box explanation guided decision-based black-box adversarial attack. Firstly, the problem of decision-based adversarial attacks is modeled as a derivative-free and constraint optimization problem. To solve this optimization problem, the black box explanation guided constrained random search method is proposed to more quickly find the imperceptible adversarial example. The insights into the targeted deep neural networks explored by the black box explanation are fully used to accelerate the computationally expensive random search. Experimental results demonstrate that our proposed attack improves the attack efficiency by 64% compared with boundary attack.
2020-07-20
Komargodski, Ilan, Naor, Moni, Yogev, Eylon.  2017.  White-Box vs. Black-Box Complexity of Search Problems: Ramsey and Graph Property Testing. 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS). :622–632.
Ramsey theory assures us that in any graph there is a clique or independent set of a certain size, roughly logarithmic in the graph size. But how difficult is it to find the clique or independent set? If the graph is given explicitly, then it is possible to do so while examining a linear number of edges. If the graph is given by a black-box, where to figure out whether a certain edge exists the box should be queried, then a large number of queries must be issued. But what if one is given a program or circuit for computing the existence of an edge? This problem was raised by Buss and Goldberg and Papadimitriou in the context of TFNP, search problems with a guaranteed solution. We examine the relationship between black-box complexity and white-box complexity for search problems with guaranteed solution such as the above Ramsey problem. We show that under the assumption that collision resistant hash function exist (which follows from the hardness of problems such as factoring, discrete-log and learning with errors) the white-box Ramsey problem is hard and this is true even if one is looking for a much smaller clique or independent set than the theorem guarantees. In general, one cannot hope to translate all black-box hardness for TFNP into white-box hardness: we show this by adapting results concerning the random oracle methodology and the impossibility of instantiating it. Another model we consider is the succinct black-box, where there is a known upper bound on the size of the black-box (but no limit on the computation time). In this case we show that for all TFNP problems there is an upper bound on the number of queries proportional to the description size of the box times the solution size. On the other hand, for promise problems this is not the case. Finally, we consider the complexity of graph property testing in the white-box model. We show a property which is hard to test even when one is given the program for computing the graph. The hard property is whether the graph is a two-source extractor.
2020-06-12
Domniţa, Dan, Oprişa, Ciprian.  2018.  A genetic algorithm for obtaining memory constrained near-perfect hashing. 2018 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR). :1—6.

The problem of fast items retrieval from a fixed collection is often encountered in most computer science areas, from operating system components to databases and user interfaces. We present an approach based on hash tables that focuses on both minimizing the number of comparisons performed during the search and minimizing the total collection size. The standard open-addressing double-hashing approach is improved with a non-linear transformation that can be parametrized in order to ensure a uniform distribution of the data in the hash table. The optimal parameter is determined using a genetic algorithm. The paper results show that near-perfect hashing is faster than binary search, yet uses less memory than perfect hashing, being a good choice for memory-constrained applications where search time is also critical.

[Anonymous].  2018.  Discrete Locally-Linear Preserving Hashing. {2018 25th IEEE International Conference on Image Processing (ICIP). :490—494.

Recently, hashing has attracted considerable attention for nearest neighbor search due to its fast query speed and low storage cost. However, existing unsupervised hashing algorithms have two problems in common. Firstly, the widely utilized anchor graph construction algorithm has inherent limitations in local weight estimation. Secondly, the locally linear structure in the original feature space is seldom taken into account for binary encoding. Therefore, in this paper, we propose a novel unsupervised hashing method, dubbed “discrete locally-linear preserving hashing”, which effectively calculates the adjacent matrix while preserving the locally linear structure in the obtained hash space. Specifically, a novel local anchor embedding algorithm is adopted to construct the approximate adjacent matrix. After that, we directly minimize the reconstruction error with the discrete constrain to learn the binary codes. Experimental results on two typical image datasets indicate that the proposed method significantly outperforms the state-of-the-art unsupervised methods.

2020-05-22
Varricchio, Valerio, Frazzoli, Emilio.  2018.  Asymptotically Optimal Pruning for Nonholonomic Nearest-Neighbor Search. 2018 IEEE Conference on Decision and Control (CDC). :4459—4466.
Nearest-Neighbor Search (NNS) arises as a key component of sampling-based motion planning algorithms and it is known as their asymptotic computational bottleneck. Algorithms for exact Nearest-Neighbor Search rely on explicit distance comparisons to different extents. However, in motion planning, evaluating distances is generally a computationally demanding task, since the metric is induced by the minimum cost of steering a dynamical system between states. In the presence of driftless nonholonomic constraints, we propose efficient pruning techniques for the k-d tree algorithm that drastically reduce the number of distance evaluations performed during a query. These techniques exploit computationally convenient lower and upper bounds to the geodesic distance of the corresponding sub-Riemannian geometry. Based on asymptotic properties of the reachable sets, we show that the proposed pruning techniques are optimal, modulo a constant factor, and we provide experimental results with the Reeds-Shepp vehicle model.
Yan, Donghui, Wang, Yingjie, Wang, Jin, Wang, Honggang, Li, Zhenpeng.  2018.  K-nearest Neighbor Search by Random Projection Forests. 2018 IEEE International Conference on Big Data (Big Data). :4775—4781.
K-nearest neighbor (kNN) search has wide applications in many areas, including data mining, machine learning, statistics and many applied domains. Inspired by the success of ensemble methods and the flexibility of tree-based methodology, we propose random projection forests, rpForests, for kNN search. rpForests finds kNNs by aggregating results from an ensemble of random projection trees with each constructed recursively through a series of carefully chosen random projections. rpForests achieves a remarkable accuracy in terms of fast decay in the missing rate of kNNs and that of discrepancy in the kNN distances. rpForests has a very low computational complexity. The ensemble nature of rpForests makes it easily run in parallel on multicore or clustered computers; the running time is expected to be nearly inversely proportional to the number of cores or machines. We give theoretical insights by showing the exponential decay of the probability that neighboring points would be separated by ensemble random projection trees when the ensemble size increases. Our theory can be used to refine the choice of random projections in the growth of trees, and experiments show that the effect is remarkable.
Kang, Hyunjoong, Hong, Sanghyun, Lee, Kookjin, Park, Noseong, Kwon, Soonhyun.  2018.  On Integrating Knowledge Graph Embedding into SPARQL Query Processing. 2018 IEEE International Conference on Web Services (ICWS). :371—374.
SPARQL is a standard query language for knowledge graphs (KGs). However, it is hard to find correct answer if KGs are incomplete or incorrect. Knowledge graph embedding (KGE) enables answering queries on such KGs by inferring unknown knowledge and removing incorrect knowledge. Hence, our long-term goal in this line of research is to propose a new framework that integrates KGE and SPARQL, which opens various research problems to be addressed. In this paper, we solve one of the most critical problems, that is, optimizing the performance of nearest neighbor (NN) search. In our evaluations, we demonstrate that the search time of state-of-the-art NN search algorithms is improved by 40% without sacrificing answer accuracy.
Chen, Yalin, Li, Zhiyang, Shi, Jia, Liu, Zhaobin, Qu, Wenyu.  2018.  Stacked K-Means Hashing Quantization for Nearest Neighbor Search. 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM). :1—4.
Nowadays, with such a huge amount of information available online, one key challenge is how to retrieve target data efficiently. A recent state-of-art solution, k-means hashing (KMH), codes data via a string of binary code obtained by iterative k-means clustering and binary code optimizing. To deal with high dimensional data, KMH divides the space into low-dimensional subspaces, places a hypercube in each subspace and finds its proper location by the mentioned optimizing process. However, the complexity of the optimization increases rapidly when the dimension of the hypercube increases. To address this issue, we propose an improved hashing method stacked k-means hashing (SKMH). The main idea is to increase the approximation by a coarse-to-fine multi-layer lower-dimensional cubes. With these kinds of lower-dimensional cubes, SKMH can achieve a similar approximation ability via a less optimizing time, compared with KMH method using higher-dimensional cubes. Extensive experiments have been conducted on two public databases, demonstrating the performance of our method by some common metrics in fast nearest neighbor search.
Ito, Toshitaka, Itotani, Yuri, Wakabayashi, Shin'ichi, Nagayama, Shinobu, Inagi, Masato.  2018.  A Nearest Neighbor Search Engine Using Distance-Based Hashing. 2018 International Conference on Field-Programmable Technology (FPT). :150—157.
This paper proposes an FPGA-based nearest neighbor search engine for high-dimensional data, in which nearest neighbor search is performed based on distance-based hashing. The proposed hardware search engine implements a nearest neighbor search algorithm based on an extension of flexible distance-based hashing (FDH, for short), which finds an exact solution with high probability. The proposed engine is a parallel processing and pipelined circuit so that search results can be obtained in a short execution time. Experimental results show the effectiveness and efficiency of the proposed engine.