Visible to the public Robust Hashing With Local Models for Approximate Similarity Search

TitleRobust Hashing With Local Models for Approximate Similarity Search
Publication TypeJournal Article
Year of Publication2014
AuthorsJingkuan Song, Yi Yang, Xuelong Li, Zi Huang, Yang Yang
JournalCybernetics, IEEE Transactions on
Date PublishedJuly
Keywords1-norm minimization, approximate similarity search, binary hash codes, computational complexity, database point, Databases, dimensionality curse, feature dimensionality, file organisation, high-dimensional data, high-dimensional data point mapping, indexing, l2, Laplace equations, Linear programming, local hashing model, local structural information, loss function, Nickel, optimal hash code, query data point, query hash code, query processing, real-life datasets, RHLM, robust hash function learning, robust hashing, robust hashing-with-local models, Robustness, search efficiency, search quality, Training, Training data, training data points, training dataset

Similarity search plays an important role in many applications involving high-dimensional data. Due to the known dimensionality curse, the performance of most existing indexing structures degrades quickly as the feature dimensionality increases. Hashing methods, such as locality sensitive hashing (LSH) and its variants, have been widely used to achieve fast approximate similarity search by trading search quality for efficiency. However, most existing hashing methods make use of randomized algorithms to generate hash codes without considering the specific structural information in the data. In this paper, we propose a novel hashing method, namely, robust hashing with local models (RHLM), which learns a set of robust hash functions to map the high-dimensional data points into binary hash codes by effectively utilizing local structural information. In RHLM, for each individual data point in the training dataset, a local hashing model is learned and used to predict the hash codes of its neighboring data points. The local models from all the data points are globally aligned so that an optimal hash code can be assigned to each data point. After obtaining the hash codes of all the training data points, we design a robust method by employing l2,1-norm minimization on the loss function to learn effective hash functions, which are then used to map each database point into its hash code. Given a query data point, the search process first maps it into the query hash code by the hash functions and then explores the buckets, which have similar hash codes to the query hash code. Extensive experimental results conducted on real-life datasets show that the proposed RHLM outperforms the state-of-the-art methods in terms of search quality and efficiency.

Citation Key6714849