Hybrid-attention based Feature-reconstructive Adversarial Hashing Networks for Cross-modal Retrieval
Abstract
With the massive growth of data of various modal types, people no longer use a single modal retrieval method, but a cross-modal retrieval method when performing retrieval tasks. Such methods often need to store data efficiently while maintaining the characteristics of fast query. Because the hashing learning method can represent the original high-dimensional data through a simple and compact binary hash code, which can greatly compress the data size and facilitate data storage and mutual retrieval, the cross-modal hashing retrieval has gradually become a hot topic in recent years. However, how to bridge the gap between modalities to improve the retrieval performance further is still a challenging problem. In order to solve this problem, we propose a Hybrid-attention based Feature-reconstructive Adversarial Hashing (HFAH) networks for cross-modal retrieval. First, a label semantic guidance module is introduced to guide the extraction process of text features and image features through the learning of labels, so as to fully maintain the semantic similarity between different modal data. Then, the hybrid-attention module is introduced to make the extracted data contain richer semantic information. Finally, the feature reconstruction network is used to make the relevant degree between similar cross-modal data pairs higher than that between dissimilar data pairs. Related experiments on two benchmark datasets confirm to us that HFAH performs better than several existing cross- modal retrieval methods.