Improvement of Deep Cross-Modal Retrieval System through Semantic Preserving Binary Hash Code Generation

Nikita Bhatt, Dr. Amit Ganatra

Nikita Bhatt, Dr. Amit Ganatra

Abstract

The amount of data which is semantically consistent with distinct statistical properties is called multi-modal data and combine them in one space is crucial task. In addition, to retrieve information of interest based on need and demand from multi-modal data is one of the open research challenge which gives birth to cross-modal retrieval. In this paper, various approaches for cross-modal retrieval is discussed where hashing is used for faster retrieval. But there is no dependency between feature generation and hash code generation which degrades the performance of the system. So to make retrieval process efficient, deep networks are used to generate features as well as hash code. In this paper, such deep based cross-modal retrieval methods are discussed. But existing deep based cross-modal system has used bag-of-word (BoW) model to map words into vector which is sparse in nature and does not preserve the semantic similarity between words. To resolve this problem, a predictive based model called word2vec is used. In addition, existing work has assumed that the generated binary code for each modality has the equal length but it is not always possible. So cosine similarity is used which normalize vectors. In this paper, experiment is performed on improved deep cross-modal retrieval (IDCMR) using MIRFLICKR-25K, NUS-WIDE-10k and XMediaNet dataset which contains image and text modality. This result is compared with state-of-the-art methods which proves that there is an improvement in image query → text database and vice versa.