Comparative Analysis of Similarity Measures for Extraction of Parallel Data

  • Manpreet Singh Lehal et al.

Abstract

Similarity and distance measures compute the similarity of two documents/sentences into single numeric value and brings out the degree of parallelism or distance from one another. A number of similarity measures have been used by the researchers but their effectiveness is not very clear. Selection of right similarity measure is crucial to the performance of translation tasks and extraction of parallel data. In this paper we have analyzed and compared the performance of four similarity and distance measures. Specifically we have done empirical analysis of Cosine Similarity, Jaccard Coefficient, Hamming Distance and Euclidean Distance.

Published
2019-12-12
How to Cite
et al., M. S. L. (2019). Comparative Analysis of Similarity Measures for Extraction of Parallel Data. International Journal of Control and Automation, 12(6), 408 - 417. Retrieved from https://sersc.org/journals/index.php/IJCA/article/view/2947