Performance Evaluation and Comparative Analysis of Code-Clone-Detection Techniques and Tools


  • Harpreet Kaur, Raman Maini


Since Code Cloning is the recent area of research in software engineering, it is crucial to have good understanding of all the code-clone-detection techniques. Clones in software development increases maintenance cost and it leads to poor software quality. This paper is basically combination of two issues: literature review of code clone detection techniques and experimental work for the evaluation of chosen techniques from literature. This paper firstly list out the various studies and then evaluates the performance of three chosen techniques (Text-based, Token-based and Tree-based) by means of automated tools. Netbeans-Javadoc, JBoss and Java-Quizz source codes has been examined to validate results. From the analysis it has been observe red that token based approach reports more false positives as compared to other techniques. Text based and token based approaches have precision values greater than tree based approach, but tree based approach has higher recall values. Token based, Tree based and metric based approaches are useful in combination with refactoring tools. It has been observed that in terms of speed, text-based approach is suitable to small size projects, but token based technique is scalable to large size projects also. Tree-based and token based techniques work effectively to detect near-miss clones and give more safe and sound result. DuDe, ccFinder, solid-SDD and cloneDr tools have been used for validation. From the experimental work it has been observed that Dude tool is suitable for small projects, but ccFinder is scalable from small to large projects. False positives are reported by ccFinder because of its token based approach, but cloneDr leads to minimum false positives as compared to ccFinder. The aim of the paper is to find the strengths and weaknesses of these techniques which will be helpful to select a clone detection technique for a particular purpose.