I’ve a bunch of legacy javascript files looking really similar. I’d like to implement a copied/pasted code detection tool, but I was unable to find a description of an algorithm…
I’m already using sonar with the javascript plugin to detect this kind of code, but I’d like to have a finer-grained control over the detection…
Is there any “standard” algorithm for this problem ?
Is there any library to perform this analysis (python or java…)?
thanks.
You could take a look to CloneDigger, it is designed to detect clones in python or java code, but the algorithm is described here.