This is a simple Python script that:
-
Finds all CSS files within a given directory (and its subdirectories).
-
Creates permutations of pairs.
Note: This means that File A will be compared to File B, but the script will be prevented from comparing File B to File A (i.e. doing the same work a second time).
-
Compares files contents for similarity.
-
Sorts the output from highest percentage of similarity to lowest.
-
Calculates the average percentage of similarity between all pairings.
-
Outputs everything to a
results.txtfile.
It was written to compare student assignment submissions for large projects and to help flag overly similar code. If two assignments are flagged as sharing a large amount of the same code, they should be manually reviewed for said similarities.
The path for the output is the users desktop. This should work across most operating systems (Linux, macOS, Windows).
-
The error handling is extremely minimal (a simple loop with a try/catch block).
-
If two files are empty (or nearly empty), there can be a false positive. Again, this doesn't indicate plagiarism, and is why everything should be manually checked.