A global network of researchers

Plagiarism-detecting software: What's the magic number?

By Alejandra Arreola Triana | Dec. 1, 2021  | Research skills Ethics

Recently, a student asked me to check his doctoral dissertation using plagiarism-detecting software. "The similarity report cannot be higher than 15%", he told me. 

The use of plagiarism-detecting software such as Turnitin, iThenticate, and others is becoming more widespread in academia. These programs scan the Internet and their proprietary databases looking for identical passages of text. On their website and training materials, these companies declare that they don't detect plagiarism, but rather levels of similarity, but what is the difference? And is it useful to assign a number to decide whether a text contains plagiarism or not?

What plagiarism-detecting software does is to identify and flag suspicious sections of text that are very similar to previously published passages--but it is incapable of telling whether these similar sections are plagiarized. Paraphrased text may be considered plagiarized if it is not properly attributed, and common phrases with identical wording may not necessarily be plagiarized. 

I asked my Master's adviser, Dr. Barbara Gastel, whether it is practical to use a cutoff to determine whether a text contains plagiarism. She told me  that "[o]ne really can't set a percentage overlap as the cutoff.  For example, a 500-page book that contains one line of poetry without attributing it contains plagiarism. On the other hand, a scientific paper with many pieces of wording such as "We randomly assigned patients to three groups," "The tubes were centrifuged," "Please see Table 1," "These differences were not statistically significant," and "More research is needed" might have a substantial percentage overlap with other papers but does not contain plagiarism."

In science, the key to avoid plagiarism is to both paraphrase and cite your sources. Some things, however, are difficult to paraphrase. For example, definitions can only be paraphrased so much before they become something else entirely. Methodologies are the same: there are only so many ways to unequivocally describe a technique. Plagiarism-detection software does not have this level of nuance, and therefore it is impractical to use these software solutions to summarily judge that a piece of writing is plagiarized or not. Every case must be analyzed by a human. 

blog comments powered by Disqus