I have already Developed a Typing Software to capture Text Typed by candidates in my institutes using PHP & MySQL. In the continuation process, I am stuck with a strategic issue as to how should I compare the Similarity of Texts typed by the Candidates with the Standard Paragraph which I had given them to Type(in the form of Hard Copy, though the same copy is also stored in the MySQL database). My dilemma is that, whether I would use the Levensthein Distance Algorithm in PHP or in MySQL directly itself so that the performance issue is optimized. Actually. I am afraid if Programming in PHP would come out erroneous while evaluating the Texts. It is worthwhile to mention here that the Texts would be compared to get the rank on the basis of Words Typed Per Minute.
I have already Developed a Typing Software to capture Text Typed by candidates in
Share
The simplest solution would be to utilize PHP’s built-in
levenshteindocs function to compare the two blocks of text. If you wanted to back the processing off to the MySQL database, you could implement the solution listed in Levenshtein: MySQL + PHPStackOverflowAnother PHP option might be the
similar_textdocs function.The unfortunate drawback for the PHP levenshtein function is that it cannot handle strings longer than 255 characters. As per the php manual docs:
So, if your paragraphs are longer than that you may be forced to implement a MySQL solution, though. I suppose you could break the paragraphs up into 255-character blocks for comparison (though I can’t say definitively that this won’t “break” the levenshtein algorithm).
I’m not an expert in linguistics parsing and processing, so I can’t speak to whether these are the best solutions (as you mention in your question). They are, however, very straightforward and simple to implement.