I just had a quick check of the site you linked to. Their algorithm appears to boil down to “longer comment == higher quality”. Not exactly a sophisticated algorithm. For example, this
asklfklasf kajslkjf akjs flkajsfklajs fkjaskfj aklsjf kajsfk ajskfj alksjf aklsjfkl asfjaklsjf
was given their top quality rating…
Some ideas to make this better:
- Check spelling (mispelled words reduce quality)
- Check for swear words and other profanity.
- Length is probably important, but I wouldn’t put much weight on it.
- Grammar would be good to check, although difficult.
- Running a spam filter over it would be a good first step.
Those are just some ideas. For the spelling and profanity, just check each words against a dictionary. Grammar would be more difficult as you start to move into natural language processing, which is a very deep area of research.
0
solved I need an algorithm to measure content quality