• merdaverse@lemmy.zip
    link
    fedilink
    arrow-up
    8
    ·
    6 hours ago

    What I would love from DDG is a search option that removes those crappy AI generated websites that fill 90% of results nowadays. Ironically, they would have to use some kind of use AI to do it (classifiers)

    • CCRhode@lemmy.ml
      link
      fedilink
      arrow-up
      2
      ·
      1 hour ago

      Boy howdy, do I have just the script for you!

      https://pypi.org/project/clanker_score/

      Full disclosure: It doesn’t work. But the idea is nice: … that you could — perhaps in real life — identify AI-generated content. … so I wrote a framework that purports to do that.

      Keyword density is not the only measure of gloss. There are others that have been developed to measure ratios between parts of speech. Unfortunately none of these distinguish sharply between pages that naturally convey genuine information and pages that have been designed to convey fluff for ulterior purposes. It is unlikely that combining measures of gloss will result in a tool that discriminates much better than keyword density by itself.

      • Piskorski, Jakub, Marcin Sydow, and Weiss Weiss. “Exploring Linguistic Features for Web Spam Detection: A Preliminary Study.” Airweb '08: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. Ed. Carlos Castillo, Kumar Chellapilla, and Dennis Fetterly. New York: ACM, Apr. 2008. 25-28. ISBN:9781605581590. DOI:10.1145/1451983. 09 Nov. 2025 https://users.pja.edu.pl/~msyd/lingFeat08draft.pdf.
    • Four_mile_circus@lemmy.ml
      link
      fedilink
      arrow-up
      4
      ·
      5 hours ago

      The best way I’ve found to avoid AI is, sadly, to use the before:date search option and limit my search to sites created before 2023. There are plugins that do it for you automatically.

      It obviously doesn’t help if you want current events or the dankest of fresh memes. (Freshest of dank memes?) But if the information you want hasn’t changed much in the last few years, the difference is night and day.