I got into the self-hosting scene this year when I wanted to start up my own website run on old recycled thinkpad. A lot of time was spent learning about ufw, reverse proxies, header security hardening, fail2ban.

Despite all that I still had a problem with bots knocking on my ports spamming my logs. I tried some hackery getting fail2ban to read caddy logs but that didnt work for me. I nearly considered giving up and going with cloudflare like half the internet does. But my stubbornness for open source self hosting and the recent cloudflare outages this year have encouraged trying alternatives.

Coinciding with that has been an increase in exposure to seeing this thing in the places I frequent like codeberg. This is Anubis, a proxy type firewall that forces the browser client to do a proof-of-work security check and some other nice clever things to stop bots from knocking. I got interested and started thinking about beefing up security.

I’m here to tell you to try it if you have a public facing site and want to break away from cloudflare It was VERY easy to install and configure with caddyfile on a debian distro with systemctl. In an hour its filtered multiple bots and so far it seems the knocks have slowed down.

https://anubis.techaro.lol/

My botspam woes have seemingly been seriously mitigated if not completely eradicated. I’m very happy with tonights little security upgrade project that took no more than an hour of my time to install and read through documentation. Current chain is caddy reverse proxy -> points to Anubis -> points to services

Good place to start for install is here

https://anubis.techaro.lol/docs/admin/native-install/

  • rtxn@lemmy.world
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    2
    ·
    edit-2
    7 hours ago

    POW is a far higher cost on your actual users than the bots.

    That sentence tells me that you either don’t understand or consciously ignore the purpose of Anubis. It’s not to punish the scrapers, or to block access to the website’s content. It is to reduce the load on the web server when it is flooded by scraper requests. Bots running headless Chrome can easily solve the challenge, but every second a client is working on the challenge is a second that the web server doesn’t have to waste CPU cycles on serving clankers.

    POW is an inconvenience to users. The flood of scrapers is an existential threat to independent websites. And there is a simple fact that you conveniently ignored: it fucking works.

    • sudo@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      5 hours ago

      Its like you didn’t understand anything I said. Anubis does work. I said it works. But it works because most AI crawlers don’t have a headless browser to solve the PoW. To operate efficiently at the high volume required, they use raw http requests. The vast majority are probably using basic python requests module.

      You don’t need PoW to throttle general access to your site and that’s not the fundamental assumption of PoW. PoW assumes (incorrectly) that bots won’t pay the extra flops to scrape the website. But bots are paid to scape the website users aren’t. They’ll just scale horizontally and open more parallel connections. They have the money.

      • poVoq@slrpnk.net
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        5 hours ago

        You are arguing a strawman. Anubis works because because most AI scrapers (currently) don’t want to spend extra on running headless chromium, and because it slightly incentivises AI scrapers to correctly identify themselves as such.

        Most of the AI scraping is frankly just shoddy code written by careless people that don’t want to ddos the independent web, but can’t be bothered to actually fix that on their side.

        • sudo@programming.dev
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          5 hours ago

          You are arguing a strawman. Anubis works because because most AI scrapers (currently) don’t want to spend extra on running headless chromium

          WTF, That’s what I already said? That was my entire point from the start!? You don’t need PoW to force headless usage. Any JavaScript challenge will suffice. I even said the Meta Refresh challenge Anubis provides is sufficient and explicitly recommended it.

          • poVoq@slrpnk.net
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            5 hours ago

            And how do you actually check for working JS in a way that can’t be easily spoofed? Hint: PoW is a good way to do that.

            Meta refresh is a downgrade in usability for everyone but a tiny minority that has disabled JS.

            • sudo@programming.dev
              link
              fedilink
              English
              arrow-up
              1
              ·
              4 hours ago

              And how do you actually check for working JS in a way that can’t be easily spoofed? Hint: PoW is a good way to do that.

              Accessing the browsers API in any way is way harder to spoof than some hashing. I already suggested checking if the browser has graphics acceleration. That would filter out the vast majority of headless browsers too. PoW is just math and is easy to spoof without running any JavaScript. You can even do it faster than real JavaScript users something like Rust or C.

              Meta refresh is a downgrade in usability for everyone but a tiny minority that has disabled JS.

              What are you talking about? It just refreshes the page without doing any of the extra computation that PoW does. What extra burden does it put on users?

              • poVoq@slrpnk.net
                link
                fedilink
                English
                arrow-up
                1
                ·
                4 hours ago

                If you check for GPU (not generally a bad idea) you will have the same people that currently complain about JS, complain about this breaking with their anti-fingerprinting browser addons.

                But no, you can’t spoof PoW obviously, that’s the entire point of it. If you do the calculation in Javascript or not doesn’t really matter for it to work.

                In the current shape Anubis has zero impact on usability for 99% of the site visitors, not so with meta refresh.

                • sudo@programming.dev
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  2 hours ago

                  You will have people complain about their anti-fingerprinting being blocked with every bot-managment solution. Your ability to navigate the internet anonymously is directly correlated with a bots ability to scrape. That has never been my complaint about Anubis.

                  My complaint is that the calculations Anubis forces you to do are absolutely negligible burden for a bot to solve. The hardest part is just having a JavaScript interpreter available. Making the author of the scraper write custom code to deal with your website is the most effective way to prevent bots.

                  Think about how much computing power AI data centers have. Do you think they give a shit about hashing some values for Anubis? No. They burn more compute power than a thousand Anubis challenges generating a single llm answer. PoW is a backwards solution.

                  Please Think. Captchas worked because they’re supposed to be hard for a computer to solve but are easy for a human. PoW is the opposite.

                  In the current shape Anubis has zero impact on usability for 99% of the site visitors, not so with meta refresh.

                  Again, I ask you: What extra burden does meta-refresh impose on users? How does setting a cookie and immediately refreshing the page burden the user more than making them wait longer while draining their battery before doing the exact same thing? Its strictly less intrusive.

                  • Nate Cox@programming.dev
                    link
                    fedilink
                    English
                    arrow-up
                    2
                    ·
                    2 hours ago

                    Heads up, you’re really invested in arguing with someone who does not appear to be arguing in good faith. Just block them and move on, you will be a happier person for it.

                  • poVoq@slrpnk.net
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    2 hours ago

                    No one is disputing that in theory (!) Anubis offers very little protection against an adversary that specifically tries to circumvent it, but we are dealing with an elephant in the porcelain shop kind of situation. The AI companies simply don’t care if they kill off small independently hosted web-applications with their scraping and Anubis is the mouse that is currently sufficient to make them back off.

                    And no, forced site reloads are extremely disruptive for web-applications and often force a lot of extra load for re-authentication etc. It is not as easy as you make it sound.