I got into the self-hosting scene this year when I wanted to start up my own website run on old recycled thinkpad. A lot of time was spent learning about ufw, reverse proxies, header security hardening, fail2ban.

Despite all that I still had a problem with bots knocking on my ports spamming my logs. I tried some hackery getting fail2ban to read caddy logs but that didnt work for me. I nearly considered giving up and going with cloudflare like half the internet does. But my stubbornness for open source self hosting and the recent cloudflare outages this year have encouraged trying alternatives.

Coinciding with that has been an increase in exposure to seeing this thing in the places I frequent like codeberg. This is Anubis, a proxy type firewall that forces the browser client to do a proof-of-work security check and some other nice clever things to stop bots from knocking. I got interested and started thinking about beefing up security.

I’m here to tell you to try it if you have a public facing site and want to break away from cloudflare It was VERY easy to install and configure with caddyfile on a debian distro with systemctl. In an hour its filtered multiple bots and so far it seems the knocks have slowed down.

https://anubis.techaro.lol/

My botspam woes have seemingly been seriously mitigated if not completely eradicated. I’m very happy with tonights little security upgrade project that took no more than an hour of my time to install and read through documentation. Current chain is caddy reverse proxy -> points to Anubis -> points to services

Good place to start for install is here

https://anubis.techaro.lol/docs/admin/native-install/

  • sudo@programming.dev
    link
    fedilink
    English
    arrow-up
    48
    arrow-down
    7
    ·
    17 hours ago

    I’ve repeatedly stated this before: Proof of Work bot-management is only Proof of Javascript bot-management. It is nothing to a headless browser to by-pass. Proof of JavaScript does work and will stop the vast majority of bot traffic. That’s how Anubis actually works. You don’t need to punish actual users by abusing their CPU. POW is a far higher cost on your actual users than the bots.

    Last I checked Anubis has an JavaScript-less strategy called “Meta Refresh”. It first serves you a blank HTML page with a <meta> tag instructing the browser to refresh and load the real page. I highly advise using the Meta Refresh strategy. It should be the default.

    I’m glad someone is finally making an open source and self hostable bot management solution. And I don’t give a shit about the cat-girls, nor should you. But Techaro admitted they had little idea what they were doing when they started and went for the “nuclear option”. Fuck Proof of Work. It was a Dead On Arrival idea decades ago. Techaro should strip it from Anubis.

    I haven’t caught up with what’s new with Anubis, but if they want to get stricter bot-management, they should check for actual graphics acceleration.

    • rtxn@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      2
      ·
      edit-2
      7 hours ago

      POW is a far higher cost on your actual users than the bots.

      That sentence tells me that you either don’t understand or consciously ignore the purpose of Anubis. It’s not to punish the scrapers, or to block access to the website’s content. It is to reduce the load on the web server when it is flooded by scraper requests. Bots running headless Chrome can easily solve the challenge, but every second a client is working on the challenge is a second that the web server doesn’t have to waste CPU cycles on serving clankers.

      POW is an inconvenience to users. The flood of scrapers is an existential threat to independent websites. And there is a simple fact that you conveniently ignored: it fucking works.

      • sudo@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        5 hours ago

        Its like you didn’t understand anything I said. Anubis does work. I said it works. But it works because most AI crawlers don’t have a headless browser to solve the PoW. To operate efficiently at the high volume required, they use raw http requests. The vast majority are probably using basic python requests module.

        You don’t need PoW to throttle general access to your site and that’s not the fundamental assumption of PoW. PoW assumes (incorrectly) that bots won’t pay the extra flops to scrape the website. But bots are paid to scape the website users aren’t. They’ll just scale horizontally and open more parallel connections. They have the money.

        • poVoq@slrpnk.net
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          5 hours ago

          You are arguing a strawman. Anubis works because because most AI scrapers (currently) don’t want to spend extra on running headless chromium, and because it slightly incentivises AI scrapers to correctly identify themselves as such.

          Most of the AI scraping is frankly just shoddy code written by careless people that don’t want to ddos the independent web, but can’t be bothered to actually fix that on their side.

          • sudo@programming.dev
            link
            fedilink
            English
            arrow-up
            2
            ·
            edit-2
            5 hours ago

            You are arguing a strawman. Anubis works because because most AI scrapers (currently) don’t want to spend extra on running headless chromium

            WTF, That’s what I already said? That was my entire point from the start!? You don’t need PoW to force headless usage. Any JavaScript challenge will suffice. I even said the Meta Refresh challenge Anubis provides is sufficient and explicitly recommended it.

            • poVoq@slrpnk.net
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              2
              ·
              5 hours ago

              And how do you actually check for working JS in a way that can’t be easily spoofed? Hint: PoW is a good way to do that.

              Meta refresh is a downgrade in usability for everyone but a tiny minority that has disabled JS.

              • sudo@programming.dev
                link
                fedilink
                English
                arrow-up
                1
                ·
                4 hours ago

                And how do you actually check for working JS in a way that can’t be easily spoofed? Hint: PoW is a good way to do that.

                Accessing the browsers API in any way is way harder to spoof than some hashing. I already suggested checking if the browser has graphics acceleration. That would filter out the vast majority of headless browsers too. PoW is just math and is easy to spoof without running any JavaScript. You can even do it faster than real JavaScript users something like Rust or C.

                Meta refresh is a downgrade in usability for everyone but a tiny minority that has disabled JS.

                What are you talking about? It just refreshes the page without doing any of the extra computation that PoW does. What extra burden does it put on users?

                • poVoq@slrpnk.net
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  4 hours ago

                  If you check for GPU (not generally a bad idea) you will have the same people that currently complain about JS, complain about this breaking with their anti-fingerprinting browser addons.

                  But no, you can’t spoof PoW obviously, that’s the entire point of it. If you do the calculation in Javascript or not doesn’t really matter for it to work.

                  In the current shape Anubis has zero impact on usability for 99% of the site visitors, not so with meta refresh.

                  • sudo@programming.dev
                    link
                    fedilink
                    English
                    arrow-up
                    2
                    ·
                    2 hours ago

                    You will have people complain about their anti-fingerprinting being blocked with every bot-managment solution. Your ability to navigate the internet anonymously is directly correlated with a bots ability to scrape. That has never been my complaint about Anubis.

                    My complaint is that the calculations Anubis forces you to do are absolutely negligible burden for a bot to solve. The hardest part is just having a JavaScript interpreter available. Making the author of the scraper write custom code to deal with your website is the most effective way to prevent bots.

                    Think about how much computing power AI data centers have. Do you think they give a shit about hashing some values for Anubis? No. They burn more compute power than a thousand Anubis challenges generating a single llm answer. PoW is a backwards solution.

                    Please Think. Captchas worked because they’re supposed to be hard for a computer to solve but are easy for a human. PoW is the opposite.

                    In the current shape Anubis has zero impact on usability for 99% of the site visitors, not so with meta refresh.

                    Again, I ask you: What extra burden does meta-refresh impose on users? How does setting a cookie and immediately refreshing the page burden the user more than making them wait longer while draining their battery before doing the exact same thing? Its strictly less intrusive.

    • SmokeyDope@piefed.socialOP
      link
      fedilink
      English
      arrow-up
      32
      arrow-down
      1
      ·
      edit-2
      16 hours ago

      Something that hasn’t been mentioned much in discussions about Anubis is that it has a graded tier system of how sketchy a client is and changing the kind of challenge based on a a weighted priority system.

      The default bot policies it comes with has it so squeaky clean regular clients are passed through, then only slightly weighted clients/IPs get the metarefresh, then its when you get to moderate-suspicion level that JavaScript Proof of Work kicks. The bot policy and weight triggers for these levels, challenge action, and duration of clients validity are all configurable.

      It seems to me that the sites who heavy hand the proof of work for every client with validity that only last every 5 minutes are the ones who are giving Anubis a bad wrap. The default bot policy settings Anubis comes with dont trigger PoW on the regular Firefox android clients ive tried including hardened ironfox. meanwhile other sites show the finger wag every connection no matter what.

      Its understandable why some choose strict policies but they give the impression this is the only way it should be done which Is overkill. I’m glad theres config options to mitigate impact normal user experience.

      • sudo@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        ·
        5 hours ago

        Anubis is that it has a graded tier system of how sketchy a client is and changing the kind of challenge based on a a weighted priority system.

        Last I checked that was just User-Agent regexes and IP lists. But that’s where Anubis should continue development, and hopefully they’ve improved since. Discerning real users from bots is how you do proper bot management. Not imposing a flat tax on all connections.

    • ___qwertz___@feddit.org
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      3
      ·
      11 hours ago

      Funnily enough, PoW was a hot topic in academia around the late 90s / early 2000, and it’s somewhat clear that the autor of Anubis has not read much about the discussion back then.

      There was a paper called “Proof of work does not work” (or similar, can’t be bothered to look it up) that argued that PoW can not work for spam protection, because you have to support both low-powered consumer devices while blocking spammers with heavy hardware. And that is very valid concern. Then there was a paper arguing that PoW can still work, as long as you scale the difficulty in such a way that a legit user (e.g. only sending one email) has a low difficulty, while a spammer (sending thousands of emails) has a high difficulty.

      The idea of blocking known bad actors actually is used in email quite a lot in forms of DNS block lists (DNSBLs) such as spamhaus (this has nothing to do with PoW, but such a distributed list could be used to determine PoW difficulty).

      Anubis on the other hand does nothing like that and a bot developed to pass Anubis would do so trivially.

      Sorry for long text.

      • sudo@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        5 hours ago

        Then there was a paper arguing that PoW can still work, as long as you scale the difficulty in such a way that a legit user

        Telling a legit user from a fake user is the entire game. If you can do that you just block the fake user. Professional bot blockers like Cloudflare or Akamai have machine learning systems to analyze trends in network traffic and serve JS challenges to suspicious clients. Last I checked, all Anubis uses is User-Agent filters, which is extremely behind the curve. Bots are able to get down to faking TLS fingerprints and matching them with User-Agents.

      • Flipper@feddit.org
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        1
        ·
        10 hours ago

        At least in the beginning the scrapers just used curl with a different user agent. Forcing them to use a headless client is already a 100x increase in resources for them. That in itself is already a small victory and so far it is working beautifully.

        • sudo@programming.dev
          link
          fedilink
          English
          arrow-up
          3
          ·
          5 hours ago

          Well in most cases it would by Python requests not curl. But yes, forcing them to use a browser is the real cost. Not just in CPU time but in programmer labor. PoW is overkill for that though.