I prefer waterfox, OpenAI can keep its Chat chippy tea browser.

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    edit-2
    3 days ago

    Not anymore.

    I can run GLM 4.6 on a Ryzen/single RTX 3090 desktop at 7 tokens/s, and it blows lesser API models away. I can run 14-49Bs (or GLM Air) in more utilitarian cases that do just fine.

    And I can reach for free/dirt cheap APIs called locally when needed.

    But again, it’s all ‘special interest tinkerer’ tier. You can’t do that with ollama run, you have to mess with exotic libraries and tweaked setups and RAG chains to squeeze out that kind of performance. But all that getting simplified is inevitable.

    • MagicShel@lemmy.zip
      link
      fedilink
      English
      arrow-up
      2
      ·
      3 days ago

      I’ll look into it. OAI’s 30B model is the most I can run in my MacBook and it’s decent. I don’t think I can even run that on my desktop with a 3060 GPU. I have access to GLM 4.6 through a service but that’s the ~350B parameter model and I’m pretty sure that’s not what you’re running at home.

      It’s pretty reasonable in capability. I want to play around with setting up RAG pipelines for specific domain knowledge, but I’m just getting started.

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        3 days ago

        I have access to GLM 4.6 through a service but that’s the ~350B parameter model and I’m pretty sure that’s not what you’re running at home.

        It is. I’m running this model, with hybrid CPU+GPU inference, specifically: https://huggingface.co/Downtown-Case/GLM-4.6-128GB-RAM-IK-GGUF

        You can likely run GLM Air on your 3060 desktop if you have 48GB+ RAM, or a smaller MoE easily. Heck. I’ll make a quant just for you, if you want.

        Depending on the use case, I’d recommend ERNIE 4.5 21B (or 28B for vision) on your Macbook, or a Qwen 30B variant. Look for DWQ MLX quants, specifically: https://huggingface.co/models?sort=modified&search=dwq

        • MagicShel@lemmy.zip
          link
          fedilink
          English
          arrow-up
          1
          ·
          3 days ago

          I’m going to upgrade my ram shortly because I found a bad stick and I’m down to 16GB currently. I’ll see if I can swing that order this weekend.

        • MagicShel@lemmy.zip
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          3 days ago

          I’m have to check. It’s a pro, not air, but I think it’s only 40GB total. I’m really new to Macs so the memory situation is unclear. I requested it at work specifically for its capability to run local AI.