$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 24 hours ago

$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

RedWeasel@lemmy.world · 20 hours ago

Honestly makes more sense than shoving everything into one model. Some of those main models seem to just use one to do everything and I would break them into parts. Have a model to filter the input and try to categorize it and feed it into the proper model. Like having a “council of experts” and ask to proper “expert”.

I am not an expert into AI models

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 19 hours ago

That’s part of the idea with the whole mixture of experts (MoE) approach in newer models actually.

Rather than using a single neural net that’s say 512 wide, you split it into eight channels/experts of 64. If the neural net can pick the correct channel for each inference, then you only have to run 1/8th of the neurons on every forward pass. Of course, once you have your 8 channels/experts in parallel, you now need to decide which expert/channel to use for each token you want to process. This is called a router which takes in an input and decides which expert/channel to send it to. The router itself is a tiny neural network. It is a matrix that converts the input vectors to a router choice. And the router itself has a small set of trainable weights that gets trained together with the MoE.

$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

GitHub - itigges22/ATLAS: Adaptive Test-time Learning and Autonomous Specialization