$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 1 day ago

$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

☆ Yσɠƚԋσʂ ☆@lemmy.ml · 20 hours ago

That’s part of the idea with the whole mixture of experts (MoE) approach in newer models actually.

Rather than using a single neural net that’s say 512 wide, you split it into eight channels/experts of 64. If the neural net can pick the correct channel for each inference, then you only have to run 1/8th of the neurons on every forward pass. Of course, once you have your 8 channels/experts in parallel, you now need to decide which expert/channel to use for each token you want to process. This is called a router which takes in an input and decides which expert/channel to send it to. The router itself is a tiny neural network. It is a matrix that converts the input vectors to a router choice. And the router itself has a small set of trainable weights that gets trained together with the MoE.

$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

$500 GPU outperforms Claude Sonnet on coding benchmarks using open-source AI system

GitHub - itigges22/ATLAS: Adaptive Test-time Learning and Autonomous Specialization