• RedWeasel@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    20 hours ago

    Honestly makes more sense than shoving everything into one model. Some of those main models seem to just use one to do everything and I would break them into parts. Have a model to filter the input and try to categorize it and feed it into the proper model. Like having a “council of experts” and ask to proper “expert”.

    • I am not an expert into AI models
    • ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
      link
      fedilink
      arrow-up
      7
      ·
      19 hours ago

      That’s part of the idea with the whole mixture of experts (MoE) approach in newer models actually.

      Rather than using a single neural net that’s say 512 wide, you split it into eight channels/experts of 64. If the neural net can pick the correct channel for each inference, then you only have to run 1/8th of the neurons on every forward pass. Of course, once you have your 8 channels/experts in parallel, you now need to decide which expert/channel to use for each token you want to process. This is called a router which takes in an input and decides which expert/channel to send it to. The router itself is a tiny neural network. It is a matrix that converts the input vectors to a router choice. And the router itself has a small set of trainable weights that gets trained together with the MoE.