Some of them will “do math” but not with the LLM predictor, they have a math engine and the predictor decides when to use it. What’s great is when it outputs results, it’s not clear if it engaged the math engine or just guessed.
when it outputs results, it’s not clear if it engaged the math engine or just guessed
That depends on the harness though. In the plain model output it will be clear if a tool call happened, and it depends on the application UI around it whether that’s directly shown to the user, or if you only see the LLM’s final response based on it.
Some of them will “do math” but not with the LLM predictor, they have a math engine and the predictor decides when to use it. What’s great is when it outputs results, it’s not clear if it engaged the math engine or just guessed.
That depends on the harness though. In the plain model output it will be clear if a tool call happened, and it depends on the application UI around it whether that’s directly shown to the user, or if you only see the LLM’s final response based on it.