TL;DR:
My iPhone 16 Pro Max produces garbage output when running MLX LLMs. An iPhone 15 Pro runs the same code perfectly. A MacBook Pro also runs the same code perfectly. The tensor outputs on the 16 show numerical values an order of magnitude wrong. I suspect it points
Under the rock where reading comprehension exists apparently.
Where he was prompting for “What is 2+2?” to the LLMs, the accuracy of the answer was immaterial. At that step he was comparing two systems and simply needed a static question to give both system to compare the internal processes to determine why they arrived at different outputs (or a what appeared to be race condition/infinite loop for one) when the result should be identical to both irrespective of how right or wrong the answer is to the prompt. The LLM answer from the LLM could have been “ham sandwich” and it still would have served his purposes.