ML Tweet Graveyard

Andrej Karpathy beatifully covers why llama.cpp works.

"How is LLaMa.cpp possible?"
great post by @finbarrtimbers https://t.co/yF43inlY87

llama.cpp surprised many people (myself included) with how quickly you can run large LLMs on small computers, e.g. 7B runs @ ~16 tok/s on a MacBook. Wait don't you need supercomputers to work… pic.twitter.com/EIp9iPkZ6x
— Andrej Karpathy (@karpathy) August 15, 2023