ML Tweet Graveyard
Andrej Karpathy beatifully covers why llama.cpp works.
"How is LLaMa.cpp possible?"
— Andrej Karpathy (@karpathy) August 15, 2023
great post by @finbarrtimbers https://t.co/yF43inlY87
llama.cpp surprised many people (myself included) with how quickly you can run large LLMs on small computers, e.g. 7B runs @ ~16 tok/s on a MacBook. Wait don't you need supercomputers to work… pic.twitter.com/EIp9iPkZ6x