Made a llama.cpp version

#39
by notune - opened

If anyone needs to use it with llama.cpp, before the official implementaion, i implemented this: https://github.com/notune/llama.cpp/tree/model-glm-ocr
Testing vs ollama showed a slight speedup, but since i wrote this with ai, it probably doesnt meet the quality standards of the llama.cpp repo.

Seems like we have an official implementation now: https://github.com/ggml-org/llama.cpp/pull/19677

notune changed discussion status to closed

Sign up or log in to comment