Made a llama.cpp version
#39
by
notune
- opened
If anyone needs to use it with llama.cpp, before the official implementaion, i implemented this: https://github.com/notune/llama.cpp/tree/model-glm-ocr
Testing vs ollama showed a slight speedup, but since i wrote this with ai, it probably doesnt meet the quality standards of the llama.cpp repo.
notune
changed discussion status to
closed