No text output inferencing with ollama?

#8
by frankslin - opened

I'm using the example image from https://cdn.bigmodel.cn/static/logo/introduction.png.

Apple M2, macOS 15.7.3.

$ ollama -v
ollama version is 0.15.5-rc1
$ sha256sum introduction.png
b01139689cb0682b3a1d6f3f3eda3f481101571654e3a23a1de5bf5520f3c6f5  introduction.png
$ ollama run glm-ocr Text Recognition: ./introduction.png
Added image './introduction.png'
```markdown

```markdown

```markdown

```text

When I tried with another file, I got this error:

Error: an error was encountered while running the model: GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0 ollama 0x0000000100e6d620 ggml_print_backtrace + 276


What am I missing?

我也遇见了同样的问题,我希望这个研究人员要端正态度,好好的把东西做好。

I am having the same problem serving glm-ocr from ollama.

  • Macbook Air M2, Tahoe 26.2
  • Ollama pre-release 0.15.5

Output for first three images of a non-complex PDF (https://www.fiw.uni-bonn.de/de/forschung/demokratieforschung/team/prof-dr-rudolf-stichweh/papers/pdfs/81_stw_niklas-luhmann-blackwell-companion-to-major-social-theorists.pdf):

Page 1

"<table bordered, no table, but no table, but no table, but no table, no table, no table, no layout, but no layout, or any other content is present. No layout, or any other content is present. No text is present. No layout. No layout. No layout. No layout. No layout."


Page 2

"<table bordered, no table, but no table, but no table, but no table, no table, no layout, rows, columns, rows, columns, rows, or any other content within the image content is present. No text, or any other content is present. No text is present. No layout, just a single row, just a single row, just a single row, just a single row, just a single row, no layout. No layout. No layout. No layout. No layout. No layout. No layout. No layout."


Page 3

Error extracting page: an error was encountered while running the model: GGML_ASSERT(a->ne[2] * 4 == b->ne[0]) failed
WARNING: Using native backtrace. Set GGML_BACKTRACE_LLDB for more info.
WARNING: GGML_BACKTRACE_LLDB may cause native MacOS Terminal.app to crash.
See: https://github.com/ggml-org/llama.cpp/pull/17869
0 ollama 0x0000000101b050c0 ggml_print_backtrace + 276
1 ollama 0x0000000101b052ac ggml_abort + 156
2 ollama 0x0000000101b0d8e8 ggml_rope + 300
3 ollama 0x0000000101b0db44 ggml_rope_multi + 20
4 ollama 0x0000000101aaf2b0 _cgo_7ebcd35a9797_Cfunc_ggml_rope_multi + 64
5 ollama 0x0000000100e6549c ollama + 513180 (status code: 500)

Either increase the context for the model or reduce the image size. Depending on your available VRAM, Ollama defaults to only 4096 as context, that is not enough for that large file (introduction.png). You can check the available context after loading the model with ollama ps.

I can confirm that increasing the context size works. I am using a context size of 10240 and have successfully extracted text from images with a size of 12 megapixels. Here is the modelfile I am using:

# Modelfile generated by "ollama show"
FROM glm-ocr:latest

# Increase context size
PARAMETER num_ctx 10240

TEMPLATE {{ .Prompt }}
RENDERER glm-ocr
PARSER glm-ocr
PARAMETER temperature 0

yea ur right mate

Thank you to all for providing the "increase context window" solution .
I did same and I have moved from gibberish output to the clean exact text expected

This comment has been hidden

Sign up or log in to comment