YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

vista3-30b-thinking

LoRA adapter for Qwen3-VL-30B-A3B-Instruct. Fine-tuned on OSV5M for visual geolocation.

what it is

Thinking model that uses <think> blocks to reason about visual cues before predicting GPS coordinates. Trained with GRPO (Group Relative Policy Optimization) on the OSV5M dataset (5M street-view images with GPS labels).

quickstart

pip install tinker transformers
export TINKER_API_KEY=your_key

import tinker
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct", trust_remote_code=True)
sampler = tinker.ServiceClient().create_sampling_client(
    model_path="tinker://78beeb37-8d57-569f-b583-7904e99f0e7f:train:0/weights/step_000075"
)

prompt = "<|im_start|>system\nYou are a geolocation expert.<|im_end|>\n<|im_start|>user\n<image>\nWhere is this image taken? Provide GPS coordinates.<|im_end|>\n<|im_start|>assistant\n"
response = sampler.sample(
    prompt=tinker.types.ModelInput.from_ints(tokenizer.encode(prompt)),
    sampling_params=tinker.types.SamplingParams(max_tokens=256, temperature=0.7),
).result()
print(tokenizer.decode(response.sequences[0].tokens[len(tokenizer.encode(prompt)):]))

local inference

tbd

citation

@misc{vista3-30b-thinking,
  author = {Dantuluri, Surya},
  title = {Vista: Visual Geolocation with Large Vision-Language Models},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/sdan/vista3-30b-thinking}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support