YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
vista3-30b-thinking
LoRA adapter for Qwen3-VL-30B-A3B-Instruct. Fine-tuned on OSV5M for visual geolocation.
what it is
Thinking model that uses <think> blocks to reason about visual cues before predicting GPS coordinates. Trained with GRPO (Group Relative Policy Optimization) on the OSV5M dataset (5M street-view images with GPS labels).
quickstart
pip install tinker transformers
export TINKER_API_KEY=your_key
import tinker
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct", trust_remote_code=True)
sampler = tinker.ServiceClient().create_sampling_client(
model_path="tinker://78beeb37-8d57-569f-b583-7904e99f0e7f:train:0/weights/step_000075"
)
prompt = "<|im_start|>system\nYou are a geolocation expert.<|im_end|>\n<|im_start|>user\n<image>\nWhere is this image taken? Provide GPS coordinates.<|im_end|>\n<|im_start|>assistant\n"
response = sampler.sample(
prompt=tinker.types.ModelInput.from_ints(tokenizer.encode(prompt)),
sampling_params=tinker.types.SamplingParams(max_tokens=256, temperature=0.7),
).result()
print(tokenizer.decode(response.sequences[0].tokens[len(tokenizer.encode(prompt)):]))
local inference
tbd
citation
@misc{vista3-30b-thinking,
author = {Dantuluri, Surya},
title = {Vista: Visual Geolocation with Large Vision-Language Models},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/sdan/vista3-30b-thinking}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support