YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
vista3-30b-agent
LoRA adapter for Qwen3-VL-30B-A3B-Instruct. Fine-tuned on OSV5M for visual geolocation.
what it is
Agent model that combines <think> reasoning blocks with tool use (zoom, pan) to iteratively refine geolocation predictions. Trained with GRPO (Group Relative Policy Optimization) on the OSV5M dataset (5M street-view images with GPS labels).
quickstart
pip install tinker transformers
export TINKER_API_KEY=your_key
import tinker
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct", trust_remote_code=True)
sampler = tinker.ServiceClient().create_sampling_client(
model_path="tinker://2b5831a4-8338-5b8a-b8a9-099066383e61:train:0/weights/step_000300"
)
prompt = "<|im_start|>system\nYou are a geolocation expert.<|im_end|>\n<|im_start|>user\n<image>\nWhere is this image taken? Provide GPS coordinates.<|im_end|>\n<|im_start|>assistant\n"
response = sampler.sample(
prompt=tinker.types.ModelInput.from_ints(tokenizer.encode(prompt)),
sampling_params=tinker.types.SamplingParams(max_tokens=256, temperature=0.7),
).result()
print(tokenizer.decode(response.sequences[0].tokens[len(tokenizer.encode(prompt)):]))
local inference
tbd
citation
@misc{vista3-30b-agent,
author = {Dantuluri, Surya},
title = {Vista: Visual Geolocation with Large Vision-Language Models},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/sdan/vista3-30b-agent}
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support