YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

vista3-30b-agent

LoRA adapter for Qwen3-VL-30B-A3B-Instruct. Fine-tuned on OSV5M for visual geolocation.

what it is

Agent model that combines <think> reasoning blocks with tool use (zoom, pan) to iteratively refine geolocation predictions. Trained with GRPO (Group Relative Policy Optimization) on the OSV5M dataset (5M street-view images with GPS labels).

quickstart

pip install tinker transformers
export TINKER_API_KEY=your_key

import tinker
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-VL-30B-A3B-Instruct", trust_remote_code=True)
sampler = tinker.ServiceClient().create_sampling_client(
    model_path="tinker://2b5831a4-8338-5b8a-b8a9-099066383e61:train:0/weights/step_000300"
)

prompt = "<|im_start|>system\nYou are a geolocation expert.<|im_end|>\n<|im_start|>user\n<image>\nWhere is this image taken? Provide GPS coordinates.<|im_end|>\n<|im_start|>assistant\n"
response = sampler.sample(
    prompt=tinker.types.ModelInput.from_ints(tokenizer.encode(prompt)),
    sampling_params=tinker.types.SamplingParams(max_tokens=256, temperature=0.7),
).result()
print(tokenizer.decode(response.sequences[0].tokens[len(tokenizer.encode(prompt)):]))

local inference

tbd

citation

@misc{vista3-30b-agent,
  author = {Dantuluri, Surya},
  title = {Vista: Visual Geolocation with Large Vision-Language Models},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/sdan/vista3-30b-agent}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support