Sample code?

by seedmanc - opened 11 days ago

11 days ago

Please provide a sample code for inference. I'm not sure how this is supposed to work. What I know is that CLIP is used to produce text embeddings and image embeddings, and matching those via cosine similarity can provide captioning to images. But we need text strings to embed and compare against, it's not an LLM that can generate text on its own. Does your safetensors file produce text out of image embeddings? How to feed those embeddings into it?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment