Generate captions and answer questions from an image
image captioning, VQA
Engage in multimedia chat with LLMs and ML models
Convert PDF to text using OCR