Integrate multimodal vision capabilities using Llama-3.2-11B-Vision-Instruct, designed to process both text and images seamlessly. This model delivers cost-effective pricing at $0.37/1M input and $0.37/1M output tokens, native tool calling support, open weights architecture. Access Llama-3.2-11B-Vision-Instruct via the Azure API with up to 8K output tokens.
Tokens
Tokens
Tokens
Llama-3.2-11B-Vision-Instruct by Azure costs $0.37 per 1M input tokens and $0.37 per 1M output tokens.