Integrate multimodal vision capabilities using Phi 3.5 Vision Instruct, designed to process both text and images seamlessly. This model delivers available completely free of charge, native tool calling support, open weights architecture. Access Phi 3.5 Vision Instruct via the Nvidia API with up to 4K output tokens.
Tokens
Tokens
Tokens
Phi 3.5 Vision Instruct by Nvidia costs Free per 1M input tokens and Free per 1M output tokens.