Integrate multimodal vision capabilities using cosmos-predict1-5b, designed to process both text and images seamlessly. This model delivers available completely free of charge, open weights architecture. Access cosmos-predict1-5b via the Nvidia API with up to 4K output tokens.
Tokens
Tokens
Tokens
cosmos-predict1-5b by Nvidia costs Free per 1M input tokens and Free per 1M output tokens.