AI News Hub
← Back to the feed

Alibaba Qwen

Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese

qwenlm.github.io ReleasesMultimodalResearch 1 min read

CLIP1 is a phenomenal playmaker in vision and multimodal representation learning. It plays not only as a foundation model but also a bridge between vision and language. It has triggered a series of research in different fields, especially text-to-image generation. However, we find that there is a necessity for a language-specific CLIP for applications, especially cross-modal retrieval, and there is no opensourced Chinese CLIP with good performance. We therefore launched this project to promote t

AI News Hub links to primary sources. This page shows the publisher's own title and excerpt with a link to the full article — we point you at the news; we don't rewrite it.