xzbin
首页
文章
标签
分类
关于
0%
多模态
分类
2023
03-02
Image as a Foreign Language: BEIT Pretraining for All Vision and Vision-Language Tasks
03-01
ZERO and R2D2: A Large-scale Chinese Cross-modal Benchmark and a Vision-Language Framework
02-28
VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
02-24
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
02-24
UNITER: learning universal image-text representations
02-07
ViLT Vision-and-Language Transformer Without Convolution or Region Supervision
01-04
learning transferable visual models from natural language supervision