Skip to content
北京通用人工智能研究院

Tag: 通用视觉

EgoTaskQA: Understanding Human Tasks in Egocentric Videos

HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes

Projective Manifold Gradient Layer for Deep Rotation Regression

Latent Diffusion Energy-Based Model for Interpretable Text Modeling

Weakly Supervised Video Moment Localization with Contrastive Negative Sample Mining

Weakly Supervised Temporal Sentence Grounding with Gaussian-based Contrastive Proposal Learning

Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds

VLGrammar: Grounded Grammar Induction of Vision and Language

YouRefIt: Embodied Reference Understanding with Language and Gesture

Synthesizing Diverse and Physically Stable Grasps with Arbitrary Hand Structures using Differentiable Force Closure Estimator

北京通用人工智能研究院

@京公网安备 11010802039317号
@京ICP备案2022010184号-1