基于空间一致性的目标物体三维生成
首发时间:2026-03-18
摘要:利用分数采样蒸馏技术,文本到三维的生成方法凭借强大的二维扩散先验信息取得了显著进展。然而,Janus 问题——即生成的三维物体在不同视角下呈现重复或几何不一致的结构——仍然是一个根本性的挑战。近年来,诸如 RecDreamer 等方法引入了统一分数蒸馏和方向分类器,以促进姿态一致的生成,但它们依赖于手动选择的 DINOv2 模板,这些模板针对特定提示,耗时费力,并且容易在 NeRF 渲染的图像上出现正面视角偏差。本文提出了一种名为 SCo3D 的框架,该框架通过两个互补的判别器来解决 Janus 问题。首先,我们引入了一种基于 Orient-Anything 的方向感知判别器。Orient-Anything 是一种零样本方位角估计器,它用一个全自动的、与提示无关的 360° 方向预测器取代了基于模板的分类器。其次,我们提出了一种实例一致性判别器,它利用 DINOv2 CLS 标记特征来强制执行多视图身份一致性,确保从不同视角渲染的内容对应于同一个底层实体。这两个模块都作为即插即用的组件无缝集成到 USD 训练流程中。大量实验表明,我们的方法显著减少了 Janus 伪影,同时保持了高保真度的文本-3D 对齐,在定性和定量评估中均优于现有的分数采样蒸馏方法。
For information in English, please click here
3D target object generation based on spatial consistency
Abstract:Text-to-3D generation via score distillation has achieved remarkable progress by leveraging powerful 2D diffusion priors. However, the Janus problem---where generated 3D objects exhibit duplicated or geometrically inconsistent structures across different viewpoints---remains a fundamental challenge. Recent methods such as RecDreamer introduce Uniform Score Distillation (USD) with an orientation classifier to encourage pose-uniform generation, yet rely on manually selected DINOv2 templates that are prompt-specific, labor-intensive, and prone to front-view bias on NeRF-rendered images. In this work, we propose Sco3D, a framework that addresses the Janus problem through two complementary discriminators. First, we introduce an Orientation-Aware Discriminator built upon Orient-Anything, a zero-shot azimuth estimator that replaces the template-based classifier with a fully automatic, prompt-agnostic 360$^\circ$ orientation predictor. Second, we propose an Instance-Consistent Discriminator that leverages DINOv2 CLS token features to enforce multi-view identity coherence, ensuring that renderings from different viewpoints correspond to the same underlying entity. Both modules are seamlessly integrated into the USD training pipeline as plug-and-play components. Extensive experiments demonstrate that Sco3D significantly reduces Janus artifacts while maintaining high-fidelity text-3D alignment, outperforming existing score distillation methods across both qualitative and quantitative evaluations.
Keywords: Computer Vision and Application;3D Generation;Spatial Sonsistency
基金:
引用

No.****
动态公开评议
共计0人参与
勘误表
基于空间一致性的目标物体三维生成
评论
全部评论