논문 출처 ➡️ https://arxiv.org/abs/2405.07933
Authentic Hand Avatar from a Phone Scan via Universal Hand Model
The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D
arxiv.org
Speaker Introduction
Fields of Expertise:
3D Computer Vision, Machine Learning, Computer Graphics, Artificial Intelligence
- Research Topics:
- 3D Hand Pose Estimation
- 3D Multi-Hand Pose Estimation
- 3D Interactive Hands
- 3D Human Body Shape
- 3D Full-Body Estimation
- Current Research Focus:
- Goal: Development of interactive 3D hand avatars and expressive full-body avatars using computer graphics and AI.
Research Necessity
Non-verbal Communication:
55% of human communication is non-verbal, making it challenging to rely solely on vocal channels for effective communication.
Problem: Recently, AI-generated images and videos have shown unrealistic hand shapes and body distortions.
Research Direction
High-Resolution 3D Modeling
- Goal: To generate high-quality 3D hand avatars even with short capture times.
- Challenge: Difficulty in generalization due to lack of data.
Solution:
- Implementing a Universal Hand Model (UHM) to represent various hand shapes and poses naturally.
Natural Relighting
- Goal: Consistent 3D hand model rendering in new environments.
Methods
- Physically-Based Relighting: Provides high-quality rendering but is slow in processing.
- Neural Relighting: Faster but struggles with generalization.
- Neuro-Physical Relighting: Combines the strengths of physically-based and neural rendering to enhance both quality and generalization.
Research Achievements
- Authentic Hand Model: High-resolution hand models generated from short captures, with various hand shapes and poses represented through the Universal Hand Model.
- Shadow Removal: Shadow removal using a data-driven approach, resulting in more natural models.
- Neuro-Physical Relighting: Maintains high-quality models under various lighting conditions by combining physical-based and neural rendering methods.
Summary
Core Content: Explains the methods for creating and animating 3D models using a system that combines physically-based and neural network-based Bidirectional Reflectance Distribution Function (BRDF).
Applications: Modeling and animation of various human body parts, including 3D hands, full body, and faces.
- Physically-Based and Neural BRDF
- Physically-Based BRDF: Utilizes Disney BRDF to add diffuse and reflective lighting functionalities for image generation.
- Neural BRDF: Inputs the output from physically-based BRDF into a neural network for rendering.
- Model Training and Results
- Training Data: Utilizes phone scans and data from previous research.
- Outcome: Generation of improved textures and animatable 3D models.
- Full-Body Animation
- Goal: Generate full-body animations from a single monocular video.
- Challenge: Generalizing new poses and expressions from limited training frames.
- Solution: Hybrid combination of 3D Gaussian splatting and surface meshes.
- Advantages of the Hybrid Model
- Generalization: Improved creation for new poses and expressions.
- Comparison: Shows fewer artifacts and better facial expression representation than methods using only Gaussian splatting.
- Parametric Models and Personalization
- Parametric Model Registration: Precise adjustments for the human body, hands, and face.
- Offset Addition: Additional offsets applied to improve the accuracy of hand and face modeling.
- Technical Details
- Architecture: Utilizes basic structures such as Triplane, MLP, and LBS 3D Gaussian.
- Technical Improvements: Achieves more accurate geometry and texture quality.
- Real-Time Video and Camera Calibration
- Question: Issues regarding location accuracy using real-time video and camera calibration.
- Answer: It is challenging to resolve depth and scale ambiguity with a single camera; multiple cameras and different modalities are necessary.
- Additional Research Areas
- Human-Object Interaction: Reconstructing interactions between objects and humans.
- EMG-Based Systems: Using EMG systems for stability in rapid movements or small pixels.
- Event Cameras: Research on 3D pose estimation using event-based cameras.