본문 바로가기
논문&세미나 리뷰

Human-Understanding AI, Prof.Gyeong-Sik Moon : Authentic Hand Avatar from a Phone Scan via Universal Hand Model

by 우당탕탕 is me 2024. 9. 11.

논문 출처 ➡️ https://arxiv.org/abs/2405.07933

 

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D

arxiv.org

 

 

 

 


Speaker Introduction


Fields of Expertise:
3D Computer Vision, Machine Learning, Computer Graphics, Artificial Intelligence

  1. Research Topics:
    •      3D Hand Pose Estimation
    •       3D Multi-Hand Pose Estimation
    •       3D Interactive Hands
    •       3D Human Body Shape
    •       3D Full-Body Estimation
  2. Current Research Focus:
    •       Goal: Development of interactive 3D hand avatars and expressive full-body avatars using computer graphics and AI.

 


Research Necessity


Non-verbal Communication:
55% of human communication is non-verbal, making it challenging to rely solely on vocal channels for effective communication.
Problem: Recently, AI-generated images and videos have shown unrealistic hand shapes and body distortions.

 


Research Direction


High-Resolution 3D Modeling

  •       Goal: To generate high-quality 3D hand avatars even with short capture times.
  •       Challenge: Difficulty in generalization due to lack of data. 

Solution:

  • Implementing a Universal Hand Model (UHM) to represent various hand shapes and poses naturally.

Natural Relighting

  •       Goal: Consistent 3D hand model rendering in new environments.

 


Methods


  • Physically-Based Relighting: Provides high-quality rendering but is slow in processing.
  • Neural Relighting: Faster but struggles with generalization.
  • Neuro-Physical Relighting: Combines the strengths of physically-based and neural rendering to enhance both quality and generalization.

 


Research Achievements


  • Authentic Hand Model: High-resolution hand models generated from short captures, with various hand shapes and poses represented through the Universal Hand Model.
  • Shadow Removal: Shadow removal using a data-driven approach, resulting in more natural models.
  • Neuro-Physical Relighting: Maintains high-quality models under various lighting conditions by combining physical-based and neural rendering methods.

 


Summary


Core Content: Explains the methods for creating and animating 3D models using a system that combines physically-based and neural network-based Bidirectional Reflectance Distribution Function (BRDF).

Applications: Modeling and animation of various human body parts, including 3D hands, full body, and faces.

  1. Physically-Based and Neural BRDF
    •       Physically-Based BRDF: Utilizes Disney BRDF to add diffuse and reflective lighting functionalities for image generation.
    •       Neural BRDF: Inputs the output from physically-based BRDF into a neural network for rendering.
  2. Model Training and Results
    •       Training Data: Utilizes phone scans and data from previous research.
    •       Outcome: Generation of improved textures and animatable 3D models.
  3. Full-Body Animation
    •       Goal: Generate full-body animations from a single monocular video.
    •       Challenge: Generalizing new poses and expressions from limited training frames.
    •       Solution: Hybrid combination of 3D Gaussian splatting and surface meshes.
  4. Advantages of the Hybrid Model
    •       Generalization: Improved creation for new poses and expressions.
    •       Comparison: Shows fewer artifacts and better facial expression representation than methods using only Gaussian splatting.
  5. Parametric Models and Personalization
    •       Parametric Model Registration: Precise adjustments for the human body, hands, and face.
    •       Offset Addition: Additional offsets applied to improve the accuracy of hand and face modeling.
  6. Technical Details
    •       Architecture: Utilizes basic structures such as Triplane, MLP, and LBS 3D Gaussian.
    •       Technical Improvements: Achieves more accurate geometry and texture quality.
  7. Real-Time Video and Camera Calibration
    •       Question: Issues regarding location accuracy using real-time video and camera calibration.
    •       Answer: It is challenging to resolve depth and scale ambiguity with a single camera; multiple cameras and different modalities are necessary.
  8. Additional Research Areas
    •       Human-Object Interaction: Reconstructing interactions between objects and humans.
    •       EMG-Based Systems: Using EMG systems for stability in rapid movements or small pixels.
    •       Event Cameras: Research on 3D pose estimation using event-based cameras.
728x90
반응형