Human-Understanding AI, Prof.Gyeong-Sik Moon : Authentic Hand Avatar from a Phone Scan via Universal Hand Model

논문 출처 ➡️ https://arxiv.org/abs/2405.07933

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D

arxiv.org

Speaker Introduction

Fields of Expertise:
3D Computer Vision, Machine Learning, Computer Graphics, Artificial Intelligence

Research Topics:
- 3D Hand Pose Estimation
- 3D Multi-Hand Pose Estimation
- 3D Interactive Hands
- 3D Human Body Shape
- 3D Full-Body Estimation
Current Research Focus:
- Goal: Development of interactive 3D hand avatars and expressive full-body avatars using computer graphics and AI.

Research Necessity

Non-verbal Communication:
55% of human communication is non-verbal, making it challenging to rely solely on vocal channels for effective communication.
Problem: Recently, AI-generated images and videos have shown unrealistic hand shapes and body distortions.

Research Direction

High-Resolution 3D Modeling

Goal: To generate high-quality 3D hand avatars even with short capture times.
Challenge: Difficulty in generalization due to lack of data.

Solution:

Implementing a Universal Hand Model (UHM) to represent various hand shapes and poses naturally.

Natural Relighting

Goal: Consistent 3D hand model rendering in new environments.

Methods

Physically-Based Relighting: Provides high-quality rendering but is slow in processing.
Neural Relighting: Faster but struggles with generalization.
Neuro-Physical Relighting: Combines the strengths of physically-based and neural rendering to enhance both quality and generalization.

Research Achievements

Authentic Hand Model: High-resolution hand models generated from short captures, with various hand shapes and poses represented through the Universal Hand Model.
Shadow Removal: Shadow removal using a data-driven approach, resulting in more natural models.
Neuro-Physical Relighting: Maintains high-quality models under various lighting conditions by combining physical-based and neural rendering methods.

Summary

Core Content: Explains the methods for creating and animating 3D models using a system that combines physically-based and neural network-based Bidirectional Reflectance Distribution Function (BRDF).

Applications: Modeling and animation of various human body parts, including 3D hands, full body, and faces.

Physically-Based and Neural BRDF
- Physically-Based BRDF: Utilizes Disney BRDF to add diffuse and reflective lighting functionalities for image generation.
- Neural BRDF: Inputs the output from physically-based BRDF into a neural network for rendering.
Model Training and Results
- Training Data: Utilizes phone scans and data from previous research.
- Outcome: Generation of improved textures and animatable 3D models.
Full-Body Animation
- Goal: Generate full-body animations from a single monocular video.
- Challenge: Generalizing new poses and expressions from limited training frames.
- Solution: Hybrid combination of 3D Gaussian splatting and surface meshes.
Advantages of the Hybrid Model
- Generalization: Improved creation for new poses and expressions.
- Comparison: Shows fewer artifacts and better facial expression representation than methods using only Gaussian splatting.
Parametric Models and Personalization
- Parametric Model Registration: Precise adjustments for the human body, hands, and face.
- Offset Addition: Additional offsets applied to improve the accuracy of hand and face modeling.
Technical Details
- Architecture: Utilizes basic structures such as Triplane, MLP, and LBS 3D Gaussian.
- Technical Improvements: Achieves more accurate geometry and texture quality.
Real-Time Video and Camera Calibration
- Question: Issues regarding location accuracy using real-time video and camera calibration.
- Answer: It is challenging to resolve depth and scale ambiguity with a single camera; multiple cameras and different modalities are necessary.
Additional Research Areas
- Human-Object Interaction: Reconstructing interactions between objects and humans.
- EMG-Based Systems: Using EMG systems for stability in rapid movements or small pixels.
- Event Cameras: Research on 3D pose estimation using event-based cameras.

728x90

'논문&세미나 리뷰' 카테고리의 다른 글

A Deep Learning Approach to Upscaling “Low-Quality” MR Images: An In Silico Comparison Study Based on the UNet Framework (0)	2024.09.11
RadarNet: Efficient Gesture Recognition TechniqueUtilizing a Miniature Radar Sensor 요약 (0)	2024.05.02
Bubbleu: Exploring Augmented Reality Game Design with Uncertain AI-based Interaction 요약 (0)	2024.05.02
SmartPoser: Arm Pose Estimation with a Smartphone and Smartwatch Using UWB and IMU Data 요약 (1)	2024.05.02
Pair-Up: Prototyping Human-AI Co-orchestration of Dynamic Transitions between Individual and Collaborative Learning in the Classroom 요약 (0)	2024.04.10