Mobile Animatronics Telepresence System
The Previous System
Inhabitor Station
In the inhabitor station, the user’s head pose (especially the orientation) should be tracked continuously to control the avatar head on the mobile avatar side. Currently, the user has to wear a helmet, on which there are optical trackers for acquiring the position and orientation of the inhabitor and a camera to capture the frontal face imagery, as shown in Fig. 1.
Figure 1: User in inhabitor
station wearing a helmet with optical trackers and camera. |
Mobile Avatar
Comparing with the previous avatar, the current avatar adopts rear projection instead of front projection, the projector is fixed rigidly with the face-shaped projection surface. Currently, the alignment of projection image and the face-shaped surface is accomplished manually (the size, position and rotation of the projection image), this setup is not user-friendly. And there are still some misalignment between the projected image and the face-shaped surface, making the appearance of the avatar distorted, as shown in Fig. 2(a) and (b). When the inhabitor speaks or has some expression changes, this misalignment becomes more distinct. Moveover, due to the inter-reflection and specular reflection, the appearance of projected avatar face is not homogeneous, in Fig. 2(b) some errors are shown (especially the sparkling spot in the eye region).
Figure 2: Some imperfections
of projected avatar face: (a) & (b) misalignment, |
Unencumbered Head Pose and Body Posture Estimation |
The emergence of low-cost 3D sensors (e.g., Kinect) makes it possible to achieve an acceptable quality of 3D capture and pose estimation of a human head and body for many applications, without encumbering the user with sensors or markers. It would be useful to explore the use of such devices for real-time head pose and estimation for local head control of the NTU prototype Physical-Virtual Avatar (PVA) avatar, without regard for the appearance (imagery) of the head. In addition, the same technology could be used for remote body control of the UNC RoboThespian RT-3. This will require understanding/exploration of both the body capture, and also the RT-3 control. Finally, real-time head pose information, with camera imagery, could be used for face modeling and deformation. |
Dynamic Face Modeling and Expression Deformation |
While modern depth sensors are noteworthy in many ways, getting an accurate real-time dynamic 3D face model remains a challenging problem. In general, the quality of single frame is not sufficient to generate reasonable 3D face models, and there is little or no temporal coherence (filtering or fusion). It would be useful to explore the use of the Kinect or other sensors to build up a parametric model of the human head, with evolving dynamic textures, that could be rendered onto the PVA using the head pose information. One important factor is that of temporal coherence. The geometry of the model (the parameters) should be evolved in a way that simultaneously affords a stable base head model, and yet allows for shape changes due to facial expressions. This might be accomplished by using repeated poses and depth information, accumulating and refining the model over time. A general model of the human head, perhaps a parametric model, may be employed as prior knowledge to simplify the problem. |
Direct Face Mapping to Avatar Head |
The current approach used to
dynamically map a real human face to the face of the PVA depends on a
full 3D model of the real human head, a full 3D model of the PVA head,
and very precise (in space and time) dynamic head tracking via a
head-worn marker system. One of the dominant goals of the project is to
un-encumber the user, while |
Photometric Issues |
Errors in appearance (color and/or luminance) arise as a result of light being projected onto an opaque surface, or through a translucent head material. The sources of error can include inter-reflection, specular highlights, interior (to the head material) diffusion and scattering of light. It should be possible to model and calibrate for some of these error sources, potentially using a camera, and to then add a post-rendering correction that adapts the luminance and color throughout. |
Reference |