Alumni
Camera poses estimation is a critical process that ensures the success of Three-Dimensional (3D) modelling. We present a Convolutional Neural Network (CNN) based multi-model ensemble method for indoor and outdoor multi-view stereo reconstruc tion capable of learning across multiple domains, including images from both indoor and outdoor environments. Each domain’s images have distinct properties and shooting view points, which leads to difficulty in efficient learning such a large difference and requires large amount of computational resources. In order to reduce complexity of the end-to end single model, the proposed model is divided into multiple learning agents consisting of domain-specific agents and domain relationship agent. The domain-specific agent is trained independently on its own set of unique image characteristics, for example, one for indoor datasets and another for outdoor datasets. The domain relationship agent then ensembles and analyzes the multiple domain features and finalizes the estimation. In terms of average root mean square error, we compare the performance of the combined domain single model with the suggested ensemble CNN model. The experimental results indicate that the proposed model outperforms the others, with rotation and translation prediction errors of 0.112012266.
Estimating camera pose is a significant process, which assures the success of the 3D modeling performance. This research presents a camera pose estimation using convolutional neural network (CNN) to transfer learning from pre-trained deep learning VGG19 model in order to extract features from a single image using several datasets captured in indoor and outdoor environments with diverse perspectives and photographic styles. Due to the large dimensions of the extracted features, Latent Semantic Analysis (LSA) are introduced prior to the CNN input. Then, the CNN is trained to predict the camera views and translations. The prediction performance is measured in terms of average mean square errors and compared to the reference techniques. As a result, the regression estimation of the proposed CNN model outperforms the others with average 0.24 degrees rotation error and 0.26 m. translation errors.
The prediction of three-dimensional (3D) rotation and translation can be retrieved from two-dimensional (2D) images to build 3D models from large collections of images. In this paper, the process starts by extracting the features of images via transfer learning approach from Deep Neural Network model called VGG19. Even though the features extracted from VGG19 are usually adopted in image recognition application; in this research, we apply these features to the prediction model to obtain rotation and translation parameters. Due to the large size of the feature dimensions, it is necessary to perform dimensional reduction technique called latent semantic analysis (LSA) to decrease the feature dimensions and remain only the important ones. Then, the regression estimation technique based on the idea of Support Vector Machine (SVM) is used to predict the rotation and translation parameters. The accuracy is estimated by comparing the prediction results with the corresponding ground truth set. The average errors of rotation and translation of 3D prediction from 2D images are approximately 0.2419 degrees and 1.35 meters respectively.
Snippet alignment is a process to arrange pieces of the torn document to the original positions according to the direction of the alphabet line. It is a prerequisite to assure the effective reconstruction. The higher of the performance of the snippet alignment, the greater the opportunity for successful reconstruction. Therefore, this paper presents a plane alignment algorithm for torn document reconstruction. The proposed technique analyzes the contents inside the snippet such as the direction of the character alignment based on the histogram of the accumulated radius of the fitted ellipses. The direction result is then used to revert the snippet to its original position. Hough transform based local descriptor is extracted as shape feature. These parameters are helpful for accurate reconstruction. The proposed technique can achieve approximately 5.07 decrease in relative orientation error thus increase 24.11 percent in reverting precision. This can demonstrate the significant performance improvement of the proposed algorithm.