The research, “Reflection Removal and Facial Detection of Individuals in Vehicles,” utilizes Single Image Reflection Removal (SIRR) technology and Face Detection to remove reflections and reduce glare caused by automotive glass and film. This enables the capture of facial images of individuals inside vehicles. SIRR technology enhances image quality by removing reflections from the surfaces of glass that might obscure objects. In this research, we explore the use of three models specialized in SIRR and YOLOv7 for Face Detection. However, the pre-trained models for reflection removal failed to effectively remove reflections and reduce glare from films. In this paper, we propose an approach to enhance the efficiency of removing reflections and reducing glare caused by automotive glass and film with opacities set at 40% and 60%, achieving an impressive improvement in Peak Signal-to-Noise Ratio (PSNR) by approximately 42.96% and Structural Similarity Index (SSIM) by approximately 34.16% compared to the pre-trained models.
Convolutional Neural Network (CNN) is a well-known Deep learning model utilized extensively in the field of computer vision. The structure of convolutional neu ral networks is quite complicated and necessitates a substantial amount of computational time and storage resources. As a result, it is difficult to adopt a CNN model on a resource constraint device. Model pruning can help to reduce computation time and storage re quirements. In this research, we propose a filter pruning technique based on Localized Gradient Activation heatmaP (LGAP) for the purpose of pruning CNNs. Analyzing a f ilter based on statistical criterion of single neuron can lead to a loss in spatial relations within the filter activation itself, the relationship to target prediction, as well as the re lationship among filters in that specific layer. To minimize the limitations, we evaluate the significance of a filter through the spatial information of local gradient activation re lated to the target prediction in terms of the layer-wise loss of the investigated filter. The effect of loss of an investigated filter demonstrates the significance or insignificance of the filter. Our pruning criteria ensure that these significant filters are preserved, while maintaining the model accuracy. The performance of our pruning method was validated using VGG-16 and ResNet-50. With pruning ratio of 50%, VGG-16 tends to decrease 1.66% of its accuracy, 3.6× of FLOP and 3.9× of storage reduction. For ResNet-50, with 50% pruning ratio, the results show that Top-1 and Top-5 of our pruning techniques outperform all the baseline techniques with a reduction of top-1 accuracy by 3.56%, top-5 accuracy by 1.89%, Floating Point Operation by 2.3×, and storage by 2.05×.
Convolutional neural networks (CNNs) are extensively utilized in computer vision; however, they pose challenges in terms of computational time and storage requirements. To address this issue, one well-known approach is filter pruning. However, fine-tuning pruned models necessitates substantial computing power and a large retraining dataset. To restore model performance after pruning each layer, we propose the Convolutional Approximation Small Model (CASM) framework. CASM involves training a compact model with the remaining kernels and optimizing their weights to restore feature maps that resemble the original kernels. This method requires less complexity and fewer training samples compared to basic fine-tuning. We evaluate the performance of CASM on the CIFAR-10 and ImageNet datasets using VGG-16 and ResNet-50 models. The experimental results demonstrate that CASM surpasses the basic fine-tuning framework in terms of time acceleration (3.3× faster), requiring a smaller dataset for performance recovery after pruning, and achieving enhanced accuracy.
Human Activity Recognition (HAR) plays a significant role in the Ambient Assisted Living (AAL) system, which aims to provide sustainable healthcare for an aging population and those with special needs. HAR automatically categorizes people’s activi ties while they wear wearable sensors. With an effective HAR system, we should be able to monitor the behavior of individuals as well as their activities and issue specific warn ings as necessary. The goal of this paper is to propose a methodological framework for developing the HAR model based on an application of Long Short-Term Memory (LSTM) network. We investigated the model selection and parameters based on Cross Validation (CV) and learning rate optimization across two well-known public HAR datasets, Mo biAct and WISDM. An analysis of the CV variance becomes a considerable impact on the generalization of the model’s learning capability. The relationship between the CV variance and accuracy can be used to guide the selection of the fold number in k-fold CV. Our studies had shown the scientific evidence and technical guidance for solving the HAR problem with improvements not only in the proposed model’s accuracy and AUC of more than 99% on average, but also in its generalization performance, which could be useful for future related studies.
Camera poses estimation is a critical process that ensures the success of Three-Dimensional (3D) modelling. We present a Convolutional Neural Network (CNN) based multi-model ensemble method for indoor and outdoor multi-view stereo reconstruc tion capable of learning across multiple domains, including images from both indoor and outdoor environments. Each domain’s images have distinct properties and shooting view points, which leads to difficulty in efficient learning such a large difference and requires large amount of computational resources. In order to reduce complexity of the end-to end single model, the proposed model is divided into multiple learning agents consisting of domain-specific agents and domain relationship agent. The domain-specific agent is trained independently on its own set of unique image characteristics, for example, one for indoor datasets and another for outdoor datasets. The domain relationship agent then ensembles and analyzes the multiple domain features and finalizes the estimation. In terms of average root mean square error, we compare the performance of the combined domain single model with the suggested ensemble CNN model. The experimental results indicate that the proposed model outperforms the others, with rotation and translation prediction errors of 0.112012266.
Specifying ship categories in waterways plays an important role in the field of marine surveillance, especially when classification is performed from satellite images due to the advancement in remote sensing technologies. In this paper, we presented an approach for ship classification of optical remote sensing images. Our approach was based on two aspects, modifying models and applying additional techniques to improve accuracy of classification. Two pretrained models, MobileNetV2 and DenseNet121, were modified in this work and all techniques were implemented using Fastai library. To illustrate the effectiveness of our approach, we compared the accuracy of the modified models to the original one. A public Dataset for Ship Classification in Remote sensing images (DSCR), containing six military ship types and a civilian ship type, was used for evaluation. The results showed that our modified DenseNet121 achieved the best accuracy at 99.52% and also outperformed the benchmark result of ResNet101 reported from the original dataset.
Deep learning techniques are widely implemented in computer vision applications. The Convolutional Neural Networks (CNN) is a deep learning class that is the most effective in categorizing the statistical characteristics of images. It is often a challenging task to classify the frequency level region in various low-resolution image. In this research, we proposed the CNN for classification of gradient profile priors by learning on several gradient characteristics such as horizontal gradient acceleration, vertical gradient acceleration, the Relational Gradient Direction and Edge Sketch Image. This technique is used multiple building blocks to designed features through backpropagation with automatic and adaptive spatial hierarchies learning. The performance comparison was improved in classification of the frequency level area in various low-resolution image input that was illustrated in the experimental results which evaluate with several predictive and conventional classification techniques.
The ability of machine learning has become a very famous and important technique for discovering statistically significant patterns in the available data. In this paper, we presented the gradient profile spectral characteristics classification on vertical and horizontal gradient acceleration data, Edge Sketch Image and The Relational Gradient Direction data in low-resolution image input. Various training datasets were learned by CatBoost Classifier to created gradient profile priors. This technique was boosting schemes help to reduce over fitting and improves quality of the model. Due to symmetric tree structure of the CatBoost, it provided fast inference and accelerated the implementation. Several predictive and conventional classification techniques were chosen for performance comparison. The experimental results demonstrated performance improvement in classification of the frequency level area in various image characteristics.
Estimating camera pose is a significant process, which assures the success of the 3D modeling performance. This research presents a camera pose estimation using convolutional neural network (CNN) to transfer learning from pre-trained deep learning VGG19 model in order to extract features from a single image using several datasets captured in indoor and outdoor environments with diverse perspectives and photographic styles. Due to the large dimensions of the extracted features, Latent Semantic Analysis (LSA) are introduced prior to the CNN input. Then, the CNN is trained to predict the camera views and translations. The prediction performance is measured in terms of average mean square errors and compared to the reference techniques. As a result, the regression estimation of the proposed CNN model outperforms the others with average 0.24 degrees rotation error and 0.26 m. translation errors.
This paper presents a study of neuro-fuzzy behavior in clustering gradient profile spectral characteristics. Various types of image scene are chosen to evaluate neuro-fuzzy performance. The combinations of training data subsets are learned by ANFIS model to generate gradient profile priors, which are used as optimum weight selection criteria for image enhancement. The experimental results illustrate quantitative performance improvement and perceptual improvement in recovery of the high-resolution details in various images.