Augmented reality (AR) is becoming part of our daily lives. We can define it as placing a virtual object in the real world and ensuring that it maintains its location and shape over time until it is removed from the scene. These scenarios require AR devices to correctly determine their 6-DoF position at all times in order to consistently overlay virtual hardware into the real environment with pixel-level precision.
Visual localization and mapping have been studied extensively in the field of computer vision. However, their application to AR can be tricky and presents unique challenges.
One such challenge relates to the devices we use to display AR content. We mostly use mobile phones or AR-specific gadgets like Microsoft’s HoloLens. These devices are equipped with multiple cameras and additional sensors, making it difficult to map and locate AR content using the methods offered for single-camera setups.
Additionally, we track unique hand or head movement patterns when using our devices to display AR content. Real-time tracking systems on the device provide streams of spatially positioned sensors, meaning they are related to each other in terms of depth, width and height. However, in many AR scenarios, objects may change over time and it may be necessary to track the object beyond local tracking. Thus, the AR tracking system must be robust against temporal changes in appearance and structures.
The other challenge concerns the temporal data of the sensor. There’s often a lot of different data coming from sensors, and the device we’re using needs to be able to make sense of all that data quickly. This is crucial because if the device cannot track the data, it will not be a good experience for the person using it.
Finally, as more and more people adopt AR, more and more opportunities will be there to prepare large-scale maps using data from various devices. However, this will not be easy as some challenges need to be overcome, such as ensuring robust algorithms and preserving privacy.
Despite all these challenges in the field of AR, current academic research is mostly driven by references that fail to address any of the aforementioned challenges. That’s where LaMAR comes in. LaMAR is here to provide a solid, realistic reference for location- and mapping-focused AR studies. LaMAR has three main contributions.
The first contribution is to introduce a large-scale dataset captured using AR devices in various settings, including a historic building, a multi-story office building, and a section of downtown. The dataset contains both interior and exterior scenes with lighting and semantic changes, as well as dynamic objects. Data is captured using handheld devices like the iPad and head-mounted devices like HoloLens over a one-year period.
The second contribution provides a pipeline to produce automatic and accurate ground-truth AR trajectories against large-scale 3D laser scans. This pipeline can handle crowdsourced data from heterogeneous devices, allowing the dataset to be extended with other data and different device types.
Finally, a detailed evaluation of localization and mapping techniques in the AR domain is presented. New ideas for future directions of research are given during these evaluations.
This was a brief summary of LaMAR, the new benchmark for AR tracking and mapping. You can find more information in the links below if you want to know more about it.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'LaMAR: Benchmarking Localization and Mapping for Augmented Reality'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, code and project.
Please Don't Forget To Join Our ML Subreddit
Ekrem Çetinkaya obtained his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis on image denoising using deep convolutional networks. He is currently pursuing a doctorate. degree at the University of Klagenfurt, Austria, and working as a researcher on the ATHENA project. His research interests include deep learning, computer vision and multimedia networks.