Kwan Wai-Pang's PhD Thesis

      Simultaneous Localization and Mapping (SLAM) serves as a foundational technology for emerging applications, such as robotics, autonomous driving, embodied intelligence, and augmented / virtual reality. However, traditional image-based SLAM systems still struggle with reliable pose estimation and 3D reconstruction under challenging conditions involving high-speed motion and extreme illumination variations. Event cameras, also known as dynamic vision sensors, have recently emerged as a promising alternative to standard cameras for visual perception. Instead of capturing intensity images at a fixed frame rate, event cameras asynchronously measure per-pixel brightness changes, producing a stream of events that encode the time, pixel location, and sign of the brightness changes. They offer attractive advantages, including high temporal resolution (MHz-level), high dynamic range (HDR, 140 dB), low latency (microsecond), no motion blur, and low power consumption. However, integrating event cameras into SLAM systems presents significant challenges due to the fundamentally different characteristics of asynchronous event streams compared to conventional intensity images, and new paradigm shifts are required.
      This dissertation presents innovative solutions and advancements for event-based SLAM. It begins with the development of Mono-EIO, a monocular event-inertial odometry framework that tightly integrates event-corner features with IMU preintegration. These event-corner features are temporally and spatially associated using novel event-based representations with a spatial-temporal and exponential decay kernel, and are subsequently incorporated into a keyframe-based sliding window optimization framework. Mono-EIO achieves high-accuracy, real-time 6-DoF ego-motion estimation even under aggressive motion and HDR conditions. Building upon this foundation, the thesis introduces PL-EVIO, an event-based visual-inertial odometry framework that combines event cameras with standard cameras to enhance robustness. The PL-EVIO utilizes line-based event features to provide additional structural constraints in human-made environments, while point-based event and image features are effectively managed to complement each other. This framework has been successfully applied to quadrotor onboard pose feedback control, enabling complex maneuvers such as flipping and operation in low-light conditions. Additionally, the thesis includes ESVIO, the first stereo event-based visual inertial odometry framework.
      The thesis also presents DEIO, a learning-optimization-combined framework that tightly coupled fuses the learning-based event data association with the IMU measurements within graph-based optimization. To the best of our knowledge, DEIO is the first learning-based event-inertial odometry, outperforming over 20 vision-based methods across 10 challenging real-world benchmarks. Finally, the thesis proposes EVI-SAM, a full SLAM system that tackles both 6-DoF pose tracking and 3D dense mapping using a monocular event camera. Its tracking module is the first hybrid approach that integrates both direct-based and feature-based methods within an event-based framework. The mapping module, on the other hand, is the first to achieve event-based dense and textured 3D reconstruction without GPU acceleration by employing a non-learning approach. This method not only successfully recovers 3D scenes structure under aggressive motions but also demonstrates superior performance compared to image-based NeRF or RGB-D cameras. Through these contributions, this dissertation significantly advances SLAM, offering robust solutions and paving the way for future research and applications in event camera.

Demo for Monocular Event-inertial Odometry

Uniform Event-corner Feature Detection

Mono-EIO in Challenging Situations

Demo for Pose Feedback Control using our Event-based VIO

Quadrotor Flip Using Our PL-EVIO

Quadrotor Flight Using Our ESVIO

Demo for Event-based Hybrid Pose Tracking

Our Event-based Hybrid Pose Tracking in HDR and aggressive motion

Demo for Learning-based Event-inertial Odometry

Evaluating DEIO in Drone Flying

Evaluating DEIO in Aggerssive Motion

Demo for Event-based Desne Mapping

Real-time Event-based Dense Mapping

Event-based Dense Mapping under Fast Motion

Event cameras are biologically-inspired vision sensors that capture pixel-level illumination changes instead of the intensity image at a fixed frame rate. They offer many advantages over the standard cameras, such as high dynamic range, high temporal resolution (low latency), no motion blur, etc. Therefore, developing state estimation algorithms based on event cameras offers exciting opportunities for autonomous systems and robots. In this paper, we propose monocular visual-inertial odometry for event cameras based on event-corner feature detection and matching with well-designed feature management. More specifically, two different kinds of event representations based on time surface are designed to realize event-corner feature tracking (for front-end incremental estimation) and matching (for loop closure detection). Furthermore, the proposed event representations are used to set mask for detecting the event-corner feature based on the raw event-stream, which ensures the uniformly distributed and spatial consistency characteristic of the event-corner feature. Finally, a tightly coupled, graph-based optimization framework is designed to obtain high-accurate state estimation through fusing pre-integrated IMU measurements and event-corner observations. We validate quantitatively the performance of our system on different resolution event cameras: DAVIS240C (240*180, public dataset, achieve state-of-the-art), DAVIS346 (346*240, real-test), DVXplorer (640*480 real-test). Furthermore, we demonstrate qualitatively the accuracy, robustness, loop closure, and re-localization performance of our framework on different large-scale datasets, and an autonomous quadrotor flight using our Event Visual-inertial Odometry (EVIO) framework. Videos of all the evaluations are presented on the project website.

Robust state estimation in challenge situations is still an unsolved problem, especially achieving onboard pose feedback control for aggressive motion. In this paper, we propose robust and real-time event-based visual-inertial odometry (VIO) that incorporates event, image, and inertial measurements. Our approach utilizes line-based event features to provide additional structure and constraint information in human-made scenes, while point-based event and image features complement each other through well-designed feature management. To achieve reliable state estimation, we tightly couple the point-based and line-based visual residuals from the event camera, the point-based visual residual from the standard camera, and the residual from IMU pre-integration using a keyframe-based graph optimization framework. Experiments in the public benchmark datasets show that our method can achieve superior performance compared with the state-of-the-art image-based or event-based VIO. Furthermore, we demonstrate the effectiveness of our pipeline through onboard closed-loop quadrotor aggressive flight and large-scale outdoor experiments. Videos of the evaluations can be found on our website: https://youtu.be/KnWZ4anBMK4.

Event cameras that asynchronously output low-latency event streams provide great opportunities for state estimation under challenging situations. Despite event-based visual odometry having been extensively studied in recent years, most of them are based on the monocular, while few research on stereo event vision. In this letter, we present ESVIO, the first event-based stereo visual-inertial odometry, which leverages the complementary advantages of event streams, standard images, and inertial measurements. Our proposed pipeline includes the ESIO (purely event-based) and ESVIO (event with image-aided), which achieves spatial and temporal associations between consecutive stereo event streams. A well-design back-end tightly-coupled fused the multi-sensor measurement to obtain robust state estimation. We validate that both ESIO and ESVIO have superior performance compared with other image-based and event-based baseline methods on public and self-collected datasets. Furthermore, we use our pipeline to perform onboard quadrotor flights under low-light environments. Autonomous driving data sequences and real-world large-scale experiments are also conducted to demonstrate long-term effectiveness. We highlight that this work is a real-time, accurate system that is aimed at robust state estimation under challenging environments.

Leveraging multiple sensors enhances complex environmental perception and increases resilience to varying luminance conditions and high-speed motion patterns, achieving precise localization and mapping. This paper proposes, ECMD, an event-centric multisensory dataset containing 81 sequences and covering over 200 km of various challenging driving scenarios including high-speed motion, repetitive scenarios, dynamic objects, etc. ECMD provides data from two sets of stereo event cameras with different resolutions (640×480, 346×260), stereo industrial cameras, an infrared camera, a top-installed mechanical LiDAR with two slanted LiDARs, two consumer-level GNSS receivers, and an onboard IMU. Meanwhile, the ground-truth of the vehicle was obtained using a centimeter-level high-accuracy GNSS-RTK/INS navigation system. All sensors are well-calibrated and temporally synchronized at the hardware level, with recording data simultaneously. We additionally evaluate several state-of-the-art SLAM algorithms for benchmarking visual and LiDAR SLAM and identifying their limitations.

Event cameras demonstrate substantial potential in handling challenging situations, such as motion blur and high dynamic range. Herein, event–visual–inertial state estimation and 3D dense mapping (EVI-SAM) are introduced to tackle the problem of pose tracking and 3D dense reconstruction using the monocular event camera. A novel event-based hybrid tracking framework is designed to estimate the pose, leveraging the robustness of feature matching and the precision of direct alignment. Specifically, an event-based 2D–2D alignment is developed to construct the photometric constraint and tightly integrated with the event-based reprojection constraint. The mapping module recovers the dense and colorful depth of the scene through the image-guided event-based mapping method. Subsequently, the appearance, texture, and surface mesh of the 3D scene can be reconstructed by fusing the dense depth map from multiple viewpoints using truncated signed distance function fusion. To the best of knowledge, this is the first nonlearning work to realize event-based dense mapping. Numerical evaluations are performed on both publicly available datasets, which qualitatively and quantitatively demonstrate the superior performance of our method. EVI-SAM effectively balances accuracy and robustness while maintaining computational efficiency, showcasing superior pose tracking and dense mapping performance in challenging scenarios.

3D Gaussian Splatting (3DGS) has shown its ability in rapid rendering and high-fidelity mapping. In this paper, we introduce LVI-GS, a tightly-coupled LiDAR-Visual-Inertial mapping framework with 3DGS, which leverages the complementary characteristics of LiDAR and image sensors to capture both geometric structures and visual details of 3D scenes. To this end, the 3D Gaussians are initialized from colourized LiDAR points and optimized using differentiable rendering. In order to achieve high-fidelity mapping, we introduce a pyramid-based training approach to effectively learn multilevel features and incorporate depth loss derived from LiDAR measurements to improve geometric feature perception. Through well-designed strategies for Gaussian-Map expansion, keyframe selection, thread management, and custom CUDA acceleration, our framework achieves real-time photo-realistic mapping. Numerical experiments are performed to evaluate the superior performance of our method compared to state-of-the-art 3D reconstruction systems. Videos of the evaluations can be found on our website: https://kwanwaipang.github.io/LVI-GS/.

Event cameras show great potential for visual odometry (VO) in handling challenging situations, such as fast motion and high dynamic range. Despite this promise, the sparse and motion-dependent characteristics of event data continue to limit the performance of feature-based or direct-based data association methods in practical applications. To address these limitations, we propose Deep Event Inertial Odometry (DEIO), the first monocular learning-based event-inertial framework, which combines a learning-based method with traditional nonlinear graph-based optimization. Specifically, an event-based recurrent network is adopted to provide accurate and sparse associations of event patches over time. DEIO further integrates it with the IMU to recover up-to-scale pose and provide robust state estimation. The Hessian information derived from the learned differentiable bundle adjustment (DBA) is utilized to optimize the co-visibility factor graph, which tightly incorporates event patch correspondences and IMU pre-integration within a keyframe-based sliding window. Comprehensive validations demonstrate that DEIO achieves superior performance on 10 challenging public benchmarks compared with more than 20 state-of-the-art methods.

Event cameras asynchronously output low-latency event streams, promising for state estimation in complex conditions. The motion-dependent nature of event cameras presents persistent challenges in achieving robust event feature detection and matching. Recent learning-based approaches have demonstrated superior robustness over traditional handcrafted methods, particularly under aggressive motions and HDR scenarios. This paper proposes SuperEIO, a novel framework that leverages a learning-based event-only detector and IMU measurements for event-inertial odometry. Our event-only feature detector employs a convolutional neural network on continuous event streams, while a graph neural network achieves event descriptor matching for loop closure. We accelerate network inference with TensorRT, ensuring low-latency, real-time operation on resource-constrained devices. Extensive evaluations on multiple public benchmarks demonstrate its superior accuracy and robustness compared to advanced event-based methods. Moreover, we conduct a large-scale real-world experiment on an edge handheld platform to demonstrate long-term effectiveness. Our pipeline is open-sourced to facilitate research in the field: \url{https://github.com/arclab-hku/SuperEIO}.

BibTeX


      @article{GuanPhDThesis,
        title={Event-based Vision for 6-DOF Pose Tracking and 3D Mapping},
        author={Guan, Weipeng},
        year={2025},
      }

Event-based Vision for 6-DOF Pose Tracking and 3D Mapping

Abstract

Demo for Monocular Event-inertial Odometry

Demo for Pose Feedback Control using our Event-based VIO

Demo for Event-based Hybrid Pose Tracking

Demo for Learning-based Event-inertial Odometry

Demo for Event-based Desne Mapping

Publication Lists

Monocular Event Visual Inertial Odometry based on Event-corner using Sliding Windows Graph-based Optimization (2022)

PL-EVIO: Robust Monocular Event-based Visual Inertial Odometry with Point and Line Features (2023)

ESVIO：Event-based Stereo Visual Inertial Odometry (2023)

ECMD: An Event-Centric Multisensory Driving Dataset for SLAM (2023)

EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping (2024)

LVI-GS: Tightly-coupled LiDAR-Visual-Inertial SLAM using 3D Gaussian Splatting (2025)

DEIO: Deep Event Inertial Odometry (2025)

SuperEIO: Self-Supervised Event Feature Learning for Event Inertial Odometry (2025)

BibTeX