論文學習:《Accurate, Low-Latency Visual Perception for Autonomous Racing》

(一)摘要

關鍵部分:基於YOLOv3的物體檢測、位姿估計、時鐘同步
The key components of DUT18D include YOLOv3-based object detection, pose estimation and time synchronization on its dual stereovision/monovision camera setup

(硬件和軟件堆棧的延遲)
Of critical importance in autonomous driving is the latency of the hardware and software stack

視覺感知佔據了60%的時間
we find that perception occupies up to 60% of the end-to-end latency

Despite its importance, low-latency visual perception of environment landmarks remains riddled with practical challenges across the entire stack, from noisy image capture and data transmission to accurate positioning in an unmapped environment. To our knowledge, there remains no available prior work detailing the full-stack design of a high-accuracy, a lowlatency perception system for autonomous driving

The visual perception system on DUT18D was designed to perceive and position(感知和地位) landmarks on a map using multiple CNNs for object detection and depth-estimation

(二)關鍵點

an open design and evaluation of a thoroughly-tested low-latency visionstackforhigh-performance autonomous racing

new techniques for domain adaptation of pre-trained CNN-based object detectors, useful loss function modifications for landmark pose estimation,and microsecond time synchronization of multiple cameras(多相機的微秒時間同步)

open-source C++ modules for mobile-GPU accelerated ONNX-DNN inference, landmark pose estimation, and a complete plug-andplay visual perception system for Formula Student racecars

a publicly available 10K+ pose-estimation/boundingbox dataset for traffic cones of multiple colors and sizes(大量的數據集)

(三)目標

the goal of our perception system is to accurately localize environment landmarks (traffic cones) that demarcate(vt. 劃分界線;區別) the racetrack.

(四)需求

  1. 精確建圖:地標的精確定位
  2. 延遲:路標進入視野到路標被定爲的時間
  3. 直線距離:保持準確性的最長直線距離
  4. Horizontal Field-of-View (FOV):廣角

(五)使用單目的原理

The rationale for using the monocular camera for short-range rather than long-range detections is that for a reasonable mounting height, a landmark’s 3D location on a relatively flat surface is a much stronger function of pixel space location for short-range objects than long-range objects. This relieves some of the challenges for estimating landmark pose from a monocular camera.

(六)軟件處理過程

1.Data Acquisition(數據獲取): Synchronized image streams are captured, disparity matched for the stereovision pipeline, and transferred to the Jetson Xavier. A critical component of this is time synchronizing all devices. (時鐘同步很關鍵)

時鐘同步解決方案: a hardware timestamped signal generated by the Nerian FPGA which is synchronized using the IEEE PTP protocol to Xavier’s master clock
單目

  1. 定位準確性和延遲是的一對矛盾
    解決方案:低分辨率檢測,高分辨率深度估計
  2. 單目怎麼時鐘同步

2. 2D Space Localization(目標在圖片中的位置): Using a neural network-based approach, landmarks are detected and outlined by bounding boxes in the images.

  1. 路標大小不同的處理
    A drawback of this process is that the distribution of landmark bounding box (BB) sizes (in pixels) in the training set no longer was representative of what would be seen by the network in the wild. To mitigate this, each set of training images from a specific sensor/lens/perspective combination was uniformly rescaled such that their landmark size distributions matched that of the camera system on the vehicle
  2. 什麼意思?
    tuning the hyperparameters in front of each of the termsinthelossfunction

3. 3D Space Localization(目標在3D空間的位置): For the stereovision pipeline, depth from each landmark is extracted by a clusteringbased approach. A neural network-based approach is used to compute depth from the monocular camera

單目

  1. 算法 ReKTNet YOLO detection for use in a Perspective-n-Point (PnP) algorithm
  2. 單個極端值的處理
    To make the algorithm robust to single keypoint outliers all subset permutations of the keypoints with one point removed are calculated if the reprojection error from the PnP estimate using all keypoints is above a threshold. The permutation with the lowest error is used as the final estimate
  3. 改進之處
    (1)用卷積層代替全連接層
    the fully connected output layer was replaced with a convolutional layer
    (2)在損失函數中加入幾何關係(利用共線關係)
    an additional term in the loss function to leverage the geometric relationship between points

雙目

  1. 3D points from detections are first passed through a hue and saturation filter where all pixels with H or S values below 0.3 are neglected
    什麼是H值和S值?
  2. To reduce the latency of each detection, the point clouds within the bounding box are downsampled if there are more than 200 points remaining
    不懂?

在這裏插入圖片描述

延遲時間

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章