Abstract
In this article, we propose a survey of the Simultaneous Localization And Mapping field when considering the recent evolution of autonomous driving. The growing interest regarding self-driving cars has given new directions to localization and mapping techniques. In this survey, we give an overview of the different branches of SLAM before going into the details of specific trends that are of interest when considered with autonomous applications in mind. We first present the limits of classical approaches for autonomous driving and discuss the criteria that are essential for this kind of application. We then review the methods where the identified challenges are tackled. We mostly focus on approaches building and reusing long-term maps in various conditions (weather, season, etc.). We also go through the emerging domain of multi-vehicle SLAM and its link with self-driving cars. We survey the different paradigms of that field (centralized and distributed) and the existing solutions. Finally, we conclude by giving an overview of the various largescale experiments that have been carried out until now and discuss the remaining challenges and future orientations.
在本文中,我们在考虑最近自主驾驶的发展时提出了一个同时定位和建图的调查。人们对自动驾驶汽车越来越感兴趣为定位和建图技术提供了新的方向。在本次调查中,我们先详细介绍SLAM的不同架构分支,然后详细介绍在考虑自动化应用时感兴趣的特定趋势。我们首先提出了经典自主驾驶方法的局限性,并讨论了这种应用所必需的标准。然后,我们回顾了如何处理公认的挑战。我们主要关注建立自动驾驶地图的方法,这种地图在各种条件(天气,季节等)下长期可重复使用。我们也遍历了新兴领域中的多车SLAM及其与无人车之间的关系。我们调查了该领域的不同范式(集中式和分布式)以及现有的解决方案。最后,我们总结一下人们迄今为止进行的各种大规模实验,并讨论剩余的挑战和未来的方向。
Introduction
Simultaneous Localization And Mapping (SLAM) has been a hugely popular topic among the mobile robotics community for more than 25 years now. The success of this field is tightly bound to the fact that “solving” the SLAM problem, that is localizing a robot thanks to a map of the surroundings built incrementally, has numerous applications ranging from spatial exploration to autonomous driving. The recent spotlight put on intelligent vehicles has pushed further the research effort with the contribution of car manufacturers. One could think of GNSS (Global Navigation Satellite System) as a solution to this localization problem but it has quickly been showed that it was not sufficient by itself. Even though accuracy limits of classical GNSS solutions are lifted when perfectly positioned base stations are used (Real-Time Kinematic GNSS), availability remains an issue. Satellite signals are affected by atmospheric conditions that are difficult to predict. Moreover, the infrastructure can block the direct reception of signals and generate multipath interference or non-line-of-sight reception which has disastrous consequences on the provided localization. This kind of signal degradation is difficult to detect and usually causes a loss of integrity from which recovering can be tricky. These problems are more common in dense urban areas where tall buildings can mask satellites. On open roads, GNSS’s usually perform better.
在移动机器人领域,同步定位和建图(SLAM)已经成为25年来非常流行的话题。该领域的成功与“解决”SLAM问题紧密相关,即通过增量式构建环境地图来定位机器人,它具有从空间探索到自动驾驶等众多应用。最近聚焦于智能汽车的焦点使汽车制造商的贡献进一步推动了研究工作。
人们可以将GNSS(全球导航卫星系统)视为解决这一个定位问题的解决方案,但很快就显露出它本身的不足。即使在使用完全定位的基站(实时动态GNSS)时,经典GNSS解决方案的精度被有效提升,但可用性仍然是一个问题。卫星信号受难以预测的大气条件的影响。而且基础设施可以阻止信号的直接接收并产生多路径干扰或非视距接收,这对GNSS提供的定位具有灾难性后果。这种信号的退化很难被检测到,并且通常会导致系统丧失完整性,并且较难恢复。这些问题在密集的城市地区更为常见,高层建筑可以掩盖卫星。在开阔的道路上,GNSS通常表现更好。
Another classic approach to localization is to take advantage of the road infrastructure (road markings or roadway detection) in order to guide a vehicle in a lane. Advanced Driver-Assistance Systems (ADAS) are already integrating lane marking detection in commercialized cars. While this kind of approach mostly constrains the lateral position of a vehicle, it is sufficient for environments where the roadway is easily identifiable such as highways for instance. On the other hand, more complex environments (urban mostly, with intersections, curved roads, etc.) do not always provide enough road information to localize a vehicle. Moreover, the position accuracy needed along the longitudinal axis is more important than in straight, highway-like environments. Anyhow, redundancy is needed in order to build a safe system and ensure a consistent behavior on the road. As such, different localization means should be considered.
In a general manner, localizing a vehicle, be it in a global or a local frame, is an essential functionality to perform any other perception or planification tasks. Predicting the evolution of other obstacles on the road and choosing which maneuver is the most appropriate require to know exactly where the ego-vehicle is located and how it will evolve in the coming seconds. The SLAM framework gives an answer to this problematic while still being general enough to allow the use of any sensor or estimation technique that suits the prerequisite of estimating both the localization of the vehicle and the map at the same time. The map is of prime interest when autonomous driving is considered as a whole as it offers a first level of perception that is needed in order to make appropriate decisions.
另一种经典的定位方法是利用道路基础设施(道路标记或道路检测)来引导车道中的车辆。先进的驾驶辅助系统(ADAS)已经被集成在商用车中进行车道标记检测。虽然这种方法主要限制车辆的横向位置,但对于道路易于识别的环境(例如高速公路)来说,这已足够。另一方面,更复杂的环境(主要是城市,交叉路口,弯道等)并不总是提供足够的道路信息来定位车辆,相对于比直线,高速公路类环境,这种环境下放入纵向轴线所需的位置精度更重要。无论如何,为了建立一个安全的系统并确保道路上的一致行为,需要冗余。因此,应该考虑不同的定位手段。
一般而言,无论是在全局还是局部系统中,车辆的定位是执行任何其他感知或规划任务的基本功能。预测道路上其他障碍物的演变和选择最合适的机动动作需要准确了解车辆自身的位置以及它在未来几秒内会如何演变。 SLAM框架为这个问题提供了答案,同时仍然具有足够的通用性,允许使用任何传感器或估计技术,以适应同时估算车辆定位和地图的先决条件。当自动驾驶被认为是一个整体时,地图是最重要的,因为它提供了第一级的知觉,以便作出适当的决定。
The SLAM problem is considered as one of the keys towards truly autonomous robots, and as such is an essential aspect of self-driving cars. However, many issues are still preventing the use of SLAM algorithms with vehicles that should be able to drive for hundreds of kilometers in very different conditions. This last statement encompasses the two main problems arising when dealing with SLAM for autonomous vehicles: localization tends to drift over time and maps are not necessarily viable in every driving condition. The former problem is well-known in the SLAM community. The local and incremental positioning estimation that is given by SLAM algorithms tends to diverge from the true trajectory as the traveled distance increases. Without prior knowledge or absolute information, it thus becomes almost impossible to ensure a proper localization for several kilometers. This leads to the second problem, namely having maps that are sufficient for the localization task no matter the conditions. The mapping aspect has gained a lot of attention lately with the objective of providing the necessary information to locate a vehicle in different seasons, weathers or traffic conditions. Many solutions have been envisaged to solve these two problems like building a map with a careful selection of distinctive information with the objective of reusing it later or taking advantage of new communication systems in order to share and enhance the maps built by other road users for instance.
SLAM问题被认为是真正的自动机器人的关键之一,因此他也是是自动驾驶汽车的一个重要方面。然而,许多问题仍然阻碍了SLAM算法在车辆中的应用,并且使其能够在非常不同的条件下行驶数百公里。上句话涵盖了处理自动驾驶汽车SLAM时出现的两个主要问题:定位倾向于随时间漂移,并且地图在每种驾驶条件下都不一定可行。前一个问题在SLAM社区中是众所周知的。由SLAM算法给出的局部和增量定位估计倾向于随着行进距离的增加而偏离真实的轨迹。如果没有先验知识或绝对信息,就几乎不可能确保几公里的适当定位。这导致了第二个问题,即无论条件如何,具有足以用于定位任务的地图。建图方面最近引起了很多关注,目的是为不同季节,天气或交通条件下的车辆定位提供必要的信息。人们已经设想了许多解决方案来解决这两个问题,例如通过仔细选择独特信息来制作地图,以便稍后重复使用或利用新的通信系统来共享和增强由其他道路用户构建的地图。
SLAM is a broad field and involves many topics ranging from sensor extraction to state estimation. In this article, we propose a survey focusing on the current trends in the SLAM community regarding the emergence of the autonomous vehicle market. To be clear, throughout this paper, we will refer to SLAM as approaches that are composed of at least an odometry and a mapping module in order to cover all the techniques that are of interest for autonomous driving. We will first give a general introduction to SLAM by reviewing the most commonly used estimation techniques, discuss the different existing benchmarks and data sets and point to the relevant surveys covering aspects that are not reviewed in this article. This will be the object of Section II. Then, Section III will concern the limits of classical methods for autonomous driving and more especially the impact of the drift. The problem will first be stated and we will then focus on the solutions proposed to avoid or correct this drift and on the general reliability of the exploited information. We will end this section by discussing the criteria that should be respected for a SLAM approach to be viable for autonomous driving. Section IV will concern the techniques that tackle some of the challenges of SLAM for autonomous cars, namely building and exploiting long-term maps. A survey of the methods that take advantage of previous knowledge, coming from the SLAM algorithm itself or from another resource (Geographic Information Systems for instance), will be proposed as it is a key aspect to achieve true autonomy for self-driving cars. Section V will give an overview of multi-vehicle SLAM systems. This field offers a solution to both the problems mentioned earlier: constraining drift and enhancing maps. The different design choices of such applications will be exposed and motivated with the related state of the art. Finally, Section VI will expose the large-scale experiments that have been carried out so far with relation to the localization means used. It will serve as a basis to discuss the future orientations and the remaining challenges in Section VII.
SLAM是一个广泛的领域,涉及许多主题,从传感器提取到状态估计。在本文中,我们提出一项调查,重点关注SLAM社区目前关于自动车市场出现的趋势。清楚的是,在本文中,我们将把SLAM看为由里程计和一个建图模块组成的方法,以涵盖所有对自动驾驶感兴趣的技术。我们首先通过回顾最常用的估算技术,对不同的现有基准和数据集进行讨论,并指出本文未涉及的相关方面的相关调查,从而对SLAM进行一般性介绍。这将成为第二部分的目标。然后,第三节将关注传统自动驾驶方法的局限性,尤其是漂移的影响。首先说明问题,然后我们将重点讨论为避免或纠正这种漂移以及所利用信息的总体可靠性而提出的解决方案。在本节末尾我们将通过讨论SLAM方法应该遵守的标准以使得自动驾驶变得可行。第四节将涉及解决自动驾驶汽车SLAM所面临的一些挑战,即构建和开发长期地图。我们对利用SLAM算法本身或其他资源(例如地理信息系统)利用先前知识的方法进行调查,因为这是实现真正的自动驾驶汽车的自主性的关键方面。第五部分将概述多车SLAM系统。该领域为前面提到的问题提供了解决方案:约束漂移和增强地图。这些应用程序的不同设计选择将会被现有技术的暴露和激发其动力。最后,第六部分将揭示迄今为止所进行的大规模实验与所使用的定位手段的关系。它将作为讨论第七部分未来方向和剩余挑战的基础。
The SLAM Problem
Initiated by Smith and Cheeseman [1] in 1986, the SLAM topic got popular during the 1990s with many structuring works like [2] or [3]. Through the years, new methods have appeared using different sensors (camera, laser, radar, etc.), creating new data representations and consequently new types of maps. Similarly, various estimation techniques have emerged inside the SLAM field. A quick panorama has been made 10 years ago in [4] and [5]. Readers looking for an initiation to the global SLAM problem can also refer to [6] and [7] for a comprehensive introduction to the topic and to [8] for an extensive up-to-date review of the current challengesin SLAM.
SLAM问题由Smith和Cheeseman于1986年提出[1],SLAM话题在20世纪90年代受到了许多结构化工作的欢迎,如[2]或[3]。多年来,使用不同传感器(照相机,激光,雷达等)出现了新的方法,创建新的数据表示,从而创建新类型的地图。同样,在SLAM领域内出现了各种估算技术。 10年前[4]和[5]已经做出了快速的全景图。寻找全局SLAM问题的读者也可以参考[6]和[7],他们全面介绍该主题,另外[8]对SLAM当前的挑战进行广泛的最新评论。
SLAM methods require a wide panel of algorithms to ensure the robustness of the provided localization. As such, sensor data extraction, primitive search [9], data association [10] or map storage and updates [11] are part of the topics which concern SLAM as well. However, in this literature review, we will first focus on the main estimation methods that exist before going through the current trends in the autonomous vehicle application field.
The SLAM problem is usually formalized in a probabilistic fashion. The whole objective is to be able to estimate at the same time the state of the vehicle and the map being built. The vehicle state can be defined differently depending on the application: 2D position and orientation, 6D pose, speed, acceleration, etc. We denote
xk
x
k
the vehicle pose estimation at the time
k
k
and the map of the environment. To estimate these variables, it is possible to take advantage of what we call control inputs u k and which represent an estimation of the motion between
k−1
k
−
1
and
k
k
. They usually come from wheel encoders or any sensor able to give a first idea of the displacement. The particularity of SLAM approaches is to take
into consideration measurements coming from sensor readings and denoted . They help to build and improve the map and indirectly, to estimate the vehicle pose.
SLAM方法需要大量的算法来确保所提供的定位的稳健性。因此,传感器数据提取,基元搜索[9],数据关联[10]或地图存储和更新[11]也是涉及SLAM的主题的一部分。然而,在这篇文献综述中,我们将在遍历无人车应用领域当前趋势之前首先关注现存则主要主流方法。
SLAM问题通常以概率方式表示。整体目标是能够同时估算车辆的状态并构建地图。车辆状态可以根据应用不同而定义:2D位置和方向,6D位姿,速度,加速度等。我们使用 xk x k 表示车辆在时间 k k 的位姿,表示的环境地图。为了估计这些变量,有可能利用我们称之为控制输入 uk u k ,他可以作为 k−1 k − 1 和 k k 之间的运动的粗略估计。它们通常来自轮子编码器或任何能够首先提供位移的传感器。SLAM方法的特殊性在于采取考虑来自传感器读数的测量结果并记为。他们帮助建立和改进地图,间接地估计车辆位姿。
The SLAM problem can be formulated in two ways. In the first one, the goal is to estimate the whole trajectory of the vehicle and the map given all the control inputs and all the measurements. A graphical representation can be seen in Figure 1. This problem, known as full SLAM, computes the joint posterior over all the poses and the map based on the entirety of sensor data:
SLAM问题可以用两种方式来表达。在第一个中,目标是估计车辆的整个轨迹以及在给定所有控制输入和所有测量值的情况下的地图。图1中可以看到图形表示。这个问题称为全SLAM,它根据传感器数据的整体计算所有位姿和地图的后验分布。
图1 Graphical representation of the full SLAM problem 全SLAM问题的的图表示
The full SLAM problem can be difficult to handle in real time as the complexity of the problem grows with respect to the number of variables considered. The idea of online SLAM is to estimate the current position of the vehicle, usually based on the last sensor information. A graphical representation is shown in Figure 2. The incremental nature of the problem can be obtained using Bayes’ rule:
The estimation techniques can be separated into two main categories: filter-based approaches and optimization-based methods. The former corresponds to iterative processes that are thus suited to online SLAM and the latter regroups methods performing batch treatments and, as such, are usually applied to solve the full SLAM problem even if this trend has changed during the last ten years.
完整的SLAM问题可能难以实时处理,因为问题的复杂性与所考虑变量的数量有关。在线SLAM的想法是估计车辆的当前位置,通常基于最新的传感器信息。图2显示了图表示。问题的增量性质可以使用贝叶斯规则获得:
估算技术可以分为两大类:基于滤波器的方法和基于优化的方法。前者对应于迭代过程,因此适用于在线SLAM,后者重新组合执行批量处理的方法,并且因此通常用于解决全SLAM问题,即使在过去的十年里该趋势已经改变。
图2 Graphical representation of the online SLAM problem at time k+2 k + 2 在线SLAM问题在时间 k+2 k + 2 的图表示
Filter-based SLAM
Filter-based methods derive from Bayesian filtering and work as two-step iterative processes. In the first step, a prediction of the vehicle and the map states is made using an evolution model and the control inputs uk u k . In the second step, the current observation, zk z k , coming from sensor data, is matched against the map in order to provide a correction of the previously predicted state. The model that put in relation the observation with the map is called an observation model. These two steps iterate and so incrementally integrate sensor data to estimate the vehicle pose and the map.
基于滤波器的方法来自贝叶斯滤波器,该方法工作过程分为两步迭代过程。在第一步中,使用演化模型和控制输入 uk u k 预测车辆和地图状态。在第二步中,将来自传感器数据的当前观测值 zk z k 与地图进行匹配以便纠正之前预测的状态。将观测与地图联系起来的模型称为观测模型。这两个步骤迭代并如此递增地集成传感器数据来估算车辆位姿和地图。
Extended Kalman Filter
The first branch of the filter-based methods concerns derivatives of the Kalman Filter (KF) [12]. KFs assume that data are affected by Gaussian noises which is not necessarily true in our case. At its basic form, KFs are designed to handle linear systems and while they have great convergence properties [13][14], they are rarely used for SLAM. On the other hand, the Extended Kalman Filter (EKF) [15] is a common tool in non-linear filtering and as such in SLAM. The EKF adds a linearization step for non-linear models. The linearization is performed around the current estimate by a first order Taylor expansion. The optimality of the EKF has been demonstrated as long as the linearization is made around the true value of the state vector. In practice, it is the value to estimate and is thus not available. This can cause consistency issues: the true value can be outside of the estimated uncertainty。
基于滤波器的方法的第一个分支涉及卡尔曼滤波器(KF)的推导[12]。 KF假定数据受到高斯噪声的影响,这在我们的例子中不一定是真实的。在其基本形式上,KF被设计用于处理线性系统,虽然它们具有很好的收敛性[13] [14],但它们很少用于SLAM。另一方面,扩展卡尔曼滤波器(EKF)[15]是SLAM中非线性滤波的常用工具。 EKF为非线性模型增加了一个线性化步骤。通过一阶泰勒展开在当前估计周围执行线性化。只要线性化是在状态向量的真值附近进行的,就已经证明了EKF的最优性。在实践中,这是估计的价值,因此不可用。这可能导致一致性问题:真实值可能超出估计的不确定性
However, estimates are most of the time sufficiently close to the truth to allow the use of the EKF. Sensors like laser scanners for instance, that provide a range information, are particularly adapted [18][19]. Sonars were among the first EKF SLAM approaches for underwater applications [20][21]. Both sensors have been combined in an EKF SLAM approach in [22]. A coupling of vision and laser has also been proposed in [23]. Monocular approaches have also largely been studied. In [24], the landmarks composing the map are inserted in the EKF only when sufficiently accurate. In [25], a specific landmark parametrization is proposed. In [26], the authors studied the impact of the Kalman gain on an update in order to constrain linearization errors.
然而,估计大部分时间都足够接近事实,以允许使用EKF。 像激光扫描仪这样的传感器可以提供范围信息,特别适合于[18] [19]。 声纳是首批用于水下应用的EKF SLAM方法[20] [21]。 [22]中两种传感器都采用了EKF SLAM方法。 [23]中也提出了视觉和激光的耦合。单目视觉方法也在很大程度上被研究。 在[24]中,只有在足够准确的情况下,组成地图的地标才会插入到EKF中。 在[25]中,提出了特定的路标点参数化。 在[26]中,作者研究了卡尔曼增益对更新的影响,以限制线性化误差。
The continuously growing map size makes the EKF unable to support large-scale SLAM as the update time depends, in a quadratic way, on the size of the state vector. To overcome this issue, the notion of submaps has been created. Each time a map becomes too large (various criteria can be used to decide so), a new blank map replaces the old one. A higher-level map keeps track of the links between the submaps not to lose information. Among the first submap-based approaches, we can cite [27] with the Constrained Relative Submap Filter where submaps are decorrelated from one another but where loop closure (drift correction based on the recognition of a previously visited place) is difficult to perform. The constant-time SLAM [28] and the Network Coupled Features Maps [29] work in a similar fashion except that landmarks common between submaps are kept to ease the change from one to another but these methods ignore correlated data. The Atlas framework [30] takes advantage of a graph structure where nodes are submaps and vertices the transformation between two submaps. Loop closures can only be applied offline. Estrada et al. solve this problem by maintaining two high-level maps [31] but still use landmarks in multiple submaps. Conditionally independent submaps were proposed in [32] as a solution to this issue. The idea is to marginalize the information that is not common to two submaps in order to make them independent given the common part. A different approach is chosen in [33]. A divide and conquer method is proposed to join the local maps created so as to recover an exact global map. New criteria to decide when to create submaps have also been proposed like the number of simultaneously observable landmarks in [34] or the correlation between landmarks in [35]. An alternative to submaps, the Compressed EKF SLAM, has been presented in [19]. In this work, the state vector is divided into an active (and updated) part and another one which is compressed into a light auxiliary coefficient matrix.
不断增长的地图大小使得EKF无法支持大规模SLAM,因为更新时间以二次方式取决于状态向量的大小。为了克服这个问题,创建了子图的概念。每次地图变得太大(可以使用各种标准来决定),新的空白地图将取代旧地图。更高级别的地图会跟踪子地图之间的链接而不会丢失信息。在第一种基于子图的方法中,我们可以引用[27]与相对约束子图滤波器,其中子图相互解相关,但是闭环(基于识别先前访问的地点的漂移校正)难以执行。恒定时间SLAM [28]和网络耦合特征图[29]以相似的方式工作,除了保持子图映射之间共同的界标以缓解从一个到另一个的变化,但这些方法忽略相关数据。Atlas框架[30]利用了图结构,其中子图和两个子图之间的转换作为节点。闭环检测只能离线应用。埃斯特拉达等人通过维护两个高级地图来解决这个问题[31],但仍然在多个子地图中使用地标。 [32]中提出了条件独立的子图,作为解决这个问题的方法。这个想法是边缘化两个子图中不常见的信息,以使它们在公共部分独立。 [33]中选择了不同的方法。提出一种分而治之的方法来加入所创建的局部地图,以恢复一个精确的全球地图。文章还提出了决定何时创建子地图的新标准,如[34]中实时地可观察的地标数量或[35]中地标间的相关性。压缩EKF SLAM的替代方案已在[19]中提出。在这项工作中,状态矢量被分成一个活动(和更新)的部分,另一个被压缩成一个轻的辅助系数矩阵。
Unscented Kalman Filter。
To compensate the weaknesses of the EKF with highly non-linear systems, Julier et al. introduced the Unscented Kalman Filter (UKF) [36] which avoids the computation of the Jacobians. The idea is to sample particles, called sigma points, which are pondered around the expected value thanks to a likelihood function. These sigma points are then passed to the non-linear function and the estimate is recomputed. The major drawback of this method is its computational time. Most of the works using the UKF took place at the beginning of the 2000s [37][38]. A real-time application to a monocular context has been demonstrated in [39].
为了补偿高度非线性系统的EKF的弱点,Julier等人 引入了Unscented卡尔曼滤波器(UKF)[36],避免了雅可比矩阵的计算。这个想法是对粒子(称为sigma点)进行采样,这些点是由于似然函数而围绕期望值进行权衡的。然后将这些sigma点传递给非线性函数,并重新计算估计值。 这种方法的主要缺点是计算时间。 大部分使用UKF的作品都是在2000年代初期[37] [38],[39]给出了对单目环境的实时应用。
Information Filter
Another variant of the Kalman Filter is the Information Filter (IF) [40] which is the inverse form of the Kalman Filter. Its particularity is to define the information matrix as the inverse of the covariance matrix. One main advantage is that the update step becomes additive and is not dependent on the order in which the observations are integrated [41]. It is also possible to make the information matrix sparser by breaking the weak links between data [42] which ensures a near constant-time update [43]. The IF is not as popular as the EKF in mono-vehicle SLAM despite some applications in [44][45][46] because it is necessary to convert every measure in its inverse form which can be costly. However, the IF has been more exploited for multi-vehicle SLAM (see Section V).
卡尔曼滤波器的另一个变体是信息滤波器(IF)[40],它是卡尔曼滤波器的逆向形式。 其特殊性是将信息矩阵定义为协方差矩阵的逆矩阵。一个主要的优点是更新步骤变成可加的,并且不依赖于观测被整合的顺序[41]。通过打破数据之间的弱连接[42],使信息矩阵变得更稀疏,从而确保近乎恒定的时间更新[43]。 虽然在[44] [45] [46]中有一些应用,但IF在单车SLAM中并不像EKF那样受欢迎,因为它必须转换每个测量到反算形式,这样成本较高。然而,IF更多地被用于多车SLAM(见第五部分)。
Particle Filter
The second major branch in filtering SLAM algorithms is based on Particle Filters (PF). Their principle is the following: the state is sampled with a set of particles according to its probability density. Then, as with every filter, a prediction of the displacement of each particle is accomplished and an update, depending on the observation, is performed. In the update phase, particles are weighted according to their likelihood regarding the measures. The most likely particles are kept, the others are eliminated and new ones are generated [3]. The direct application of this method to the SLAM is difficult to tract because it requires a set of particles per landmark. Variations of PFs have then appeared, like the Distributed Particle approaches DP-SLAM [47] and DP-SLAM 2.0 [48] which proposed to use a minimal ancestry tree as a data storage structure. It enables fast updates by guiding the PF while reducing the number of iterations of the latter. However, the most famous PF algorithm is FastSLAM [49] which has been greatly influenced by previous works [50][51] on the subject. Each landmark is estimated with an EKF and particles are only used for the trajectory. FastSLAM has been applied in real-time in [52]. The major advantage of PFs is that they do not require a Gaussian noise assumption and can accommodate with any distribution. Nevertheless, PFs also suffer from long-term inconsistency [53]. In [54], this problem was tackled by combining FastSLAM to an IF but the result is computationally heavy. In [55], FastSLAM was applied to laser data where the matching gives an odometry measure which is then used to weight particles in the resampling phase. Still regarding FastSLAM, its Bayesian foundations were extended to the Transferable Belief Model framework (TBM) in [56] thus allowing the representation of conflict in the employed grid map.
滤波SLAM算法的第二个主要分支是基于粒子滤波器(PF)的方法。它们的原理如下:根据概率密度,用一组粒子对状态进行采样。然后,如每个滤波器都会完成每个颗粒位移的预测,并根据观察结果执行更新。在更新阶段,粒子根据它们在度量上的可能性进行加权。最可能的粒子被保留下来,其他粒子被淘汰并产生新的粒子[3]。这种方法直接应用于SLAM很难实现,因为它需要每个地标的一组粒子。随后出现了PF的变化,如分布式粒子方法DP-SLAM [47]和DP-SLAM 2.0 [48],它们提出使用最小的祖先树作为数据存储结构。它通过引导PF来实现快速更新,同时减少后者的迭代次数。然而,最着名的PF算法是FastSLAM [49],它受到以前着作[50] [51]对该主题的极大影响。每个界标都用EKF估算,粒子仅用于轨迹。 FastSLAM已在[52]中实时应用。 PFs的主要优点是它们不需要高斯噪声假设,并且可以适应任何分布。尽管如此,PFs也存在长期的不一致性[53]。在[54]中,通过将FastSLAM与IF相结合解决了这个问题,但结果是计算量大。在[55]中,FastSLAM被应用于激光数据,其中匹配给出里程计测量,然后用该测量值在重采样阶段对粒子进行加权。仍然关于FastSLAM,其贝叶斯基础被扩展到[56]中的可转移信念模型框架(TBM),因此允许在所采用的网格图中表示冲突。
Filter-based approaches tend to now rely on 3D points when it comes to vision sensors [24] and 2D occupancy grids with laser data. The latter, introduced in [57] and [58], are particularly suited to SLAM as the discretization of the space due to the grid itself allows for a finite number of position candidates to test during the update step. Landmark uncertainties are represented by the occupancy probability of the cell which makes updates on map parts possible. During the update step, the classical approach is to maximize the similarity between the measurement and the map like in [59] in the Bayes formalism or in [60] in the TBM context. Not mentioned yet, the use of RADARs for SLAM has been demonstrated with filter-based approaches in [61][62][63]. However, their use remains limited in large-scale experiments due to the noisy nature of the signal. They are usually employed for obstacle detection.
当涉及视觉传感器[24]和2D激光数据占用网格时,基于滤波器的方法往往依赖于3D点。 后者在[57]和[58]中介绍,特别适用于SLAM作为离散化。由于网格本身的空间允许有限数量的候选位置,在更新步骤期间进行测试。地标不确定性由单元的占用概率表示,这使得地图部分上的更新成为可能。在更新步骤中,经典方法是最大化测量和地图之间的相似性,如贝叶斯形式中的[59]或TBM上下文中的[60]。另外,在[61][62][63]中展示基于滤波器的方法的SLAM应用。 然而,由于信号的噪声性质,它们的使用在大规模实验中仍然受到限制,它们通常用于障碍物检测。
Optimization-based SLAM
Optimization-oriented SLAM approaches generally consist of two subsystems, as in filter-based SLAM. The first one identifies the constraints of the problem based on sensor data by finding a correspondence between new observations and the map. The second subsystem computes or refines the vehicle pose (and past poses) and the map given the constraints so as to have a coherent whole. As for filters, we can divide these methods into two main branches: bundle adjustment and graph SLAM.
与基于过滤器的SLAM类似,面向优化的SLAM方法通常由两个子系统组成。第一个子系统通过寻找新观测数据与地图之间的对应关系来识别基于传感器数据的问题约束。 第二个子系统计算或改进车辆位姿(和过去的位姿)和给定约束的地图,以便具有连贯的整体。与过滤器方法类似,我们可以将这些方法分成两个主要分支:光束平差BA和图SLAM。
Bundle Adjustment
Bundle adjustment is a vision technique that jointly optimizes a 3D structure and the camera parameters (pose). Most of the early works focused on 3D reconstruction [64] but it has since then been applied to SLAM. The main idea is to optimize, usually using the Levenberg-Marquardt algorithm [65], an objective function. The latter minimizes the reprojection error (distance between observations in the image and reprojected past features) giving the best camera and landmark positions. However, the core bundle adjustment algorithm can be computationally heavy as it considers all the variables at once to optimize over. Since then, many approaches have been proposed to perform local optimizations. In [66][67], a hierarchical method working on smaller chunks is presented. The partial 3D models obtained are then merged in a hierarchical fashion. To reduce the complexity, two virtual key frames are chosen or created among all the frames to represent a given portion. This lessens the number of variables to optimize. A similar method has been applied to autonomous driving in [68] with accurate localization results but an offline map building.
BA是联合优化3D结构和相机参数(位姿)的视觉方法。早期的大部分工作都集中在三维重建上[64],但之后它已应用于SLAM。主要思想是通常使用Levenberg-Marquardt算法[65]优化目标函数。将重投影误差(图像中的观测值与重新投影的特征点之间的距离)最小化,以提供最佳的相机和地标位置。然而,核心BA算法在计算复杂度较高,因为它一次考虑所有变量来优化。从那时起,已经提出了许多方法来执行局部优化。在[66][67]中提出了一种处理较小块的分层方法。然后将获得的部分3D模型以分层方式合并。为了降低复杂度,在所有帧中选择或创建两个虚拟关键帧来表示给定部分。这减少了要优化的变量的数量。[68]中给出了类似的方法,已被应用于自动驾驶[68],具有准确的定位结果,但是离线构建地图。
In [69], an incremental method optimizing only over the new information is proposed. In the worst case, it is equivalent to a full bundle adjustment. In [70], the authors use a sliding window over key frames triplets to locally minimize the reprojection errors. Points common between two views are considered in the optimization phase. The principle of a triplet of images is common in the bundle adjustment community. As with every optimization technique, bundle adjustment works well when a good rough estimate is given. In that sense it is important to filter outliers. Nist ́er et al. [71] proposed a selection method based on a preemptive RANSAC [72]. Even though it allows for a real-time application (13 Hz), results are less accurate than in [68]. A local bundle adjustment has been proposed in [11]. The objective is to locally optimize the last n camera positions based on the 2D reprojections of the points in the last N frames (N ≥ n). These parameters can be tuned to obtain a real-time approach with good results. An integration of inertial measures has been proposed in [73]. A bi-objective function (vision and inertial objective functions) is weighed with coefficients set thanks to a machine learning approach. Another alternative is to optimize only a set of frames in order to solve loop closing problems. This is the idea presented in [74] where the authors keep skeleton frames and marginalize out most of the features and frames in the process. This method is very close to graph SLAM techniques.
在[69]中,提出了一种仅对新信息进行优化的递增方法。在最坏的情况下,它相当于一个完整的BA。在[70]中,作者在关键帧三元组上使用滑动窗口来局部最小化重投影误差。优化阶段会考虑两个视图之间共同的点。图像三元组的原理在BA社区中很常见。如同每种优化技术一样,当给出良好的粗略估计时,BA效果良好。从这个意义上说,过滤离群值是很重要的。 Nist er et al. [71]提出了一种基于抢先式RANSAC的选择方法[72]。即使它允许实时应用(13 Hz),结果也不如[68]中的准确。在[11]中提出了一种局部BA,目标是基于最后N个帧中的点的2D重投影(N≥n)局部优化最后n个摄像机位置。可以调整这些参数以获得具有良好结果的实时方法。惯性测量的积分已经在[73]中提出,由于采用了机器学习方法,系数设定了双目标函数(视觉和惯性目标函数)。另一种选择是仅优化一组帧以解决闭环问题。这是[74]中提出的思想,作者保留骨骼框架并边缘化该过程中的大部分特征和框架。这种方法非常接近图SLAM方法。
Graph SLAM
The graphical representation of Bayesian SLAM (see Figure 2) is particularly well adapted to a resolution via optimization methods. An example, coming from [75], is shown in Figure 3. Based on this graphical representation, a matrix describing the relationships between landmarks and vehicle poses can easily be built and used in an optimization framework.
Many localization problems can be modeled using a graph representation and solved by finding the minimum of a cost function that follows this form:
Fig. 3: Transition graph and associated constraints 图3 图转移和关联约束
贝叶斯SLAM(参见图2)的图表示特别适合通过优化方法求解。 图3显示了一个来自[75]的例子。基于这种图形表示,描述地标和车辆位姿之间关系的矩阵可以很容易地建立并用于优化框架。许多定位化问题可以使用图表示来建模,并通过查找以下形式的成本函数的最小值来解决:
where x x is the vector containing the different vehicle poses, m is the map, e ij is the error function which computes the distance between the prediction and the observation and Ω ij is the associated information matrix.
其中是包含不同车辆位姿的矢量, m m 是地图,是计算预测与观测之间的距离的误差函数, Ωij Ω i j 是关联的信息矩阵。
The optimal values
x∗
x
∗
and
m∗
m
∗
can be obtained through optimization:
The minimization of the, in general, non-linear function F(x,m) F ( x , m ) is usually simplified by a local approximation using popular methods such as Gauss-Newton, Levenberg- Marquardt, Gauss-Seidel relaxation or gradient descent. These methods can either work by optimizing the whole trajectory or with small displacement increments for a real-time use. Sim- ilarly to filtering approaches, the success of the minimization depends on the initialization.
一般而言,非线性函数 F(x,m) F ( x , m ) 的最小化通常通过使用流行方法如Gauss-Newton,Levenberg-Marquardt,Gauss-Seidel弛豫或梯度下降的局部近似来简化。这些方法可以通过优化整个轨迹或使用小的位移增量进行实时使用。 类似于过滤方法,最小化的成功取决于初始化。
The TORO algorithm [76] applies a stochastic gradient descent variant with a novel node parametrization in the graph. This parametrization takes the form of a tree structure that defines and updates local regions at each iteration. A different idea is to consider not a Euclidean space but a manifold. It is the basis of the algorithm HOG-MAN [77] where a hierarchical optimization on the manifold is proposed. The lowest level represents the original data while the highest levels capture the structural information of the environment. In g 2 o [78], a similar representation is adopted. g2o uses the structure of the Hessian matrix to reduce the complexity of the system to optimize. COP-SLAM [79] optimizes a pose graph. The latter considers the displacement and the associated uncertainty to build a chain of poses. In a different approach, TreeMap [80] exploits a tree structure for the map and makes topological groups of landmarks so as to make the information matrix sparser and so quicken the processing (O (log n) for n landmarks). Even though not exploiting a graph structure, iSAM (incremental Smoothing And Mapping) [81][82] sim- plifies the information matrix to speed up the underlying optimization. Here, the objective is the QR factorization of this sparse information matrix.
TORO算法[76]在图中应用随机梯度下降变体(新颖的节点参数化)。该参数化采用树结构的形式,在每次迭代时定义和更新局部区域。不同的想法是不考虑欧几里得空间,而是在流形空间(manifold)。它是HOG-MAN算法的基础[77],其中提出了对流形的层次优化。最低级代表原始数据,最高级代表环境的结构信息。在g2o [78]中,采用了类似的表示法。 g2o使用Hessian矩阵的结构来降低系统的优化复杂度。 COP-SLAM [79]优化位姿图。后者考虑位移和相关的不确定性来建立位姿链表。在另一种方法中,TreeMap[80]利用地图的树结构并制作拓扑组的地标,以便使信息矩阵更加稀疏并加快处理(n个地标的O(log n))。即使不利用图结构,iSAM(增量平滑和映射)[81][82]简化了信息矩阵以加速底层优化,目标是这个稀疏信息矩阵的QR分解。
The comparison of filter-based techniques and optimization approaches for a SLAM application is difficult as they are usually considered in different scopes. This comparison effort has been proposed in [83] and then extended in [84] for monocular approaches. The outcome is that optimization tends to give better results than filters which are more subject to linearization issues. However, the authors conclude that “filter-based SLAM frameworks might be beneficial if a small processing budget is available, but that BA optimization is superior elsewhere that with limited resources” [83].
SLAM应用中基于滤波器的技术和优化方法的比较很困难,因为它们通常被认为是在不同的范围内。 这种比较工作已经在[83]中提出,然后在[84]中被推广用于单目系统。 结果是,与线性化问题更为接近的滤波器相比,优化会产生更好的结果。 然而,作者得出结论:“如果只有一个小的处理预算可用,基于滤波器的SLAM框架可能是有益的,但是BA优化在资源有限的其他地方更为优越”[83]。
Relevant surveys and existing data sets
As SLAM is a central topic of both mobile robotics and now autonomous driving, it continues to draw the attention of many researchers. In this section, we direct the readers towards recent survey articles that cover aspects not treated here. We also describe the different existing data sets that serve to benchmark current approaches.
由于SLAM是移动机器人和现在自动驾驶的核心话题,因此它一直引起许多研究人员的关注。 在本节中,我们将引导读者阅读最近的调查文章,其中涵盖了未在此处处理的方面。 我们还描述了用于基准当前方法的不同现有数据集。
As mentioned before, a good introduction to the field is given in [4] and [5] with an overview of the different aspects involved by SLAM. In [85], a brief survey of the most common filter-based estimation techniques is given with a list of the pros and cons of each main paradigm. An interesting take on the SLAM is offered by Dissanayake et al. in [86], and extended in [87], where the observability of SLAM, its convergence properties, its consistency as well as its com- putational efficiency are discussed. In [7], three estimation techniques are reviewed: EKF-SLAM, PF-SLAM and graph- based SLAM. Recent SLAM works that use vision are also reviewed. A specific focus is given to indoor SLAM and RGB-D cameras. There is a strong interest of the community towards vision-based SLAM due to the cost of the sensor and its information richness. Some of the works are reviewed in [88]. In [89] and [90], the authors cover the topic of Visual Odometry (VO) where the focus is on the localization (and the trajectory) and not on the map. Visual SLAM is also at the heart of [91] where feature selection, matching and map representation are addressed. Recently, Ros et al., in [92], present the challenges of Visual SLAM for driverless cars: building long-life maps, how to share maps between vehicles and the necessity to work on high-level features to ease recognition. A very complete survey on visual place recognition has also been published lately [93]. The authors go through the different modules that are essential in this field: image processing (descriptors, etc.), maps (representation) and the estimation part called belief generation. While this survey will also explore place recognition as it is an important aspect of SLAM, we will focus on its application to autonomous vehicle and on the maturity of existing approaches. Already mentioned before is the considerable work of Cadena et al. in [8] where the SLAM topic is reviewed as a whole. Some aspects, not necessarily applicable yet in autonomous driving, will not be covered here (active SLAM, for instance). Once again, we will insist on the key topics with regards to self- driving cars and the experimental results of current state- of-the-art approaches. Finally, in [94], the authors review multiple-robot SLAM approaches with a description of the underlying mathematical formulation. In our dedicated section, we will complete this survey with cloud-based approaches and also give some insights about the current results in this field and the expected future directions in autonomous driving.
如前所述,在[4]和[5]中给出了该领域的一个很好的介绍,概述了SLAM涉及的不同方面。在[85]中,对最常用的基于滤波器的估计技术进行了简要的调查,并列出了每个主要范例的优缺点。 Dissanayake等人提供了一个有趣的SLAM。在[86]中和在[87]中进行了扩展,其中讨论了SLAM的可观测性,收敛性,一致性以及其计算效率。在[7]中回顾了三种估计方法:EKF-SLAM,PF-SLAM和基于图形的SLAM。其中对最近使用视觉的SLAM方法也进行了考察。特别关注室内SLAM和RGB-D相机。由于传感器的成本及其丰富的信息,社区对基于视觉的SLAM产生了强烈的兴趣。有些方法在[88]中进行了评论。在[89]和[90]中,作者讲述了视觉里程计(VO)的主题,重点在于定位(和轨迹),而不是在地图上。视觉SLAM也是[91]的核心,其中特征选择,匹配和地图表示被解决。最近,Ros等人在[92]中提出了无人驾驶汽车的视觉SLAM的挑战:构建长寿命地图,如何在车辆之间共享地图,以及如何处理高级特征以减轻识别。最近还发表了一个关于视觉地点识别的非常完整的调查[93]。作者通过了这个领域中不同的模块:图像处理(描述符等),地图(表示)和估计部分,称为信念生成。虽然这项调查还将探索场所识别,因为这是SLAM的一个重要方面,我们将重点关注其应用于自动驾驶汽车和现有方法的成熟度。之前已经提到了Cadena等人的大量工作。在[8]中SLAM话题作为一个整体进行评估。一些方面,不一定适用于自动驾驶,这里不会涉及(例如,主动SLAM)。我们将再次坚持关于自动驾驶汽车的关键议题以及当前最先进方法的实验结果。最后,在[94]中,作者回顾了多机器人SLAM方法并描述了基础数学公式。在我们的专用章节中,我们将使用基于云的方法完成此调查,并对该领域当前的结果以及未来自动驾驶方向提供一些见解。
A strong effort has been made lately to provide data sets which can serve to benchmark the different SLAM algorithms. The Rawseeds project was among the first to propose a benchmarking tool [95] with outdoor and indoor data sets with ground truth. In a similar fashion, the New College data set [96] is a 2.2 km trajectory in outdoor environment with cameras and lasers. The most famous is the KITTI database [97] with various data sets acquired from a car in urban and peri-urban environments. Its popularity is linked to the variety of the data sets (the vehicle is equipped with stereo cameras and a 3D laser scanner) as well as to the evaluation possibility offered by the website. It is the main tool to evaluate the positioning accuracy (translation and rotation errors) of SLAM algorithms at the moment. Table I offers a summary of the different available data sets based on the comparison table proposed by [98].
We will now present the limits of SLAM approaches with regard to autonomous driving by analyzing the top-ranked methods on KITTI and then discuss how these limits can be mitigated.
最近已经做出了很大的努力来提供可以用来对不同的SLAM算法进行基准测试的数据集。 Rawseeds项目是首批提出基准工具[95]的室内和室外数据集之一。以类似的方式,新学院的数据集[96]是在室外环境中用相机和激光器的2.2公里轨道。最著名的是KITTI数据库[97],它具有从城市和城郊环境中的汽车中获取的各种数据集。其受欢迎度与各种数据集(车辆配备立体相机和3D激光扫描仪)以及网站提供的评估可能性有关。这是目前评估SLAM算法定位精度(平移和旋转误差)的主要工具。表I根据[98]提出的对照表提供了不同可用数据集的摘要。
现在我们将通过分析KITTI排名靠前的方法,介绍SLAM方法在自动驾驶方面的局限性,然后讨论如何减轻这些限制。
TABLE I: Summary of existing open data sets (GT stands for ground truth) based on [98]. Laser information includes field-of-view/horizontal resolution/vertical resolution. Ticks indicate the presence of a sensor. Crosses indicate its absence.
现有的开源数据集的总结(GT代表真值),基于[98],激光信息包括视场角、水平分辨率、竖直分辨率,对号表示使用该传感器,×表示没有使用。
Llimits Of Classical Approaches For Autonomous Driving
Problem identification
The KITTI database provides an interesting vector to evaluate the current state of odometry/SLAM algorithms for autonomous driving as the furnished data sets were carried out with an equipped vehicle mostly on urban and peri-urban roads, where these kinds of approaches are expected to be the most efficient. The top-ranked localization method on KITTI’s odometry data set is V-LOAM [105] which combines visual odometry for motion estimation with a slower 3D laser odometry. The 3D laser is used to create a depth map of the visual features and also to extract geometric feature points (edge points and planar points) that are matched with the previous 3D scan. The obtained transformation serves for the initialization of an ICP matching with the map. This approach is able to reach a 0.68 % translation error and a 0.0016 deg/m rotation error on the KITTI benchmark. The best stereo approach on KITTI is a visual odometry named SOFT[106]. This algorithm is based on a careful feature selection and tracking. Corner-like features are extracted in both images. Correspondences are sought through SAD (Sum of Absolute Differences) over small windows. These features must be matched in two consecutive images to be viable. A Normalized Cross Correlation (NCC) test is performed to remove the last outliers. The frame-to-frame motion is estimated separately for rotation and translation with different algorithms. A similar focus is chosen in [107]. An outlier removal scheme, based on an analysis of landmark reprojection errors, serves to define a criterion to reject outliers. Another interesting approach is described in [108]. The authors use state-of-the-art monocular techniques but applied to a stereo system. This again implies a strong focus on inliers identification and tracking. The performance of these visual methods reaches around 1 % for the translation error and a rotation error below 0.003 deg.m −1 .
KITTI数据库提供了一个有趣的向量来评估用于自动驾驶的odometry/SLAM算法的当前状态,因为所配备的数据集主要在城市和城郊道路上进行,预计这些方法将是最有效的。 KITTI的测距数据集中排名第一的定位方法是V-LOAM [105],该方法将运动估计的视觉里程计与较慢的三维激光里程计相结合。3D激光用于创建视觉特征的深度图,并提取与之前3D扫描相匹配的几何特征点(边缘点和平面点)。获得的变换用于初始化与地图匹配的ICP。这种方法能够在KITTI基准测试中达到0.68%的平移误差和0.0016deg/m的旋转误差。双目系统中在KITTI测试结果最好的方法是名为SOFT[106]的视觉里程计。该算法基于仔细的特征选择和跟踪。在两幅图像中提取角点特征。通过小窗口SAD寻找匹配。这些功能必须在两个连续的图像中匹配才能生效。执行归一化互相关(NCC)测试以去除最后的异常值。帧与帧之间的运动是通过不同的算法分别估算旋转和平移的。[107]中选择了类似的焦点。基于分析地标(landmark)对应重投影误差的异常值,来定义拒绝异常值的标准。另一个有趣的方法在[108]中描述。作者使用最先进的单目技术,但应用于双目系统。这再次意味着强烈关注内点的识别和跟踪。这些可视化方法的性能达到了1%左右的平移误差和0.003deg/m以下的旋转误差。
These four approaches are among the better ranked on KITTI. Even though the results are impressive, the drift that affects them makes their use over long distances impossible. For example, a 1 % translation drift alone (without rotation error) implies that after 100 meters, the vehicle is one meter away from its real, absolute localization. Ways to constrain, correct or avoid this long-term drift must thus be taken into consideration. The large-scale environments involved by autonomous driving are an important concern as a decimetric accuracy must be maintained during several kilometers (a twenty-centimeter accuracy is often recommended [109]). It is thus of vital importance to find a solution, be it under the form of absolute constraints or by studying how the drift occurs and can be modeled.
It clearly appears that the non-linearity of the models used in SLAM algorithms (vehicle models and observation models), which is for the major part induced by rotations (see Figure 4 for an example), has a strong impact on the divergence of a localization. However, it is important to mention that convergence properties have been demonstrated in a linear case [14] for SLAM, the use of Kalman Filters in this field being a good example.
这四种方法在KITTI排名中名列前茅。即使结果令人印象深刻,影响它们的漂移也使得它们无法在长距离使用。例如,单独1%的平移漂移(没有旋转误差)意味着在100米后,车辆距离其真实的绝对定位一米。因此必须考虑限制这种误差,纠正或避免这种长期漂移的方法。自动驾驶所涉及的大规模环境是一个重要的问题,因为在几公里范围内必须保持分米精度(通常建议精度为20厘米[109])。因此,寻找解决方案至关重要,无论是以绝对约束的形式还是通过研究漂移如何发生并且可以建模。
很明显,用于SLAM算法(车辆模型和观测模型)中的模型的非线性(主要由旋转引起(参见图4作为示例))对定位的发散具有强烈的影响。然而,重要的是收敛性质已经在SLAM的线性情况下被证明[14],在这个领域中使用卡尔曼滤波器就是一个很好的例子。
Fig. 4: Impact of an angular drift. In green, the ground truth of trajectory is picture. Below, example of an angular drift (orange arrow) occurring at different times on the estimation of the position.
图4 角度漂移的影响,在途中的轨迹真值为绿色,橘色的为角度漂移在发生时间不同的情况下估计的位置的例子。
Regarding filtering methods, the divergence problem of the EKF SLAM has been stated numerous times. In [17], the
authors show that the filter tends to give biased estimates when the linearization is performed far from the true value. It means that the uncertainty associated to the state (vehicle or map) is too small: the estimate is said to be inconsistent, the true value being out of the estimated uncertainty. The localization drifts progressively, the filter calculating a biased vehicle pose and map [110]. Non-linear models are not the only source of drift. Julier et al. in [16] add that the ignored correlation between data also provokes the divergence. Kalman Filters produce optimistic results, which means that they consider that a piece of data coming from a sensor is entirely new and so temporally decorrelated from the rest (white noise assumption) which is not the case and so contributes to the inconsistency.
The problems raised in this last paragraph also apply to optimization methods when they work in a local environment, which is a necessary condition for real-time estimation. The previously cited optimization-SLAM works also clearly show the importance of the rotation in that drift. It has been observed by multiple authors [111]. It has also been shown experimentally in [112] and [113]. In [114], the authors show that the 2D formulation of the SLAM problem is not convex and as such has local minima in which the optimization can be trapped. However, without including the rotation error, the problem is close to a quadratic function and thus becomes convex.
Many practical causes also affect the behavior of SLAM algorithms. For instance, the presence of dynamic obstacles when a static world assumption is a prerequisite is a factor. A low number of landmarks or the absence of sufficiently salient features in the scene can also induce wrong associations and so a divergence of the system. Of course, the accuracy of the models used with regard to noisy real sensor data is another factor that can have a strong impact on the integrity of the computed localization. Most of the works directly addressing this inconsistency problem tend to find ways to locally avoid or reduce the drift.
关于过滤方法,EKF SLAM的分歧问题已多次提及。在文献[17]中,作者表明,当线性化远离真值时,滤波器往往会给出有偏估计。这意味着与状态(车辆或地图)相关的不确定性太小:估计被认为是不一致的,真实值超出估计的不确定性。定位逐渐漂移,滤波器计算得到有偏的车辆位姿和地图[110]。非线性模型并不是漂移的唯一来源。Julier等人在[16]中补充说,数据之间被忽略的相关性也会引起分歧。卡尔曼滤波器产生乐观的结果,这意味着他们认为来自传感器的一段数据是全新的,并且与其余的时间解相关(白噪声假设),实际上却不是这种情况,因此导致不一致。
在上一段中提出的问题也适用于局部环境中工作的优化方法,这是实时估算的必要条件。先前引用的优化SLAM工作也清楚地表明了旋转在该漂移中的重要性。多位作者已经观察到[111]。它在[112]和[113]中也有实验证明。在[114]中,作者证明SLAM问题的二维表达式不是凸函数,因此导致优化陷入局部最小值。但是,如果不包括旋转误差,则该问题接近二次函数并因此变凸。
许多实际因素也会影响SLAM算法的行为。例如,当一个静态的世界假设为先决条件时,动态障碍的存在是一个因素。地标数量较少或场景中缺乏足够突出的特征也会导致错误的关联,从而导致系统的发散。当然,与有噪声的实际传感器数据相关的模型的准确性是另一个可能对计算定位的完整性产生强烈影响的因素。直接解决这个不一致问题的大多数工作都倾向于找到方法来局部地避免或减少漂移。
Avoiding or reducing the impact of drift
The first possibility to avoid, or at least reduce, the divergence is to divide the map in submaps [31]. These approaches, presented before, break the global map in submaps where each one has its own reference frame. By doing so, the drift can be avoided at a local level by allowing only short trajectories per submap. The size of a submap must thus be calibrated and becomes a compromise between reducing the drift and dealing with the induced information loss. However, even with an appropriate size (see [35] for instance), there is no guarantee that the localization consistency will be preserved. Indeed, each measure contributes to make the system inconsistent and this phenomenon can occur even with a few pieces of data. Moreover, even if the divergence is avoided in the submaps, it is not the case in the global map that keeps track of how the submaps are connected. It is thus more of a local solution to the problem.
In a similar fashion, robot-centered approaches greatly reduce the divergence [115]. Instead of having a fixed world frame, the estimates are always given with relation to the vehicle position. Inconsistency problems are less frequent because they do not accumulate like it is usually the case. The estimation in this frame allows for a better consistency than in classical EKF SLAM [116]. Nevertheless, as with submaps, the divergence is not entirely resolved as landmarks can diverge with only several measures. A few SLAM approaches have used a robot-centered frame with success [117][118]. Landmarks are forgotten as soon as they are not visible which relates these methods to visual odometry and is as such totally appropriate.
避免或至少减少分歧的第一种可能性是将地图划分为子图[31]。前面介绍的这些方法将全局地图分解成子地图,其中每个地图都有自己的参考坐标系。这样可以通过每个子图中的只保留很短的轨迹来避免局部的漂移。因此子地图的大小需要校准,该大小会成为减少漂移和处理诱导信息丢失之间的折中。但是,即使具有适当的大小(例如参见[35]),也不能保证局部一致性将被保留。事实上,每一项措施都有助于使系统不一致,即使有少量数据也会出现这种现象。此外,即使在子地图中避免了发散,全局中的情况也不一定会保持,全局地图一直跟踪局部地图的连接,因此对于该问题需要更多的局部解决方案。
以类似的方式,以机器人为中心的方法大大减少了发散[115]。估计只是相对于辆位置而不是固定的世界坐标系。这种方式中不一致性问题不常见,因为它们不像普通方法那样累积。在这个框架中的估计会得到比经典的EKF SLAM更好的一致性[116]。尽管如此,与子地图一样,发散是无法完全解决的,因为在只有少量几个观测下,地标会发生发散。一些SLAM方法使用了以机器人为中心的框架[117] [118]。一旦它们不可见,地标就会被遗忘,这将这些方法与视觉里程计联系起来并且是完全合适的。
Other practical causes can create drift and inconsistency such as the presence of mobile obstacles when a static world is expected. This problem has been tackled in SLAMMOT (Mobile Object Tracking) or SLAM-DATMO (Detection And Tracking of Moving Objects) approaches [119][120][121]. The idea is to take advantage of the map building process to directly detect and track obstacles by analyzing observations which are not coherent with the vehicle displacement. Alternatively, credibilist approaches [60], which represent ambiguous information, can indirectly deal with dynamic obstacles. They do not detect or track these obstacles but instead treat them as conflicts which allows the algorithm to affect a very low weight to these observations when estimating the vehicle displacement.
More generally, the quality and number of selected land marks in the SLAM process have a clear impact on how the system behaves. Ambiguities can cause matching errors and so the inconsistency of the produced estimates. Related to this problematic, Fault Detection and Isolation (FDI) systems propose to exploit information redundancy to measure the co- herence of different sources (sensors, models, etc.) and detect errors or failures of one of the sources. By doing so, it becomes possible to know if the system is in a situation prone to errors and so reject spurious measures to avoid drift. In [122], the authors propose to analyze the residuals of a bank of Kalman Filters using thresholds to determine if one filter is diverging. A similar approach is described in [123] but the decision part is given to a Neural Network. Sundvall and Jensfelt, in [124], add a coherence measure between different estimations. In [125], Fault Detection is applied to GPS using the Normalized Innovation Squared (NIS) test. Neural Networks are used in [126] to detect sensor failures by making measure predictions using approximators. In [127], the authors propose a visual odometry where photometric camera calibration is taken into account in the minimization step in order to reduce the drift induced by effects like lens attenuation, gamma correction, etc. The quality of the observations is increased compared to simpler camera model.
其他实际因素也可能会造成漂移和不一致,例如在假设为静态的世界存在动态障碍时。在SLAMMOT(移动对象跟踪)或SLAM-DATMO(移动对象的检测和跟踪)方法[119][120][121]中解决了这个问题。这个想法是利用地图构建过程,通过分析与车辆位移不一致的观测数据直接检测和跟踪障碍物。或者,可信度较高的方法[60]可以间接处理动态障碍。他们不检测或追踪这些障碍物,而是将它们视为冲突,使得算法在估算车辆位移排量时能够对这些观测值产生非常低的权重。
更一般地说,SLAM过程中选定的地标(landmark)的质量和数量对系统的行为有着明显的影响。歧义可能导致匹配错误,从而导致产生的估计不一致。与这个问题相关的是,故障检测和隔离(FDI)系统建议利用信息冗余来测量不同来源(传感器,模型等)的共存,并检测其中一个来源的错误或故障。通过这样做,可以知道系统是否处于容易出错的情况,并因此拒绝虚假测量以避免漂移。在[122]中,作者提出使用阈值分析一组卡尔曼滤波器的残差,以确定一个滤波器是否发散。在[123]中描述了类似的方法,但是决策部分被赋予了神经网络。 Sundvall和Jensfelt [124]在不同的估计之间增加了一致性度量。在[125]中,使用Normalized Innovation Squared(NIS)测试将故障检测应用于GPS。神经网络用于[126]通过使用逼近器进行度量预测来检测传感器故障。在[127]中,作者提出了一种视觉里程计,其中在最小化步骤中考虑了光度相机校准,以便减少由诸如透镜衰减,伽玛校正等效应引起的漂移。观测质量相对于更简单的相机模型更高。
An aspect that has often been considered to counter drift in SLAM algorithms is the use and fusion of multiple sources. Conversely to the previously described FDI methods, the idea is not to reject spurious measurements but for various sensors to compensate each other. According to Luo et al. [128], three levels of information fusion can be identified: low-level with raw data, mid-level with features and high- level with objects. In [129], the authors propose to couple laser with camera data. The fusion is done at a landmark level (laser estimation followed by camera refinement). In [130], camera, laser and GPS are fused based on information coherency. Already discussed before, the method described in [105] combines vision and 3D laser data to produce a low- drift algorithm. However, even if coupling information can benefit the accuracy and consistency of an algorithm, it does not ensure that it will not drift. It only partially corrects the drift induced by one sensor.
Even if these methods have been proven to reduce or partially avoid the drift, it does not allow for a drift-free estimation during long periods of time. Correcting the drift in a reliable manner involves that constraints about where a vehicle is with relation to a previously known, local or absolute, reference should be regularly taken into account.
在SLAM算法中常常考虑抵消漂移的一个方面是使用和融合多种传感器。与之前描述的外国FDI相反,这个想法并不是拒绝杂散测量,而是各种传感器相互补偿。根据罗等人,[128]可以识别三个层次的信息融合:低层次与原始数据,中层与特征、高层与对象。在[129]中,作者建议将激光与相机数据耦合。融合是在地标landmark级完成的(激光估计以及相机调整)。在[130]中,相机,激光和GPS基于信息一致性进行融合。之前已经讨论过,[105]中描述的方法将视觉和3D激光数据结合起来以产生低漂移算法。但是,即使耦合信息可以有益于算法的准确性和一致性,也不能确保它不会漂移。它只能部分纠正一个传感器引起的漂移。即使这些方法已被证明减少或部分避免了漂移,它不保证在很长一段时间内进行无漂移估计。实际上应以可靠的方式纠正漂移,应定期考虑车辆与先前已知的,本地或绝对参考相关的约束。
Evaluation criteria for SLAM in autonomous driving
While the previously described approaches can be successfully applied for autonomous indoor mobile robots with a dedicated exploration scheme, it is not sufficient for the large- scale environments of the autonomous driving setting. It means that it is necessary to rely on previous knowledge (absolute or local information) or to be able to improve the built map over time until it is accurate enough (loop closing, for instance). As such, the mapping aspect is central in self-driving vehicles and raises important challenges about how to build or use compact, relevant, reliable and evolutive maps.
We have identified 6 criteria that we think must be fulfilled by a SLAM approach to be viable for autonomous driving. They are described below:
- Accuracy: refers to how accurate the vehicle localization is, be it in world coordinates or with relation to an existing map. Ideally, the accuracy should be below a threshold (usually around 20 centimeters [109]) at all times. Of course, in straight lines, the longitudinal localization can be less accurate without major consequences.
- Scalability: refers to the capacity of the vehicle to handle large-scale autonomous driving. The SLAM algorithm should be able to work in constant time and with a constant memory usage. It implies the use of a map manager to load data when needed. The second aspect is that the map built and/or used has to be light (or stored on a distant server and retrieved on the fly) so as for the approach to be easily generalizable over long distances.
- Availability: refers to the immediate possibility for a SLAM algorithm to provide an adequate localization that could be used right away for autonomous driving if sufficiently accurate. In other words, no first passage of the algorithm is needed to build a map of the environment and it implies that the approach is able to leverage existing map resources (or integrate absolute information more generally). This criterion is particularly important not to restrain where a self-driving vehicle can operate (a world-wide mapping process is costly and requires dedicated means).
- Recovery: refers to the ability to localize the vehicle inside a large-scale map. Initially, the vehicle does not know where it is and a dedicated process is, most of the time, needed to have a rough idea of its position in a map. It is also a way to recover from a failure (kidnapped robot problem).
- Updatability: refers to the identification of permanent changes between the map and the current observation. It also includes the update policy that is needed to integrate these lasting changes and not the temporary ones. Long-term autonomous driving requires the automation of map updates.
- Dynamicity: refers to how the SLAM approach is able to handle dynamic environments and changes. This includes dynamic obstacles that can distort the localization estimation. It also integrates weather conditions that can vary as well as seasonal changes (trees losing leaves, etc.). One of the challenges of long-term SLAM is to find sufficiently discriminative features or methods in order to be robust against those changes.
We will now go through existing methods for both single and multi-vehicle SLAM and compare them with regard to the previously defined criteria so as to have a panorama on the maturity of SLAM approaches for autonomous driving.
尽管先前描述的方法可以成功地应用于具有专用勘探方案的自主室内移动机器人,但是对于自动驾驶设置的大规模环境来说是不够的。这意味着有必要依靠以前的知识(绝对或本地信息)或者能够随着时间的推移改进构建的地图,直到它足够准确为止(例如,闭环)。因此,建图方面是自动驾驶车辆的核心,并对如何构建或使用紧凑、相关、可靠和动态的地图提出了重要挑战。
我们已经确定了6个标准,我们认为必须通过SLAM方法来实现自动驾驶的可行性。它们如下所述:
- 准确度:指车辆定位的精确程度,无论是世界坐标还是相对于现有地图。理想情况下,准确度应始终低于阈值(通常在20厘米左右[109])。当然,用直线表示,不太准确纵向定位不会产生重大后果。
- 可扩展性:指车辆处理大规模自动驾驶的能力。SLAM算法应该能够在恒定的时间内工作并且具有恒定的内存使用量。它意味着在需要时使用地图管理器来加载数据。第二个方面是,建立和/或使用的地图必须是轻的(或存储在远处的服务器上并且随时取回),以便该方法可以在长距离上容易地推广。
- 可用性:指SLAM算法提供足够的定位,如果足够准确,可立即用于自动驾驶。换句话说,不需要算法的第一部分来构建环境地图,这意味着该方法能够利用现有的地图资源(或更一般地集成绝对信息)。这个标准对于不限制自动驾驶车辆可以在哪里运行(全球范围的测绘过程昂贵并且需要专用手段)而言特别重要。
- 恢复:指能够对车辆在大规模地图定位。最初,车辆不知道它在哪里,大多数情况下,专用过程需要大致了解其在地图中的位置。这也是从失败中恢复的一种方式(被绑架的机器人问题)。
- 可更新性:指地图和当前观测之间永久变化的识别。它还包括整合这些持久变化所需的更新策略,而不是临时变更。长期的自动驾驶需要地图更新的自动化。
- 动态性:指SLAM方法如何处理动态环境和变化。这包括可能会扭曲定位估计的动态障碍、它还整合了可能会随季节变化而变化的天气条件(树木丢失树叶等)。长期SLAM面临的挑战之一是要找到充分区分的特征或方法,以便对这些变化具有强大的适应能力。
我们现在将通过现有的单车和多车SLAM方法,并将它们与先前定义的标准进行比较,以便全面了解用于自主驾驶的SLAM方法的成熟度。
Single Vehicle SLAM
All the challenges that appear from the criteria detailed above point at the necessity to build better maps and so at environment representation and recognition both at a metric and topological level. To cover this broad topic, we divide the works in three categories which reflect how the SLAM community usually considers their contributions. The first one concerns loop closure (recognizing a previously mapped place) which is an essential part of SLAM as it allows correcting maps and making them coherent. An interesting aspect is that these algorithms are usually applicable to relocalize a vehicle inside its environment and so provide an answer to the recovery criterion. The second category deals with the localization inside a previously built map. As previously discussed, it is a direct way to constrain the drift. Theoretically, every SLAM approach could be reusing its map but we will focus here on the articles that explicitly do so and on those which address the long-term challenges of reusing maps. A third part focuses on localization approaches that leverage existing data so as to avoid the first passage of a specifically equipped vehicle. For each part of this section, we will provide a synthesis of the discussed approaches, how they respond to the previously established criteria and briefly discuss the remaining challenges.
从上述标准出现的所有挑战都表明,有必要制定更好的地图,因此在度量和拓扑级别的环境表示和识别中都是如此。为了涵盖这个广泛的话题,我们将这些工作分为三类,反映了SLAM社群通常如何考虑他们的贡献。第一个涉及闭环(识别先前映射的地方),这是SLAM的重要部分,因为它允许校正地图并使其一致。一个有趣的方面是这些算法通常适用于重新定位车辆内部的环境,因此为恢复标准提供了一个答案。第二类涉及先前构建的地图内的定位。如前所述,这是限制漂移的直接方式。从理论上讲,每个SLAM方法都可以重用其地图,但我们将在这里重点讨论明确这样做的文章,以及那些解决地图重用长期挑战的文章。第三部分侧重于利用现有数据的定位方法,以避免特定装备车辆的首次通过。对于本节的每个部分,我们将综合讨论的方法,它们如何回应先前建立的标准,并简要讨论剩余的挑战。
Relocalization and loop closure
Recognizing a previously mapped place, and thus reducing the drift induced by SLAM algorithms, is considered as an essential part of SLAM. The main difficulty comes from the fact that the estimation process is inconsistent and cannot be trusted to find those loops. It means that, most of the time, this problem is solved by a dedicated algorithm that runs at all times, independently from the estimation process. This dedicated algorithm is also often used to relocalize the vehicle inside a previously built map (kidnapped robot problem) which is of prime interest when considering autonomous vehicles. Most of the approaches use cameras to find loops because of the richness of the visual information. Williams et al. [131] have identified 3 categories in which these algorithms can be separated:
• Image-to-image methods [132]
• Map-to-map methods [133]
• Image-to-map methods [134]
识别先前映射的位置,并因此减少由SLAM算法引起的漂移,被认为是SLAM的重要部分。主要困难来自于估计过程不一致并且不能信任找到这些循环。这意味着,大部分时间,这个问题通过独立于估计过程的专用算法解决,该算法始终运行。这种专用算法通常也用于在预先构建的地图(被绑架的机器人问题)内重新定位车辆,这在考虑自动驾驶车辆时是最引人关注的。由于视觉信息的丰富性,大多数方法都使用相机来寻找循环。Williams等人[131]已经确定了可以将这些算法分开的3个类别:
• Image-to-image methods [132]
• Map-to-map methods [133]
• Image-to-map methods [134]
1) Image-to-image methods: In image-to-image methods, the loop detection takes place in the sensor space. Bag-of- words approaches based on visual clues [132][135] belong to this category. The idea is to build a dictionary where each word represents similar descriptors. SIFT [136] is often used to find descriptors and compare them due to its good robustness to scale, rotation or viewpoint changes in images. Many visual descriptors can be considered in a redundant fashion (SURF, CenSurE, etc.) so as to have the largest database possible and a representative dictionary. Once the latter is built, it can be used to check if some images can be described with the same words and thus be qualified as representing the same place. Algorithms like FAB-MAP (Fast Appearance Based Map) [99] (and its extension FAB-MAP 2.0 [137]) and PIRF-Nav (Position Invariant Robust Feature Navigation) [138] are two examples of SLAM solutions based on landmark appearance for loop closures. In the first one, information having a strong dependency is extracted so as to avoid false recognitions which often affect loop closing algorithms. The approach has been evaluated on a 1000-km outdoor data set and shows that, using a dictionary built offline, the algorithm can provide proper topological loop closures and make maps more coherent (the gain in accuracy is not evaluated). In [138], the objective is the same but features allowing to be robust against dynamic changes are favored (PIRF). A major difference with FAB- MAP is that the whole process is online and incremental. However, it does not scale well (quadratic complexity for image retrieval) and as thus has not been applied to vehicles contrary to FAB-MAP.
Still to reduce the number of false positives, SeqSLAM [139] proposes to be less discriminative on single images but to analyze over sequences in order to ensure a proper matching. Results on a 22 and 8-km data sets demonstrate the ability of SeqSLAM to handle day/night matching in real time (even though the computing time scales linearly with the size of the data set) with different weather conditions under which FAB-MAP is unable to properly operate. SeqSLAM is demonstrated as a pure place recognition approach and does not apply loop closure to correct previous measurements even though it is possible. SeqSLAM has even been extended to integrate motion estimation between matching by using a graph structure representing the potential roads and a particle filter to maintain the consistency of the positioning [140]. This approach, SMART PF, shows better results than SeqSLAM. Another interesting approach is shown in [141] where the algorithm is tested to find loop closures across different seasons. To do so, the problem is also solved by using image sequences. A flow network is built (a directed graph with start and end-of-traversal nodes) and is used to formulate the image matching as a minimum cost flow problem. On the proposed data set (summer/winter matching), the approach outperforms SeqSLAM. It requires a few seconds to process each image. An evolution of this method has been presented in [142]. The authors replaced the HOG features used to represent images with a global image feature map from a Deep Convolutional Neural Networks. They are able to obtain better performances. The found loops are taken into account in a graph-based approach (g 2 o [78]) with odometry constraints without giving insights about the reached accuracy. The approach runs on a GPU and is thus faster than the previous implementation (real time with a data set of 48,000 images) even though its speed still depends on the data set size.
在图像到图像方法中,闭环检测发生在传感器空间中。基于视觉线索的词袋方法[132] [135]属于这一类。这个想法是建立一个字典,其中每个词代表相似的描述符。SIFT[136]通常用于查找描述符并对其进行比较,因为其对图像中的缩放、旋转或视角更改具有良好的鲁棒性。许多视觉描述符可以被认为是冗余的(SURF,CenSurE等),以便拥有最大的数据库和一个有代表性的字典。一旦建立了后者,它就可以用来检查一些图像是否可以用相同的单词来描述,从而可以用来代表同一个地方。像FAB-MAP(基于快速外观的地图)[99](及其扩展FAB-MAP 2.0 [137])和PIRF-Nav(位置不变鲁棒特征导航)[138]等算法是基于地标(landmark)外观的SLAM解决方案用于闭环的两个例子。在第一个中,提取具有强依赖性的信息以避免经常影响闭环算法的错误识别。该方法已在1000公里室外数据集上进行了评估,并表明使用离线建立的字典,该算法可以提供适当的拓扑闭环并使地图更一致(不评估精度的增益)。在[138]中,目标是相同的,但允许更加鲁棒的特征来对抗动态变化(PIRF)。与FAB-MAP的主要区别在于整个过程是在线的和渐进式的。然而,它不能很好地扩展(图像检索的二次复杂度),因此也没有被应用到与FAB-MAP相反的车辆上。
尽管为了减少误报的数量,SeqSLAM[139]提出减弱对单个图像的判别性,但为了确保正确的匹配,对序列进行分析。22和8公里数据集上的结果表明,SeqSLAM能够在不同的天气条件下实时处理日夜匹配(即使计算时间与数据集的大小成线性关系)即使FAB-MAP无法正常操作。SeqSLAM被证明是一种纯粹的场所识别方法,即使有可能,也不应用闭环来矫正以前的测量。SeqSLAM甚至通过使用表示潜在道路的图结构和粒子滤波器来保持定位的一致性,从而在匹配之间集成运动估计[140]。这种方法(SMART PF)比SeqSLAM显示更好的结果。另一个有趣的方法在[141]中给出,其中算法被测试以找到跨越不同季节的闭环。要做到这一点,问题也可以通过使用图像序列来解决。建立一个流网络(一个带有开始和结束遍历节点的有向图),并用于将图像匹配制定为最小成本流问题。在提出的数据集(夏季/冬季匹配)上,该方法优于SeqSLAM。它需要几秒钟来处理每个图像。这种方法的演变已在[142]中提出。作者用Deep Convolutional Neural Networks的全局图像特征图代替了用于表示图像的HOG特征。他们能够获得更好的表现。发现的闭环在基于图表的方法(g2o[78])中与里程计约束条件考虑在一起,没有提供关于达到的准确度的见解。该方法在GPU上运行,因此比之前的实现(实时数据集为48,000个图像)更快,即使其速度仍取决于数据集大小。
2) Map-to-map methods: Map-to-map methods are solely based on information contained in the map. It is the case of the GCBB algorithm (Geometric Constraint Branch and Bound) proposed in [143]. The principle is to define geometric constraints between landmark pairs in the map and the current observation. An association tree is established so to as to find the most likely association hypothesis that respects the geometric organization of the landmarks in the map. It has been applied successfully in [133] with a hierarchical approach in outdoor experiments even if the accuracy of the resulting map is not measured.
Li et al., in [144] and [145], proposed to merge occupancy grids by using a genetic algorithm to find the best alignment possible. A low-cost GPS is used to constrain the search space to smaller regions. Similarly to the previous example, the consistency of the map is drastically improved but its accuracy is not measured. In the case that the drift can be approximately estimated, the JCBB algorithm (Joint Compatibility Branch and Bound) [146] selects compatible landmarks according to their uncertainty.
地图到地图的方法完全基于地图中包含的信息。这是在[143]中提出的GCBB算法(几何约束分支限界)的情况。其原理是定义地图中的地标对与当前观测之间的几何约束。建立关联树以便找到符合地图中地标的几何组织的最可能的关联假设。即使未测量结果图的准确性,它也已在[133]中以层级方法在户外实验中成功应用。
Li等人在[144]和[145]中提出通过使用遗传算法来合并占用网格以找到可能的最佳对齐。使用低成本的GPS将搜索空间限制在较小的区域。与前面的例子类似,地图的一致性得到了显着改善,但其准确性未被测量。在漂移可以近似估计的情况下,JCBB算法(联合兼容性分支限界)[146]根据其不确定性选择兼容的地标。
Map-to-map methods are difficult to apply without any hint of where to look for loops as maps are either too sparse to be sufficiently distinctive (monocular SLAM for instance) or too complex for a complete exploration in real-time. In such cases, the use of a GPS, like in [145], makes sense as it allows the system to considerably reduce the exploration space.
地图到地图的方法在没有任何寻找循环的提示很难实现,因为地图要么太稀疏而不能足够独特(例如单目SLAM),要么太复杂以至于无法实时完成探索。在这种情况下,像[145]中那样使用GPS是有意义的,因为它允许系统大大减少搜索空间。
3) Image-to-map methods: The last category, called image- to-map (or more generally sensor-to-map), extracts informa- tion from the sensor space and compares it directly with the map. The approach described in [134] is based on classifiers trained on the detected landmarks. The loop closing stage uses these classifiers on the image to check if there is a match. Then, a RANSAC algorithm takes the corresponding 3D positions to compute the new pose of the vehicle. Results with a single camera show that the map becomes coherent in an outdoor scenario.
3)图像到地图的方法:最后一类称为图像到地图(或更一般的传感器到地图),从传感器空间中提取信息并直接与地图进行比较。 在[134]中描述的方法基于在检测到的地标上训练的分类器。闭环检测阶段在图像上使用这些分类器来检查是否匹配。然后,RANSAC算法将采用相应的3D位置来计算车辆的新位姿。单个摄像机的结果显示地图在室外场景中变得一致。
Another possibility to close the loop or relocalize a vehicle is to use hierarchical techniques to speed up the processing. A low-resolution map first serves to identify a rough positioning of the vehicle which is then refined as the map resolution increases. These methods are particularly suited to occupancy grids and have been applied successfully to automated vehicles with laser sensors in [147]. In this example, a coarse-to-fine approach is used to relocalize the vehicle inside a map and not to perform loop closure. Experiments show that the method works well but that the size of the map is a crucial aspect.
闭环检测或重新定位车辆的另一种可能性是使用层级技术来加速处理。低分辨率地图首先用于识别车辆的粗略定位,随着地图分辨率的提高,该定位将被精化。这些方法特别适用于占用网格,并已成功应用于[147]中带有激光传感器的自动车辆。在这个例子中,粗略到精细的方法被用来重新定位地图内的车辆,而不是执行闭环。实验表明,该方法运作良好,但地图的大小是一个关键方面。
4) Discussion: We have focused on how loop closure is applied in SLAM algorithms. The recent years have seen a clear focus on how to deal with seasonal or weather changes and we refer our readers to this survey [93] for a detailed view of this field, independently of loop closure.
Even it may seem like image-to-image methods are favored, recent approaches tend to couple different methods to ensure that a place has been properly recognized. As an example, an application of FAB-MAP with a visual odometry can be found in [148]. The authors rely on dense stereo mapping and use FAB-MAP to indicate loop closures. Their metric integration is done within a graph formulation and is calculated using the Iterative Closest Point (ICP) algorithm [149] between the observation and the previously built map. Even if the accuracy is not directly measured, the built map indicates a consistent mapping. In [150], a bag-of-words method is first employed to generate candidates that are then checked using Conditional Random Fields (CRF) in which a constraint ensures a geometric consistency between the landmarks. The whole approach is applied in a visual SLAM framework. In Table II, we give a brief overview of the principal methods that have been described here with relation to the autonomous driving problematic.
4)讨论:我们专注于如何在SLAM算法中应用闭环。近年来,人们已经清楚地关注如何应对季节性或天气变化,并且我们将读者引用到本次调查[93]中,详细了解该领域,与闭环无关。即使它看起来像图像到图像的方法是受欢迎的,但最近的方法倾向于耦合不同的方法以确保一个地方被正确识别。作为一个例子,可以在[148]中找到带视觉里程计的FAB-MAP的应用。作者依赖密集的双目图,并使用FAB-MAP来指示环路闭合。它们的度量积分是在图公式内完成的,并使用观测值与之前构建的地图之间的迭代最近点(ICP)算法[149]进行计算。即使不直接测量准确度,构建的地图也会显示一致的建图。在文献[150]中,首先采用词袋算法生成候选项,然后使用条件随机场(CRF)进行检查,其中约束条件确保了地标之间的几何一致性。整个方法应用于可视SLAM框架。在表二中,我们简要概述了这里描述的与自动驾驶有关的主要方法。
TABLE II: Relocalization and loop closure methods regarding autonomous driving criteria. Tick: criterion satisfied. Cross: criterion not satisfied.
关于自动驾驶标准的重新定位和闭环方法。 对勾:标准满意。 ×:标准不满意。
Based on the established criteria, we can see that these approaches propose a solution to the recovery problem. However,loop closure was initially seen as a way to correct drift. Whileit produces more coherent maps, the articles cited here donot clearly expose the gain in accuracy. In [151], the authorsexplain that closing the loop can counter the drift inside the loop but that the result will always be overconfident. Another approach was proposed in [152] where errors are redistributedin a probabilistic fashion around the past trajectory when a loop is identified. It avoids the previously mentioned overconfidence problem but does not guarantee the consistency of the localization.
It also explains why the community does not necessarily focus on improving the accuracy but on addressing difficult conditions and building consistent maps. Even if the results are impressive, especially with seasonal changes and sunny/rain conditions, the approaches cited here, when considered for autonomous driving, can be seen as a better GPS but within a given zone. As such, it gives two main challenges for the future: can these approaches be generalized by using already available information (maps, images, etc.) to avoid the zone restriction? Can these methods be extended and integrated into SLAM frameworks to give very resilient approaches that could be viable for autonomous driving?
根据既定标准,我们可以看到这些方法提出了恢复问题的解决方案。然而,闭环最初被视为一种纠正漂移的方法。尽管它制作了更多连贯的地图,但这里引用的文章显然无法显示准确性的提高。在[151]中,作者解释说,闭环可以抵消循环内的漂移,但结果总是过于自信。在[152]中提出了另一种方法,其中当识别出闭环时,误差在过去的轨迹中以概率方式重新分配。它避免了前面提到的过度自信问题,但并不保证定位的一致性。这也解释了为什么社区不一定专注于提高准确性,而是解决困难的条件和建立一致的地图。即使结果令人印象深刻,特别是在季节性变化和晴天/降雨条件下,这里引用的方法在考虑用于自动驾驶时,可以被看作是在给定区域内更好的GPS。因此,它为未来提出了两个主要挑战:通过使用已有的信息(地图,图像等)来避免区域限制,这些方法是否可以推广?这些方法是否可以扩展并整合到SLAM框架中,以提供非常有弹性的方法,可以用于自动驾驶?
Localization in a previously built map
The localization of a vehicle in a previously built map is tightly linked to the methods presented in the previous section about loop closing and relocalization. Indeed, the first necessary action is to globally identify where the vehicle is located in the map. Once this first step is accomplished, more classical data association algorithms can be used. If countering the drift is not entirely possible with loop closure, constraining the localization inside a given map is a viable solution for autonomous driving. However, building a map that scales well, that can be updated or that can work whatever the conditions is not a trivial task. We will go through existing methods that have showed that reusing a map is possible without or with a dedicated map building process.
车辆在先前构建的地图中的定位与上一节中介绍的有关闭环和重定位的方法紧密相关。事实上,第一个必要的行动是全局识别车辆在地图上的位置。一旦完成了第一步,就可以使用更经典的数据关联算法。如果在环路闭合情况下不能完全克服漂移,那么将定位限制在给定地图内的是自主驾驶的可行解决方案。然而,建立一张可以扩展的地图,可以更新或者可以工作,无论条件是不是一项简单的任务。我们将通过现有的方法,证明在没有或没有专门的地图构建过程的情况下重复使用地图是可能的。
1) Identical method between first mapping and on-the-fly localization: The logical extension of any SLAM algorithm is to reuse a built map in order to constrain the localization. While it still requires a first passage, the map can be used right away and does not necessitate a dedicated offline processing. Nevertheless, not all methods have demonstrated that they are viable under such circumstances as it involves continuously building or enriching maps.
与先建图和后定位的方法相同:任何SLAM算法的逻辑扩展是重用一个建好的地图以约束定位。虽然它仍需要第一段时间,但地图可以立即使用,不需要专门的离线处理。尽管如此,并不是所有的方法都表明,在这种情况下它们是可行的,因为它涉及不断构建或丰富地图。
Map management is widely covered by all the approaches that involve submaps, which allows SLAM algorithms to work with a near-constant time and memory consumption [28]. However, reusing the maps is not necessarily considered in these methods. In [59], PML-SLAM is proposed to tackle these problems (map loading and unloading between RAM and hard drive based on the vehicle position, etc.) using laser scanners and occupancy grids. This approach has been proven to be viable for autonomous driving in [153] in certain areas. The map can also be continuously updated but no dedicated process handles it, meaning that temporary changes will also be integrated.
所有涉及子图的方法都广泛涉及地图管理,它允许SLAM算法以接近恒定的时间和内存消耗工作[28]。但是,重复使用地图不一定在这些方法中考虑。在[59]中,PML-SLAM提出用激光扫描仪和占用网格来解决这些问题(基于车辆位置在RAM和硬盘驱动器之间进行地图加载和分流)。这种方法已被证明在某些领域[153]中可用于自动驾驶。地图也可以不断更新,但没有专门的流程处理它,这意味着临时更改也将被整合。
In [154], the authors describe a long-term mapping process where new measurements improve the map. The approach is based on vision and inertial sensors and uses a pose graph representation to integrate new data. Scalability over known areas is achieved thanks to the reduction of the pose graph. No control is made on whether or not an update corresponds to a permanent change. Already mentioned before, the work of McDonald et al. in [150] uses anchor nodes to link together pose graphs acquired during different sessions. This way, the produced visual SLAM maps can all be taken into account. Similarly to [154], all maps are integrated without distinction. The authors of [155] propose a monocular approach that uses a low number of landmarks and their visual patches to characterize them. Landmarks are reprojected in the image and matched thanks to their patches with Normalized Cross-Correlation [156]. Even if this approach is very resource-light, no dedicated mechanism is able to handle map portions that are no longer needed. The use of a memory of images is quite common [157][158][159]. The principle is to compare the current image with a reference database stored in memory, which can be costly and a limiting factor for a map. Data association can also be a problem in such cases. In [160], the authors propose data association graphs as a way to model and solve this problem.
Building maps that are able to evolve and follow permanent changes is a challenge that has mainly been tackled in indoor environments. The Dynamic Pose Graph [161] maintains two maps: an active and a dynamic map. The laser scan points are labeled in order to identify mobile obstacles and then both maps can be updated accordingly. The pose graph representation allows the system to remove inactive nodes and keep a more tractable representation. In [162], a visual odometry approach is coupled with a place recognition mechanism in order to stitch maps acquired at different times. Old views are deleted when they are not longer relevant allowing the system to maintain an up-to-date map even though temporary changes are integrated as well.
在[154]中,作者描述了一个长期的测绘过程,其中新的测量改进了地图。该方法基于视觉和惯性传感器,并使用位姿图表示来整合新数据。由于减少了位姿图,实现了对已知区域的可扩展性。无法控制更新是否对应于永久更改。之前已经提到,McDonald等人的工作。在[150]中使用锚节点来链接不同会话期间获取的位姿图。这样,所产生的视觉SLAM地图都可以被考虑。与[154]类似,所有地图都没有区别地被整合。[155]的作者提出了一种单目方法,使用少量的地标和他们的视觉快来表征它们。图像中的地标被重新投影,并且与归一化交叉相关[156]的块相匹配。即使这种方法非常重要,但没有专门的机制能够处理不再需要的地图部分。使用图像记忆是相当常见的[157] [158] [159]。其原理是将当前图像与存储在内存中的参考数据库进行比较,这可能是昂贵的并且是地图的限制因素。在这种情况下,数据关联也可能成为问题。在[160]中,作者提出数据关联图来建模和解决这个问题。
构建能够发展并遵循永久性变化的地图是主要在室内环境中解决的挑战。动态位姿图[161]保持两张图:一张活动图和一张动态图。激光扫描点标记为识别移动障碍物,然后可以相应更新两个地图。位姿图表示允许系统删除不活动的节点并保持更易处理的表示。在[162]中,视觉里程计法与地点识别机制相结合,以便缝合在不同时间采集的地图。旧视图在不再相关时会被删除,即使临时更改也已整合,系统仍可保留最新地图。
A 3-month outdoor experiment is proposed in [163] in which the authors introduce the concept of plastic maps as a way to integrate visual experiences over time. The idea is that the visual odometry tries to relate to a past experience. If none exists, a new one is created. Experiences can be partial along the trajectory in order to only store changes on dedicated portions and not on the whole map. The authors show through experiments that the number of required experiences tends to be bounded over time. The main advantage of this concept is that it allows the detection of sudden and long-term changes in an unified method. Gathering all the needed experiences requires many traversals of a same road which can only be done at a world level with probe cars.
2) Dedicated map-building process: The increasing capacity of computers, coupled with the fact that a first traversal is needed for autonomous driving with SLAM, has led researchers to focus on how to build the best maps possible for online exploitation. Place recognition approaches, most of the time, fall also in this category as they require a previous passage to build specific databases that are then exploited. In [164], one SVM per feature is trained across several images. The robustness is ensured by discarding the detectors not accurate or unique enough across the neighboring images. These classifiers and their temporal connection can be seen as the map. The authors demonstrate a great reduction in localization failures but do not directly assess the accuracy of the method. The PoseNet algorithm, in [165], trains a Convolutional Network to associate images to the corresponding position and orientation. The CNN is then applied to locate an image in real-time. These approaches do localize the vehicle inside a map but do not ensure continuity in the localization service as with more classical methods.
在[163]中提出了一个为期3个月的户外实验,其中作者引入可塑地图的概念作为一种整合视觉体验的方法。这个想法是,视觉里程计试图与过去的经验联系起来。如果不存在,则创建一个新的。经验可以沿着轨迹,以便仅在专用部分而不是在整个地图上存储更改。作者通过实验表明,所需经验的数量趋于随着时间的推移而趋于恒定。这个概念的主要优点是它允许以统一的方法检测突然和长期的变化。收集所有需要的体验需要许多遍历同一条道路的路线,而这条道路只能在探测车的世界级别上完成。
2)专门的地图制作过程:计算机的容量越来越大,加上SLAM自动驾驶所需的第一次遍历这一事实,导致研究人员将重点放在如何构建可能用于在线开发的最佳地图上。在大多数情况下,场所识别方法也属于这一类,因为它们需要先前的一段通路来构建特定的数据库,然后才能被利用。在[164]中,每个特征的一个SVM在多个图像上被训练。通过丢弃在相邻图像上不够精确或独特的检测器,可以确保鲁棒性。这些分类器及其时间连接可以看作是地图。作者表明定位失得到巨大的减少,但不直接评估方法的准确性。 PoseNet算法在[165]中训练卷积网络以将图像关联到相应的位置和方向。CNN然后被应用于实时定位图像。这些方法确实将车辆定位在地图中,但不像传统方法那样确保定位服务的连续性。
Many approaches have also chosen to improve maps over time. In [166], the authors consider that, in the future, maps will be coming from centralizing collect services. As such, they propose a system where maps, constructed using vision by optimizing over 2D-3D associations, serve to feed a database. An offline process takes all these information to produce summary maps where all the meaningful data are contained (most seen landmarks). The localization accuracy is under 30 cm in various conditions. The problem of the scalability of such an approach is mentioned but not addressed. In [167], the authors consider an initial metric and semantic map and propose an unsupervised method to make them evolve through time in a parking context. The metric map uses a pose graph relaxation algorithm in order to take into account multiple passages. The semantic part is updated thanks to machine learning techniques. Multiple timescales are maintained in [168] in order to choose the one that best fits the current observation. A first run is performed to obtain an initial map. After that, local maps are maintained online with a short-term memory while an offline update allows the system to build more consistent global maps. Indoor experiments show that the map slowly adapts to long-term changes.
许多方法也选择随着时间的推移改进地图。在[166]中,作者认为,未来地图将来自集中收集服务。因此,他们提出了一种系统,其中通过对2D-3D关联进行优化而使用视觉构建的地图用于馈送数据库。一个离线过程会将所有这些信息生成汇总图,其中包含所有有意义的数据(最常见的地标)。定位精度在各种条件下均小于30厘米。提到了这种方法的可扩展性问题,但没有强调。在文献[167]中,作者考虑了一个初始度量和语义地图,并提出了一种无监督的方法,使它们在停车场中随着时间的推移而演变。度量地图使用位姿图松弛算法来考虑多个段落。语义部分由于机器学习技术而更新。在[168]中保持多个时间尺度,以便选择最适合当前观察的时间尺度。执行第一次运行以获得初始地图。之后,局部地图将通过短期内存在线维护,而离线更新则允许系统构建更一致的全局地图。室内实验表明,地图慢慢适应长期变化。
A different approach is proposed in [169] and [170] where spherical images (intensity, depth, saliency and sampling spheres) are built from several images. The main advantage of a sphere is to cover a given area and not only one position. An online registration method based on monocular inputs serves to localize the vehicle. In [171], a two-step process is proposed. During the first phase (teach), a database is built using SURF keypoints and submapping techniques. Then, the repeat phase localizes the robot according to the constructed map in order for it to follow the same path as previously. Similarly, in [172], a visual database is first built offline using a hierarchical bundle adjustment method. Then, in real-time, the vehicle localizes itself inside the map. Both approaches do not allow the system to modify the map once built. In [172] the results show an impressive accuracy (around 2 centimeters with the exact same path). However, the map remains quite heavy (around 100 Mb for a kilometer). Still related to vision, in [173], a geo-referenced visual map approach is proposed. The map is first built offline using SURF features and GPS constraints within a graph-based optimization framework. Online, the GPS is not used and only the map and stereovision are employed. The map memory is one of the problem cited by the authors (500,000 landmarks for 1 km) for an average accuracy of 30 centimeters.
在[169]和[170]中提出了一种不同的方法,其中球面图像(强度,深度,显着性和采样球体)由多个图像构建而成。球体的主要优点是覆盖特定区域,而不仅仅覆盖一个位置。基于单目输入的在线对齐方法用于定位车辆。在[171]中,提出了两步过程。在第一阶段(示教)期间,使用SURF关键点和子地图技术构建数据库。然后,重复阶段根据构建的地图对机器人进行定位,以使其遵循与先前相同的路径。同样,在[172]中,可视化数据库首先使用分层BA方法离线构建。然后,车辆在地图内实时地自行定位。这两种方法都不允许系统在构建完成后修改地图。在[172]中,结果显示了令人印象深刻的精确度(约2厘米,具有完全相同的路径)。但是,地图仍然很重(一公里约为100 Mb)。仍然与视觉有关,在[173]中,提出了一种地理参考视觉地图方法。该地图首先在基于图优化框架内使用SURF功能和GPS约束离线构建。在线时不使用GPS,只使用地图和立体视觉。地图记忆是作者引用的问题之一(1公里500,000个地标),平均精确度为30厘米。
A topometric localization is presented in [174]. During a first passage, a GPS is coupled with vision and range sensors to create a database of compact features (a descriptor per image and a range and standard deviation value between each image). A Bayesian filter is then used for online localization. Extensive experiments (over 4 months) have been carried out to show the resilience of the features to various conditions. The average localization error is around 1 meter. Still in a multi-sensor context, Maximally Stable Extremal Regions (MSER) [175] are extracted from both images and laser grid maps [176]. Coupled with a GPS-RTK, they serve to establish a database during a first passage. A particle filter then keeps track of the vehicle pose online. 2574 landmarks are required for 7 km for an average error below 50 centimeters. Finally, an interesting approach, first proposed in [177] then extended in [178], focuses on building high-quality road surface maps using a 3D laser. A calibration method for reflectivity values is proposed and maps are then computed using a graph approach with inertial and GPS constraints. A histogram filter is used for the online localization inside the map. Results have been demonstrated in autonomous driving for more than 6 kilometers with an accuracy of less than 10 centimeters. A map manager ensures a constant-memory usage. 10 MB are required for 1.6 km. The authors indicate that the reliance on the built map could have inappropriate consequences in complicated weather settings. Instead of a laser-built map, Napier and Newman, in [179], construct orthographic images of the road surface from a visual odometry approach. During the online localization phase, a synthetic image, based on the predicted pose, is generated and compared with the map using mutual information so as to refine the localization. The approach does not work in real time at the moment and its accuracy has not been directly assessed.
在[174]中介绍了一种拓扑定位。在第一次通过期间,GPS与视觉和距离传感器耦合以创建紧凑特征(每个图像的描述符以及每个图像之间的范围和标准差值)的数据库。贝叶斯滤波器然后用于在线定位。已经进行了大量的实验(超过4个月),以显示这些特征对各种条件的弹性。平均定位误差约为1米。仍然在多传感器环境中,从图像和激光网格图中提取最大稳定极值区域(MSER)[175][176]。再加上GPS-RTK,他们可以在第一段时间建立数据库。粒子滤波器会在线跟踪车辆位姿。7公里内需要2574个地标,平均误差低于50厘米。最后,一个有趣的方法,首先在[177]中提出,然后在[178]中进行了扩展,重点在于使用3D激光来构建高质量的路面地图。提出了反射率值的校准方法,然后使用具有惯性和GPS约束的图方法计算地图。直方图滤波器用于地图内的在线定位。在自动驾驶中已经展示了超过6公里的精确度小于10厘米的结果。地图管理器可确保常量使用内存。1.6公里需要10MB。作者指出,对复杂天气环境的依赖可能会产生不适当的后果。Napier和Newman[179]用视觉里程计构建了路面的正交图像,而不是激光建立的地图。在在线定位阶段,基于预测位姿的合成图像被生成并且使用互信息与地图进行比较以便改善定位。该方法目前无法实时运行,其准确性尚未得到直接评估。
3)Discussion: An overview of the main approaches discussed in this section and how they fill our autonomous driving criteria is exposed in Table III.
We can see that almost all approaches propose a recovery system (except [173] where it is not mentioned) which makes sense with regard to the localization in a previously built map. Vision alone, to be sufficient, requires dense representations or a high number of landmarks. It is also worth noting that experimental conditions are not the same between all these approaches and as such one method could not be able to reach a proper accuracy in a different setting. Building long-term maps remain a difficult task and it is not always clear if always updating the map is the good strategy. We also notice that this kind of methods has the disadvantage not to limit the size of the maps (even if this effect is limited) which can be problematic in the long run. Many approaches do not propose a specific mechanism to store partial maps as the scope of the experiments do not necessarily requires it. However, the information density needed for most approaches makes a world-wide deployment difficult to envisage at the moment. Direct availability of maps is the main problem but it is worth noting that [166] mentions probe cars (multiple sensor- equipped vehicles) as a future way to reduce this problem.
讨论:本节讨论的主要方法的概述以及它们如何填补我们的自主驾驶标准,该标准见表III。
我们可以看到,几乎所有的方法都提出了恢复系统(除了没有提及的[173]之外),这对于在先前构建的地图中的定位而言是有意义的。仅仅视觉就足够了,需要密集的表征或者大量的地标。值得注意的是,所有这些方法的实验条件并不相同,因此一种方法无法在不同环境下达到适当的准确度。建立长期地图仍然是一项艰巨的任务,我们并不很清楚总是更新地图是够是一个好策略。我们也注意到,这种方法的缺点是不限制地图的大小(即使这种效果是有限的),从长远来看这可能会有问题。由于实验的范围并不一定需要,所以许多方法不提出用于存储部分地图的特定机制。但是,大多数方法所需的信息密度使得目前很难设想世界范围的部署。地图的直接可用性是主要问题,但值得注意的是[166]提到探测车(多个配备传感器的车辆)作为减少此问题的未来方法。
TABLE III: Localization in a previously built map methods regarding autonomous driving criteria. Tick: criterion satisfied. Cross: criterion not satisfied.
表III:在先前构建的地图中关于自动驾驶标准的定位方法。 对勾:标准满意。 ×:标准不满意。
This will be further discussed in Section V. From this short analysis, it appears that the problem of building life-long maps that can take into account permanent changes as well as seasons and weather in a bounded manner remains a challenge. Regarding accuracy, even though some impressive results are shown, it is difficult to predict how one map representation will work in a different environment. As there is no clear way to evaluate this, more tests in various conditions are needed. The final, bigger problem is how can the creation of these maps be generalized at a world level? At the moment, it is not clear if it will happen as world-wide raw sensor data may never be available.
这将在第五部分进一步讨论。从这个简短的分析看来,建立终身地图的问题似乎是一个挑战,这个地图可以考虑到永久变化以及季节和天气。关于准确性,即使显示了一些令人印象深刻的结果,但很难预测一个地图表示如何在不同的环境中工作。 由于没有明确的方法来评估这一点,所以需要在各种条件下进行更多的测试。 最后一个更大的问题是如何将这些地图的创建在世界范围内推广? 目前,还不清楚是否会发生,因为世界范围的原始传感器数据可能还不曾可用。
Localization in existing maps
Even without world-wide raw sensor data, there exists many different sources already furnishing large-scale information. In the prospect of facilitating the deployment of autonomous vehicles, many researchers have proposed approaches to leverage these geographic information sources. In this part, we will first focus on new map (or potentially new) formats and their applications as it could drive how information are collected in the near future. Then, we will discuss localization methods that already integrate existing widely available data to build prior maps.
即使没有世界范围的原始传感器数据,也存在许多已经提供大规模信息的不同来源。为了促进部署无人车,许多研究人员提出了利用这些地理信息源的方法。 在这一部分中,我们将首先关注新地图(或可能是新的)格式及其应用,因为它可以推动不久的将来如何收集信息。 然后,我们将讨论已经集成现有广泛可用数据的定位方法,以构建先验地图。
1) Building and using future maps: In the recent years, researchers have proposed custom map formats in order to respond to the need of a prior knowledge for autonomous driving. Some have considered the practical challenges of building world-wide maps in an automatic way. In [180], the authors propose a custom format, Emap (enhanced map), usable for a lane-level localization. Lanes are represented as a series of straight lines, circles or clothoids based on GNSS and dead reckoning measurements. This map format has been utilized in [181] in a 30-minute experiment where lane accuracy was achieved (error below 1 meter most of the time). An extension of the Route Network Definition File format (RNDF), initially specified by DARPA, is discussed in [182]. This new format, RNDFGraph, overcomes some of the limitations of the original definition by including lane relationships and lane change information. This is made possible by using a graph representation. Splines are also generated based on waypoints in order to ensure a smooth trajectory. This format has been used in German highways for path planning but not directly for localization purposes. Accurate lane-level map generation is also the objective pursued in [183]. Here, the authors combine line-segment features extracted from a 3D laser and a graph SLAM approach with an OpenStreet Map map. A particle filter is used to obtain a lane estimation that is then integrated inside the map. The authors show that they are able to reach an average accuracy of 5 cm. The map utilization inside a localization algorithm is not proposed in the article. Still regarding automatic lane-level map generation, Guo et al. in [184] (and extended in [185]) propose a low-cost approach based on an OSM map, an INS, a GPS and orthographic images generated from a camera. The idea is that such a system could be used by probe cars and could generalize the map-building process. First, the INS and GNSS measurements are optimized together to obtain the vehicle localization. A second optimization using visual odometry is performed. The image is then aligned according to local map segment extracted from OSM. The lanes are finally extracted from the orthographic image. Still centered on a path planning use, lanelets have been proposed in [186]. The format is proposed with tools to manually create maps from satellite views. This map representation has been used in [109] but dedicated maps for localization were built as well. Finally, in [187], a system based on a high precision GNSS, a 3D laser and a cameras pointing downwards is proposed to build maps. Bird eye view images are generated and localized according to the GNSS. Lane markings and curbs are then extracted from them and are manually reviewed. A localization application, LaneLoc, is proposed where the map is reprojected in the images (no laser is used in the localization phase) using the estimated vehicle position. It eases the subsequent extraction of the lane markings and of the curbs. The localization is able to reach the map accuracy (around 10 centimeters) most of the time.
构建和使用未来地图
近年来,研究人员提出了自定义地图格式,以响应自动驾驶的先验知识需求。有些人已经考虑了以自动方式构建全球地图的实际挑战。在[180]中,作者提出了一种自定义格式Emap(增强型地图),可用于车道级别的定位。车道被表示为基于GNSS和航位推测测量的一系列直线,圆圈或布景。这种地图格式已经在[181]中用于30分钟的实验中,实现了车道精度(大部分时间误差低于1米)。 [182]中讨论了最初由DARPA指定的路由网络定义文件格式(RNDF)的扩展。这种新格式RNDFGraph通过包含车道关系和车道变化信息,克服了原始定义的一些局限性。这可以通过使用图形表示来实现。样条也是基于航点生成的,以确保平滑的轨迹。这种格式在德国高速公路中用于路径规划,但不直接用于定位目的。精确的车道图生成也是[183]中追求的目标。在这里,作者将从3D激光提取的线段特征和图形SLAM方法与OpenStreet Map地图相结合。粒子滤波器用于获得车道估计,然后将其集成到地图中。作者表示他们能够达到5厘米的平均精确度。本文未提出定位算法内的地图利用率。Guo等人仍然关于自动车道级地图生成。在[184](和在[185]中进行了扩展))提出了一种基于OSM地图,INS,GPS和照相机生成的正交图像的低成本方法。这个想法是探测车可以使用这样一种可以概括地图建立过程的系统。首先,INS和GNSS测量一起优化以获得车辆定位。使用视觉里程计进行第二次优化。然后根据从OSM提取的局部地图段对齐图像。最终从正交图像中提取车道。[186]仍然集中在路径规划的使用上,提出了lanelets。该格式提供了手动创建卫星视图地图的工具。该地图表示已在[109]中使用,但也建立了用于定位的专用地图。最后,在[187]中,提出了一种基于高精度GNSS,3D激光器和向下指向的摄像机来构建地图。根据GNSS生成并定位鸟瞰图像。然后从它们中提取车道标记和路缘并手动检查。定位应用LaneLoc被提出,其中地图在图像中被重投影(在定位阶段不使用激光)使用估计的车辆位置。它简化了随后提取的车道标记和路缘。定位能够在大部分时间达到地图精度(大约10厘米)。
All the previously cited methods bring geometric information to, for now, topologic-only maps. As such, approaches making use of this kind of prior information might be viable in the near future. A map-aided localization is proposed in [188] that takes advantage of prior knowledge about lanes and stop lines. A vision system that detects these lanes and compares them with the map is implemented inside a particle filter that also integrates IMU and GPS measurements. This light system is able to reach a 50-cm accuracy. In [189], the authors use only road segments and integrate them inside FastSLAM. The idea of this road constrained approach is to limit the lateral deviation as well as angular errors by matching them with the expected value computed from the map. The latter is built using a Differential GPS. A lane-accuracy is reached with an average error of 1.4 m. Similarly, an accurate digital map of the lane markings is built and used in [190]. Fused with GPS and proprioceptive information, the lane detection allows the localization to be constrained to 10 centimeters along the lateral axis. However, longitudinally, the error is around 1 meter. In general, lane approaches are difficult to apply in more complex settings like intersections and roundabouts where the accuracy is an important concern.
Instead of lanes, the authors of [191] use a map containing the walls of the surroundings. A Bayesian network decides the most appropriate wall to detect using a laser scanner in order to reach a defined accuracy objective in a top-down manner. A 20-cm accuracy is attained using a precise map of the walls composing the scene. A previously-built pole database serves as a reference map in [192]. The accuracy depends on the frequency of the poles but is on average around 1 meter. In [193], the authors exploit the entire pole-like infrastructure. A map is first built using stereovision and a high-precision DGPS/INS combination. During the localization phase, stereo matching with the map, along with odometry and GPS are all integrated inside a particle filter. The accuracy is not directly measured but the authors claim a lateral accuracy of around 20 centimeters.
以前引用的所有方法都会利用几何信息提供用于拓扑地图。因此,利用这种先验信息的方法在不久的将来可能是可行的。在[188]中提出了一种地图辅助定位,它利用了有关车道和停靠线的先验知识。检测这些通道并将其与地图进行比较的视觉系统在粒子滤波器内实现,该滤波器还集成了IMU和GPS测量。这个轻系统能够达到50厘米的精度。在[189]中,作者仅使用路段并将其集成到FastSLAM中。这种道路约束方法的想法是通过将它们与从地图计算出的期望值相匹配来限制横向偏差以及角度误差。后者是使用差分GPS建立的。车道精度达到了1.4米的平均误差。同样,一个精确的车道标志数字地图在[190]中被构建和使用。融合了GPS和本体感受信息,车道检测功能可以将定位限制在沿横轴10厘米。但纵向上,误差在1米左右。一般来说,车道方法难以应用于更复杂的设置,如交叉口和环形交叉口,其中准确性是一个重要问题。[191]的作者不使用车道,而是使用包含周围墙壁的地图。贝叶斯网络决定使用激光扫描仪检测最合适的墙壁,以自顶向下的方式达到既定的精确度目标。使用构成场景的墙的精确地图获得20厘米的精确度。先前建立的杆数据库在[192]中充当参考地图。精度取决于杆的频率,但平均在1米左右。在[193]中,作者利用了整个杆状基础设施。首先使用立体视觉和高精度DGPS/INS组合建立地图。在定位阶段,与地图的立体匹配以及里程计和GPS都集成在粒子滤波器内部。精确度不是直接测量的,但作者声称横向精度为20厘米左右。
Leveraging current map resources:
An impressive amount of data has already been gathered around the world (topological maps, panoramas, etc.) and can be used to create specific maps that do not require a prior passage of a specific vehicle. However, leveraging the available resources at hand to produce high-quality maps is not easy.
世界各地已经收集了大量的数据(拓扑图,全景图等),并且可以用来创建不需要先验通道特定车辆的特定地图。 但是,利用现有的资源制作高质量的地图并不容易。
A few approaches are starting to use Google Street View images (see Figure 5) or equivalent. In [194], an aerial vehicle takes advantage of Street Views to localize itself. Artificial views are created to overcome the difference in viewpoints and are then compared using the Approximate Nearest Neighbors on extracted features. The objective is to resolve the place recognition problem in urban environment with an aerial vehicle. In [195], SIFT descriptors are extracted from Street View images and indexed in tree structures which are then browsed with a nearest-neighbor algorithm from the descriptors of the current image. A vote across the candidates is then accomplished to choose the closest image. This work has been extended in [196] with a Bayesian tracking filter in order to ensure the continuity of the localization. Even though the filter allows for a smoother trajectory, sudden jumps around the position still occur. This city-scale localization has an accuracy that ranges from more than 1 meter to 12 meters. In [197], visual bag-of-words methods are employed to build two dictionaries from Street View images using SIFT and MSER (Maximally Stable Extremal Regions) detectors so as to have both local and regional feature descriptors. Based on these, the closest Street View from a real image can be recovered. The relationships between physically close panoramas serve to speed up the matching process. For all these approaches, the difficult task is to find sufficiently discriminating features to increase the matching ratio as explained in [198]. As such, these methods share strong relationships with the new descriptors that have emerged for the urban context [199][200][201]. Another way to improve the results is to process sequences that can then be matched against the topology of the environment as proposed in [202]. A better performance is obtained by considering visual words from a query image across multiple images than just one. The authors formulate the problem as a regression on an image graph. The approach is only evaluated by matching Street Views together and according to the capacity of the algorithm to retrieve the right image.
有几种方法开始使用Google街景视图图像(请参见图5)或同等功能。在[194]中,一架飞行器利用街道视图进行定位。创建人工视图来克服视点间的差异,然后使用近似最近邻方法对提取的特征进行比较。目标是用飞行器解决城市环境中的地点识别问题。在[195]中,从街景视图图像中提取SIFT描述符并在树结构中索引,然后用当前图像描述符中的最近邻算法浏览。然后对候选进行投票,以选择最接近的图像。为了确保定位的连续性,这项工作已经在[196]中用贝叶斯跟踪滤波器进行了扩展。即使滤波器允许更平滑的轨迹,仍然会发生位置周围的突然跳跃。这种城市规模定位的精度范围从1米到12米不等。在[197]中,使用视觉BOW法来从街景视图图像中使用SIFT和MSER(最大稳定极值区域)检测器构建两个字典,以具有局部和区域特征描述符。基于这些,可以恢复最接近真实图像的街景。物理上紧密的全景图之间的关系有助于加速匹配过程。对于所有这些方法,困难的任务是找到充分区分的特征来增加匹配率,如[198]中所述。因此,这些方法与城市背景下出现的新描述符有着密切的关系[199] [200] [201]。改进结果的另一种方法是处理顺序,然后可以按照[202]中提出的与环境拓扑相匹配。通过考虑跨多个图像的查询图像中的视觉单词而不仅仅是一个而获得更好的单词。作者将该问题表述为图像图的回归。该方法仅通过将街景视图匹配在一起并根据算法检索正确图像的能力进行评估。
Fig. 5: Available information from Google: panorama, depth map and road topology (each circle indicates a anorama/depth map). 图6 来自Google的可用信息:全景、深度图、道路拓扑(每个圆圈表示全景/深度图)
The mentioned approaches solve the place recognition problem inside Street Views. However, they do not manage to achieve a sufficient accuracy for autonomous driving. This issue is more difficult due to the sparse nature of the Street View panoramas. In [203], the authors propose to create synthetic views to improve the continuity of the matching and thus of the localization. The depth associated to Street Views is used in a 3D-to-2D matching that is injected inside an optimization framework to minimize the reprojection error. They are able to obtain an accuracy of approximately 60 cm at best. In [204], the pose estimation is performed by comparing features from the current image with the best Street Views (highest number of matching with the query image). Proper correspondences are identified by evaluating the residual of the corresponding transformation over all the matches. The pose is then estimated between the best two reference views. The accuracy of the system oscillates between less than 2 meters to 16 meters. A method with two separated modules is described in [205]. In the first one, a visual odometry algorithm is applied from feature points tracked for short periods of time. In the second one, the transformation linking these points to Street View panoramas is computed so as to obtain a global localization with an accuracy below 1.5 m. In an original approach, the authors of [206] propose a text-based geo-localization method that extracts textual contents with a camera from available shop signs. It is then compared with an annotated map built from Street View panoramas in order to estimate the pose of the camera for an average error of around 10 meters.
上述方法解决了街景视图中的地点识别问题。但是,他们无法为自动驾驶实现足够的准确性。由于街景全景图的稀疏性,此问题更加困难。在[203]中,作者提出创建合成视图以改善匹配的连续性,从而提高定位的连续性。与街景相关的深度用于3D-2D匹配,该匹配被注入到优化框架中以最小化重投影误差。它们最多可以获得约60厘米的精度。在[204]中,通过将当前图像的特征与最佳街景(与查询图像匹配的最高数量)进行比较来执行位姿估计。通过评估所有匹配的相应变换的残差来确定适当的对应关系。然后在最佳的两个参考视图之间估计该位姿。系统的精度在2米到16米之间摆动。 [205]中描述了一个有两个分离模块的方法。在第一个中,对短时间跟踪的特征点应用视觉里程计算法。在第二个中,计算将这些点链接到街景全景图的转换算,以便获得精度低于1.5米的全局定位。在原创的方法中,[206]的作者提出了一种基于文本的地理定位方法,该方法使用可用商店标志中的相机提取文本内容。然后将它与从街景全景图构建的带注释的地图进行比较,以便估计相机的位姿,平均误差约为10米。
Other than Street View imagery, in [207], the authors show that it is possible to consider traffic signs as geo-referenced landmarks coming from existing maps. 3D models of these traffic signs are matched in images and the position of the vehicle is optimized inside a Bundle Adjustment approach. Results are below 0.5 m most of the time but depend on the number of traffic signs. However, the database that is used need to be quite precise and is built by a mobile mapping system. With a more traditional traffic sign database, the authors of [208] are able to reach a 5-meter accuracy by combining vision, GPS and an INS inside a Bayes filter. The structure of the road graph can also serve to roughly localize a vehicle as proposed in [209]. In this paper, a particle filter, only fed by odometric information, progressively converges toward the true location of the vehicle by matching the trajectory followed against the map. The result is an average error of 5.8 m in a large network. The authors of [210] combine a visual odometry system along with features (lane markings) extracted from satellite imagery. The aerial images are pre-processed offline using a SVM and a clustering algorithm to identify road markings. The latter are then used online to constrain the localization. The accuracy of the system is not measured but the authors show that they are able to align satellite image features with the current image.
除[207]中的街景视图图像外,作者表明可以将交通标志视为来自现有地图的地理参考地标。这些交通标志的三维模型在图像中相匹配,并且车辆的位置在BA方法内进行了优化。大部分时间结果低于0.5米,但取决于交通标志的数量。但是,使用的数据库需要非常精确,并且构建移动地图系统。有了更传统的交通标志数据库,[208]的作者能够通过在贝叶斯滤波器内结合视觉,GPS和INS来达到5米的精度。如[209]中所提出的,路线图的结构也可以用于粗略地定位车辆。在本文中,只有通过里程信息馈送的粒子滤波器才能通过匹配地图上的轨迹而逐渐收敛到车辆的真实位置。结果是大型网络的平均误差为5.8米。 [210]的作者将视觉里程计系统与从卫星图像中提取的特征(车道标记)结合起来。航拍图像使用SVM和聚类算法进行离线预处理,以识别道路标记。后者则用于在线约束定位。系统的准确性没有被测量,但是作者表明他们能够将卫星图像特征与当前图像对齐。
Discussion: We provide a synthesis of the main approaches covered here in Table IV with relation to the established criteria.
The current state of localization methods based on existing maps show that they are not yet viable for autonomous driving. The accuracy is not sufficient or requires more precise maps that the ones currently available. An interesting observation is that even the algorithms that utilize more precise lane maps (built manually) do not reach the critical accuracy (around 20 cm [109]) for automated driving. Of course, things could evolve in the near future depending on the information available in the upcoming high-precision maps. Nevertheless, leveraging current databases is an evolving trend that can be an interesting prospect. However, even if scalability can theoretically be achieved with these methods, it will require dedicated pre-processing algorithms to transform data in the right format (for instance, creating bag of words according to zones). All of the methods depicted in Table IV are able to recover their pose inside the map, most the time using a standard GNSS which might prove to be more difficult in dense urban environments.
讨论:我们提供表IV中涉及已确定标准的主要方法的综合。
基于现有地图的当前定位方法状态表明,它们还不适合自动驾驶,精确度不够,或者需要目前可用的更精确的地图。一个有趣的观察是,即使是利用更精确的车道地图(手动构建)的算法也没有达到用于自动驾驶的临界精度(大约20cm [109])。当然,事情可能会在不久的将来发展,这取决于即将推出的高精度地图中可用的信息。尽管如此,利用当前的数据库是一个不断发展的趋势,可能是一个有趣的前景。但是,即使这些方法在理论上可以实现可伸缩性,也需要专门的预处理算法来以正确的格式转换数据(例如,根据区域创建一大堆单词)。表IV中描述的所有方法都能够在地图内恢复它们的位姿,大部分时间使用标准GNSS,在密集的城市环境中可能证明这种方法更加困难。
TABLE IV: Localization in existing maps (or potential future maps) methods regarding autonomous driving criteria. Tick: criterion satisfied. Cross: criterion not satisfied.
表IV:关于自动驾驶标准的现有地图(或潜在未来地图)方法的本地化。对勾:标准满意。 ×:标准不满意。
The main challenge for the localization methods presented in this part is how to improve the accuracy by building more relevant, accurate maps. One solution might come naturally with new world-wide sources (new maps, etc.). A key aspect remains the information density of the sources. Street Views, even though interesting, are, for now, too physically spaced to be viable alone. This is why generating synthetic data might be a possible answer. Authors tend to focus on a specific kind of data and finding a framework and an estimation method that take advantage of all the available resources at once might be an interesting prospect. Knowing in advance what will be the map elements that must be sought should also be considered in order for a vehicle to evaluate beforehand, without having directly experienced it, if, with regard to its capacity, a sufficient accuracy can be reached for autonomous driving. The approaches mentioned in this Section do not consider the update of these shared resources. While it is not a trivial task to update inaccurate maps with an approximate vehicle position, it would clearly contribute to progressively improve these maps until they are viable for autonomous driving.
本部分介绍的定位方法面临的主要挑战是如何通过构建更相关、准确的地图来提高准确性。一个解决方案可能会自然而然地带来新的全球资源(新地图等)。一个关键方面仍然是信息来源的信息密度。即使有意思,街景也是如此,现在,它们在物理空间上太过独立。这就是为什么生成综合数据可能是一个可能的答案。作者倾向于专注于特定类型的数据,并找到一个立即利用所有可用资源的框架和估算方法,这可能是一个有趣的前景。事先知道必须寻找的地图要素也应该考虑在内,以便车辆事先进行评估,而不必直接体验到它,如果就其容量而言,可以达到足够的自动驾驶准确性。本节中提到的方法不考虑更新这些共享资源。虽然用一个近似的车辆位置更新不准确的地图并不是一件容易的事,但它显然有助于逐步改进这些地图,直到它们可以用于自动驾驶。
multi-vehicle SLAM
The complexity of creating or exploiting maps for localization purposes at a world scale indicates that cooperation between vehicles might be needed in order to improve existing maps or ease the large-scale collection of data for appropriate map construction. However, doing so is not trivial as the collaboration of several vehicles has an impact on how to design an effective approach. One important initial distinction concerns how the cooperation is handled. Two systems exist: the centralized approaches and the decentralized ones. In the first case, communications are all directed towards one entity that agglomerates and fuses all data before sending the result back to the vehicles. Decentralized systems assume that each vehicle is capable of building its own decentralized map while communicating with the other vehicles of the fleet. It means that information flows must be controlled to avoid bandwidth explosions and estimation problems which typically happen in such cases (double counting a measurement, for instance). It is worth noting that this field is still fairly recent and concerns mostly mobile robotics for now. We will discuss both systems (centralized and decentralized) and how they can be applied to autonomous driving.
在世界范围内创建或利用地图用于定位目的的复杂性表明,可能需要车辆之间的合作,以便改善现有地图或者为了适当的地图构建而简化大规模数据收集。然而,这样做并不是微不足道的,因为几种车辆的合作对如何设计有效的方法会产生影响。一个重要的初始区别涉及如何处理合作。存在两种系统:集中式方法和分散式系统。在第一种情况下,通信都是针对一个实体,在将结果发送回车辆之前将所有数据聚合并融合。分散式系统假定每辆车都能够在与车队其他车辆进行通信的同时构建自己的分散式地图。这意味着必须控制信息流以避免在这种情况下通常发生的带宽爆炸和估计问题(例如重复测量)。值得注意的是,这个领域还是相当新的,目前主要涉及移动机器人技术。我们将讨论两种系统(集中式和分散式)以及它们如何应用于自动驾驶。
Centralized SLAM
Centralized SLAM may seem like a natural extension of single-vehicle SLAM algorithms where parts of the computation are offloaded to a distant server. However, there are many ways to share the SLAM task in a centralized approach depending on the objective. The main distinction that can be made is whether or not the centralized part should be running in real-time or in an offline manner.
1) Online centralized SLAM: This organization makes the extension of SLAM algorithms quite immediate. One of the first works on the subject was presented by Fenwick et al. in [211]. An Extended Kalman filter is utilized to integrate the state vectors (pose and landmarks) of all the vehicles. The paper covers more the theoretical aspects and only shows simulation results. In [212], submaps are built individually by robots using cylindrical object features detected by a laser scanner. All these submaps are fused in a centralized fashion from a nearby server. The relative locations of the robots must be known. The approach is demonstrated with a very simple experiment with two robots. The work presented in [213] was the first, to our knowledge, to propose a multi-robot visual SLAM. Observations and visual descriptors are sent to a central agent which builds a map shared among all the robots. The initial positions of the vehicles must be approximately known so as to localize everyone in this common map. The estimation of the different trajectories within this unique map is performed by a Rao-Blackwellized particle filter. The approach has only been tested in simulation and the results show that the processing power needed to compute a proper map and the localization of the vehicles is not enough to meet real-time constraints.
集中式SLAM看起来像是单车SLAM算法的自然延伸,其中部分计算被分流到远程服务器。但是根据目标有许多方法可以集中方式共享SLAM任务,可以做出的主要区别是集中部分是否应该实时运行或以离线方式运行。
1)在线集中式SLAM:该组织使得SLAM算法的扩展非常直接。关于这个问题的第一批作品之一是Fenwick等人提出的。在[211]中。利用扩展卡尔曼滤波器来整合所有车辆的状态向量(位姿和地标)。本文涵盖了更多的理论方面,只显示了仿真结果。在[212]中,子图由机器人单独构建,使用激光扫描仪检测到的圆柱形物体特征。所有这些子图都以附近服务器的集中方式进行融合。必须知道机器人的相对位置。这个方法用两个机器人进行了非常简单的实验。就我们所知,[213]中提出的工作是第一个提出多机器人视觉SLAM的工作。观察和视觉描述符被发送给中央代理,该中央代理建立所有机器人之间共享的地图。车辆的初始位置必须是近似已知的,以便将所有人都放置在该通用地图中。在这个独特的地图内对不同轨迹的估计是由Rao-Blackwellized粒子滤波器执行的。该方法仅在仿真中进行了测试,结果表明,计算适当的地图和车辆定位所需的处理能力不足以满足实时约束。
A recent trend has moved this processing from one vehicle or close entity to the cloud in order to take advantage of the available processing power. One well-known example can be found in [214]. In this article, the authors expose a cloud framework, C 2 TAM, which shares the workload between a cloud server and the robot. All the demanding tasks are moved to the cloud and only the part where a high frequency is required is executed on the robot. The proposed application performs all the mapping aspects of a RGB-D SLAM on a distant computer and only the tracking of the pose is done locally. Even if the approach is proposed as a cloud framework, the experiments use a desktop computer with a wireless con- nection. They demonstrate the ability of the system to perform cooperative update on the map as well as online map merging. In [215], the DAvinCi architecture is presented. Its objective is to offload all the computations on a cloud system. The software architecture enables heterogeneous agents to share and upload common data on the cloud. A grid-based FastSLAM has been adapted to fit the needs of this cloud approach. Each particle responsible for the pose estimation can be run on separated nodes. The experiments demonstrate a faster execution time for a single-vehicle FastSLAM but the authors do not consider the delays or latencies induced by these outsourced compu- tations on a real-time approach. Rapyuta [216] is another framework designed for cloud robotics. Its use is demonstrated with a RGB-D odometry. Different settings are evaluated: a complete offloading of the computation to the cloud, a combination where only the mapping process is offloaded and a collaborative mapping by two robots. Keyframes, sent to the cloud, are compressed in order to reduce the needed bandwidth. In their experiments, the Amazon cloud service was used. The complete offloading of information requires too much bandwidth despite the compression. However, hybrid approaches, were only a part of the computations is done on the cloud, are viable. The authors do not, however, discuss the impact of the delay on the map building in these experiments. Cloud-based robotics applications are fairly new and have a great potential to, at least, reduce the computational require- ments inside autonomous vehicles in the future. Interested readers can refer to [217] for a short survey of this recent practice and to [218] for a more general review of cloud approaches in robotics.
最近的一种趋势是将这种处理从一个车辆或紧密的实体转移到云端,以利用可用的处理能力。一个众所周知的例子可以在[214]中找到。在这篇文章中,作者公开了一个云框架C
2
2
TAM,它共享云服务器和机器人之间的工作负载。所有要求苛刻的任务都移到云端,只有需要高频率的部分才会在机器人上执行。所提出的应用程序在远程的计算机上执行RGB-D SLAM的所有建图方面,并且仅在本地完成位姿的跟踪。即使将这种方法作为云框架提出,实验也会使用带有无线连接的台式计算机。他们展示了系统在地图上执行协作更新以及在线地图合并的能力。在[215]中介绍了DAvinCi架构,其目标是分流云系统上的所有计算。该软件体系结构支持异构代理在云上共享和上传常用数据。基于网格的FastSLAM已经适应了这种云方法的需求。负责位姿估计的每个粒子可以在分离的节点上运行。实验表明单车FastSLAM的执行时间更快,但作者并没有考虑这些外包计算在实时方法中产生的延迟或等待时间。 Rapyuta [216]是另一个为云机器人设计的框架。其使用以RGB-D里程计进行演示。对不同的设置进行评估:将计算完全分流到云中,只有建图过程分流的组合和两个机器人的协作建图。发送到云的关键帧被压缩以减少所需的带宽。在他们的实验中,使用了亚马逊云服务。完全分流信息需要尽可能多的带宽压缩。然而,混合方法只是云计算中的一部分,是可行的。然而,作者并没有在这些实验中讨论延迟对地图构建的影响。基于云的机器人应用是相当新的,并且很有可能至少减少未来自动驾驶车辆的计算要求。有兴趣的读者可以参考[217]对这种最近实践进行简短的调查,并参考[218]对机器人技术中的云方法进行更全面的回顾。
2) Offline centralized SLAM: The delays and latencies involved by cloud computations can prevent the use of the previously cited methods in autonomous cars. However, with the aim of easing the creation of world-wide maps, offline computations of data gathered during the day by fleet of vehicles can be a way to build consistent large-scale maps. Offline computation of SLAM data can be seen as a natural extension of multisession SLAM algorithms. Multisession SLAM is the possibility for a SLAM algorithm to take into account several passages in overlapping areas and so to extend the map initially built. The identification of common grounds between two maps usually involves place recognition algorithms that loop closing constraints to compute more coherent maps. Approaches proposing multisession mapping are based on a graph representation which has the advantage to be flexible when it comes to adding new constraints and nodes to optimize. We refer the readers to the description of the following works in Section IV which could be utilized by several vehicles in a cloud computation: [148] [150] [154] [161] [163] [166] [167] [168]. While these approaches are only applied to a single-vehicle, the extension to a fleet is straightforward. In [166], the authors mention that the objective in the long-term is the deployment of such a system in probe cars that would gather data and fuse them in the cloud to then provide maps as a service. The main difficulties are the necessity to divide maps by sectors in order to be still optimizable in a reasonable time and to limit how maps can grow to avoid having intractable environment representation. The first problem has already been addressed in single-vehicle SLAM with submapping techniques and the second is still an open challenge. Finally, a practical problem is to have common representations within a set of probe cars to be able to build and exploit the same map.
The resources involved to roll out such an application make it more an industrial challenge than a research problem. This is the objective of the map creator HERE which intends to build HAD (Highly Automated Driving) maps by collecting data from probe cars. The data serve to feed different maps, which are computed offline in the cloud, for specific services. A noticeable research effort can be found in [219] where a cloud service is proposed in which data can be collected, stored and shared. Most of the previously cited cloud-based approaches are connected to this service and it could be a way to experiment large-scale multi-vehicle offline map building in the future.
2)离线集中式SLAM:云计算涉及的延迟和延迟可以阻止在自动汽车中使用上文引用的方法。然而,为了便于创建世界范围的地图,车队在当天收集的数据的离线计算可以用来构建一致的大规模地图。离线计算SLAM数据可以看作是多区段SLAM算法的自然延伸。多区段SLAM可以使SLAM算法考虑重叠区域中的若干段落,从而扩展最初构建的地图。两地图之间共同基础的确定通常涉及地点识别算法,这些算法可以获得闭环约束条件来计算更多的相干地图。该文章中提出基于图表示多区段建图方法,当添加新的约束和节点进行优化时,该表示法具有灵活性的优点。我们建议读者阅读对第四部分中的以下工作的描述,可以在云计算中使用若干车辆:[148] [150] [154] [161] [168] 。虽然这些方法仅适用于单一车辆,但对车队的延伸很简单。在[166]中,作者提到,长期目标是在探测汽车中部署这种系统,以收集数据并将其融合在云中,然后提供地图作为服务。主要困难在于必须按照部门划分地图,以便在合理的时间内仍然可以优化,并限制地图的增长方式以避免形成难以处理的环境。第一个问题已经在使用子建图技术的单车SLAM中得到解决,第二个问题仍然是一个公开挑战。最后,一个实际的问题是在一组探测车内有共同的表示,以便能够构建和利用相同的地图。
所涉及的资源推出这样的应用程序使其成为一个工业挑战而不是研究问题。这是地图创建者HERE的目标,它打算通过从探测车收集数据来构建HAD(高度自动驾驶)地图。这些数据用于为特定服务提供不同的地图,这些地图是在云中离线计算的。在[219]中可以找到一个值得注意的研究工作,其中提出了一个可以收集、存储和共享数据的云服务。以前引用的大多数基于云的方法都与该服务相关,并且可能是未来实现大规模多车辆离线地图构建的一种方式。
Decentralized SLAM
The increasing connectivity capacities of the autonomous vehicle with the infrastructure (Vehicle To Infrastructure, V2I) and other cars (Vehicle To Vehicle, V2V) bring the question of how localization methods can take advantage of them. One major aspect, as with other multi-vehicle approaches,
is the possibility to map new areas in a quicker way. The main difference with centralized methods is that data can directly be exchanged between vehicles without the need of a dedicated infrastructure. Vehicles could benefit from a direct communication to quickly update maps in case of sudden changes or to anticipate dynamic conditions (pre-load a map adapted to rain conditions, for instance).
However, fully decentralized SLAM is difficult to achieve as there are many new constraints that must be taken into account. As such, the closest current applications concern mobile robotics and not directly autonomous vehicles. We will thus focus on them in this section before discussing how they could be integrated inside autonomous driving applications. Interest readers can refer to [94] for a complete overview of this field.
The first works on that matter come from the data fusion community where a task was split between several CPUs to speed up the computations. For instance, the multi-vehicle localization algorithm proposed by Nettleton et al. in [220] is derived from the advances in decentralized data fusion presented in [221] and [222]. We will focus here on the main design difficulties of such methods: data incest, data association and communication issues before discussing the main experimental results.
随着基础设施(车辆到基础设施,V2I)和其他车辆(车辆到车辆,V2V)和自动车辆的连接能力的增加带来了如何利用它们进行定位的问题。与其他多车辆方法一样,一个主要方面是,
是以更快的方式在新区域建图的可能性。与集中式方法的主要区别在于数据可以直接在车辆之间进行交换,而无需专用的基础设施。车辆可以从直接通信中受益,以便在突然变化的情况下快速更新地图或预测动态条件(例如预加载适用于下雨条件的地图)。
然而,完全分散的SLAM很难实现,因为必须考虑许多新的约束条件。因此,目前最接近的应用涉及移动机器人而不是直接自动驾驶的车辆。在讨论如何将它们集成到自动驾驶应用程序中之前,我们将在本节中重点介绍它们。感兴趣的读者可以参考[94]了解该领域的完整概述。关于这个问题的第一个作品来自数据融合社区,在这个社区中,几个CPU之间的任务被分开以加速计算。例如,Nettleton等人提出的多车辆定位算法在[220]中是从[221]和[222]中提出的分散式数据融合的进展中推导出来的。在讨论主要实验结果之前,我们将重点讨论这种方法的主要设计难点:数据乱伦,数据关联和沟通问题。
1) Data incest: One common difficulty between the applications sharing information is data incest which is responsible for inconsistency by double-counting data in the estimation process. Decentralized SLAM (or decentralized localization as a whole) is particularly prone to this phenomenon as observations and maps are shared, relayed and fused by vehicles [223][224]. The data incest phenomenon in a SLAM context is illustrated in Figure 6. In this example, the red vehicle receives from the blue one a fused state already integrating its own map. As it does not know it, it will fuse this landmark within its map and thus double-count its own landmark and become overconfident (inconsistent). Keeping a track of what is exchanged is a possibility with a few vehicles but does not scale well with a large fleet.
To solve this problem in a SLAM context, Nettleton et al. in [220] proposed a dedicated network architecture to avoid double-counting. Only point-to-point communications are allowed thus forcing information to go through several nodes before reaching its target. In [225], an approach where submaps are only exchanged when closed is exposed. A topological global map puts all the submaps in a common frame. The main constraint comes from the fact that submaps cannot be updated and are only sent once on the network. The work of Vidal-Calleja et al. in [226] also proposes an extension of submap SLAM, and more specifically of the hierarchical approach introduced in [31]. Only the topological map is exchanged inside the fleet and a global metric map cannot be recovered on the fly. In [227], a multi-robot graph-based SLAM is introduced. The map of each vehicle is compressed and sent to the neighbors. A data cache filters out already received information. Similarly, in [228] and [229], a consensus is sought between neighbors so as to find the best map possible that avoids double-counting information. In [113], each vehicle state (pose and map) is stored separately from the others. Each vehicle can obtain a global map by fusing all the received states with its own. In [230], the Split Covariance Intersection Filter (SCIF), initiated by Julier et al. in [231], is applied to a multi-vehicle context. This filter is based on Covariance Intersection so as to produce consistent estimates even with unknown correlations between data.
1)数据乱伦:共享信息的应用程序之间的一个共同困难是数据乱伦,它通过对估计过程中的数据进行重复计算而导致不一致。分散式SLAM(或整体分散式定位)特别容易出现这种现象,因为观测和地图由车辆共享、转播和融合[223] [224]。 SLAM环境中的数据乱伦现象如图6所示。在这个例子中,红色车辆从蓝色车辆接收到一个已经融合了自己的地图的融合状态。因为它不知道它,它会将地标融入其地图中,从而重复计算自己的地标并变得过于自信(不一致)。保持跟踪什么进行了交换是对于少数车辆是可能的,但与大型车队不能很好地协调。
图6a The red landmark is mapped by the red vehicle and is then sent to the blue one.
红色路标通过红色的车辆进行建图,然后发送至蓝色车辆。
(b) The blue landmark is mapped by the blue vehicle. The red landmark is the one previously received.
蓝色的地标同蓝色车辆建图,红色地标被接收。
(c)The green landmark is the fusion between the blue and red ones by the blue vehicle and sent to the red one.
在蓝色车辆中蓝色和红色地标融合得到绿色地标,然后发送至红色车辆。
(d) The grey triangle is the true pose of the red vehicle (and similarly for the landmark). Dotted ellipses are the true uncertainties. Plain ones are those estimated after the fusion of the green landmark with the red one (data incest).
灰色三角为红色车辆的真实位姿(对于路标表示类似), 点椭圆为真实的误差。 扁平的椭圆通过融合绿色和红色地标后的估计(数据乱伦)。
为了在SLAM环境中解决这个问题,Nettleton等在[220]中提出了专门的网络架构以避免重复计算。只允许点对点通信,从而迫使信息在到达目标之前通过多个节点。在[225]中,给出了一种只在关闭时交换子图的方法。拓扑全局地图将所有子地图放在一个通用框架中。主要约束来自于子图不能更新并且只在网络上发送一次的事实。 Vidal-Calleja等人的工作。在[226]中也提出了子图SLAM的扩展,更具体的是[31]中引入的分层方法。只有拓扑地图在车队内部进行交换,并且全局度量地图无法联机恢复。在[227]中,介绍了一个基于多机器人图模型SLAM。每辆车的地图都被压缩并发送给邻居。数据缓存过滤掉已收到的信息。同样,在[228]和[229]中,邻居之间寻求共识以找到可能避免重复计算信息的最佳地图。在[113]中,每个车辆状态(位姿和地图)与其他车辆分开存储。每个车辆可以通过融合所有接收到的状态来获得全局地图。在[230]中,由Julier等人发起的分裂协方差相交滤波器(SCIF)在[231]中,应用于多车辆情境。该滤波器基于协方差相交,即使数据之间存在未知的相关性,也可以产生一致的估计值。
2) Data association: Data association in a multi-vehicle context can be extremely complicated as the relative positions between the vehicles are not necessarily known and the information that can be communicated is limited by the bandwidth. Some approaches have chosen to consider that the initial distance between the vehicles is known beforehand [232][220]. Another possibility, introduced in [233], is to define a rendezvous towards which vehicles will converge to observe themselves directly.
Among the data association algorithms specifically designed for a multi-vehicle application, we can cite [234] in which each landmark is characterized by measures (distance and angle) to other close landmarks. Similar configurations of landmarks are then sought in maps from different vehi cles. In [235], landmarks are grouped by 3 and a Delaunay triangulation is applied to obtain a unique map. The perime ters and areas of these triangles are given to a RANSAC algorithm in order to find the best correspondence between two map pieces coming from two different vehicles. In [236], the authors analyze the distribution of potential multi-vehicle correspondences. Inliers are assumed to produce the same transformation between two vehicles conversely to outliers. A clustering algorithm is used to find initial candidates that can then be further analyzed with more classical methods. The works of Li et al. [144][145] solve the initial alignment of the maps from different vehicles using a low-cost GPS and a genetic algorithm in the restrained search space.
2)数据关联:多车辆情况下的数据关联可能非常复杂,因为车辆之间的相对位置不一定是已知的,并且可以传送的信息受到带宽的限制。有些方法选择考虑车辆之间的初始距离是事先已知的[232] [220]。在[233]中介绍的另一种可能性是定义一个会聚车辆直接观察自己的会合点。
在专门为多车应用设计的数据关联算法中,我们可以引用[234],其中每个地标都通过度量(距离和角度)来表征其他近地标。然后在来自不同车辆的地图中寻找类似的地标配置。在[235]中,地标按3分组,并应用Delaunay三角剖分以获得独特的地图。这些三角形的周边和区域被赋予RANSAC算法,以便找到来自两个不同车辆的两个地图块之间的最佳对应关系。在[236]中,作者分析了潜在的多车辆对应的分布。假设内点在两个车辆之间产生相同的变换,反过来是异常值。聚类算法被用来找到初始候选,然后可以用更经典的方法进一步分析。李等人的作品[144] [145]在约束搜索空间中使用低成本GPS和遗传算法解决来自不同车辆的地图的初始对准。
3) Communication issues: The bandwidth needs are often neglected in decentralized SLAM. For instance, in [225], submaps are directly sent. In [237], graphs (nodes comprise raw data) are exchanged, which is only suitable for small vehicle fleets. In [238], the authors propose to exchange maps only when vehicles can detect each other, which means that big amount of data will be sent punctually. Methods communicating only topological maps, like in [226] and [227], avoid this problem by providing very light maps. The exchange strategy should also be able to cope with potential losses and delays. When not considered, a temporary saturation or failure can lead to information definitively lost [237][225]. The algorithms built around interactions with neighbors are usually capable of identifying and asking for missing data [227]. Another possibility, described in [239], is the Lazy Belief Propagation which integrates observations in a Particle Filter that does not depend on the temporal order in which data are added to the filter. Thus, missing or late information can still be integrated later without a problem. However, it is necessary to be able to identify missing values. More generally, all the approaches based on the inverse form of the Kalman Filter, namely the Information Filter, handle delay natively. Indeed, the update becomes additive and does not depend on any temporal order. It has become a common choice in the multi-vehicle community [234][240][241].
4) Experimental results: The experiments carried out in the previously cited articles give a clear visibility on the potential deployment of such methods for autonomous driving. Indeed, most methods are only demonstrated in simulated ex periments [220][225][227][235][239][240]. The main reason is that addressing all the issues mentioned in this section in order to conduct real data experiments can be difficult as they are not all directly connected to the estimation problem. Some approaches have been demonstrated indoor in real time [242][233]. In [226], the authors demonstrate an air/ground cooperation in outdoor environment. The map is built in a cooperative fashion with vision landmarks. The accuracy has not been measured. In [113], outdoor experiments with two or three vehicles are performed. The method relies on the exchange of visual landmarks. The accuracy oscillates between 50 cm and a few meters depending on the capacity of the vehicle to identify common landmarks. Finally, in [145], 2 vehicles equipped with laser merge their maps together in an outdoor experiment. The method is able to reach a 3-meter error on average.
This short survey of recent results shows that we are still far from generalizing the use of decentralized SLAM for autonomous driving. However, the communication means themselves continue to improve and bandwidth limitations might be less of a problem in the coming years. Still, large-scale demonstrations of the capabilities of decentralized SLAM are yet to be proven and applying the previously described approaches to autonomous driving is one of the upcoming challenges of SLAM.
3)通信问题:在分散式SLAM中,带宽需求往往被忽略。例如,在[225]中,直接发送子地图。在[237]中,图(节点包含原始数据)被交换,这只适用于小型车队。在[238]中,作者建议仅在车辆可以互相检测时交换地图,这意味着大量的数据将被准时发送。仅通信拓扑图的方法(如[226]和[227])通过提供非常轻的地图来避免此问题。交换策略也应该能够应对潜在的损失和延误。如果不考虑,暂时的饱和或失败会导致信息明确丢失[237] [225]。围绕与邻居相互作用构建的算法通常能够识别和询问丢失的数据[227]。 [239]中描述的另一种可能性是Lazy Belief Propagation,它集成了粒子滤波器中的观测值,并不依赖于数据添加到滤波器的时间顺序。因此,缺失或迟到的信息仍然可以在稍后被整合而没有问题。但是,有必要能够识别缺失值。更一般地说,基于卡尔曼滤波器的逆形式(即信息滤波器)的所有方法能够自然而然地处理延迟。事实上,更新变得可加,并且不依赖任何时间顺序。它已成为多车辆社区的常见选择[234] [240] [241]。
4)实验结果:在先前引用的文章中进行的实验清楚地看到这种自动驾驶方法的潜在部署。事实上,大多数方法仅在仿真实验中被证明[220,225,247,235,239,240]。主要原因在于,解决本节提到的所有问题以进行实际的数据实验可能很困难,因为它们并非都与估计问题直接相关。有些方法已经在室内实时演示[242] [233]。在[226]中,作者展示了室外环境中的空中/地面合作。该地图是与视觉地标合作建立的,精度尚未测量。在[113]中,进行了两辆或三辆车的室外实验。该方法依赖于视觉地标的交换。精确度在50厘米和几米之间摆动,具体取决于车辆识别常见地标的能力。最后,在[145]中,配备有激光的2辆车在户外实验中将他们的地图合并在一起。该方法平均可以达到3米的误差。
这个对最近结果的简短调查表明,我们还没有能力使用分散式SLAM于自动驾驶。然而,通信手段本身不断改进,带宽限制在未来几年可能不会成为问题。尽管如此,大规模分散式SLAM的能力的例子尚未得到证实,并将先前描述的方法应用于自动驾驶是SLAM即将面临的挑战之一。
C. Discussion
This overview of multi-robot SLAM approaches has showed that the maturity of such methods for autonomous driving differs vastly depending on the way data are handled (in a centralized or decentralized fashion). Online central ized SLAM systems have been demonstrated on small-scale experiments with either a classic computer that serves as a hub to collect observations or with cloud processing. The extension from mobile robots to driverless cars raises the question of the bandwidth capacity that will be available in tomorrow’s cars but most importantly of the availability of an Internet connection in the vehicle. Completely offloading critical processing onto the cloud will probably not be safe enough. However, delegating the task of updating maps gath ered from several vehicles in a reasonable amount of time is an interesting prospect. Real-time updates for map portions that might have changed could help driverless cars to adapt in a quicker way. Moreover, environmental conditions that have a low dynamicity could be alerted beforehand in order for the vehicle to anticipate by choosing more adapted features, etc. To attain this goal, large-scale experiments, involving a realistic number of vehicles, are necessary. It also means that a major prospect remains the design of software architectures able to cope with data flows and to properly segment updates. One key element to make online cloud-based methods viable is the choice of a map representation able to integrate all incoming information.
这种多机器人SLAM方法的概述表明,这种自主驾驶方法的成熟度取决于数据处理方式(以集中或分散的方式)。在线集中式SLAM系统已经在小规模实验中进行了演示,可以使用经典计算机作为收集观测数据的中心或云处理。从移动机器人到无人驾驶汽车的扩展引发了未来汽车可用的带宽容量问题,但最重要的是车载互联网连接的可用性。将关键处理完全分流到云上可能不够安全。然而,在合理的时间内委托更新几部车辆地图的任务是一个有趣的前景。实时更新可能已经改变的地图部分可以帮助无人驾驶汽车更快地适应。此外,可以预先警告具有低动态性的环境条件,以便通过选择更适应的特征等来预测车辆。为了实现该目标,需要涉及实际数量的车辆的大规模实验。这也意味着一个主要的前景依然是能够处理数据流和正确分段更新的软件体系结构的设计。使基于云的在线方法可行的一个关键要素是选择能够整合所有传入信息的地图表示。
In that sense, fully decentralized methods share similar goals. The main difference comes from the fact that, by receiving the information from nearby cars, the ego-vehicle can select which information to integrate in its map. An interesting challenge thus becomes to establish criteria in order to evaluate the information gain that can bring maps coming from different vehicles with relation to the goal of the ego- vehicle (destination, energetic constraints, etc.). However, all the practical implications of decentralized methods are the first challenge to tackle. Contrary to cloud approaches, where processing power is less of a limit, the tractability of the final solution will also be an important factor to consider. Large-scale experiments are also an important milestone to reach in order to clearly exhibit the benefits of decentralized SLAM for autonomous vehicles.
Offline centralized methods might be the most mature for autonomous driving mainly because this challenge is very close to multisession SLAM. While future HAD maps may not provide all the needed information to reliably localize a vehicle in various conditions, probe cars might be a way to quickly gather enough data to build such maps. It is also an interesting perspective regarding the capacity of vehicles to detect long-term changes. Indeed, the agglomeration of differ ent viewpoints on a situation should ease this process. Being able to limit the growth of these maps remains a blocking point. Finally, a considerable challenge is to build sufficiently robust software architectures to integrate and process all these data. The cost of such an infrastructure directs large-scale experiments with such methods towards industrial companies rather than small research teams.
从这个意义上讲,完全分散的方法有着类似的目标。主要区别在于,通过接收来自附近车辆的信息,自我车辆可以选择将哪些信息集成到其地图中。因此,一个有趣的挑战就是建立标准,以便评估信息增益,从而可以将来自不同车辆的地图与自我车辆的目标(目的地,能量限制等)相关联。然而,要解决的第一个挑战是分散方法的全部实际影响。与云计算方法相反,处理能力不受限制,最终解决方案的易处理性也是一个需要考虑的重要因素。大规模实验也是一个重要的里程碑,以便清晰地展示分散SLAM在自动驾驶汽车中的优势。
离线集中式方法可能是自动驾驶最成熟的方法,主要是因为这种挑战非常接近多区段SLAM。尽管未来的HAD地图可能不能提供所有必要的信息以在各种条件下可靠地定位车辆,但探测车可能是快速收集足够数据来构建此类地图的一种方式。关于车辆检测长期变化的能力也是一个有趣的观点。事实上,不同观点对某种情况的集聚应该可以缓解这一过程。能够限制这些地图的增长仍然是一个阻碍点。最后,一个相当大的挑战是构建足够强大的软件体系结构来整合和处理所有这些数据。这样的基础设施的成本指导着这种方法的大规模实验,而不是小型研究团队。
Large-scale Experiment For Autonomous Vehicles
During the recent years, the scope of the experiments used to validate localization algorithms has broadened with the availability of large-scale data sets. Moreover, many research teams now have a dedicated platform to test, during many kilometers, their algorithms. In this section, we give an overview of the large-scale experiments that have involved fully autonomous vehicles. We will, of course, focus on the localization algorithms and the context in which they have been used.
The first large-scale experiments took place at the beginning of the 90’s and were mostly on highways [243][244]. In both approaches, localization was performed using lane detection with vision sensors to laterally position the vehicle on the road. Lane detection algorithms have been the default localization system in autonomous driving demonstration for a long time as it is totally appropriate in a highway context. The DARPA Grand Challenge [245] with its 244-km race across the desert changed things with a new setting. In [246], a GPS is coupled with elevation maps built using laser scanners and a road-finding algorithm in order to avoid cliffs, rocks and the like. A similar approach is chosen by the winner of the challenge [247]. The GPS is coupled with an IMU and an algorithm to detect the road. In [248], a terrain reconstruction with lasers was also used. In mostly empty territories, the use of a GPS alone was sufficient for navigation. Obstacles and the difficult terrain added the necessity to build maps that can provide a safe corridor in which vehicles could evolve.
近年来,用于验证定位算法的实验范围随着大规模数据集的可用性而扩大。此外,许多研究团队现在都有专门的平台,可以在数公里内测试他们的算法。在本节中,我们概述了涉及完全自动驾驶汽车的大型实验。当然,我们将重点关注定位算法以及它们的使用环境。第一次大规模的实验发生在90年代初,主要集中在高速公路[243] [244]。在两种方法中,使用视觉传感器进行车道检测以将车辆侧向定位在道路上进行定位。车道检测算法长期以来一直是自动驾驶演示中的默认定位系统,因为它在高速公路环境中完全合适。DARPA大挑战[245]在沙漠中穿越244公里的比赛使其变成一个新的环境。在[246]中,GPS与使用激光扫描仪和寻路算法建立的海拔地图耦合,以避免悬崖、岩石等。挑战获胜者选择类似的方法[247]。 GPS与IMU和一种算法相结合来检测道路。在[248]中,还使用了激光地形重建。在大部分空地上,仅使用GPS就足以导航。障碍和困难的地形增加了建立地图的必要性,这些地图可以提供车辆可推进的安全走廊。
The DARPA Urban Challenge that followed focused on urban environments with a 97-km autonomy test in a city-like traffic [249]. It has to be noted that the environment was still sufficiently open for a Differential GPS to operate properly. The reliance on maps was more important this time. The Route Network Definition File (RNDF) format was specially designed for this challenge in order to furnish the topology and geometry of the road. Junior, the vehicle proposed by Stanford [250], used a GPS and the RNDF map to position the vehicle. The accuracy was improved with a laser system performing lane and curb detection. In [251], a visual lane detection algorithm combined with the RNDF was responsible for the vehicle localization. The authors pointed out the difficulty to have a reliable localization with a vision-only system because of environmental conditions like shadows, etc. The winner of this challenge [252] combined a GPS along with an IMU and the RNDF map. Localization was improved with a laser-based lane detection. The authors mention the necessity of prior road models for an efficient localization. Semantic information is also referred to as an important point in order to disambiguate situations.
随后的DARPA城市挑战集中在城市环境中,在城市交通中进行了97公里的自主测试[249]。必须指出的是,环境仍然充分开放,差分GPS能够正常工作。这次对地图的依赖更重要。路线网络定义文件(RNDF)格式是专门为此挑战而设计的,目的是提供道路的拓扑和几何形状。斯坦福[250]提出的小型车-使用GPS和RNDF地图来定位车辆。激光系统执行车道和路边检测可以提高精度。在[251]中,结合RNDF的视觉车道检测算法负责车辆定位。作者指出,由于环境条件如阴影等原因,仅使用只有视觉的系统进行可靠定位存在困难。这一挑战的胜利者[252]将GPS与IMU以及RNDF地图结合在一起。基于激光的车道检测改进了定位。作者提到先行道路模型对于高效定位的必要性。语义信息也被称为重要的一点,以消除歧义。
Another interesting challenge conducted in 2011 is the Grand Cooperative Driving Challenge (GCDC) [253]. The idea was to evaluate how collaboration in platooning can help to reduce congestion. The winner of the GCDC challenge in 2011 [254] used a previously built map in which received positions of the other fleet members were matched and coupled with on-board radar detection. This showed that sharing simple localization information can still benefit autonomous driving by agreeing on speed or anticipating maneuvers.
A very large-scale experiment has been conducted by the VisLab team [255] with a 13,000-km trip from Italy to China. Technical details about the embedded technologies can be found in [256]. The localization was performed using a lane keeping algorithm (stereo and monocular) and a leader following system. A laser terrain mapping algorithm was also implemented for off-road driving but not used in this experiment. The leader following is an interesting possibility for autonomous driving if vehicles agree on their destination (or at least, a part of the trip). It still implies that one vehicle should be able to have a full localization system. For instance, cars with less expensive sensors could use platooning-like formations.
Automatic guidance of a vehicle was demonstrated in [257] for the Stadpilot project. This work can be seen as an update of the results of the Urban Challenge [249]. During a 15- km experiment on open roads, a combination of DGPS, IMU and lane keeping algorithm was used. Prior digital maps were also employed. Based on their experience in DARPA, the authors clearly indicate that GPS and IMU are not sufficient for autonomous driving in urban environments. An evolution of [255] was presented in [258]. In a 13-km experiment in different environments and real traffic, VisLab demonstrated the possibility of using lane or leader following based on vision, IMU and DGPS. An interesting aspect is that a map of the environment was used to trigger the appropriate perception module.
在2011年进行的另一个有趣的挑战是大合作驾驶挑战赛(GCDC)[253]。这个想法是评估如何在排队合作可以帮助减少拥堵。 2011年GCDC挑战赛获胜者[254]使用先前建造的地图,其中接收到的其他车队成员的位置与车载雷达检测相匹配。这表明,分享简单的定位信息仍然有利于自动驾驶,方法是达成速度或预测机动性。
VisLab团队进行了一次非常大规模的实验[255],从意大利到中国旅行了13,000公里。关于嵌入式技术的技术细节可以在[256]中找到。定位使用车道保持算法(立体和单目)和领导者跟随系统进行。激光地形地图算法也用于越野驾驶,但在本实验中未使用。如果车辆同意他们的目的地(或至少是旅程的一部分),领导者追随是自动驾驶的一个有趣的可能性。它仍然意味着一辆车应该能够拥有完整的定位系统。例如,传感器价格较低的汽车可以使用类似排队的编队。
Stadpilot项目在[257]中演示了车辆的自动导航。这项工作可以看作是对城市挑战结果的更新[249]。在开放道路上进行15公里的实验期间,使用了DGPS、IMU和车道保持算法的组合。之前的数字地图也被采用。根据他们在DARPA的经验,作者明确指出GPS和IMU不足以在城市环境中进行自动驾驶。 [258]中提出了[255]的演变。在不同环境和实际交通情况下的13公里实验中,VisLab展示了基于视觉,IMU和DGPS使用车道或领航者的可能性。一个有趣的方面是使用环境地图来触发适当的感知模块。
The V-Charge project aims at providing automated valet parking with close-to-market sensors. In [259], the authors evaluate their localization approach in a real-world scenario. First, the map is built using SURF keypoints extraction from fisheye cameras. Loop closures are defined manually and the map is optimized using global Bundle Adjustment. Localization is then performed against the known map still using SURF features. The algorithm is evaluated 0, 1 and 2 months after the map creation with a sufficient accuracy. However, the authors state that map update is needed over long periods of time to reflect changes. Another open question raised here but not addressed is the map portability between different vehicles. Still in the V-Charge project, [260] completed this system with height maps built from stereovision. A semantic map and a road graph were also needed to identify parking places and how to navigate between them.
Large-scale outdoor experiments with mobile robots have also been carried out in real-world environments. In [261], a robot traveled 20 km in crowded streets. The map was built beforehand with a graph-based laser SLAM. The on-the-fly localization was based on a particle filter and a GPS for initialization. The system was able to perform reinitialization when lost. The authors indicate that mobile obstacles masking the map were a source of localization losses. In [262], a robot traveled several times at different periods of the year a 1-km road. The map was built beforehand as well and the localization was based on road boundaries detected by laser and integrated inside an EKF. One of the problems cited by the authors is the recovery when the robot is lost and the GPS is not available. Also appearance changes, like leaves on the ground, proved to be difficult to handle. In [263], a 6-km experiment in pedestrian walkways is discussed. A 3D laser map of edges was built beforehand with 2D lasers. The localization system compared the map with the current observation to weight the particles of a particle filter. An important difficulty faced by the authors is the presence of glass windows that disturbed the localization system based on laser. They conclude by saying that 2D lasers are not sufficient for this kind of experiments.
Shuttle demonstrations in cities are also becoming more frequent as the mapping is confined to a dedicated area. In [264], the authors discuss the lessons learned after 1,500 km of a vision-guided shuttle in a private site. The experiments took place over 3 months and the localization was performed using a hierarchical Bundle Adjustment method over a previously built map. This approach could be used with either a front or a back camera. The main difficulties were the lightning conditions despite the use of front and back cameras. The dynamic range of vision sensors was not sufficient to handle day-time changes that were qualified by the authors as having a greater impact on the localization than the 3-month gap. The CityMobil2 project has cumulated 26,000 km in various cities for several months [265]. The localization differed depending on the experiments but was based on pre-built maps with laser or vision and GPS. The weather and the general reliability of the software had a significant impact on the results.
Already discussed before, Levinson et al. showed an update of the Stanford’s vehicle from the Urban Challenge in [266]. Ground maps were generated using precise GPS, IMU and a 64-layer 3D laser from multiple passages in an offline manner. The localization inside this map was then performed in real-time with a 2D histogram filter for a 10-cm accuracy on many kilometers. The approach of Ulm University is exposed in [267]. A map, composed of MSER features coming from vision and of a laser grid, is built beforehand and geo-referenced using a RTK-GPS. This map representation is light and allowed for a 10-cm accuracy on average during the 5-km test. Using both sensors for localization improved the results but the need of a highly-precise built-in-advance map is seen as a problem by the authors.
V-Charge项目旨在通过贴近市场的传感器提供自动代客泊车服务。在[259]中,作者在现实世界中评估他们的定位方法。首先,使用从鱼眼相机提取的SURF关键点来构建地图。环路闭合是手动定义的,并且地图使用全局BA进行了优化。然后使用SURF功能对已知地图执行定位。该算法在地图创建后的0,1和2个月以足够的准确性进行评估。然而,作者指出,需要长时间更新地图以反映变化。这里提出的另一个未解决的问题是不同车辆之间的地图可移植性。仍然在V-Charge项目中,[260]用立体视觉建立了高度图,完成了该系统。还需要语义地图和路线图来识别停车位以及如何在它们之间导航。
移动机器人的大规模室外实验也在现实世界中进行。在[261]中,一个机器人在拥挤的街道上行驶20公里。该地图是事先用基于图模型的激光SLAM构建的。实时定位基于粒子滤波器和GPS进行初始化。系统在丢失时能够执行重新初始化。作者指出,掩盖地图的移动障碍是定位损失的来源。在[262]中,一名机器人在一年的不同时段行驶数次,在1公里的道路上行驶。该地图也是事先建好的,定位基于激光检测到的道路边界,并集成在EKF内部。作者引用的一个问题是当机器人丢失并且GPS不可用时的恢复。事实证明,外观变化,如地上的叶子,很难处理。在[263]中,我们讨论了一个6公里的人行道实验。2D激光器预先建立了边缘的3D激光图。定位系统将地图与当前的观察结果进行比较以对颗粒过滤器的颗粒进行加权。作者面临的一个重要困难是玻璃窗的存在干扰了基于激光的定位系统。他们总结说2D激光器不足以进行这种实验。
由于地图仅限于专用区域,因此在城市进行的穿梭示范也变得更加频繁。在[264]中,作者讨论了在私人场地1,500公里的视觉引导穿梭之后所学到的经验教训。实验进行了3个多月,定位是在先前建成的地图上使用分层Bundle Adjustment方法进行的。这种方法可以用于前置或后置摄像头。尽管使用了前置和后置摄像头,但主要困难还是光照状况。视觉传感器的动态范围不足以处理作者认为对定位影响比3个月差距更大的日间变化。 CityMobil2项目已经在几个月的时间里在各个城市累计26,000公里[265]。根据实验不同,定位有所不同,但是基于预先构建的激光或视觉和GPS地图。天气和软件的总体可靠性对结果有重大影响。
之前已经讨论过,Levinson等人在[266]中展示了来自城市挑战赛的斯坦福车辆的更新。使用精确的GPS、IMU和来自多个段落的64层3D激光以离线方式生成地面地图。然后在这张地图内的定位通过一个2D直方图滤波器实现,实现了数公里内10厘米的精度。乌尔姆大学的方法于[267]给出。由视觉和激光网格构成的MSER特征组成的地图预先构建,并使用RTK-GPS进行地理参考。该地图表示较轻,并且在5公里测试期间平均允许10厘米的精度。使用这两种传感器进行定位改善了结果,但对作者而言,需要高精度的内置高级地图是一个问题。
BMW, in [268], gives an interesting overview of their experience with autonomous driving over thousands of kilometers on public roads (highways). Their approach relies on lane marking using vision and laser, as well as odometry and a DGPS. Road boundaries were also detected using laser and radar. A high-precision map is needed for a proper localization. Their map integrates semantic and geometric information (lane models, connectivity, etc.) as well as localization data (lane marking and road boundary positions) with different layers. The difficulties presented by the authors were the necessity to remap from time to time the environment for a proper online exploitation. This process should be automated according to the authors. Large-scale maps are also a concern and they should be broken down in submaps. The standardization of these maps is also an important aspect that needs to be addressed to be able to use probe cars and crowd-source the required information for digital maps.
Daimler experimented with their autonomous vehicle over 103 km in mixed environments (urban, highways, etc.) with close-to-market sensors [109]. Their system is a combination of vision, radar and accurate digital maps. The localization fuses lane detection with feature-based localization for a 20-cm accuracy on average. All the lanes and features were acquired during a first passage and fused in a UKF. Among the topics of importance cited by the authors, the scalability of the maps is crucial. Two other major aspects that need to be addressed for a commercial use are the reliance on an up-to-date digital map as well as its accuracy. For the authors, the sensor setup should be improved in order to be less dependent of the map.
Among the industrial works in autonomous driving, Waymo (formerly Google Car project) was a pioneer. The project started following the DARPA Urban Challenge on the foundation of the winner [252]. While the algorithms are not specifically known, the localization relies on dedicated 3D maps built beforehand and corrected by hand. The capacity of Google makes that solution viable and more than 40,000 km per week are said to be driven by Waymo. Even with these impressive results, the disengagements reports published each year [269] [270] indicate that drivers must occasionally take back the wheel due to software failures and in a more frequent manner in urban environments. Conversely to Google, Tesla’s strategy is to collect data from their own vehicles in order to improve the capabilities of their vehicles [271]. This method offers, by definition, less control on the quality of the produced outputs. Anyway the role of this philosophy on the localization is not known.
宝马在[268]中对自己在公路(公路)上行驶数千公里的自主驾驶经验进行了有趣的介绍。他们的方法依赖于使用视觉和激光的车道标记,以及测距和DGPS。使用激光和雷达也可以检测到道路边界。进行适当的定位时需要高精度地图。他们的地图将不同层次的语义和几何信息(车道模型,连通性等)以及定位数据(车道标记和道路边界位置)集成在一起。作者提出的困难是必须不时重新地图以适当地在线利用环境。根据作者的说法,这个过程应该是自动的。大规模的地图也是一个问题,它们应该在子图中分解。这些地图的标准化也是需要解决的一个重要方面,以便能够使用探测车并将所需的信息用于数字地图。
戴姆勒在拥有近距离市场传感器的混合环境(城市,高速公路等)中试验了超过103公里的自动驾驶汽车[109]。他们的系统是视觉、雷达和精确数字地图的组合。定位融合了车道检测和基于特征的定位,平均精度为20厘米。所有车道和特征都是在第一段时间获得的,并融合在UKF中。在作者引用的重要话题中,地图的可伸缩性至关重要。商业用途需要解决的另外两个主要方面是依赖于最新的数字地图以及其准确性。对于作者而言,应该改进传感器设置以减少对地图的依赖。
在自动驾驶的工业作品中,Waymo(前身为Google Car项目)是先驱者。该项目是在获胜者基础上开始遵循DARPA城市挑战赛的[252]。虽然这些算法并未明确知道,但定位依赖于事先构建的专用3D地图并通过手工修正。据说Waymo推动Google实现该解决方案的能力,并且每周超过40,000公里。即使有了这些令人印象深刻的结果,每年发布的脱离报告[269] [270]也表明,司机必须偶尔在软件故障后以更频繁的方式在城市环境中收回车轮。与谷歌相反,特斯拉的战略是从他们自己的车辆收集数据,以提高他们的车辆的能力[271]。根据定义,这种方法可以减少对生产产品质量的控制。无论如何,这种哲学在定位中的作用还不得而知。
Discussion and Conclusion
In this survey, we have focused on the individual challenges that should be considered depending on the approach chosen. To conclude this paper, we will now discuss how some more general aspects could have an impact on the localization of autonomous vehicles.
During the 23rd ITS World Congress, a localization competition was proposed. Based on low-resolution voxel maps of above-the-ground objects and lanes furnished by HERE, participants were asked to propose an accurate localization algorithm. The competition ended with no winner as the targeted accuracy was not reached. It is not yet clear if HAD maps, that are going to be available in the coming years, will provide a strong enough prior knowledge for localization algorithms. Depending on the outcome, the community will surely adapt what kind of prior knowledge is used in SLAM. These maps are also intended to be used as part of the Local Dynamic Map (LDM) that gathers static and dynamic information to which SLAM could contribute. As such, standardization is needed [272]. An aspect that is not often discussed is sensor placement.
It can have a tremendous impact on the performance of a localization system. For instance, a laser on the car roof is going to have a clear view on the infrastructure and avoid most of the mobile obstacles while one in the undercarriage might be affected by masked information. Similarly, the minimal set of sensors necessary for localization is not clearly defined yet (even if lasers and cameras seem to be favored). Such a definition should be made conjointly with the map represen- tation that is going to serve as prior knowledge. Of course, this aspect is also tightly linked to the cost of those sensors. An interesting perspective in that sense is multi-modality where a map could be built by an expensive 3D laser but then exploited by a simple camera as proposed in [273]. Another interesting challenge is to build flexible architectures in which a decision system can choose what are the sensors, detectors or maps to favor depending on the context.
在这次调查中,我们侧重于根据所选方法考虑的各自的挑战。总结本文,我们现在将讨论一些更一般的方面会如何影响自主车辆的定位。
在第二十三届ITS世界大会期间,提出了一个定位比赛。根据HERE提供的地上物体和车道的低分辨率体素图,参与者被要求提出一个精确的定位算法。由于没有达到目标精确度,比赛结束时没有获胜者。目前还不清楚HAD地图是否将在未来几年提供,将为定位算法提供足够强大的先验知识。取决于结果,社区肯定会调整SLAM中使用的先前知识。这些地图还旨在用作本地动态地图(LDM)的一部分,以收集SLAM可能贡献的静态和动态信息。因此,需要标准化[272]。
传感器放置不常被讨论的一个方面。它可以对定位系统的性能产生巨大影响。例如,汽车车顶上的激光将会清楚地看到基础设施,避免大部分移动障碍物,而底盘中的一个可能会受到蒙面信息的影响。同样地,定位所需的最小传感器集合尚未明确定义(即使激光器和相机似乎受到青睐)。这样一个定义应该与作为先验知识的地图代表相结合。当然,这方面也与这些传感器的成本紧密相关。从这个意义上说,一个有趣的视角是多模态,其中地图可以由昂贵的3D激光器构建,但之后由[273]中提出的简单相机利用。另一个有趣的挑战是建立灵活的架构,其中决策系统可以根据上下文选择传感器、探测器或地图。
Context by itself is an important part of autonomous driving and its understanding would help place recognition algorithms or could even be directly integrated to make SLAM more robust. For now, most of the works carried out have concerned indoor localization. They have been initiated by Chatila et al. in [274] where semantic (object) maps are used for high-level decision-making. SLAM methods have since then been frequently considered for semantic map building. In [275] and [276], the authors take advantage of a 3D laser to identify the ceiling, the floor and objects. In [277], a conceptual space representation is proposed based on three different maps: metric (from a SLAM system), navigation (free space) and topological (connections between door-separated areas). The conceptual map allows the robot to have a semantic representation for rooms and objects that helps the interactions with users. In [278], the authors integrate well known objects inside a monocular SLAM. A similar approach is followed in SLAM++ [279] where objects are detected and optimized in a common localization framework. In these last two works, context and relations between objects are not directly used to reason upon. In that sense, semantic SLAM in outdoor environments is still a challenge that needs to be tackled. However, semantic mapping from images alone is a field that has grown lately [280][281]. Deep Convolutional Neural Networks have largely contributed to this trend and methods like SegNet [281] show impressive results. Their application in localization methods remains an open challenge but a direct use would be to remove moving, or temporally static, obstacles that can disturb the proper behavior of a SLAM algorithm.
More generally, CNNs could change how place recognition is performed by using feature maps coming from these networks as in [142]. The impact and the use of CNNs for localization in autonomous driving will surely evolve in the coming years. A recent example like [282] shows that localization might not even be needed. In this paper, Nvidia demonstrates the possibility to directly learn the steering angle that should be applied from camera clues. Nevertheless, the scalability of such methods has yet to be demonstrated.
上下文本身是自动驾驶的重要组成部分,其理解有助于识别算法的位置,甚至可以直接进行整合以使SLAM更加健壮。目前,大部分作品都涉及室内定位。它们由Chatila等人发起。在[274]中,语义(对象)图用于高层决策。从那以后,SLAM方法经常被考虑用于语义地图构建。在[275]和[276]中,作者利用3D激光识别天花板、地板和物体。在[277]中,基于三种不同的地图提出概念性空间表示:度量(来自SLAM系统),导航(自由空间)和拓扑(门与门之间的连接区域)。概念图允许机器人具有帮助与用户交互的房间和对象的语义表示。在[278]中,作者将众所周知的物体集成到单目SLAM中。在SLAM++ [279]中采用了类似的方法,其中在通用的定位框架中检测和优化对象。在最后两篇作品中,对象之间的上下文和关系并不直接用于推理。从这个意义上说,户外环境中的语义SLAM仍然是一个需要解决的挑战。然而,单从图像语义地图是最近增长的领域[280] [281]。深度卷积神经网络在很大程度上促成了这一趋势,像SegNet[281]这样的方法显示出令人印象深刻的结果。他们在定位方法中的应用仍然是一个公开的挑战,但直接用途是去除可能干扰SLAM算法正确行为的移动或时间静态障碍。
更一般地说,CNN可以通过使用来自这些网络的特征地图来改变如何进行地点识别[142]。 CNNs在自动驾驶定位方面的影响和使用必将在未来几年得到发展。像[282]这样的最近的例子表明,甚至可能不需要定位。在本文中,Nvidia展示了直接学习从相机线索应用的转向角度的可能性。尽管如此,这种方法的可扩展性尚未得到证明。
In the recent years, experiments have broadened their scope and autonomous driving for several kilometers and over long periods of time is more common. Shuttle experiments are also starting in various cities. Even though, in that case, SLAM approaches can use a previously built map without scalability issues, it is a good way to test them over long periods of time and confront them to the environment variability. In most roadmaps [283], the automation of transit systems is actually foreseen at a shorter term than other vehicles as it has the advantage to restrain the covered area.
Finally, the safety of localization algorithms is an important issue to consider. Multiple sources should be envisaged and strategies to safely switch among them must be designed. Taking into account the failures and their impacts on the localization system could help in creating degraded localization modes that ensure that a vehicle can reach a stopping spot safely.
近年来,实验范围扩大、自动驾驶行驶数公里、长时间更为常见。 穿梭实验也开始于各个城市。 即使在这种情况下,SLAM方法可以使用先前构建的地图而没有尺度问题,但它是长时间测试它们并将其与环境变化对抗的好方法。 在大多数路线图[283]中,交通系统的自动化实际上预见的时间短于其他车辆,因为它具有抑制覆盖区域的优势。
最后,定位算法的安全性是一个需要考虑的重要问题。 应该设想多种来源,并且必须设计安全地在它们之间切换的策略。考虑到故障及其对定位系统的影响可能有助于创建退化的定位模式,以确保车辆能够安全地到达停车点。
更多推荐
Simultaneous Localization And Mapping: A Survey of Current Trends in Autonomous
发布评论