<h1 align = "center">Differential Evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces</h1>

by Rainer Storn and Kenneth Price TR-95-012

March 1995

Users generally demand that a practical optimization technique should fulfill three requirements. First, the method should find the true global minimum, regardless of the initial system parameter values. Second, convergence should be fast. Third, the program should have a minimum of control parameters so that it will be easy to use. In our search for a fast and easy to use "sure fire" technique, we developed a method which is not only astonishingly simple, but also performs extremely well on a wide variety of test problems. It is inherently parallel and hence lends itself to computation via a network of computers or processors. The basic strategy employs the difference of two randomly selected parameter vectors as the source of random variations for a third parameter vector. In the following, we present a more rigorous description of the new optimization method which we call Differential Evolution.

优化算法使用者通常需要实际优化算法满足三个要求。第一个，算法应找到实际的全局最小，无论系统的初始参数是什么值。第二，收敛要迅速。第三，程序应该有最小控制参数数目以易于使用。在我们的研究中为了使得使用确定正确的方法更快捷简便，我们推出了一种算法不仅非常简单，而且对于各种测试问题表现很好。它内在可以并行因此适用于使用电脑和处理器网络进行计算。最初的策略使用两个随机挑选的参数向量之间的差异作为第三个参数向量。在之后的文章中，我们将对于新的优化方法--差分进化进行更严格的地描述。

Problem Formulation

Consider a system with the real valued properties
g_m; m = 0, 1, 2, ... , P-1 (1)
which constitute the objectives of the system to be optimized. Additionally, there may be real valued constraints
g_m; m = P, P+1, ... , P+C-1 (2)
which describe properties of the system that need not be optimized but neither shall be degraded. For example, one may wish to design a mobile phone with the dual objectives of maximizing the transmission power g₁ and minimizing the noise g₂ of the audio amplifier while simultaneously keeping the battery life g₃ above a certain threshold. The properties g₁ and g₂ represent objectives to be optimized whereas g₃ is a constraint. Let all properties of the system be dependent on the real valued parameters
x_j; j = 0, 1, 2, ... , D-1. (3)

想象一个系统有P个实值特征，用g₀到g_P-1表示，他们是系统被优化的目标。另外，也有C个实值约束，从g_P到g_P+C-1，不用被优化，但是也不能不满足。例如，有个人想要设计一个手机，有两个目标，一个是最大化传输功率g₁，最小化扬声器噪音g₂同时保持电池寿命g₃超过特定阈值。特征g₁和g₂代表被优化的目标函数然而g₃是一个约束。使所有特征依赖D个实值参数x₀到x_D-1。

In the case of the mobile phone the parameters could be resistor and capacitor values. For most technical systems realizability requires
x_j ∈ [x_jl, x_jh] . (4)
Usually, restrictions on the xj will be incorporated into the collection g_m, m>=P, of constraints.
Optimization of the system means to vary the D-dimensional parameter vector
x = (x₀, x₁, ... , x_D-1)^T (5)
until the properties g_m are optimized and the constraints g_m, m>=P, are met. An optimization task can always be reformulated as the minimization problem
min f_m(x) (6)

在手机的例子中参数可以是电阻和电容值，对于大部分技术体系的实现需要一个x_j 在[x_jl, x_jh] 区间里的约束。通常，x_j会参与下标大于等于P的约束。系统的优化问题意味着改变D维参数向量x = (x₀, x₁, ... , x_D-1)^T 直到特征被优化约束被满足。一个优化问题总能重新建模为最小化问题min f_m(x) 。

用f_m(x)表示特征 g_m的计算函数，并且它的优化或者限制维持可以表达为最小化f_m(x)。所有的函数f_m(x)可以结合为一个单目标目标函数z(x)，它的形成或者通过计算加权平均或者最大加权f_m(x)值（权重为正）。权重向量w_m用于定义不同目标和限制的重要程度也可以用于标准化不同实际单位。现在最优化问题可以被表述成minz(x)。

The min-max formulation (8) and (10) guarantees that all local minima, the Pareto critical points, including the possibly multiple global minima, the Pareto points, can at least theoretically be found [2], [12]. For the objective function (7) and (10) this is true only if the region of realizability of x is convex [1], [2], which in general does not apply in most technical problems.

(8)和(10)的最小化最大化公式保证了所有的局部最小值（Pareto临界点），包括可能的多个全局最小值(Pareto点)，在理论上可以被找到。对于目标函数(7)和(10)，只有当函数是凸函数才能确保找到，通常在多数实际问题达不到这样的条件。

The Method of Differential Evolution

Differential Evolution (DE) is a novel parallel direct search method which utilizes NP parameter vectors x_i,G, i = 0, 1, 2, ... , NP-1. (11)
as a population for each generation G. NP doesn't change during the minimization process. The initial population is chosen randomly if nothing is known about the system. As a rule, we will assume a uniform probability distribution for all random decisions unless otherwise stated. In case a preliminary solution is available, the initial population is often generated by adding normally distributed random deviations to the nominal solution x_nom,0. The crucial idea behind DE is a new scheme for generating trial parameter vectors. DE generates new parameter vectors by adding the weighted difference vector between two population members to a third member.

差分进化是一种创新的并行直接搜索方法，利用NP个参数向量（并行NP次），x_i,G表示第G代的分布。最小化过程中NP不变化。在对系统不了解时，初始分布随机选择。作为一个原则，我们通常假定所有随机选择都采用均匀分布除非特别说明。一旦赋了初值，初始分布通常通过把正态分布偏移量加到初值x_nom,0上。DE背后的核心思想是一个新的产生试探参数向量的方法。DE通过将两个点的差加权加到第三个点上产生新的参数向量。

If the resulting vector yields a lower objective function value than a predetermined population member, the newly generated vector replaces the vector with which it was compared. The comparison vector can, but need not be part of the generation process mentioned above. In addition the best parameter vector x_best,G is evaluated for every generation G in order to keep track of the progress that is made during the minimization process.
Extracting distance and direction information from the population to generate random deviations results in an adaptive scheme with excellent convergence properties. We tried several variants of DE, the two most promising of which we subsequently present in greater detail.

如果结果向量比之前的目标函数值低,新产生的向量代替和它比较的点。比较的向量可以是三个产生结果向量的向量之中的，也可以不是。另外，最佳参数向量x_best,G在每一代计算，为了在最小化过程中保留运算轨迹。从分布中提取距离和方向信息以产生随机偏移产生了有很好收敛性质的适应性机制。我们试了DE的很多个变体，两个最好的我们随后详细说明。

Scheme DE1

Our first variant of DE works as follows: for each vector x_i,G , i = 0,1,2,...,NP-1, a trial vector v is generated according to
v = x_r1,G + F *(x_{r2 ,G} x_r3,G ), (12)
with r1,r2,r3∈[0,NP-1],integer and mutually different,andF>0. (13)
The integers r1, r2 and r3 are chosen randomly from the interval [0, NP-1] and are different from the running index i. F is a real and constant factor which controls the amplification of the differential variation (x_{r2 ,G} x_r3,G ).Fig. 1 shows a two dimensional example that illustrates the different vectors which play a part in DE1.

我们第一个DE变种工作流程如下：对于每个向量x_i,G，一个试探向量v根据v = x_r1,G + F *(x_{r2 ,G} x_r3,G )产生，其中r1,r2,r3在[0,NP-1]区间内并且是不相同的整数，F>0。r1,r2,r3从区间[0,NP-1]中随机选择并且和当前序号i不同。F是一个实值并且常数因子，控制偏差的扩大缩小量。图1演示了二维参与DE1的不同向量的例子。

I.e. a certain seuence of the vector elements of u are identical to the elements of v, the other elements of u acquire the original values of x_i,G . Choosing a subgroup of parameters for mutation is similiar to a process known as crossover in evolution theory. This idea is illustrated in Fig. 2 for D=7, n=2 and L=3. The starting index n in (15) is a randomly chosen integer from the interval [0, D-1]. The integer L is drawn from the interval [0, D-1] with the probability Pr(L=v) = (CR)^v . CR∈[0,1] is the crossover probability and constitutes a control variable for the DE1-scheme. The random decisions for both n and L are made anew for each trial vector v.

为了增加参数向量的多样性，u向量产生自(15)函数。换句话说，一个特定序号的向量元素u和v相同，其他u需要x_i,G的原始的值。选择几个参数进行变异和进化理论的交叉变异过程类似，这个思想在图2 中演示，其中D=7, n=2 and L=3。(15)中的开始序号n是从[0, D-1]区间随机选择的，整数L是从区间[0, D-1]中通过概率 Pr(L=v) = (CR)^v得出的？？？。CR是交叉概率并且是DE1算法中的控制变量。n和L的随机选择是为了再次更新每个试探向量v。

In order to decide whether the new vector u shall become a population member of generation G+1, it will be compared to x_i,G . If vector u yields a smaller objective function value than x_i,G, x_i,G+1 is set to u, otherwise the old value x_i,G is retained.

为了判断新向量u能否成为G+1代的成员，它应当和x_i,G比较。如果u产生更小的目标函数值，x_i,G+1被赋值为u，否则，旧值x_i,G保留。

Scheme DE2

Basically, scheme DE2 works the same way as DE1 but generates the vector v according to
v = x_i,G + λ*(x_best,G - x_i,G) + F (x_r2,G - x_r3,G), (16)
introducing an additional control variable λ. The idea behind λ is to provide a means to enhance the greediness of the scheme by incorporating the current best vector x_{best ,G} . This feature can be useful for non-critical objective functions. Fig. 3 illustrates the vector-generation process defined by (16). The construction of u from v and x_i,G as well as the decision process are identical to DE1.

根本上来说，DE2方案和DE1方案运作方法类似但是产生v变量根据v = x_i,G + λ*(x_best,G - x_i,G) + F (x_r2,G - x_r3,G)，引入额外控制变量λ。λ背后的思想是通过纳入当前最优向量x_{best ,G}提供一种提高方案贪心性的方法。这种方法对于非临界目标函数？有用。图3展示了一个(16)定义的向量产生过程。u从v和x_i,G中的构造以及决策过程和DE1相同。

Competing minimization methods

In order to compare the DE method with other global minimizing strategies, we looked for approaches where the source code is readily available, which are known to be powerful and which are capable of coping with nonlinear and non differentiable functions. Two methods in particular piqued our interest. The first was the annealed version of the Nelder&Mead strategy (ANM) [10] which is appealing because of its adaptive scheme for generating random parameter deviations. When the annealing part is switched off, a fast converging direct search method remains which is especially useful for non-critical objective functions. The basic control variables in ANM are T, the starting temperature, TF, the temperature reduction factor and NV, the number of random variations at a given temperature level.

为了比较DE方法和其他全局最小策略，我们寻找源代码可得的方法，并且这种方法要有影响力的并且可以处理非线性和不可导函数。两种方法激起了我们的兴趣。第一个是Nelder&Mead方法的退火版本(ANM)，这个策略因为它产生随机参数偏移的适应性方案而吸引人。当退火部分关闭，一个快速收敛的直接搜索方法保留，这个方法对非临界目标函数特别有效。ANM的基本控制变量有初始温度T，温度递减因子TF，给定温度水平的随机变量数目。

The second method of interest was Adaptive Simulated Annealing (ASA) [8] which claims to converge very quickly and to outperform genetic algorithms on the De Jong test suite [9]. Although ASA provides more than a dozen control variables, it turned out that just two of them, TEMPERATURE_RATIO_SCALE (TRS) and TEMPERATURE_ANNEAL_SCALE (TAS), had significant impact on the minimization process. We will compare both ANM and ASA to DE1 and DE2. During our research we also wrote an annealed version of the Hooke&Jeeves method [5] and tested two Monte Carlo methods [3] one of which used NP parallel vectors and the differential mutation scheme of DE. Although these approaches all worked, they quickly turned out not to be competitive.

第二个有趣的方法是自适应模拟退火(ASA)，这个方法收敛迅速并且在De Jong测试中做的比遗传算法好。即使ASA需要超过一沓变量，结果是只有两个变量TRS和TAS对最小化过程有显著影响。我们会比较ANM、ASA和DE1、DE2。在我们的研究中，我们也写了Hooke&Jeeves的退火版本并且试了两个蒙特卡洛方法，一个用了NP并行向量和DE的分变异方案。即使这些方案全部有效，他们都没有竞争力。

20170721_DE