In this chapter, we'll take a closer look at the mechanics of concurrency. In particular, we'll point out some of the problems associated with sharing variables among multiple goroutines, the analytical techniques for recognizing those problems, and the patterns for solving them. Finally, we'll explain some of the technical differences between goroutines and operating system threads.
- 本章中,我们要看看并发的机制。
- 特别的,我们要指出在使用多个 goroutine 时,共享变量的问题,并发问题的分析手段,以及解决他们的方法。
- 最终我们要解释一些 goroutines 和 操作系统线程的区别。
9.1 竞争条件
- x y 事件是并发的
当我们没有办法自信地确认一个事件是在另一个事件的前面或者后面发生的话,就说明x和y这两个事件是并发的。 - 并发安全
考虑一下,一个函数在线性程序中可以正确地工作。如果在并发的情况下,这个函数依然可以正确地工作的话,那么我们就说这个函数是并发安全的,并发安全的函数不需要额外的同步工作。我们可以把这个概念概括为一个特定类型的一些方法和操作函数,对于某个类型来说,如果其所有可访问的方法和操作都是并发安全的话,那么该类型便是并发安全的。 - 一个函数在并发调用时没法工作的原因太多了
死锁(deadlock)、活锁(livelock)、饿死(resource starvation)
A race condition is a situation in which the program does not give the correct result for some interleaving of the operations of multiple goroutines. Race conditions are pernicious because they may remain latent in a program and appear infrequently, perhaps only under heavy load or when using certain compilers, platforms, or architectures. This makes them hard to reproduce and diagnose.
竞争条件指的是程序在多个goroutine交叉执行操作时,没有给出正确的结果。竞争条件是很恶劣的一种场景,因为这种问题会一直潜伏在你的程序里,然后在非常少见的时候蹦出来,或许只是会在很大的负载时才会发生,又或许是会在使用了某一个编译器、某一种平台或者某一种架构的时候才会出现。这些使得竞争条件带来的问题非常难以复现而且难以分析诊断。
竞争条件的解决方法
- 不要去写变量
举的例子是,填充一个 Icon map,如果是“懒”填充,也就是每个 key 第一次用到的时候才去填充,如果 Icon 是被顺序调用的时候,是不会有问题的,但并非调用时,就会产生问题
- 避免从多个goroutine访问变量
这也是前一章中大多数程序所采用的方法。例如前面的并发web爬虫(§8.6)的main goroutine是唯一一个能够访问seen map的goroutine,而聊天服务器(§8.10)中的broadcaster goroutine是唯一一个能够访问clients map的goroutine。这些变量都被限定在了一个单独的goroutine中。
“不要使用共享数据来通信;使用通信来共享数据”
一个提供对一个指定的变量通过channel来请求的goroutine叫做这个变量的monitor(监控)goroutine。例如broadcaster goroutine会监控clients map的全部访问。
- 互斥(锁)
第三种避免数据竞争的方法是允许很多goroutine去访问变量,但是在同一个时刻最多只有一个goroutine在访问。这种方式被称为“互斥”,在下一节来讨论这个主题。
9.2 Mutual Exclusion: sync.Mutext
In Section 8.6, we used a buffered channel as a counting semaphore to ensure that no more than 20 goroutines made simultaneous HTTP requests. With the same idea, we can use a channel of capacity 1 to ensure that at most one goroutine accesses a shared variable at a time. A semaphore that counts only to 1 is called a binary semaphore.
我们使用了一个buffered channel作为一个计数信号量,来保证最多只有20个goroutine会同时执行HTTP请求。同理,我们可以用一个容量只有1的channel来保证最多只有一个goroutine在同一时刻访问一个共享变量。一个只能为1和0的信号量叫做二元信号量(binary semaphore)。
package bank
//!+
var (
sema = make(chan struct{}, 1) // a binary semaphore guarding balance
balance int
)
func Deposit(amount int) {
sema <- struct{}{} // acquire token
balance = balance + amount
<-sema // release token
}
func Balance() int {
sema <- struct{}{} // acquire token
b := balance
<-sema // release token
return b
}
This pattern of mutual exclusion is so useful that it is supported directly by the Mutex type from the sync package. Its Lock method acquires the token (called a lock) and its Unlock method releases it:
package main
import "sync"
var (
mu sync.Mutex
balance int
)
func Deposit(amount int) {
mu.Lock()
balance = balance + amount
mu.Unlock()
}
func Balance() int {
mu.Lock()
b := balance
mu.Unlock()
return b
}
Each time a goroutine accesses the variables of the bank(just balance here), it must call the mutex't Lock method to acquire an exclusive lock. If some other goroutine has acquired the lock, this operation will block until the other goroutine calls Unlock and the lock becomes available again. The mutex guards the shared variables. By convention, the variables guarded by a mutex are declared immediately after the declaration of the mutex itself. If you deviate from this, be sure to document it.
- critical section(临界区)
The region of code between Lock and Unlock in which a goroutine is free to read and modify the shared variables is called a critical section. The lock holder's call to Unlock happens before any other goroutine can acquire the lock for itself. It is essential that the goroutine release the lock once it is finished, on all paths through the function, including error paths.
在Lock和Unlock之间的代码段中的内容goroutine可以随便读取或者修改,这个代码段叫做临界区。锁的持有者在其他goroutine获取该锁之前需要调用Unlock。goroutine在结束后释放锁是必要的,无论以哪条路径通过函数都需要释放,即使是在错误路径中,也要记得释放。
The bank program above exemplifies a common concurrency pattern. A set of exported functions encapsulates one or more variables so that the only way to access the variables is through these functions (or methods, for the variables of an object). Each function acquires a mutex lock at the beginning and releases it at the end, thereby ensuring that the shared variables are not 上面的bank程序例证了一种通用的并发模式。一系列的导出函数封装了一个或多个变量,那么访问这些变量唯一的方式就是通过这些函数来做(或者方法,对于一个对象的变量来说)。每一个函数在一开始就获取互斥锁并在最后释放锁,从而保证共享变量不会被并发访问。这种函数、互斥锁和变量的编排叫作监控monitor(这种老式单词的monitor是受"monitor goroutine"的术语启发而来的。两种用法都是一个代理人保证变量被顺序访问)。
defer 的采用解决了 Unlock() 的遗忘问题
Since the critical sections in the Deposit and Balance functions are so short - a single line, no branching ---- calling Unlock at the end is straightforward. In more complex critical sections, especially those in which errors must be dealt with by returning early, it can be hard to tell that calls to Lock and Unlock are strictly paired on all paths. Go's defer statement comes to the rescue: by deferring a call to Unlock, the critical section implicitly extends to the end of the current function, freeing us from having to remember to insert Unlock calls in one or more places far from the call to Lock.
- 复杂情况下使用
func Balance() int {
mu.Lock()
defer mu.Unlock()
return balance
}
In the example above, the Unlock executes after the return statement has read the value of balance, so the Balance function is concurrency-safe. As a bonus, we no longer need the local variable b.
Furthermore, a deferred Unlock will run even if the critical section panics, which may be important in programs that make use of recover. A defer is marginally more expensive than an explicit call to Unlock, but not enough to justify less clear code. As always with concurrent programs, favor clarity and resist premature optimization. Where possible, use defer and let critical sections extend to the end of a function.
- 这样,Balance 函数就是变成了并发安全了,作为奖励,我们业不需要局部变量 b 了。
- 一个deferred Unlock即使在临界区发生panic时依然会执行,这对于用recover (§5.10)来恢复的程序来说是很重要的。
- defer调用只会比显式地调用Unlock成本高那么一点点,不过却在很大程度上保证了代码的整洁性。
- 大多数情况下对于并发程序来说,代码的整洁性比过度的优化更重要。如果可能的话尽量使用defer来将临界区扩展到函数的结束。
consider the Withdraw function below. On success, it reduces the balance by the specified amount and returns true. but if the account holds insufficient funds for the transaction, Withdraw restores the balance and returns false.
成功的时候,它会正确地减掉余额并返回true。但如果银行记录资金对交易来说不足,那么取款就会恢复余额,并返回false。
// NOTE: not atomic!
func Withdraw(amount int) bool {
Deposit(-amount)
if Balance() < 0 {
Deposit(amount)
return false // insufficient funds
}
return true
}
This function eventually gives the correct result, but it has a nasty side effect. When an excessive withdrawal is attempted, the balance transiently dips below zero. This may cause a concurrent withdrawal for a modest sum to be spuriously rejected. So if Bob tries to buy a sports car, Alice can't pay for her morning coffee. The problem is the Withdraw is not atomic: it consists of a sequence of three separate operations, each of which acquires and then releases the mutex lock, but nothing locks the whole sequence.
函数终于给出了正确的结果,但是还有一点讨厌的副作用。当过多的取款操作同时执行时,balance可能会瞬时被减到0以下。这可能会引起一个并发的取款被不合逻辑地拒绝。所以如果Bob尝试买一辆sports car时,Alice可能就没办法为她的早咖啡付款了。这里的问题是取款不是一个原子操作:它包含了三个步骤,每一步都需要去获取并释放互斥锁,但任何一次锁都不会锁上整个取款流程。
Ideally, Withdraw should acquire the mutex lock once around the whole operation. However, this attempt won't work:
// NOTE: incorrect!
func Withdraw(amount int) bool {
mu.Lock()
defer mu.Unlock()
Deposit(-amount)
if Balance() < 0 {
Deposit(amount)
return false // insufficient funds
}
return true
}
- 这里加锁的原因,是因为过多的 取款操作同时执行时,balance 可能会瞬间被减到 0 以下,这可能会引起一个并发的取款被不合逻辑地拒绝。
- 但上面的例子中,直接加锁,会导致一个问题,就是Deposit函数会调用 mu.Lock()第二次去获取互斥锁,但是因为 mutex 已经上锁了,而无法被重入(go里没有重入锁,关于重入锁的概念,请参考java)--- 也就是说没法对一个已经锁上的mutex来再次上锁--这会导致程序死锁,没法继续执行下去,Withdraw会永远阻塞下去。
There is a good reason Go's muteness are not re-entrant. The purpose of mutex is to ensure that certain invariants of the shared variables are maintained at critical points during program execution. One of the invariants is "no goroutine is accessing the shared variables," but there may be additional invariants specific to the data structures that the mutex guards. When a goroutine acquires a mutex lock, it may assume that the invariants hold. While it holds the lock, it may update the shared variables so that the invariants are temporarily violated. However, when it releases the lock, it must guarantee that order has been restored and the invariants hold once again. Although a re-entrant mutex would ensure that no other goroutines are accessing the shared variables, it cannot protect the additional invariants of those variables.
Go 不支持重入锁有很好的理由。锁的目的是确保在临界区里,那些共享的变量能保持不变性。不变性的其中之一是“没有goroutine访问共享变量”,但是锁还对数据结构提供额外的不变性保护。当一个 goroutine 获取
9.8 Goroutines and Threads
In the previous chapter we said that the difference between goroutines and operating system(OS) threads could be ignored until later. Although the differences between them are essentially quantitative, a big enough quantitative difference becomes a qualitative one, and so it is with goroutines and threads. The time has now come to distinguish them.
- 尽管两者的区别实际上只是一个量的区别,但量变会引起质变的道理同样适用于goroutine和线程。
goroutine 和 thread 的区别有三个