Swift 解读 - Collection 大家族（上篇）

概要

集合类型对任何一门现代化编程语言都至关重要，它们在许多可见或者不可见的地方，影响着代码质量与执行效率。Swift 在集合类型的设计和实现上，进行了诸多的考量，让它兼具易用性、高性能以及扩展性，但同时也增加了集合的复杂性。很多时候我们都在说 Swift 会比 OC 更快，但它到底快在哪里呢？对于名 Swifter 来说，单纯的数据统计分析的结果无说服力的，我们要知其然，同时更要知其所以然，那么本系列文章将围绕着 Swift 是如何实现高性能的集合类型来展开。

起点 Sequence

在 Swift 中，集合是一个复杂的类型家族，是由许多 Protocol 组成的，其中 Sequence 是 Swift 整个集合类型体系中的起点，它抽象了一个集合最原始的协议。仅表示一系列类型相同的元素，而不对这一系列元素的性质有任何额外的约定。它只约定一个动作，就是可以从序列的当前位置读取下一个元素，也即表示序列需要对外提供一个迭代器来实现对外的访问。

Sequence 的实现

我们可以在 Sequence.swift 文件中查看到 Sequence 协议的实现方法（在前言篇中在已经讲述过如何编译和查看 Swift 的源码，系列文章将不作重复说明）：

public protocol Sequence {
  associatedtype Element
  associatedtype Iterator : IteratorProtocol where Iterator.Element == Element
  __consuming func makeIterator() -> Iterator    
}

Sequence 协议中定义了大量的辅助函数，大部分都已通过 Protocol Extension 实现了，不一一列举，这里的事例代码只保留了 Sequence 协议最核心代码。从上面源码可以看出，序列的实现思路非常简单：

Element：表示序列中元素的类型；
Iterator：表示 IteratorProtocol 并且 Iterator 的 Element 与序列的 Element 相同；
makeIterator()：返回一个 Iterator，Iterator 迭代器提供了访问序列中下一个元素的能力，它的具体实现如下：

public protocol IteratorProtocol {
  associatedtype Element
  mutating func next() -> Element?
}

Element：表示 next() 返回的元素类型，如上面所述，Element 的类型与相应的序列的 Element 保持一致；
next()：方法是问序列下一个元素的接口，如果没有下一个元素则返回 nil;

IteratorProtocol 协议与 Sequence 协议是一对紧密相连的协议。序列通过创建一个提供对其元素进行访问的迭代器，它通过跟踪迭代过程并在调用 next() 时返回一个元素。

在我们使用 for-in 来访问序列或者集合时， Swift 实质在底层通过迭代器来循环遍历数组，正常我们通过 for-in 来遍历一个数组的用法：

let colors = ["red", "white", "black"]
for color in colors {
    print(color)
}

实质上 Swift 在底层通过 Sequence 的 Iterator 能力，做了如下转换：

let colors = ["red", "white", "black"]
var iterator = colors.makeIterator()
while let color = iterator.next() {
    print(color)
}

这两种方式访问序列的效果是等同的，只是一种简单的语法转换，所以所有基于 Sequence 的协议，除了可以通过过 for 来访问元素外，还可以通过迭代器来序列元素。

自定义序列

我了解 Sequence 的具体实现后，我们来尝试实现自己的序列，下面我们来实例一个输出 1…n 的平方数的序列。

首先我们需要先实现一个 SquareIterator 的迭代器来遍历平方数序列：

struct SquareIterator: IteratorProtocol {
    typealias Element = Int
    var state = (curr: 0, next: 1)
    mutating func next() -> SquareIterator.Element? {
        let curr = state.curr
        let next = state.next
        state = (curr: next, next: next + 1)
        if curr == 0 {
            return 1
        }
        return curr * curr
    }
}

通过在迭代器中定义一个 state 来记录当前迭代过程的状态信息，实现了该迭代器后，我们就需要实现 Square 序列的 Sequence 协议：

struct Square: Sequence {
    typealias Element = Int
    func makeIterator() -> SquareIterator {
        return SquareIterator()
    }
}

通过实现了 Sequence 与 IteratorProtocol 两个协议，那么一个简单的 Square 序列即开发完毕，我们可以尝试去执行它：

let square = Square()
var iterator = square.makeIterator()
while let num = iterator.next(), num <= 100 {
    print(num) // 1,1,3,9,16,25,36,49,64,81,100
}

到这里我们已经完成一个自定义的序列，它支持通过迭代器来遍历序列的所有元素，但是没有办法通过下标的方式来访问序列元素，至于如何实现下标访问，这其中又涉及到另一个协议：Collection，接下来我们就来谈谈 Collection Protocol。

Collection

Collection 是一个继承于 Sequence 序列，是一个元素可以反复遍历并且可以通过索引的下标访问的有限集合。集合在标准库中广泛使用，当我们在使用数组、字典和其他集合时，大多将受益于 Collection 协议声明和实现的操作。除了集合从 Sequence 协议继承的操作之外，最大的不同点是可以访问依赖于访问集合中特定位置的元素的方法。

Collection 的实现

源码

我们可以在 Collection.swift 查看到 Collection 的具体实现：

public protocol Collection: Sequence {
  override associatedtype Element
  associatedtype Index : Comparable
  var startIndex: Index { get }
  var endIndex: Index { get }
  associatedtype Iterator = IndexingIterator<Self>
  override __consuming func makeIterator() -> Iterator

  associatedtype SubSequence: Collection = Slice<Self>
  where SubSequence.Index == Index,
        Element == SubSequence.Element,
        SubSequence.SubSequence == SubSequence
  
  @_borrowed
  subscript(position: Index) -> Element { get }
  subscript(bounds: Range<Index>) -> SubSequence { get }

  associatedtype Indices : Collection = DefaultIndices<Self>
    where Indices.Element == Index, 
          Indices.Index == Index,
          Indices.SubSequence == Indices
       
  var indices: Indices { get }
}

Element、makeIterator：重写 Sequence 的 Element、makeIterator；
startIndex、endIndex：非空集合中第一个、最后一个元素的位置；
subscript：下标访问集合元素，例如 collection[i]、collection[0...i]；
indices: 集合的索引

通过源码，我们可以发现 Collection 与 Sequence 最大的不同点是提供了索引能力，以此基础上提供了通过下标访问元素的能力。 Collection 的自定义了迭代器：IndexingIterator，接下来我们了解下 What is IndexingIterator。

IndexingIterator

Collection 的迭代器使用 IndexingIterator，从名字的字面意思来看，这就是与下标位置有关的迭代器，我们先来看下它的具体实现：

public struct IndexingIterator<Elements : Collection> {
  internal let _elements: Elements
  internal var _position: Elements.Index
  
  init(_elements: Elements) {
    self._elements = _elements
    self._position = _elements.startIndex
  }
  init(_elements: Elements, _position: Elements.Index) {
    self._elements = _elements
    self._position = _position
  }
}
extension IndexingIterator: IteratorProtocol, Sequence {
  public typealias Element = Elements.Element
  public typealias Iterator = IndexingIterator<Elements>
  public typealias SubSequence = AnySequence<Element>
  
  public mutating func next() -> Elements.Element? {
    if _position == _elements.endIndex { return nil }
    let element = _elements[_position]
    _elements.formIndex(after: &_position)
    return element
  }
}

_elements：需要迭代的集合，类型为 Collection；
_position：记录遍历时的位置信息；

IndexingIterator 的作用主要是在迭代器执行 next() 方法时，记录了 position，通过 position 记录索引的同时，还可以与 elements.endIndex 比较来判断是否有下一个元素。

Slice

Collection 协议中，通过闭包去访问多个元素时，返回了一个 Slice<Self>，这里的 Slice 是一个可以获取集合的元素的子序列。切片存储的只是集合的开始和结束索引，所以它不会将集合中的元素复制一份，因此创建切片具有O（1）复杂度。Collection 中通过 Slice 来截取子序列，具体可以看下 Slice 到底做了些什么：

public struct Slice<Base: Collection> {
  public var _startIndex: Base.Index
  public var _endIndex: Base.Index
  internal var _base: Base

  public init(base: Base, bounds: Range<Base.Index>) {
    self._base = base
    self._startIndex = bounds.lowerBound
    self._endIndex = bounds.upperBound
  }
  public var base: Base {
    return _base
  }
}
extension Slice: Collection {
  public typealias Index = Base.Index
  public typealias Indices = Base.Indices
  public typealias Element = Base.Element
  public typealias SubSequence = Slice<Base>
  public typealias Iterator = IndexingIterator<Slice<Base>>

  public var startIndex: Index {
    return _startIndex
  }
  public var endIndex: Index {
    return _endIndex
  }
  public subscript(index: Index) -> Base.Element {
    get {
      _failEarlyRangeCheck(index, bounds: startIndex..<endIndex)
      return _base[index]
    }
  }
  public subscript(bounds: Range<Index>) -> Slice<Base> {
    get {
      _failEarlyRangeCheck(bounds, bounds: startIndex..<endIndex)
      return Slice(base: _base, bounds: bounds)
    }
  }
}

Slice 实质上仅且修改 Collection 的索引，从而达到获取子序列的效果。

Custom Collection

在了解了 Collection 及其相关的一些协议后，我们来尝试自己实现一个简单的 Collection，这个集合包含 1 ~ 10 的 Int 类型的元素，并支持下标访问和多次遍历：

struct SquareCollection: Collection { 
    typealias Element = Int
    typealias Index = Int
    typealias Indices = Range<Int>
    let contents = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    var indices: Range<Int> = 0..<10
    
    public func index(after i: Int) -> Int {
        return i + 1
    }
    public var startIndex: Int {
        get {
            return indices.startIndex
        }
    }
    public var endIndex: Int {
        get {
            return indices.endIndex
        }
    }
    public subscript(index: Int) -> Element {
        get {
            return contents[index]
        }
    }
}

我们在 SquareCollection 实现了 Collection 协议，提供了索引功能，然后通过 subscript 方法就可以达到通过下标访问元素的效果：

let collection = SquareCollection()
print(collection[1])  // 2
print(collection[3])  // 4

这个自定义的集合，有点类似一个写死的数组，它拥有 Sequence 与 Collection 协议的所有已实现的方法，比如：

let newCollection = collection.dropLast()
print(newCollection.count) // 9

当然这离真正的数组还有很远的距离，比如设置初始化元素、越界判断、通过下标修改元素值，关于通过下标修改元素，Swift 中通过 MutableCollection 协议来实现，可以看下 MutableCollection 的核心代码：

public protocol MutableCollection: Collection
where SubSequence: MutableCollection
{
  override subscript(position: Index) -> Element { get set }
  override subscript(bounds: Range<Index>) -> SubSequence { get set }
}

extension MutableCollection {
  public subscript(bounds: Range<Index>) -> Slice<Self> {
    get {
      _failEarlyRangeCheck(bounds, bounds: startIndex..<endIndex)
      return Slice(base: self, bounds: bounds)
    }
    set {
      _writeBackMutableSlice(&self, bounds: bounds, slice: newValue)
    }
  }
}

最核心的是重写了 subscript 方法，提供了对外的 set 方法，当然集合家族中，还有非常的辅助协议，用来提高集合家族的效率与性能，这里暂不深入讲解，后面内容涉及到的话再详细介绍。

集合与算法

集合大家族中，除了定义相关的数据结构外，还提供了大量的算法实现，以提高 Swift 的便捷性与高性能，以 Sequence 为例，我们可以看下 SequenceAlgorithms.swift 内容：

public func enumerated() -> EnumeratedSequence<Self>
public func min() -> Element?
public func max() -> Element?
public func starts<PossiblePrefix: Sequence>(
    with possiblePrefix: PossiblePrefix
  ) -> Bool
public func elementsEqual<OtherSequence: Sequence>
...
...

SequenceAlgorithms 实现了大量的基础算法，其具体的实现就不展示讨论，当然 CollectionAlgorithms 也是类似的，目的就是让开发者更便捷地使用集合的同时又保证它的性能稳定。如果我们开发组件或者开源库的时候，这是一种值得参考的思路，在功能完善的同时，我们还要保证使用者的便捷与接口的高性能，当然这是题外话了，就不过多去讨论。

Lazy

What is Lazy

Lazy 惰性加载，最典型的用例是懒加载，定义时声明具体操作，但只有在第一次使用时才去真正创建：

// 仅声明了 titleLabel，并没 UILabel 对象
lazy var titleLabel : UILabel = {
    let label = UILabel()
    return label
}() 
// 访问 titleLabel 时会去创建 UILabel 对象
self.titleLabel

通过这么一个简单的例子，初步了解 Swift 的 Lazy 是什么意思，当然懒加载只是其中的一种用法，下面我们来了解下集合中的 Lazy 作用。

LazySequence

在 Sequence 和 Collection 中已经了很多类似 map 和 filter 这些高阶函数，它们的思路是将一些通用的算法封装成简单的单行函数。但是有些时候我们并不需要立马执行这些辅助数，我们希望是的使用时再去执行辅助函数，定义时只做声明，这个时候就需要使用到 lazy，举个例子：

let allUser = [0, 1, 0, 1, 0, 2, 3, 4, 2, 1]
// 定义时直接执行 filter 方法，如果 agents 没有被使用，则浪费了资源
let agents = allUser.filter { $0 == 1 }

// 使用时才执行 filter 方法，如果不使用也不会浪费资源
let lazyAgents = allUser.lazy.filter { $0 == 1 }

Swift Sequences 和 Collections 给我提供通过 lazy 的方式来执行辅助函数，目的是为了给开发者提供便捷的辅助函数并又保持其高性能。那么这里来看下 Sequences 是怎么实现 lazy 的。

我们可以通过查看 LazySequence.swift 了解具体实现，核心代码如下：

public protocol LazySequenceProtocol : Sequence {
  associatedtype Elements: Sequence = Self where Elements.Element == Element
  var elements: Elements { get }
}
extension LazySequenceProtocol where Elements == Self {
  public var elements: Self { return self }
}
extension LazySequenceProtocol {
  public var lazy: LazySequence<Elements> {
    return elements.lazy
  }
}
extension LazySequenceProtocol where Elements: LazySequenceProtocol {
  public var lazy: Elements {
    return elements
  }
}
public struct LazySequence<Base : Sequence> {
  internal var _base: Base
  internal init(_base: Base) {
    self._base = _base
  }
}

从源码角度来看，非常有意思的是 LazySequence 并没有做任何的额外事情，只是新定义了一种类型。那么它具体是怎么实现，这个我们需要看具体的辅助方法，比如 Filter，我们可以从 Filter.swift 中找到以下代码：

extension LazyFilterSequence.Iterator: IteratorProtocol, Sequence {
  public typealias Element = Base.Element
  public mutating func next() -> Element? {
    while let n = _base.next() {
      if _predicate(n) {
        return n
      }
    }
    return nil
  }
}

Lazy 的功能，实质上是通过 extension 方法，在迭代器执行 next() 方法时，对获取到的元素执行 filter() 后再进行返回，从而实现了 lazy 惰性执行的效果。当然其它的 map()、flatMap() 等方法 lazy 实现也是一样的，这里就不一一列举了。

总结

在这一章内容中，介绍 Collection 大家族中几个重要的协议，初步了解它们是如何去实现与工作的。但是，它们在 Collection 大家族中，不过是冰山一角，还有更多的内容等待我们去深入挖掘，我们可以尝试通过解决下面几个问题去加深对集合类型的了解：

有几种不同形式的 Sequence 与 Collection，它们的作用是什么？
为什么 Iterator、LazySequence 等使用的是值类型，这样的目的是为了什么？
Sequence 与 Collection 有哪些高阶函数，他们是如何去实现的？

如果能答对上面的问题，估计会对 Swift 的集合类型有更深入的了解，下一篇我们会从 Array、Dictionary、Set 的具体实现来深入了解 Swift 的集合家族，同时也会尝试去实现我们自定义的功能齐全的集合类型。

更多内容，请关注我的公众号：

关注公众号

最后编辑于：2020.04.12 23:53:14

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 203,324评论 5赞 476
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 85,303评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 150,192评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,555评论 1赞 273
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,569评论 5赞 365
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,566评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 37,927评论 3赞 395
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,583评论 0赞 257
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 40,827评论 1赞 297
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,590评论 2赞 320
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,669评论 1赞 329
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,365评论 4赞 318
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 38,941评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,928评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,159评论 1赞 259
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 42,880评论 2赞 349
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,399评论 2赞 342