LeetCode146-LRU缓存机制

这是第72篇LeetCode题解设计题

1 题目描述

运用你所掌握的数据结构，设计和实现一个 LRU (最近最少使用) 缓存机制。它应该支持以下操作：获取数据 get 和写入数据 put 。

获取数据 get(key) - 如果关键字 (key) 存在于缓存中，则获取关键字的值（总是正数），否则返回 -1。写入数据 put(key, value) - 如果关键字已经存在，则变更其数据值；如果关键字不存在，则插入该组「关键字/值」。当缓存容量达到上限时，它应该在写入新数据之前删除最久未使用的数据值，从而为新的数据值留出空间。

2 题解

首先来复习一下什么是LRU（Least Recently Used）

LRU是一种常用的页面置换算法，由于缓存的容量有限，因此需要将一些最近没有被访问到的页面/数据从当前缓存中删去，并将最近刚访问的内容更新到缓存中，这种置换策略就是LRU。

2.1 baseline

如果不考虑时间复杂度那么这道题就很容易了，每一次put或者get的时候以O(n)的时间复杂度遍历过去就好了，下面的解法是最最最笨的写法了，看下面的代码图一乐呵就好了嘿嘿~

class Base(object):
    def __init__(self, key = None, value = None, timestamp = -1):
        self.key = key
        self.value = value
        self.timestamp = timestamp
    

class LRUCache(object):

    def __init__(self, capacity):
        """
        :type capacity: int
        """
        self.capacity = capacity
        self.size = 0
        self.records = [Base() for i in range(capacity)]
        self.max_ts = 0

    def get(self, key):
        """
        :type key: int
        :rtype: int
        """
        for i in range(self.size):
            if self.records[i].key == key:
                self.records[i].timestamp = self.max_ts
                self.max_ts += 1
                return self.records[i].value
        return -1


    def put(self, key, value):
        """
        :type key: int
        :type value: int
        :rtype: None
        """
        # 如果当前的key存在，直接更新当前key对应的内容即可
        for i in range(self.capacity):
            if self.records[i].key == key:
                self.records[i].value = value
                self.records[i].timestamp = self.max_ts
                self.max_ts += 1
                return
        
        # 如果当前key不存在,有两种情况
        # 1 如果当前的size < capacity,直接在在最后面添加一个节点即可
        # 2 如果当前的size == capacity,需要遍历一遍找到ts最小的节点
        if self.size < self.capacity:
            self.records[self.size].key = key
            self.records[self.size].value = value
            self.records[self.size].timestamp = self.max_ts
            self.max_ts += 1
            self.size += 1
        else:
            min_ts_idx = 0
            for i in range(self.capacity):
                if self.records[i].timestamp < self.records[min_ts_idx].timestamp:
                    min_ts_idx = i
            self.records[min_ts_idx].key = key
            self.records[min_ts_idx].value = value
            self.records[min_ts_idx].timestamp = self.max_ts
            self.max_ts += 1


# Your LRUCache object will be instantiated and called as such:
# obj = LRUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)

不过这题肯定不可能这么简单，进阶要求需要用O(1)的时间复杂度实现LRU。先分析一下上面的解法存在的问题：

其一、由于只是单纯的将<key, value>数据对放在数组中，而且数组中的元素并没有什么联系，因此在put的时候只能从左往右的遍历过去找到timestamp最小的位置然后更新数据，这样做是在是太慢了。
其二、由于没有使用hash这样的索引，在每次进行get操作的时候都必须要从左往右遍历过去，这样的时间复杂度肯定不可能是O(1)了，因此更好的做法是建立一个hash索引直接定位到key。

2.2 优化

如何优化呢？针对问题2，可以使用hash字典存储key，这样就能够在O(1)的时间复杂度内知道要查询的<key, value>对是否在缓存中，而hash的value则保存指向当前<key, value>对在链表中节点的指针。针对问题1，可以利用链表能够在O(1)的时间复杂度完成添加/删除节点的特性，将链表中的元素按照访问的时间从前往后排列——即最近被访问到的元素放在链表头部，同时为了方便删除/添加链表节点可以使用双向链表完成实现。

现在大概的思路已经明确了，下面开始分析应该如何实现这样的数据结构，我们逐个击破。首先看一下我设计的hash字典+双向链表的结构示意图：

由于python中有dict或者cpp中有map可以直接使用，因此字典就不需要专门设计了。可以考虑自己实现/定制一下双向链表，在我的设计方案中定义了两个函数：remove函数用于从链表中删除给定节点，insert函数用于将节点插入到链表头部，只需要这两个函数即可搞定所有问题，这两个函数的实现如下图所示：

另外在实现双向链表的时候有一个小技巧：在初始化双向链表的时候就构造好head和tail两个节点，这两个节点一直存在，不存储任何有用信息，但是在进行链表的操作的时候能够减少很多不必要的判断~

class ListNode(object):
    def __init__(self, key = None, val = None):
        self.key = key
        self.val = val
        self.pre = None
        self.next = None

class LinkList(object):
    def __init__(self, key = None, val = None):
        self.head = ListNode()
        self.tail = ListNode()
        self.head.next = self.tail
        self.tail.pre = self.head
    
    def remove(self, node):
        """从双向链表中删除当前节点"""
        node.pre.next = node.next
        node.next.pre = node.pre
    
    def insert(self, node):
        """添加节点到链表的头部"""
        node.pre = self.head
        node.next = self.head.next
        self.head.next = node
        node.next.pre = node

class LRUCache(object):
    def __init__(self, capacity):
        """
        :type capacity: int
        """
        self.val_hash = dict()
        self.capacity = capacity
        self.linklist = LinkList()
        self.size = 0
        self.cur_ts = 0
    
    def _print_val_hash(self):
        ret = ""
        for item in self.val_hash:
            ret += str(self.val_hash[item].val) + "->"
        print(ret)

    def get(self, key):
        """
        :type key: int
        :rtype: int
        """
        if key in self.val_hash:
            cur_node = self.val_hash[key]
            self.linklist.remove(cur_node)
            self.linklist.insert(cur_node)
            return self.val_hash[key].val
        return -1

    def put(self, key, value):
        """
        :type key: int
        :type value: int
        :rtype: None
        """
        # 如果当前的key存在，直接更新当前key对应的val并将当前的val节点移动到双向链表的的最前面
        if key in self.val_hash:
            cur_node = self.val_hash[key]
            cur_node.val = value
            self.linklist.remove(cur_node)
            self.linklist.insert(cur_node)

        # 如果当前key不存在,有两种情况
        # 1 如果当前的size < capacity,直接在在最后面添加一个节点即可
        # 2 如果当前的size == capacity,删除链表最后
        elif self.size < self.capacity:
            new_node = ListNode(key, value)
            self.val_hash[key] = new_node
            self.linklist.insert(new_node)
            self.size += 1
        else:
            del self.val_hash[self.linklist.tail.pre.key]
            self.linklist.remove(self.linklist.tail.pre)
            new_node = ListNode(key, value)
            self.val_hash[key] = new_node
            self.linklist.insert(new_node)


# Your LRUCache object will be instantiated and called as such:
# obj = LRUCache(capacity)
# param_1 = obj.get(key)
# obj.put(key,value)

好的，到这里本题就圆满结束了，不过除了LRU还有一道hard难度的题目LFU哦~先挖个坑以后再填。