244. Shortest Word Distance II (M)

Design a class which receives a list of words in the constructor, and implements a method that takes two words word1 and word2 and return the shortest distance between these two words in the list. Your method will be called repeatedly many times with different parameters.

Example:
Assume that words = ["practice", "makes", "perfect", "coding", "makes"].

Input: word1 = “coding”, word2 = “practice”
Output: 3
Input: word1 = "makes", word2 = "coding"
Output: 1
Note:
You may assume that word1 does not equal to word2, and word1 and word2 are both in the list.

我的答案：offline把每个word的index数组都存起来，然后online的时候找最近

class WordDistance {
private:
    unordered_map<string, vector<int>> word2index;
public:
    WordDistance(vector<string>& words) {
        int len = words.size();
        string word;
        for (int i=0; i<len; ++i) {
            word = words[i];
            if (word2index.find(word) == word2index.end())
                word2index[word] = {i};
            else
                word2index[word].push_back(i);
        }
    }
    
    int shortest(string word1, string word2) {
        int ans = INT_MAX;
        
        vector<int> indexes1 = word2index[word1];
        vector<int> indexes2 = word2index[word2];
        int len1 = indexes1.size();
        int len2 = indexes2.size();
        
        int i1 = 0, i2 = 0;
        while (i1 < len1 and i2 < len2) {
            int ans_temp = abs(indexes1[i1] - indexes2[i2]);
            if (ans_temp < ans) { //check 
                ans = ans_temp;
            }
            
            if (indexes1[i1] < indexes2[i2])
                ++i1;
            else
                ++i2;
        }
        
        return ans;
    }
};

/**
 * Your WordDistance object will be instantiated and called as such:
 * WordDistance* obj = new WordDistance(words);
 * int param_1 = obj->shortest(word1,word2);
 */

Runtime: 96 ms, faster than 5.76% of C++ online submissions for Shortest Word Distance II.
Memory Usage: 20.3 MB, less than 25.08% of C++ online submissions for Shortest Word Distance II.

很慢！
看了下其他答案和标准答案，意思是一样的，可以改进的地方是，constructor里面不需要判断是否存在words[i]，直接push_back就行了

Solution

Before looking at the solution for this problem, let's look at what the problem asks us to do in simpler terms. We have to design a class which receives a list of words as input in the constructor. The class has a function which we need to implement and that function is shortest which takes two words as input and returns the minimum distance between the two as the output.

When the problem talks about the distance between two words, it essentially means the absolute gap between the indices of the two words in the list. For e.g. if the first word occurs at a location i and the second word occurs at the location j, then the distance between the two would be abs(i - j).

The question asks us to find the minimum such different between words which clearly indicates that the words can occur at multiple locations. If we have K occurrences for the word1 and L occurrences for the word2, then iteratively checking every pair of indices will give us a <math><semantics><annotation encoding="application/x-tex">O(N^2)</annotation></semantics></math>O(N2) algorithm which won't be optimal at all. We won't discuss that algorithm here since it is very straightforward.

The brute-force algorithm would simple consider all possible pairs of indices for (word1_location, word2_location) and see which one produces the minimum distance. Let's try and build on this idea and see if some pre-processing can help us out reduce the complexity of the brute-force algorithm.

Approach 1: Using Preprocessed Sorted Indices

Intuition

A given word can occur multiple times in the original word list. Let's suppose the first word, word1 in the input to the function shortest occurs at the indices [i1, i2, i3, i4] in the original list. Similarly, let's assume that the second word, word2, appears at the following locations inside the word list [j1, j2, j3].

Now, given these list of indices, we are to simply find the pair of indices (i, j) such that their absolute difference is minimum.

The main idea for this approach is that if the list of these indices is in sorted order, we can find such a pair in linear time.

The idea is to use a two pointer approach. Let's say we have a pointer i for the sorted list of indices of word1 and j for the sorted list of indices of word2. At every iteration, we record the difference of indices i.e. abs(word1[i] - word2[j]). Once we've done that, we have two possible choices for progressing the two pointers.

If this is the case, that means there is no point in moving the j pointer forward. The location indices for the words are in a sorted order. We know that word2[j + 1] > word2[j] because these indices are sorted. So, if we move j forward, then the difference abs(word1[i] - word2[j + 1]) would be even greater than abs(word1[i] - word2[j]). That doesn't help us since we want to find the minimum possible distance (difference) overall.

So, if we have (word1[i] < word2[j]), we move the pointer 'i' one step forward i.e. (i + 1) in the hopes that abs(word1[i + 1] - word2[j]) would give us a lower distance than abs(word1[i] - word2[j]). We say "hopes" because it is not certain this improvement would happen.

Let's look at two different examples. In the first example we will see that moving i forward gave us the best difference overall (0). In the second example we see that moving i forward leads us to our second case (yet to discuss) but doesn't lead to any improvement in the difference.

Example-1

word1_locations = [2,4,5,9]
word2_locations = [4,10,11]

i, j = 0, 0
min_diff = 2 (abs(2 - 4))
word1[i] < word2[j] i.e. 2 < 4
  move i one step forward

i, j = 1, 0 (abs(4 - 4))
min_diff = 0 (We hit the jackpot!)

Example-2

word1_locations = [2,7,15,16]
word2_locations = [4,10,11]

i, j = 0, 0
min_diff = 2 (abs(2 - 4))
word1[i] < word2[j] i.e. 2 < 4
  move i one step forward

i, j = 1, 0
min_diff = 2 (2 < abs(7 - 4))

Here, we did not update out global minimum difference.
That is why we said earlier, moving 'i' forward may or
may not give a lower difference. But moving 'j' forward in
our case would definitely worsen the difference (or keep it same!).

Let's move onto our second scenario.

If this is the case, that means there is no point in moving the i pointer forward. We know that word1[i + 1] > word2[j] because these indices are sorted. So, if we move i forward, then the difference abs(word1[i + 1] - word2[j]) would be even greater than abs(word1[i] - word2[j]). That doesn't help us since we want to find the minimum possible distance (difference) overall.

So, along the similar lines of thought as the previous case, if we have (word1[i] > word2[j]), we move the pointer 'j' one step forward i.e. (j + 1) in the hopes that abs(word1[i] - word2[j + 1]) would give us a lower distance than abs(word1[i] - word2[j]). We say "hopes" because as showcased in the previous scenario, it is not certain this improvement would happen.

Now let's formally look at the algorithm for solving this problem.

Algorithm

In the constructor of the class, we simply iterate over the given list of words and prepare a dictionary, mapping a word to all it's locations in the array.
Since we process all the words from left to right, we will get all the indices in a sorted order by default for all the words. So, we don't have to sort the indices ourselves.
Let's call the dictionary that we build, locations.
For a given pair of words, obtain the list of indices (appearances inside the original list/array of words). Let's call the two arrays loc1 and loc2.
Initialize two pointer variables l1 = 0 and l2 = 0.
For a given l1 and l2, we first update (if possible) the minimum difference (distance) till now i.e. dist = min(dist, abs(loc1[l1] - loc2[l2])). Then, we check if loc1[l1] < loc2[l2] and if this is the case, we move l1 one step forward i.e. l1 = l1 + 1. Otherwise, we move l2 one step forward i.e. l2 = l2 + 1.
We keep doing this until all the elements in the smaller of the two location arrays are processed.
Return the global minimum distance between the words.

image

</center>

This represents the locations dictionary that we should build given the original words list in the constructor. The key represents the word and the value is a list containing indices in ascending order of occurrences throughout the array. Let's look at the minimum distance between the words apple and football in the array. So, we will be considering the two sorted lists of indices: [3, 6, 8, 12] and [2, 7, 9].

image.png

13 / 14

</center>

Complexity analysis

Time complexity : The time complexity of the constructor of our class is <math><semantics><annotation encoding="application/x-tex">O(N)</annotation></semantics></math>O(N) considering there were <math><semantics><annotation encoding="application/x-tex">N</annotation></semantics></math>N words in the original list. We iterate over them and prepare a mapping from key to list of indices as described before. Then, for the function that finds the minimum distance between the two words, the complexity would be <math><semantics><annotation encoding="application/x-tex">O(max(K, L))</annotation></semantics></math>O(max(K,L)) where <math><semantics><annotation encoding="application/x-tex">K</annotation></semantics></math>K and <math><semantics><annotation encoding="application/x-tex">L</annotation></semantics></math>L represent the number of occurrences of the two words. However, <math><semantics><annotation encoding="application/x-tex">K = O(N)</annotation></semantics></math>K=O(N) and also <math><semantics><annotation encoding="application/x-tex">L = O(N)</annotation></semantics></math>L=O(N). Therefore, the overall time complexity would also be <math><semantics><annotation encoding="application/x-tex">O(N)</annotation></semantics></math>O(N). The reason the complexity is <math><semantics><annotation encoding="application/x-tex">O(max(K, L))</annotation></semantics></math>O(max(K,L)) and not <math><semantics><annotation encoding="application/x-tex">O(min(K, L))</annotation></semantics></math>O(min(K,L)) is because of the scenario where the minimum element of the smaller list is larger than all the elements of the larger list. In that scenario, the pointer for the smaller list will not progress at all and the one for the longer list will reach to the very end.
Space complexity: <math><semantics><annotation encoding="application/x-tex">O(N)</annotation></semantics></math>O(N) for the dictionary that we prepare in the constructor. The keys represent all the unique words in the input and the values represent all of the indices from <math><semantics><annotation encoding="application/x-tex">0 ... N</annotation></semantics></math>0...N.

244. Shortest Word Distance II (M)

Solution

Approach 1: Using Preprocessed Sorted Indices

推荐阅读更多精彩内容