CodingBison - Hash Tables: Problems

Let us go through some of the common problems related to hash tables.

Canonical Forms

Some of the hash table problems require us to use canonical form. Here is the definition of canonical form (from Wikipedia): In computer science, data that has more than one possible representation can often be canonicalized into a completely unique representation called its canonical form.

Two more examples of canonical forms:

Irreducible fractions. So, 12/15 = 4/5, 24/30 is same as 4/5.. So, 4/5 is the canonical form of all of these fractions.
Irreducible form for equations: Simplified linear equations. Equations "3y=15x+21" or "2*y-10*x-14" have the same form "y=3x+7"

Group Anagrams

Problem: https://leetcode.com/problems/group-anagrams/

The goal is to group words that are anagrams of each other. We are given an array of strings.

 Input: ["eat", "tea", "tan", "ate", "nat", "bat"],
 Output::
 [
   ["ate", "eat","tea"],
   ["nat","tan"],
   ["bat"]
 ]

All anagrams have same chars (including their frequencies) in their string. So, if we can sort their chars, then we can use that as a signature for this canonical form. In Java, we can simply sort the String. Since String does not have a direct method to sort, we would have to create a new char array (using toCharArray() method), then we need to call Arrays.sort() on that, and lastly, we need to create a new String. This new String would be the key.

 public List<List groupAnagrams(String[] strs) {
     List<List ret = new ArrayList<>();
     Map<String, List map = new HashMap<>();
     List<String> tempList;

     for (String s: strs) {
         char arr[] = s.toCharArray();
         Arrays.sort(arr);
         String key = new String(arr);

         if (!map.containsKey(key)) {
             map.put(key, new ArrayList<String>());
         }
         map.get(key).add(s);
     }

     return new ArrayList<List(map.values());
 }

Group Shifted Strings

Problem: https://leetcode.com/problems/group-shifted-strings/

Group strings that are shifted versions of each other. Thus, given a string, we can "shift" each of its letter to its successive letter, for example: "abc" -> "bcd". We can keep "shifting" which forms the sequence:

"abc" -> "bcd" -> ... -> "xyz" Given a list of strings which contains only lowercase alphabets, group all strings that belong to the same shifting sequence.

For example:

 Input: ["abc", "bcd", "acef", "xyz", "az", "ba", "a", "z"],
 Output: 
 [
   ["abc","bcd","xyz"],
   ["az","ba"],
   ["acef"],
   ["a","z"]
 ]

We need to find the signature of each word (their canonical form). Get the difference of each character wrt first character and then add 'a'. Btw, if the current character is less than first character, then we should add 26 as well. Once you have the signature, we can put them in a hashtable (hashmap in Java).

 public List<List groupShiftedStrings(String[] strings) {
     Map<String,List group_map = new HashMap<>();
     for(String s: strings){
         StringBuilder key  = new StringBuilder();
         for(int i = 0; i < s.length();i++){
             int relative_distance = (s.charAt(i) - s.charAt(0) + 26) % 26;
             // in case [0,1,11] [0,11,1], so i add "." to key.
             key.append(".");
             key.append(Integer.toString(relative_distance));
         }
         String k = key.toString();
         if(!group_map.containsKey(k)){
             List<String> value = new ArrayList<>();
             group_map.put(k,value);
         }
         group_map.get(k).add(s);
     }

     List<List output = new ArrayList<>();
     for(String key: group_map.keySet()){
         List<String> value = new ArrayList<>();
         value = group_map.get(key);
         output.add(value);
     }
     return output;
 }

Unique Word Abbreviation

Problem: https://leetcode.com/problems/unique-word-abbreviation/

Goal is to determine if a word's abbreviation is unique in a dictionary. An abbreviation of a word follows the form <first letter><number><last letter>. Below are some examples of word abbreviations:

Example of abbreviations:


 a) it                      --> it    (no abbreviation)

      1
 b) d|o|g                   --> d1g

               1    1  1
      1---5----0----5--8
 c) i|nternationalizatio|n  --> i18n

               1
      1---5----0
 d) l|ocalizatio|n          --> l10n

Now, given a dictionary and given a word, find whether its abbreviation is unique in the dictionary. A word's abbreviation is unique if no other word from the dictionary has the same abbreviation.

 Input 
 dictionary = [ "deer", "door", "cake", "card" ]

 Output:
 isUnique("dear") -> false
 isUnique("cart") -> true
 isUnique("cane") -> false
 isUnique("make") -> true

The idea here is that given a word, generate a signature for the canonical form (much like "Group Anagrams" and "Group Shifted Strings"). Once you have the signature, add that signature as the key in a hashtable. The value of the hashtable can be a pair (that holds a string and a count). Do this for all the words in the dictionary.

Next, when you need to check if a word is unique, make a signature, do a lookup. If the lookup fails, then yes. But, if the lookup does not fail and the count for that key in the hashtable is 1 AND the string is same as the passed string, then also, this woudl be considered unique.

Please note that, if the input array has the word twice {"dog", "dog"} in that case, the count should be 1. Else, if someone checks for "dog", then we would find the count as 2 and return false (since we check only if the count is 1). One way to handle this would be to increment the count (when adding to the hashtable) only if the word is different from the one that is already in the hashtable for that signature.

So, if input is ["door","cake","card"], then the word "door" is unique. The word "door" would be unique also if we have ["door","door","cake","card"]. So, since the "door" is a duplicate in the key.

Here is the solution in Java:

 public class ValidWordAbbr {
     HashMap<String, String> map;
     public ValidWordAbbr(String[] dictionary) {
         map = new HashMap<String, String>();
         for(String str:dictionary){
             String key = getKey(str);
             // If there is more than one string belong to the 
             // same key then the key will be invalid, we set 
             // the value to ""
             if(map.containsKey(key)){
                 if(!map.get(key).equals(str)){
                     map.put(key, "");
                 }
             }
             else{
                 map.put(key, str);
             }
         }
     }

     public boolean isUnique(String word) {
         return !map.containsKey(getKey(word))||map.get(getKey(word)).equals(word);
     }

     String getKey(String str){
         if(str.length()<=2) return str;
         return str.charAt(0)+Integer.toString(str.length()-2)+str.charAt(str.length()-1);
     }
 }

Caches

LRU cache

Problem: https://leetcode.com/problems/lru-cache/

We need to implement a cache with the Least Recently Used (LRU) eviction policy. This is a very popular question.

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and put.

get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
put(key, value) - Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

 Example:

 LRUCache cache = new LRUCache( 2 /* capacity */ );

 cache.put(1, 1);
 cache.put(2, 2);
 cache.get(1);       // returns 1
 cache.put(3, 3);    // evicts key 2
 cache.get(2);       // returns -1 (not found)
 cache.put(4, 4);    // evicts key 1
 cache.get(1);       // returns -1 (not found)
 cache.get(3);       // returns 3
 cache.get(4);       // returns 4

Keep a hashtable, where every value is a pointer to a doubly-linked list (DLL). Keep key and value both in the DLL node. For the DLL, using a dummy head and dummy tail, makes the code simpler.

When we insert an element in cache (put() method), insert it into the hashtable and also add it in teh DLL (at the tail of the DLL using addNode()). If the element exists in the hashtable, then simply move the pointer to the head of the DLL.

If the size of the DLL goes beyond a LIMIT, then delete the element from the head. You would need to store the key in the DLL node since when we delete the DLL node, we should also delete the hashtable value.

When we delete an element in the cache, delete the hashtable entry and also in the DLL node.

 class LRUCache {
     int size;
     HashMap<Integer, Node> map = new HashMap<>();
     Node dummyHead, dummyTail;

     public LRUCache(int capacity) {
         size = capacity;
         dummyHead = new Node(0,0);
         dummyTail = new Node(0,0);
         dummyHead.next = dummyTail;
         dummyTail.prev = dummyHead;
     }

     public int get(int key) {
         int value = -1;

         if (map.containsKey(key)) {
             Node node = map.get(key);
             deleteNode(node);
             addNode(node);
             value = node.val;
         }
         return value;
     }

     public void put(int key, int value) {
         Node node;
         if (map.containsKey(key)) {
             node = map.get(key);
             node.val = value;
             deleteNode(node);
         } else {
             node = new Node(key, value);
             map.put(key, node);
         }
         addNode(node);
         resetSize();
     }

     // add at tail
     void addNode (Node node) {
         dummyTail.prev.next = node;
         node.prev = dummyTail.prev;
         node.next = dummyTail;
         dummyTail.prev = node;
     }

     void deleteNode (Node node) {
         node.prev.next = node.next;
         node.next.prev = node.prev;
     }

     // delete from head
     public void resetSize() {
         if (map.size() > size) {
             map.remove(dummyHead.next.key);
             deleteNode(dummyHead.next);
         }
     }
 }

 class Node {
     Node prev, next;
     int key, val;

     public Node (int k, int v) {
         key = k;
         val = v;
     }
 }

It would be incomplete if we were not to not provide an implementation of this in Java LinkedHashMap which internally stores the insertion order (using a DLL!). When resetting the size we simply traverse the keys in the LinkedHashMap (using map.keySet()) and that allows us to get hold of the earliest element.

 class LRUCache {
     int size;
     LinkedHashMap<Integer, Integer> map = new LinkedHashMap<>();

     public LRUCache(int capacity) {
         size = capacity;
     }

     public void resetSize() {
         int out = 0;
         if (map.size() > size) {
             // instead of this, you can use starKey = map.iterator().next()
             for (int key: map.keySet()) {
                 out = key;
                 break;
             }
             map.remove(out);
         }
     }

     public int get(int key) {
         int ret = -1;
         if (map.containsKey(key)) {
             int ret = map.get(key);
             map.remove(key);
             map.put(key, ret);
          }
          return ret;
     }

     public void put(int key, int value) {
         if (map.containsKey(key)) { // you can remove without check. If the key is not there, then it will be a no-op.
             map.remove(key);
         }

         map.put(key, value);
         resetSize();
     }
 }

Hash-table with Binary Search Tree

Design a Leaderboard

Problem: https://leetcode.com/problems/design-a-leaderboard/

The goal here is to design a Leaderboard class, which has 3 functions:

addScore(playerId, score): Update the leaderboard by adding score to the given player's score. If there is no player with such id in the leaderboard, add him to the leaderboard with the given score.
top(K): Return the score sum of the top K players.
reset(playerId): Reset the score of the player with the given id to 0. It is guaranteed that the player was added to the leaderboard before calling this function.

Initially, the leaderboard is empty. Here is an example:

 Input:
 ["Leaderboard","addScore","addScore","addScore","addScore","addScore","top","reset","reset","addScore","top"]
 [[],[1,73],[2,56],[3,39],[4,51],[5,4],[1],[1],[2],[2,51],[3]]
 Output:
 [null,null,null,null,null,null,73,null,null,null,141]

 Explanation:
 Leaderboard leaderboard = new Leaderboard ();
 leaderboard.addScore(1,73);   // leaderboard = [[1,73]];
 leaderboard.addScore(2,56);   // leaderboard = [[1,73],[2,56]];
 leaderboard.addScore(3,39);   // leaderboard = [[1,73],[2,56],[3,39]];
 leaderboard.addScore(4,51);   // leaderboard = [[1,73],[2,56],[3,39],[4,51]];
 leaderboard.addScore(5,4);    // leaderboard = [[1,73],[2,56],[3,39],[4,51],[5,4]];
 leaderboard.top(1);           // returns 73;
 leaderboard.reset(1);         // leaderboard = [[2,56],[3,39],[4,51],[5,4]];
 leaderboard.reset(2);         // leaderboard = [[3,39],[4,51],[5,4]];
 leaderboard.addScore(2,51);   // leaderboard = [[2,51],[3,39],[4,51],[5,4]];
 leaderboard.top(3);           // returns 141 = 51 + 51 + 39;

Maintain a leaderboard with efficient updates and retrieval using a hash table and binary search tree.

The key here is to use a two way mapping (basically, when it comes to things like score, grades, etc). We can use a HashMap to frontend the player and its current score and a TreeSet in the backend, to keep the scores sorted. Btw, different players can have the same score, so the TreeSet should also contain the player-id.

Here is the Code:

 class Leaderboard {
     HashMap<Integer, Integer> map;
     TreeSet<int[]> set;
     public Leaderboard() {
         map = new HashMap<>();
         set = new TreeSet<>((a, b) -> {
             if (a[0] != b[0]) return b[0]-a[0];
             else return b[1]-a[1];
         });
     }

     public void addScore(int id, int score) {
         if (map.containsKey(id)) set.remove(new int[]{map.get(id), id});

         map.put(id, map.getOrDefault(id, 0)+score);
         set.add(new int[]{map.get(id), id});
     }

     public int top(int K) {
         int sum = 0, i = 0;
         for (int[] elem: set) {
             sum += elem[0];
             if (++i >= K) break;
         }
         return sum;
     }

     public void reset(int id) {
         if (map.containsKey(id)) set.remove(new int[]{map.get(id), id});
         map.remove(id);
     }
 }