Sunday, 12 February 2017

ConcurrentHashMap - Detail discussion


Before proceeding with concurrent hash map let’s recap the synchronization in map.
Java API provide three different implementation of synchronized MAP.
1.      Hashtable
2.      Collections.synchronizedMap(Map)
3.      ConcurrentHashMap

HastTable is synchronized data structure, but it have only one lock per table, so if a thread put a lock on hashtable than whole table will be locked and no other thread will access it.

To synchronize a HashMap use synchronizeMap utility method of Collections class. This again put lock on complete HasMap.

ConcurrentHasMap provide higher concurrency level as compare to synchronized HashMap and HasTable. By default ConcurrentHashMap is divide in 16 segment. So at a time 16 threads can work on ConcurrentHashMap while in HashTable and Collections.synchronizedMap(Map) at a time only one thread can put a lock. So we can say that performance may be marginally poorer when only a single thread accesses the Map at a time, but significantly better when multiple threads access the map concurrently. In ConcurrentHasMap threads will lock only portion of the data which are being updated while other portion of data can be accessed by other threads.

ConcurrentHashMap doesn’t allow null keys or null values.
The main reason that nulls aren't allowed in ConcurrentMaps (ConcurrentHashMaps, ConcurrentSkipListMaps) is that ambiguities because  if map.get(key) returns null, you can't detect whether the key explicitly maps to null or the key isn't mapped. See below code
           
if (map.containsKey(k)) {
   return map.get(k);
} else {
   throw new KeyNotPresentException();
}

It might be possible that key k might be deleted in between the get(k) and containsKey(k) calls. As a result, the code will return null as opposed to KeyNotPresentException (Expected Result if key is not present).

Code Snippet –
import java.util.concurrent.ConcurrentHashMap;
/**
 * Simple example of ConcurrentHasMap
 * @author Manoj
 */
public class ConcurrentHashMapTest {
  public static void main(String[] args) {
   ConcurrentHashMap<Integer, String> chm = new ConcurrentHashMap<Integer, String>();
            chm.put(1, "One");
            chm.put(2, "Two");
            chm.forEach((K,V) -> System.out.println(K+ "|" +V));
   }
}


ConcurrentHashMap internal implementaion –

ConcurrenHashMap provide constructor to provide custom values of initial capacity, load factor and concurrency level.
ConcurrentHashMap m = new ConcurrentHashMap(initialCapacity , loadFactor, concurrencyLevel);

ConcurrentHashMap m = new ConcurrentHashMap(100 , 0.50f, 5);

Initial capacity is 100, it means CHM make sure it has space for adding 100 key-value pairs after creation.
Default capacity of Hashmap is 2^4 = 16 buckets.

Load factor is 0. 50f, it means when average number of elements per map exceeds 50f (intital capacity * load factor = 100 * 0. 50f = 50f) at that time map size will be increased and existing items in map are rehashed to put in new larger size map.
Load Factor is a measure, which decides when exactly to increase the hashmap capacity(buckets) to maintain get and put operation complexity of O(1).
Default load factor of Hashmap is 0.75f (i.e 75% of current map size).


Concurrency level is 5, it means at any given point of time Segment array size will be 5 or greater than 5, so that 5 threads can able to parallely write to a map.

The actual number of segments a ConcurrentHashMap instance will maintain is equal to the next power of two of the specified concurrencyLevel means in above details concurrency level is 5 than no of segment will be 8. Suppose if concurrency level is 90 than no of segment will be 128

ConcurrentHashMap creates segments lazily as necessary. Only one segment is created at construction time so wthin the constructor, a single Segment, s0, is created.
however, that segments is an immutable (final) instance field, which already guarantees transient visibility of s0 after the final field freeze

How concurrency works?
In above customize ConcurrentHashMap there are 8 segment. In ConcurrentHashMap it is defined as static final inner class segment which extends Reentrant lock and implements Serilizable interface. Each segment use single lock to consistently update its element flushing all the changes to main memory. Each segment manage its internal  hashtable of size 2x >= (capacity/no of segment)

How to store value? How put () method works?
First concurrentHashMap calculate the hashvalue of key and search it if it is found than simply update its value. If it’s a new entry than first it will find the segment into which the entry should be written. Segment could be existing or newly created than segment.put method will be invoked.
The key cannot be NULL, it is used to locate the segment index and then the actual bucket within the segment
The upper bits will be used to locate the segment index and the lower bits to locate the table index within the segment. This is why re-hashing function is different from that of HashMap as the hash bits need to be spread better to regularize both segment and index locations.
In Segment.put() method – with the help of hashing of key index is determined in the Segment.table than we make an entry of static final class HashEntry<K,V>
 Code snippet of HashEntry<K, V> class is –

static final class HashEntry<K,V> {
            final K key;
            final int hash;
            volatile V value;
            final HashEntry<K,V> next;

            HashEntry(K key, int hash, HashEntry<K,V> next, V value) {
            this.key = key;
            this.hash = hash;
            this.next = next;
            this.value = value;
            }

Before making an entry thread get lock on the segment If the lock cannot be acquired, The scanAndLockForPut() method uses a while loop to optimistically acquire the segment lock. Each time it is unable to acquire the lock, the applicable linked list is traversed, 1 element per lock acquisition failure. If the current element of traversal has the same key as the one being added, the traversal ends
Alternatively, if the key is not found in the linked list (after a complete traversal), a new HashEntry is speculatively created on behalf of put.

We don’t need lock for get() method(read operation) because in HashEntry class value field is volatile so we will always get the updated value lock will not be required.

2 comments: