I was given the task to make two different objects of the class User with the same fields and 4 options for adding them to the HashSet :

  1. Do not override either equals() or hashCode() of the User class.
  2. Override equals() .
  3. Override only hashCode() .
  4. Override both equals() and hashCode() .

In all cases, you need to say what will be the size and why.

The User class looks like this:

 public class User { public String name; public int children; public Calendar birthday; public User(String name, int children, Calendar birthday) { this.name = name; this.children = children; this.birthday = birthday; } } 

I override hashCode() like this:

 @Override public int hashCode() { int hash = 31; hash = hash * 17 + name.hashCode(); hash = hash * 17 + birthday.hashCode(); hash = hash * 17 + children; return hash; } 

equals() I redefine as:

 @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (this.getClass() != obj.getClass()) return false; UserEquals ue = (UserEquals) obj; return this.hashCode() == ue.hashCode() || (this.birthday.getTimeInMillis() == ue.getBirthday().getTimeInMillis() && this.children == ue.getChildren() && this.name.equals(ue.getName())); } 

If my options for overriding methods are incorrect, please correct me.

The test generally looks like this:

 @Test public void whenThen() { Calendar calendar = new GregorianCalendar(1988, Calendar.JUNE, 19); User user1 = new User("Pavel", 0, calendar); User user2 = new User("Pavel", 0, calendar); Set<User> set = new HashSet<>(); set.add(user1); set.add(user2); int result = set.size(); assertThat(result, is(2)); } 

And here is the question:

When I do not override anything in the User class, then I have size == 2 , and when I override only equals() is also size == 2 . And I thought that HashSet does not use the equals() method at all, but immediately uses hashCode() (that is, a comparison of the contents of the fields in equals() does not affect the situation).

But when I redefined only hashCode() separately, the situation also did not change: size still remained equal to 2 . And only when I redefined both methods (and hashCode() , and equals() ), size became equal to 1 .

No matter how much I wander through the HashSet source, I can’t fully understand how it still works.

What is the idea of ​​the HashSet class in identifying what to consider as a duplicate and what is not?

  • If you look at the HashSet sources, you will see that the data in it is stored in the HashMap , therefore, HashSet works exactly the same as HashMap with the correction that Object used as a value for all objects. I’ll write the answer now, but I’ll have to wait a bit, since this topic cannot be explained in two words. - post_zeew
  • Yes, I saw that there HashMap is used climbed there and finally got confused ((. I would be very grateful if you find time for a full answer. - Pavel

2 answers 2

To understand why such results are obtained, you need to understand how the hash table works.

For the equals and hashCode methods a specific contract is established:

  • if the elements are equal, i.e. equals returns true , then the hashCode value for these objects must match.
  • if the hashCode value for objects matches, then this does not mean that equals will return true for them, i.e. objects do not have to be equal to each other, i.e. collisions are possible.

And so, consider a hash table with bucket'ами . It consists of an array, in each cell of which a list of elements is stored.

When we add an item, the cachecode is calculated for it, then the cell index is determined by where the list containing the item should be stored:

 index = hashcode % table_size 

where table_size is the size of the table.

The index gets a list, the equals method checks whether the given element is in the list, if not, it is added.
Similarly, the operation is performed remove , contains .

Now consider each case separately:

  1. equals and hashCode not redefined , which means that equals will return true only if the links are equal, and hashCode can be equal or not. The size, regardless of the hashCode value, will be 2.
  2. equals and hashCode redefined , then we will have the same hash value, we will get into the same table cell, equals will determine that the object is already in the list, respectively, the size will be equal to 1.
  3. equals redefined , and hashCode redefined . In this case, the cell index will be the same, but the list will not show the same element and therefore the size will be equal to 2.
  4. equals redefined , and hashCode redefined . It depends on how the value is generated for hashCode in the Object class. If the values ​​are the same, the list will be the same, and accordingly, the number of elements in the table will be 1. If different, then the search will take place in different lists, and no duplicates will be detected, then the size will be equal to 2.
  • >>> And so, consider a hash table with buckets <<< And what is a bucket? - Pavel
  • one
    This is a list, just taken so called. there is still a hash table with open indexing, but they are not used in java - Artem Konovalov
  • >>> equals is not redefined, and hashCode is redefined. In this case, the cell index will be the same, but the list will not show the same element and therefore the size will be 2. <<< but why, if the indices match, does the value have to be overwritten by the same table? - Pavel
  • in this case, we get the same index. further we get the list of elements from the table stored at the given index. using the equals method, we are trying to find the same element, but we will not find a duplicate, because if the equals are not redefined, only the links are compared, respectively, the size will be equal to 2 - Artem Konovalov
  • but where does this value go if its index is already taken, because the index can be occupied either by one or the other, isn't it, and we have 2 applicants for the same place, how can that be? - Pavel

( be sure to read the last paragraph related to the changes in JDK 8 )

How does HashSet work

Consider the source code of the boolean add(E e) method of the HashSet class:

 public boolean add(E e) { return map.put(e, PRESENT)==null; } 

This shows that when an element is added to the HashSet , this element is added to the map , which is an object of the HashMap class:

 private transient HashMap<E,Object> map; 

The key used here is the object passed to the add(...) method, and the object of the Object class is used as a value:

 // Dummy value to associate with an Object in the backing Map private static final Object PRESENT = new Object(); 

From here we conclude that HashSet works exactly the same way as HashMap with the correction that some value stub is used as the value - the same Object class.

How HashMap Works

HashMap works on the principles of hashing, due to which access to elements by key is achieved at best in constant time - O(1) .

HashMap stores ключ-значение type elements in an array:

 transient Node<K,V>[] table; 

where Node<K,V> is an nested class with the following structure:

 static class Node<K,V> implements Map.Entry<K,V> { final int hash; final K key; V value; Node<K,V> next; ... } 

As you can see, the Node<K,V> stores the hash, the key, the value and the link is not the next element (I will discuss this field further).

In hashing theory, there is such a thing as a collision - this is a phenomenon when the same hash code is obtained for different objects.

In Java, a hash code is of type int , therefore, the set of hash codes is limited to a set of values ​​of type int - [-2147483648;2147483647] .

The set of objects is limited only by your imagination, therefore, it is possible that different objects will have the same hash code.

In short, the process of adding a key-value pair to a HashMap as follows:

  1. If key == null , then the putForNullKey(...) method is putForNullKey(...) into which the value is passed. This method adds key-value to the zero position of the array;
  2. If key != null , then the hash code of the key object is calculated, which is passed to the hash(...) method:

     static int hash(int h) { h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } 
  3. The hash code obtained from the hash(...) method is used to calculate the index of the array according to which you need to place this key-value pair:

     static int indexFor(int h, int length) { return h & (length-1); } 
  4. If the resulting index in the array does not contain anything, then this pair is placed on this index.

  5. If there is already a pair (or pairs) on the resulting index, then a sequential bypass of all pairs is performed, checking the equality of the hash code and the key of the added pair with the corresponding values ​​of the pairs already in this basket. For comparison, the hash code is used == , and for comparison of the key == or key.equals(...) . If these parameters match, the element value is overwritten.

Thus, each element of the array (basket, bucket ) is a linked list ( linked list ), in which elements are sequentially stored.

From here it becomes clear that:

  • In the best case (when there is at most one element in one basket), the value by key can be obtained as O(1) ;
  • In the worst case (when all items are in the same basket, and the item you are looking for is the last one in the list), you can get the value by key for O(n) (since you have to go round all the items in the list).

One very important note

  • In JDK 7 and below, the linked list is used to store pairs in the same basket;

  • In JDK 8, a balanced tree is used for this purpose, therefore, in the worst case, the key value can be obtained as O(log n) .

It is possible that there are some inaccuracies in the text above, but in general, the HashMap algorithm works just like this.

I will not describe the answer to your immediate task, as I agree with the answer of my colleague.