The point of using HashMap when there is a risk of collisions

Question

Hello,

I recently tried to use HashMap as follows: there is a register consisting of 2 bytes that is the address and the contents of this register of 2 bytes.

For example, at the address 0xF1 0x15 (low, high byte) the register contains data 0x12 0x45 (low, high byte).

Because the address consists of two bytes, then just maybe 256 * 256 = 65536 registers. Using HashMap create a database of the following form:

  HashMap<String,Data2BytesCell> mDataBase = new HashMap<>(); mDataBase.put( Byte.toString( (byte) 0xF1 ) + Byte.toString( (byte) 0x15 ), new data2BytesCell((byte) 0x12, (byte) 0x45 );

I create the key as a string of two bytes, here in particular "F115". Using the same key, I similarly extract the register value.

As a result, I created 65536 elements, but when I started checking these elements for collisions, there were a lot of them. Those. the register must contain one value at the given address, and it displays a completely different one.

What is the point of using such an unreliable repository as a HashMap when such a high probability of collisions? Maybe I do not use it either? Or use for other purposes?

Thank you in advance for all the answers.

For specifics, I will give a more detailed code with methods:

  // в классе создается объект базы данных HashMap<String,Data2BytesCell> mDataBase = new HashMap<>(); // Описание метода записи данных по адресу. // Данные заносятся в базу по данному адресу // (является ключом). public void setData(byte lBAddress, byte hBAddress, byte lBData, byte hBData){ mDataBase.put(Byte.toString(lBAddress) + Byte.toString(hBAddress), new Data2BytesCell( lBData, hBData ) ); } // Данные считываются с определенного адреса (ключа). // Возвращает массив из 2 байт (данные) public byte[] getData(byte lBAddress, byte hBAddress){ Data2BytesCell data2BytesCell = mDataBase.get( Byte.toString(lBAddress) + Byte.toString(hBAddress) ); return data2BytesCell.getData(); }

I will also give a method for checking on collisions:

  public void validateCollisions(){ // Здесь в данные data2BytesCell в 2 байта записываются значение // адреса. Т.е. например в регистр с адресом 0х01 0х02 (ключ "0102") // записывается значение 0х01 и 0х02. // Теперь когда я захочу считать значение по адресу 0х01 0х02 // (ключ "0102") я должен на выходе получить массив из двух элементов // 0х01 0х02. Аналогично заполняется регистры для остальных 256х256 // адресов. for (int i = 0; i < 256; i++){ for (int j = 0; j < 256; j++){ Data2BytesCell data2BytesCell = new Data2BytesCell( (byte) i, (byte) j ); mDataBase.put( Byte.toString( (byte) i ) + Byte.toString( (byte) j ), data2BytesCell); } } // Здесь идет проверка берем регистр с адресом 0х01 0х02 // (ключ "0102") извлекаем с помощью get извлекаем массив из двух // байт если первый элемент равен 0х01, а второй 0х02, то все норм, // если нет, то выводим ошибку коллизия. for (int i = 0; i < 256; i++){ for (int j = 0; j < 256; j++){ Data2BytesCell data2BytesCell = mDataBase.get(Byte.toString( (byte) i ) + Byte.toString( (byte) j)); byte lbHb[] = data2BytesCell.getData(); if (lbHb[0] != (byte)i || lbHb[1] != (byte)j ){ Log.e(TAG, "Collision in: i = " + Integer.toString(i) + " j = " + Integer.toString(j) + "/n" + "lbHb[0] = " + Byte.toString(lbHb[0]) + "lbHb[1] = " + Byte.toString(lbHb[1]) ); } } }

you have a problem in key generation, the pairs (11, 1) and (1, 11) give the same string 111, etc.
@zRrr, brilliant, thank you so much, now I understand why I have problems with the String key, and Integer has all the rules.

Mstislav Pavlov Mstislav Pavlov 6,439 14 silver marks 36 bronze marks · Answer 1 · 2016-07-10T19:31:42

The meaning of Hashtable is that the time complexity of add, delete, search operations on average require O (1). When inserting into a hash table of 365 cells of just 23 elements, the probability of a collision already exceeds 50%. Therefore, as a rule, there are mechanisms for resolving conflicts. I do not know what problem you are solving, so I can’t say if you chose the right data structure. I can recommend you to read about data structures.

Mstislav Pavlov

6,439 14 silver marks 36 bronze marks

Thank you very much for the answer. I read about the structure, I realized that they HashMap used to break a large amount of data into groups of LinkedList-ы , thereby simplifying the task of finding data. I read about these mechanisms for overcoming collisions, but they boil down to rewriting the HashCode and Equals methods. Then immediately there are 2 questions why it was impossible to immediately create the correct HashMap structure without collisions? - foxis
And why use a structure in which you know in advance that a collision is possible with a 1% probability, which means the program automatically becomes unreliable, is it not easier to use other structures that are 100% reliable? - foxis
3
What do you mean by "collisions"? Within HashMap collision is when two keys have the same hash. Collision slows down access to data stored on these keys, but the data itself is saved. The worst thing that can happen is to slow down data access to O (n), which is possible only if all keys have the same hash. You write about HashMap as "unreliable" storage, and it seems that you have lost data there. Add a code to the question with which you add data and check it for "collisions" to make your problem clearer. - fori1ton
3
The real task of HashMap is not to “split” the data, but to store them in the form of “key -> value” pairs and a quick search for such pairs. If values are written to one key several times, only the last one is saved. But this is not a sign of the "unreliability" of the structure, it is its direct purpose, described in the documentation. - fori1ton
@ fori1ton, thank you very much for the answer. Honestly, I thought that a collision is when one value for example "1" under one key is written down, and the other "2" under another key. And it so happened that the hash codes of these keys became the same. As a result, the compiler randomly selects one of them. For example, under the key "first" I want to get 1, and he gives me (because the hash codes match) gives 2 (which has a different key "second"). Isn't it called a collision? - foxis

|

PinkTux PinkTux one · Answer 2 · 2016-07-10T21:59:38

We test your code of check on collisions. Slightly simplifying it:

 import java.util.HashMap; public class Main { public static void main(String[] args) { HashMap<Integer, Integer> mDataBase = new HashMap<>(); for (int i = 0; i < 256; i++) { for (int j = 0; j < 256; j++) { int key = i * 256 + j; mDataBase.put(key, key); } } int errors = 0; for (int i = 0; i < 256; i++) { for (int j = 0; j < 256; j++) { int key = i * 256 + j, expect = key; int val = mDataBase.get(key); if (val != expect) { errors++; System.out.format("Error: key %s, expect %d, got %d\n", key, expect, val); } } } System.out.format("Errors: %d\n", errors); } }

Conclusion:

 Errors: 0

And nothing changes if the key is made according to your recipe.

The point of using HashMap when there is a risk of collisions

2 answers 2

More articles: