It is required to generate unique non-consecutive n-digit codes from consecutive integers. Imagine hashes, serials, or coupon numbers. An integer id from 0 to K corresponds to a strictly one n-digit code. And it is important that the successive codes differ greatly, and not by one or two characters.

Example:

0 KXBR6Z 1 8FLWGG 2 PAZT73 

Due to the collisions and brevity of the required codes, popular hash functions are not suitable. Crypt resistance is not required. If the villains solve the algorithm, even if all the codes are printed. It is done for beauty. Although, it is also interesting to consider what the difficulty of guessing the algorithm in X on the hands on the hands depends on!

Question: tell the algorithm / function f(i) = s for the correspondence between the sets 0..K and n-digit ABC123XYZ . The inverse function would also be interesting: f1(s) = i to get an integer from the code, or to find out that the code is invalid.

Surely, the task is not new, but did not figure out how to find the answer. Requires human-broadcast. )

PS I am looking for exactly "mathematics", an algorithm, and not a way to score unique values ​​in the database, each time you create a new one, checking the uniqueness.

  • And what prevents juzat hexadecimal number system, leaving zero orders? Unique values, visually shorter than the decimal, you can pack more values ​​in the same number of orders - neoascetic
  • Question:> And what prevents juzat 16-ranks number system, leaving zero orders? Answer:> The only thing - successively running codes differ little - Specter
  • And if this hexadecimal numbering system after this is still something to encrypt, then there will be a strict correspondence between n and f (n), and, with the right choice of algorithm, the results will vary greatly + complexity when hacking. - Alexey Lobanov


4 answers 4

Substitution ciphers work well with this task. The first thing you need to do is expand the alphabet (go from numbers to letters, where several letters correspond to a single number). If it is very important for you to hide the original values, then it makes sense to pre-encrypt the data with block or asymmetric encryption.

I added a code example with the substitution, the result:

 input/encoded/decoded - 123 / efg / 123 input/encoded/decoded - 456 / hij / 456 input/encoded/decoded - 789 / abc / 789 input/encoded/decoded - 012 / def / 012 input/encoded/decoded - 1157232188 / eoiafgpybl / 1157232188 

code itself:

 import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; public class Substitution { private static final int NUMBER_COUNT = 10; private static final int LETTER_COUNT = 26; private static final int ZERO_ASCII_CODE = 48; private static final int A_CHAR_ASCII_CODE = 97; // substitution tables private static final Map<Character, List<Character>> numToChar = new HashMap<Character, List<Character>>( NUMBER_COUNT); private static final Map<Character, Character> charToNum = new HashMap<Character, Character>( LETTER_COUNT); // init tables static { // add list containers for (int i = 0; i < NUMBER_COUNT; i++) { numToChar.put(Character.valueOf((char) (ZERO_ASCII_CODE + i)), new LinkedList<Character>()); } // init substitution table for (int i = A_CHAR_ASCII_CODE; i < A_CHAR_ASCII_CODE + LETTER_COUNT; i++) { // number = (symbol ascii code) mod 9 Character num = Character .valueOf((char) (i % NUMBER_COUNT + ZERO_ASCII_CODE)); Character ch = Character.valueOf((char) i); numToChar.get(num).add(ch); charToNum.put(ch, num); } } public static void main(String[] args) { System.out.println("dumping straight substitution table"); for (Character num : numToChar.keySet()) { System.out.println("num to char - " + num + ", replacements = " + numToChar.get(num)); } System.out.println(); System.out.println("dumping reverse substitution table"); for (Character ch : charToNum.keySet()) { System.out.println("char to num - " + ch + ", replacements = " + charToNum.get(ch)); } // test String[] nums = new String[] { "123", "456", "789", "012", "1157232188" }; String r = null; for (int i = 0; i < nums.length; i++) { r = decode(nums[i], true); System.out.println("input/encoded/decoded - " + nums[i] + " / " + r + " / " + decode(r, false)); } } public static String decode(String input, boolean direction) { // result buffer StringBuilder b = new StringBuilder(input.length()); // ensure all latters are in lower case String data = input.toLowerCase(); // store indexes (determines which letter to use when coding a number) Map<Character, Integer> indexes = new HashMap<Character, Integer>(); // encode string for (int i = 0; i < input.length(); i++) { b.append(decode(indexes, data.charAt(i), direction)); } return b.toString(); } // straight - number to letter // reverse - letter to number private static Character decode(Map<Character, Integer> indexes, char ch, boolean direction) { // convert character Character character = Character.valueOf(ch); if (direction) { // get list List<Character> list = numToChar.get(character); // get index to use Integer index = indexes.get(character); if (null == index) { index = Integer.valueOf(0); } // update index map int next = index.intValue() + 1; if (next == list.size()) { next = 0; } indexes.put(character, Integer.valueOf(next)); return list.get(index); } return charToNum.get(character); } } 

This example is strictly "sharpened" by numbers and letters, but, by analogy, you can make code that will make substitutions for individual characters and / or their sequences

    • As one of the bad options, which, nevertheless, solves the problem, I can suggest the following approach (it is bad for the reasons described below):
    1. Encode the original coupon number N in the Base36. coding Base36.
    2. Add missing zeros to the beginning in order to get the desired code length.
    3. Perform any bit conversions so that the codes for N and N + 1 do not look similar, preserving the bijectivity and effective computability of the inverse function.
    • By itself, the formulation of the problem is a bit strange - it is required to find a bijective function for which there is an effectively computable inverse function (“in order to get the whole from the code”). In this setting, any coupon or serial number issued to the user is automatically vulnerable to reversing.

    If the benefit obtained from such a coupon / serial / ... is sufficient, then there will surely be a person who can find the opposite function and, therefore, be able to operate the system.

    • If, however, we remove the restriction on the effective computability of the inverse function, then the practical task loses its meaning, since it becomes an order of magnitude more difficult to organize a validity check for an arbitrarily taken coupon code.

    It is necessary again to somehow compare hashes of coupon codes, exclude collisions, etc. It seems to me that for codes of the type KXBR6Z it can be proved that with an acceptable from a practical point of view, the probability of this will not work. Although, perhaps, here I am missing something obvious. [?]

    • Generally speaking, the correct approach to a problem of this kind (generation of a unique set of coupons in the database and operations on this database) has a whole set of advantages compared to a solution that uses some predefined function.

    Try to imagine, for example, how to affix a coupon received with the help of a certain function, the status of revoked or invalid . Or, say, allow you to use a coupon twice.

    • From references that can be useful:

      We have two tasks: to generate a unique number for each coupon. This is called perfect hash. The second task is to make each coupon have a number from 0 to N-1. How to solve both of these tasks with one algorithm, I can’t imagine, but separately - completely.

      The coupon code consists of 6 characters, each of which can have one of 36 values. That is, the code is a six-digit number in the 36-number system! Fortunately, this number fits into 32 bits. You can check on the calculator that 36 ^ 6 <2 ^ 32.

       unsigned to_hash(const char* coupon) { unsigned hash = 0; char c; for( int i = 0; i < 6; ++i ) { c = coupon[i]; if( c < 'A' ) { c -= '0'; } else { c -= '7'; } hash *= 36; hash += c; } return hash; } 

      Now the second task is the numbering of these hashes and the search for the hash number by the hash itself. It is very simple if you pre-compose an array of coupon hashes in ascending order and determine the index by binary search. If the index is not found, then the coupon is incorrect. If it is impossible to create an array of hashes in advance, the task becomes more complicated, but only slightly. You will need to use an indexed list and sort by inserts. But that's another story.

      • It also requires that the consecutive coupon numbers do not look like the latest. Those. 000001, 000002, 000003 - no ice. - Sergiks
      • That is, coupons ABCDE1 and ABCDE2 should turn into very different numbers? It is enough to mix the hash bits and / or XOR. Obviously, there will be no collisions. Here's the first thing that came to mind: hash = hash << 22 & (hash & 0x3ffc00) & hash >> 22; hash ^ = 0x811c9dc5; - VadimTukaev

      Without collisions it is impossible (if a hash of constant length), on the general data set. For a sequence of numbers than md5 with salt and the selection of the part of the hash of the required length is not suitable?

      upd: If the algorithm is on the server side, take a simple one: a series, if there is one, and your unique salt. Check, you will again perform on the server, what is the difference from the function. Example:

       Серия: 101010 Код: e25019 

      For them on the server:

       Серия:101010 Соль(приватные данные): Yap! Md5(101010Yap!) = e2501912913cd181d4199bcea5dd401c Выделил 6 символов. Проверил. И ты не зависишь от коллизий. 

      The only negative is that the salt will leak, or it will be too small in length (brute force) - consider that you have to change the algorithm (change the salt / edit the security settings, etc.).

      • The fact of the matter is that what is needed is not a hash, but a seemingly similar function. - Sergiks
      • md5(n) => x; need a function that would do: f(x) => n , but with md5 this is not a ride - Specter
      • one
        If md5 hash by itself [guarantees practical uniqueness,] [1] then this part of this guarantee does not apply to its individual part. Proving can be trivial, taking, for example, as the length of the part of the hash unit. [1]: stackoverflow.com/questions/2444321/… - Costantino Rupert
      • @ Kotyk_khokhet_kusat, there is a share of the likelihood of hash collisions, but there is a binding not only to the code (hash), but also to the serial number, which guarantees uniqueness on the set {0; K} of integers, not repeating numbers. Example. Take 1 as the length of the md5 hash. We get the sets: (1Yap!; E); (2Yap!; 7) ... Even if the hash is repeated somewhere, it will in any case be a unique set, provided that the serial number is not repeated and goes on increasing. In this case, a validation check (even a digital signature of the number can be said) is possible. - vv2cc
      • @ vv2 - We, in the context of this task, do not consider a pair {что-то, фрагмент хэша} , but simply {фрагмент хэша} . That is, if, say, to consider the task of issuing coupons or serials, the idea is that the corresponding coupon number can be printed on paper and that all such coupons are practically unique. - It is clear that for the case with the md5 fragment of length 1, there is no need to talk about uniqueness. Or am I missing something? :) - Costantino Rupert