Write an analog of the native function array_unique without using the built-in PHP functions for working with arrays. It is necessary to write optimally, because up to 1,000,000 items can be in the incoming array

My version:

// аналог встроенной ф-ции in_array function myInArray($needle, array $haystack, $strict = false) { foreach ($haystack as $item) { if ($strict ? ($needle === $item) : ($needle == $item)) { return true; } } return false; } function myArrayUnique(array $array) { $unique_array = array(); foreach ($array as $item) { if (!myInArray($item, $unique_array)) { $unique_array[] = $item; } } return $unique_array; } echo '<pre>'; print_r(myArrayUnique($array)); echo '</pre>'; 

Why is not my option (from the comment)

My method will not save keys for example in working with an array:

 $arr = array("a" => "green", "red", "b" => "green", "blue", "red"); 

The result should be:

Array ([a] => green [0] => red [1] => blue).

And my code version will return:

Array ([0] => green [1] => red [2] => blue)

  • one
    At a minimum, when declaring parameters, use & $ array; otherwise, every time the functions are called, the array is completely copied and something tells me that the search for “whether” will be faster if it is right in the unique function, and not rendered separately. And then the question is how much is expected doubles. Perhaps it will be faster to make a parallel hash in which to reflect which elements already existed. searching by key in the hash is much faster than searching through the array. - Mike
  • And the question is what? You needed a method, you wrote it. - Dmitriy Simushev
  • my method will not save keys for example in working with an array: $ arr = array ("a" => "green", "red", "b" => "green", "blue", "red"); The result should be: Array ([a] => green [0] => red [1] => blue). And my code version will return: Array ([0] => green [1] => red [2] => blue) - Eugene

1 answer 1

If the problem is only in the keys, here is a slightly modified code that will return the keys you need (the myArrayUnique function):

 foreach ($array as $key => $item) { if (!myInArray($item, $unique_array)) { $unique_array[$key] = $item; } } 

I also wrote another implementation of the necessary function and did a small performance test for an array with 1000000 elements:

 <?php $array = array(); $size = 1000000; $min = 1; $max = 100; for ($i = 0; $i < $size; $i++) { $array[] = rand($min, $max); } function myInArray($needle, array $haystack, $strict = false) { foreach ($haystack as $item) { if ($strict ? ($needle === $item) : ($needle == $item)) { return true; } } return false; } function myArrayUnique(array $array) { $unique_array = array(); foreach ($array as $key => $item) { if (!myInArray($item, $unique_array)) { $unique_array[$key] = $item; } } return $unique_array; } function myArrayUnique2(array &$array) { $uniqueArray = array(); $usedItems = array(); foreach ($array as $key => $item) { if (empty($usedItems[$item])) { $uniqueArray[$key] = $item; $usedItems[$item] = true; } } return $uniqueArray; } $start = microtime(true); $result = myArrayUnique($array); $timeElapsedSecs = microtime(true) - $start; echo "Test 1: {$timeElapsedSecs} sec \n"; $start = microtime(true); $result = myArrayUnique2($array); $timeElapsedSecs = microtime(true) - $start; echo "Test 2: {$timeElapsedSecs} sec \n"; 

On my system, the result was:

 $ php test.php Test 1: 10.178105831146 sec Test 2: 0.10400199890137 sec 

Those. The acceleration is about 100 times compared to your option. The optimizations used in my version have already been indicated to you in the first comment.

  • It helped a lot, thanks)) I just haven't quite understood how this part works if (empty ($ usedItems [$ item])) {$ uniqueArray [$ key] = $ item; $ usedItems [$ item] = true; } - Eugene
  • The $ usedItems simply fit the already used members of the original array. The array takes the form [item1 => true, item2 => true, item3 => true, .., itemN => true]. Those. the members used are indices of this array, and therefore we can easily check their presence without cycles by this check: empty($usedItems[$item]) . If there is no element in the array of used values ​​(the condition is false ), then we add it. The answer can also be noted useful if it helped you;) - Gino Pane