Greetings. I am trying to solve this training problem for quite a long time, but I have never thought of a suitable algorithm.

Task.

Input data:

  1. Number N (1 <= N <= 10 ^ 5) - the number of numbers in the sequence;
  2. N natural numbers Hi (1 <= Hi <= 10 ^ 5), written through space.

Output:

The length of the longest sequence of numbers, going in ascending order, which can be obtained from the original one, removing some numbers from it, and keeping the order of the rest, and so the numbers are arranged in ascending order.

Example.

Input data:

10

4 3 6 7 9 5 2 8 7 3

Output:

4

In this case, there are several options:

  1. We leave only the numbers number 1, 3, 4, 5 (these are numbers 4, 6, 7, 9) and remove all the others. The length of the resulting sequence is 4, 6, 7, 9 - 4.
  2. In another case, we can leave the numbers number 2, 3, 4, 8 and remove all the rest. Then the length of the resulting sequence 3, 6, 7, 8 is also 4.

My solutions that need correction:

  • Search for the maximum number in the sequence, then search for the maximum in the part of the sequence that lies to the left of the first maximum, and so on, until the sequence ends. The algorithm does not work in such cases: 10 1 2 3 4 5 6 7 8 9 , according to this method, the length will be equal to 1, although in fact it should be 9.
  • Decision using trees. It leads to a huge number of combinations for the search with a large length of the sequence.
  • Various brute force solutions. Form a huge number of options that can not be bruised.

I would be grateful for the help! I can not think of the algorithm itself.

  • one
    it is not so solved. e-maxx.ru/algo/longest_increasing_subseq_log somewhere on the SO seem to be answered even. And the tree is the right option - pavel
  • @vp_arth and sense to find increasing chains? There may be none at all, for example: 1 0 2 0 3 0 4 . Here you need to remove the zeros and then we get the sequence 1 2 3 4 . And if we look for increasing chains, we will find 0 2 ; 0 3 and 0 4 , which is not at all correct. - Nik
  • Misunderstood the task, weir) - vp_arth
  • ready! ........ - Yuri Negometyanov

2 answers 2

Optimized solution in full statement

04.02.17

Formulation of the problem

This task can be used as a simple illustration of the methods for the secondary processing of a radar signal used to combine single target marks into tracks (trajectories). The success of such methods is determined by the choice of decision rules for setting, tracking and dumping tracks.

Decision rules

  1. The plot of the new track

Each new element of the original array generates new tracks. In this case, one track appears "from scratch", and the rest appear as duplicates of the tracks continued by this mark.

  1. Track maintenance

Accompanying the existing track is binding new points to it. If the new point has a greater value than the last element of the track, it should be added to the end of the track, and a duplicate of the old track should be used to tie the new trajectory (see item 1).

  1. Reset track

The basis for resetting a track is the presence of a track of the same length, having a smaller or coincident last element. To implement this rule, you must first calculate the array $last , which stores the least recent elements among the tracks of the same length. When one of the optimal tracks is found, the corresponding value of the $last array is decremented (this allows you to leave the first array among the equal ones and thus save resources).

About the program

The program is written in PHP 7.0 (memory-limit = 512M), this was enough to solve the problem in its original formulation.
The decision rules are implemented by the tracking function. The standard array_walk function array_walk used to encapsulate data and can be replaced with a foreach .

Program text:

 $issue = array(4,3,6,7,9,5,2,8,7,3); print("<b>The longest increasing subsequence</b><br>"); function print_1d($text,$arr){ print $text; foreach($arr as $key=>$item) print "$item "; } function print_1dk($text,$arr){ print $text; foreach($arr as $key=>$item) print "$key=>$item&emsp;"; } function print_longest_track($tracks, $prn=1){ $track = reset($tracks); if($track === false) return; $len = count($track); if($prn){ print_1dk("<b><br>maximal track long is:</b> $len<br><b>the first longest track is:</b>&emsp;", $track); }else{ print("<b><br>maximal track long is:&emsp;</b> $len"); } } function print_track($issue, $key, $track){ print_1dk("<br><b>track#$key:&emsp;</b> ",$track); } function tracking($issue, $prn = 1){ $len = count($issue); $arr = array_count_values($issue); $digs = count($arr); print"<br><b>tracking:<br>&emsp;</b>length = $len<br>&emsp;digs_all = $digs"; if($prn>0) print_1d("<br>&emsp;issue:&emsp;",$issue); $tracks = []; array_walk($issue, function($item, $key) use(&$tracks){ $old = $tracks; // copy of $track foreach($old as $k=>$track){ if($item > end($track)){ $tracks[$k][$key] = $item; $tracks[] = $track; // track duplication } } $tracks[] = [$key=>$item]; // new track $last = []; // array of minimal last tracks' elements foreach($tracks as $k=>$track){ $cnt = count($track); if(array_key_exists($cnt, $last)){ $last[$cnt] = min($last[$cnt], end($track)); }else{ $last[$cnt] = end($track); } } $old = $tracks; foreach($old as $k=>$track){ if(end($track) > $last[count($track)]){ unset($tracks[$k]); }elseif(end($track) == $last[count($track)]){ $last[count($track)]--; } } }); return $tracks; } $tracks = tracking($issue); foreach($tracks as $k=>$track) print_track($issue, $k, $track); print_longest_track($tracks); $dim = 50; $dig = 20; $issue = []; for($i=0; $i<$dim; $i++) $issue[] = mt_rand(1,$dig); print "<br><br><b>Model parameters:</b><br>&emsp;dim = $dim<br>&emsp;dig = $dig"; $start_time = microtime(true); $tracks = tracking($issue); $time = microtime(true) - $start_time; foreach($tracks as $k=>$track) print_track($issue, $k, $track); print_longest_track($tracks); print"<br>time=$time s"; $dim = 100000; $dig = 100000; $issue = []; for($i=0; $i<$dim; $i++) $issue[] = mt_rand(1,$dig); print "<br><br><b>Model parameters:</b><br>&emsp;dim = $dim<br>&emsp;dig = $dig"; $start_time = microtime(true); $tracks = tracking($issue, 0); $time = microtime(true) - $start_time; //foreach($tracks as $k=>$track) print_track($issue, $k, $track); print_longest_track($tracks, 0); print"<br>time=$time s"; 

Results:

 The longest increasing subsequence

 tracking:
 length = 10
 digs_all = 8
 issue: 4 3 6 7 9 5 2 8 7 3 
 track # 7: 1 => 3 2 => 6 3 => 7 7 => 8 
 track # 14: 1 => 3 2 => 6 3 => 7 
 track # 19: 6 => 2 9 => 3 
 track # 21: 6 => 2 
 maximal track long is: 4
 the first longest track is: 1 => 3 2 => 6 3 => 7 7 => 8 

 Model parameters:
 dim = 50
 dig = 20
 tracking:
 length = 50
 digs_all = 20
 issue: 13 2 4 13 8 2 11 8 1 8 17 14 18 20 11 3 2 6 2 6 5 7 7 5 5 10 20 6 6 5 4 13 10 19 20 6 9 12 6 16 13 13 13 10 4 1 9 9 15 6 
 track # 159: 8 => 1 16 => 2 20 => 5 27 => 6 36 => 9 37 => 12 41 => 13 48 => 15 
 track # 200: 8 => 1 16 => 2 20 => 5 27 => 6 36 => 9 37 => 12 41 => 13 
 track # 201: 8 => 1 16 => 2 20 => 5 27 => 6 36 => 9 43 => 10 
 track # 202: 8 => 1 16 => 2 20 => 5 27 => 6 36 => 9 
 track # 203: 8 => 1 16 => 2 20 => 5 27 => 6 
 track # 208: 8 => 1 16 => 2 30 => 4 
 track # 209: 8 => 1 16 => 2 
 track # 210: 8 => 1 
 maximal track long is: 8
 the first longest track is: 8 => 1 16 => 2 20 => 5 27 => 6 36 => 9 37 => 12 41 => 13 48 => 15 
 time = 0.0058760643005371 s

 Model parameters:
 dim = 100,000
 dig = 100000
 tracking:
 length = 100000
 digs_all = 63147
 maximal track long is: 614
 time = 2636.3067319393 s

Conclusion

The problem can be solved in full statement when using resources within reason.

Simplified Problem Statement

The task can be significantly simplified by looking for a sequence of numbers.

Benefits:

  1. You can use strings.
  2. It is possible to organize a cycle not by the elements of a given array, but by increasing sequences of digits. In this case, we are satisfied with the first occurrence of the required digit in the data line.

PHP program:

 $issue = "4367952873"; function cons($str, $digit=0, $offset=0){ $result = ""; for($dig = $digit; $dig <10; $dig++){ $pos = strpos($str, "$dig", $offset); if($pos === false) continue; $res = "$dig"; if(($dig < 9) && ($pos < strlen($str)-1)) $res .= cons($str, $dig+1, $pos+1); if(strlen($res) > strlen($result)) $result = $res; } return $result; } $result = cons($issue); print"issue = $issue&emsp;result = $result"; $issue = ""; for($i=0; $i<10000; $i++) $issue .= mt_rand(0,9); $start_time = microtime(true); $result = cons($issue); $time = microtime(true) - $start_time; print"<br>issue = $issue<br>result = $result&emsp;time=$time s"; 

Results:

 issue = 4367952873 result = 3678
 issue =
 result = 0123456789 time = 0.0039651393890381 s

For N = 100,000, the counting time is the same. The calculation of the length of the sequence of difficulties is not.

UPD

Taking into account the clarification of the concept of numbers, the following PHP program demonstrates the limit of possibilities of this approach:

 $issue = array(4,3,6,7,9,5,2,8,7,3); $dig_min = 1; $dig_max = 99; function cons($arr, $digit, $offset=0){ global $dig_max, $level; $level++; $ar = array_slice($arr, $offset); $result = ""; for($dig = $digit; $dig < $dig_max; $dig++){ $pos = array_search($dig, $ar); if($pos === false) continue; $res = "$dig "; if(($dig <= $dig_max) && ($pos < count($ar)-1)) $res .= cons($ar, $dig+1, $pos+1); if(substr_count($res, " ") > substr_count($result, " ")) $result = $res; } if($level < 3) print"<br>".str_repeat("&emsp;", $level)."level = $level&emsp;digit = $digit&emsp;result = $result"; $level--; return $result; } function print_1d($text,$arr){ print $text; foreach($arr as $item) print $item; } $level = 0; print_1d("<br>issue:",$issue); $result = cons($issue,$dig_min); print"&emsp;result = $result"; $level = 0; $issue = []; for($i=0; $i<100; $i++) $issue[] = mt_rand(0,99); print_1d("<br>issue:",$issue); $start_time = microtime(true); $result = cons($issue, $dig_min); $time = microtime(true) - $start_time; print"&emsp;result = $result"; print"<br>result = $result&emsp;time=$time s"; 

The results are modest:

 issue: 4367952873
 level = 2 digit = 3 result = 3 
 level = 2 digit = 4 result = 6 7 8 
 level = 2 digit = 5 result = 6 7 8 
 level = 2 digit = 6 result = 7 
 level = 2 digit = 7 result = 7 8 
 level = 2 digit = 8 result = 8 
 level = 2 digit = 9 result = 
 level = 2 digit = 10 result = 
 level = 1 digit = 1 result = 3 6 7 8 result = 3 6 7 8 
 Issue: 403997771521881192526498019529069301665217591939759236017141530407174576376572273129222349960150686099207835908279732551733762575760150008858851526388588AAAAAAAAAAAAAAAAI shouldAtAndT!
 level = 2 digit = 2 result = 9 14 15 17 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 3 result = 7 17 21 51 57 60 76 78 81 92 
 level = 2 digit = 4 result = 7 51 57 60 76 78 81 92 
 level = 2 digit = 8 result = 17 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 10 result = 11 14 15 17 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 11 result = 12 41 60 61 73 
 level = 2 digit = 12 result = 14 15 17 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 13 result = 17 21 51 57 60 76 78 81 92 
 level = 2 digit = 15 result = 15 17 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 16 result = 17 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 17 result = 21 23 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 18 result = 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 20 result = 21 23 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 21 result = 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 22 result = 23 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 23 result = 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 24 result = 27 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 28 result = 31 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 29 result = 61 73 
 level = 2 digit = 31 result = 39 40 45 50 51 57 60 76 78 81 92 
 level = 2 digit = 32 result = 32 35 51 57 60 76 78 81 92 
 level = 2 digit = 33 result = 35 51 57 60 76 78 81 92 
 level = 2 digit = 35 result = 35 51 57 60 76 78 81 92 
 level = 2 digit = 36 result = 51 57 60 76 78 81 92 
 level = 2 digit = 38 result = 50 51 57 60 76 78 81 92 
 level = 2 digit = 40 result = 40 45 50 51 57 60 76 78 81 92 
 level = 2 digit = 41 result = 52 64 65 75 76 78 82 87 88 92 
 level = 2 digit = 42 result = 60 61 73 
 level = 2 digit = 43 result = 47 60 61 73 
 level = 2 digit = 46 result = 50 51 57 60 76 78 81 92 
 level = 2 digit = 48 result = 60 61 73 
 level = 2 digit = 51 result = 51 57 60 76 78 81 92 
 level = 2 digit = 52 result = 57 60 76 78 81 92 
 level = 2 digit = 53 result = 64 65 75 76 78 82 87 88 88 92 
 level = 2 digit = 56 result = 57 60 76 78 81 92 
 level = 2 digit = 58 result = 60 76 78 81 92 
 level = 2 digit = 61 result = 65 68 78 82 87 88 92 
 level = 2 digit = 62 result = 76 78 81 92 
 level = 2 digit = 65 result = 65 75 76 78 82 87 88 92 
 level = 2 digit = 66 result = 75 76 78 82 87 88 92 
 level = 2 digit = 69 result = 78 82 87 88 92 
 level = 2 digit = 70 result = 75 76 78 82 87 88 92 
 level = 2 digit = 71 result = 81 92 
 level = 2 digit = 72 result = 75 76 78 82 87 88 92 
 level = 2 digit = 73 result = 78 82 87 88 92 
 level = 2 digit = 76 result = 76 78 82 87 88 92 
 level = 2 digit = 77 result = 78 82 87 88 92 
 level = 2 digit = 78 result = 78 82 87 88 92 
 level = 2 digit = 79 result = 82 87 88 92 
 level = 2 digit = 82 result = 92 
 level = 2 digit = 83 result = 87 88 92 
 level = 2 digit = 88 result = 88 92 
 level = 2 digit = 89 result = 90 91 92 95 
 level = 2 digit = 91 result = 91 92 95 
 level = 2 digit = 92 result = 92 95 
 level = 2 digit = 93 result = 95 
 level = 2 digit = 96 result = 
 level = 2 digit = 98 result = 
 level = 2 digit = 99 result = 
 level = 1 digit = 1 result = 1 9 14 15 17 27 31 32 35 51 57 60 76 78 81 92 result = 1 9 14 15 17 27 31 32 35 51 57 60 76 78 81 92 
 result = 1 9 14 15 17 27 31 32 35 51 57 60 76 78 81 92 time = 229.34637498856 s

Thus, solving a problem in its full formulation requires optimized approaches.

  • As I understand it, you consider that each number occupies one character, i.e. lies in the interval [0; 9], but in reality each number lies in the interval [0; 10 ^ 5] ... - Nik
  • The title says about the sequence of numbers - Yuri Negometyanov
  • quite right. Is 99999 not a figure? In the input section I described what numbers are fed to the input. Thanks for the help. - Nik
  • Not a figure. But I will see) - Yuri Negometyanov
  • Yes, I'm sorry, I did not mean numbers, but numbers. But in the condition I described everything correctly. - Nik

Here is the translation into Python, the pseudo-code from the task of finding the largest increasing subsequence :

 #!/usr/bin/env python3 def max_increasing_subseq_length(numbers): # pile is always increasing pile = [] for i, x in enumerate(numbers): # Binary search for the largest positive j <= len(pile) # such that pile[j] < X[i] lo, hi = 0, len(pile) while lo < hi: mid = (lo + hi) // 2 if pile[mid] < x: lo = mid + 1 else: hi = mid # After searching, lo is 1 greater than the length of the longest # prefix of X[i] pile[lo:lo+1] = [x] return len(pile) N, numbers = int(input()), list(map(int, input().split())) assert len(numbers) == N print(max_increasing_subseq_length(numbers)) 

This is O(n log n) in time and O(n) in memory algorithm. An example .

Here is a detailed explanation of why the greedy algorithm for patience sorting (a type of tapeworm) finds the length of the largest increasing sequence (equal to the number of heaps of cards at the end).