check binary search tree for correctness

Question

There is a kind of binary search tree with nodes of the form:

struct node { struct node *left; struct node *right; int value; };

check it for correctness, i.e. whether all elements from the left subtrees are smaller than the current node, and from the right one - more.

I wrote the following functions (only the walk is called in main ):

 bool proverka_l (struct node *nNode, int be) { bool rez = true; if (nNode->value >= be) return false; if (nNode->left) rez = proverka_l (nNode->left, be); if (rez && nNode->right) rez = proverka_l (nNode->right, be); return rez; } bool proverka_r (struct node *nNode, int be) { bool rez = true; if (nNode->value <= be) return false; if (nNode->left) rez = proverka_r (nNode->left, be); if (rez && nNode->right) rez = proverka_r (nNode->right, be); return rez; } bool walk (struct node *nNode) { bool rez = true; if (nNode->left) rez = proverka_l (nNode->left, nNode->value); if (rez && nNode->right) rez = proverka_r (nNode->right, nNode->value); if (rez && nNode->left) rez = walk (nNode->left); if (rez && nNode->right) rez = walk (nNode->right); return rez; }

But if the tree is degenerate (for each node there is only one descendant), then the time costs are too large. How to speed up the check?

The algorithmic complexity of this test does not depend on the structure of the tree at all.
Therefore, if in your case you observe dependence on the structure of the tree, it is precisely the cost of the "heavy" implementation.
In this case, I see some strange double recursion on proverka_ and on walk .

Accepted Answer · 2018-01-16T17:52:38

You implemented the check too "literally", i.e. for each node in the tree, check its subtrees for inequality with the value in this node. This is a very irrational quadratic (!) Approach, which makes many passes through the same nodes and checks the same thing many times, i.e. "generates more heat than light."

In other words, this approach does not use transitive comparison properties and pays for it with terrible inefficiency. Already checking that a < b and b < c , your algorithm is still eager to check, just in case, that a < c , although there is no need for that.

More reasonable algorithms:

Bottom-up algorithm
Implement a recursive function that returns the range of values stored in the [under] tree. On the reverse course of recursion for each node, check that the range of values of the left subtree lies entirely "to the left" of the node's value, and the range of values of the right subtree is entirely to the "right" of the node's value. If at least somewhere in the tree these conditions are not fulfilled - the tree is composed incorrectly.

 bool get_and_check_tree_range(const struct node *root, int *lo, int *hi) { assert(root != NULL && lo != NULL && hi != NULL); *lo = *hi = root->value; if (root->left != NULL) { int left_lo, left_hi; if (!get_and_check_tree_range(root->left, &left_lo, &left_hi)) return false; assert(left_lo <= left_hi); if (left_hi > root->value) return false; *lo = left_lo; } if (root->right != NULL) { int right_lo, right_hi; if (!get_and_check_tree_range(root->right, &right_lo, &right_hi)) return false; assert(right_lo <= right_hi); if (right_lo < root->value) return false; *hi = right_hi; } return true; }

Call for root root

 int full_lo, full_hi; if (root == NULL || get_and_check_tree_range(root, &full_lo, &full_hi)) /* Все в порядке */; else /* Дерево не упорядочено */;

Top-down algorithm
Recursively bypassing the tree, in each node to "lower" on top of the requirement for what range the values of the corresponding subtree should lie. That is, if we arrive at a node with a “lowered” range [a, b] , and the value at this node is c , then we drop the requirement [a, c] to the left subtree, and [c, b] to the right subtree. If at least in some node the value of c does not lie in the range [a, b] that came from above [a, b] tree is incorrectly compiled.
```
 bool check_tree_in_range(const struct node *root, int lo, int hi) { assert(lo <= hi); if (root == NULL) return true; if (root->value < lo || root->value > hi) return false; return check_tree_in_range(root->left, lo, root->value) && check_tree_in_range(root->right, root->value, hi); } 
```
Call for root root
```
 if (check_tree_in_range(root, INT_MIN, INT_MAX)) /* Все в порядке */; else /* Дерево не упорядочено */; 
```

Answer 2 · 2018-01-16T17:30:35

To avoid degeneration, you can use trees with balancing, for example, AVL or red-black .

For example, if you insert a sequence [1,2,3,4,5,6] into a regular binary tree, you get this:

 +-1 +-2 +-3 +-4 +-5 +-6

The reverse sequence ( [6,5,4,3,2,1] ) will build such a "tree":

 +-6 +-5 +-4 +-3 +-2 +-1

And here is what we get in both versions for the AVL-tree:

 +-4 |-2 | |-1 | +-3 +-5 +-6

AND:

 +-3 |-2 | +-1 +-5 |-4 +-6

PS In general, it is necessary to decide what is more interesting: checking the tree for correctness (in fact, it is a one-time operation and it is not entirely clear why it is needed in regular work with the tree), or the overhead of balancing: it was experimentally found out that one balancing falls on every 2 inclusions and every 5 exceptions (wiki) .

The algorithm for checking the correctness of the tree structure will require O(n) operations regardless of the tree structure.
You can do worse (which the author of the question managed to do), but it is better not to do it
And I still do not understand what the difference is how much time validation takes?
Of course, from the point of view of the core functionality, such a check is meaningless.
I know how to avoid degeneration - that's just not the question.

check binary search tree for correctness

2 answers 2

More articles: