📜 ⬆️ ⬇️

Using a binary tree in swift using enum using OCR as an example

The goal was to create an application on mac that can recognize the text of the code from images and videos.

I wanted to make it so that even with a large amount of code, the text was recognized in less than a second.

The problem is facilitated by the fact that the language in which the code is written is always English and the width between all letters is the same (monospaced font) - such are used for programming, and in these fonts it is easy to see the difference between 1 and I, 0 and O and so on.

In short, the task is reduced to two parts:

1. Finding the letter itself with its borders


And Vision, the new framework from Apple, did a great job with it.

Here is a screenshot of how it works.


2. Recognition of the letter within the specified limits


I decided not to go in a tricky way and check certain pixels of a square, within whose borders there are letters (let's say: center, angles, sides) and starting from the presence or absence of a letter, to classify that per letter.

Illustrative example:



But how about a tree will look like
This part is because everything would not fit, and not necessary.


How can this schematic drawing be transferred to the code, so as not to dig in it, and so that it is just as clear ?!

This is where a binary tree comes to the rescue. Here is his frame.

enum Tree<Node, Result> { ///Empty result case empty ///Result with generic type case r(Result) ///Recursive case with generic tree indirect case n(Node, Tree<Node, Result>, Tree<Node, Result>) } 

Now, based on it, we can transfer all of our drawing into the code.

 //.c означает, если нахожу по центру пиксель с буквой, значит это "H" в противном случае "O" //на месте .c будет находиться условие, по которому происходит ветвление let HorOTree = TreeOCR.n(.c, .r("H"), .r("O")) 

This is what a bigger piece of wood would look like.



You can all very schematically expand and easily find the right letter.

And the last moment, this is how the model itself looks like, in which all the work takes place.

 extension Tree where Node == OCROperations, Result == String { func find(_ colorChecker: LetterExistenceChecker, with frame: CGRect) -> String? { switch self { case .empty: return nil case .r(let element): return element case let .n(operation, left, right): let exist = operation.action(colorChecker, frame) return (exist ? left : right).find(colorChecker, with: frame) } } } 

In this tree, we pass the LetterExistenceChecker class, which is responsible for checking whether a letter has a pixel at a certain point within the boundaries of the desired square. Of course, I omitted a lot of details, otherwise the article would have turned out too cumbersome. And here, not only these two stages, which were mentioned in the article, but much more, but they were omitted, because the goal was to show how to use a binary tree and enum.

Here is a demo of how the program works, please note that since the goal I set was to recognize only text with code, I decided to just ignore the rest of the text that was not code, made it so that the program looked only for text with code.


I am pleased to hear your comments, criticism.

Source: https://habr.com/ru/post/439324/