Text tokenization

Question

I'm trying to make a text tokenizer.
Type from this text:

10 meters 192.168.0.1 100

It should make such an array (approximately):

 [ { type: "units:meters", value: "10" }, { type: "ipaddress", value: "192.168.0.1" }, { type: "number", value: "100" } ]

I have already made separate parsers for them, but they parse the entire text and give all the matches.

How can I do something so that they are in the correct order and so that they do not overlap each other (for example, so that the parser would not take the beginning and end of the IP address separately as numbers)?

Try a regular schedule or take one character first, check for a possible token, no - take another one, and so on.

NeoKat NeoKat 48 7 bronze marks · Accepted Answer · 2016-07-18T19:45:56

I found a solution:
You can simply search for one parser and delete the passed piece of the string.

Text tokenization

1 answer 1

More articles: