It is necessary to cut content from some tags, and some simply to delete. For example, this:

baz<foo>foo</foo><bar>bar</bar> 

should turn into:

 baz bar 

Regexpami does not offer :) HTML :: TagFilter does not know how the first, but nothing else can go to the head ...

UPD: My work on HTTP :: Parser

  • And let's better regekspami? Pluses are obvious: ease of support, short code and more - ReinRaus

3 answers 3

use CPAN

eg

  • There were thoughts about parsers, but this doesn’t quite fit into the “turnkey solution”. - user6550 February
  • in the sense of ? Do you need an example of how to use one of the packages to manipulate the DOM? There are examples in the description. - zb '
  • No, I know how to write event handlers :) It may just be something that is already completely ready, where you need to pass tag lists in arguments, that's all. In general, I did HTML :: Parser, and DOM is too heavy for such tasks. - user6550 February
  • See UPD and comments below :) - user6550

Use Mojo :: DOM

  • And pull the whole mojo? - user6550 February
  • Pulling Mojo bothers you, but doesn't HTML bother you? - Kirill Novgorodtsev
  • If that, here's a list of dependencies HTML :: Parser, which uses the tag filter: perlmonks.org ... ... - Cyril Novgorodtsev
  • I know what dependencies TagFilter has :) It's all there. In addition, decisions on the DOM are much harder, proven. However, it is necessary to drive the benchmarks, we'll see. - user6550 February
  • I didn’t get to Mojo :) But: #! / Usr / bin / perl use strict; use Benchmark qw (: all); use HTML :: Parser; use HTML :: DOM; my $ dp = new HTML :: DOM; my $ hp = new HTML :: Parser; my $ t = timeit (10000, sub {eval {$ hp-> parse_file ('1.html');}}); print "Parser", timestr ($ t), "\ n"; $ t = timeit (10000, sub {eval {$ dp-> parse_file ('1.html');}}); print "DOM", timestr ($ t), "\ n"; And: Parser 0 wallclock secs (0.12 usr + 0.39 sys = 0.52 CPU) @ 19379.84 / s (n = 10000) DOM 7 wallclock secs (3.81 usr + 3.22 sys = 7.03 CPU) @ 1422.27 / s (n = 10000) - user6550

HTML :: TreeBuilder