<object type="application/x-shockwave-flash" data="//site.ua/uploads/uppod.swf" id="videoplayer15841" style="visibility: visible;" width="688" height="464"> <param name="wmode" value="opaque"> <param name="bgcolor" value="#ffffff"> <param name="allowFullScreen" value="true"> <param name="allowScriptAccess" value="always"> <param name="id" value="videoplayer15841"> <param name="flashvars" value="uid=videoplayer15841&amp;st=//site.ua/img/123123.txt&amp;pl=http://site.ua/speacker.txt"> </object> 

How to pull out the values ​​contained in the value of this media object?

 value="uid=videoplayer15841&amp;st=//site.ua/img/123123.txt&amp;pl=http://site.ua/speacker.txt"> 

usually grabbed the body with Nokogiri like this:

 doc = Nokogiri::HTML(open(link)) doc.css(".linker") 

But it is impossible with this method to get close to the object

I tried it like this:

 doc = Nokogiri::HTML(open(link)) param = doc.css('param') param['value'] 

Probably for media objects you need something else to use?

    2 answers 2

    Just HTML does not assume that there can be something smarter in the attributes than strings. Therefore, nothing but strings (and nil for attributes that are not present) will be returned by Nokogiri when receiving an attribute. However, what gets information from HTML may have some thoughts about coding information, i.e. here one format in another.

    It was this:

     markup = <<HTML <param name="flashvars" value="uid=videoplayer15841&amp;st=//site.ua/img/123123.txt&amp;pl=http://site.ua/speacker.txt"> HTML 

    Nokogiri removes HTML-level escapes (similar to XML, &amp; replaced by & ):

     require 'nokogiri' flashvars = Nokogiri::HTML(markup).at_css('param')['value'] # => "uid=videoplayer15841&st=//site.ua/img/123123.txt&pl=http://site.ua/speacker.txt" 

    And the data remains in the Flash-specific format of the parameters , which, quite by chance, are very similar to the coding format of key-value pairs in the query string , which can be expanded into a hashmap using CGI.parse :

     require 'cgi' CGI.parse(flashvars) # => {"uid"=>["videoplayer15841"], # "st"=>["//site.ua/img/123123.txt"], # "pl"=>["http://site.ua/speacker.txt"]} 

    The structure of the result may seem strange, but it is the most correct, since it assumes that the key may occur several times in a row.

    • And a minus without explanation. Ok :) - D-side
     string_value=doc.xpath("//object@id='videoplayer15841'/param@name='flashvars'").to_s hash_value=string_value.split(';').map{|i|i.split('=')}.to_h 

    something like that?

    • First, the predicate in XPath is supposed to be enclosed in square brackets, like param[@name='flashvars'] . Secondly, it does not work (even if you fix the XPath; try it yourself). Thirdly, even if it worked at all, it would give the wrong answer, since inside HTML there is a value with XML shielding. - D-side
    • Well, I probably wrote, without looking into the directory, explain about xml plz screening, something I’ve heard for the first time is - Denis Epishkin
    • str = doc.xpath ("object [@ id = 'videoplayer15841'] // param [@ name = 'flashvars']"). first ['value'] this is how it works and forced it to be refined)) we get a clean string from value - Denis Epishkin
    • ZY Nokiogiri has a great parse_html method, if about screens, etc., and the application is clear, the stump to Nokogiri :: HTML looks for this clever person using xpath in any not very valid html after parse_html esseno - Denis Epishkin
    • This is almost the case, make a response. True, this will also give the wrong answer, if the values ​​are URL-unsafe characters :) But the question doesn’t have them. - D-side