What is:

Array ( [0] => 5 [1] => Array ( [/Subtype] => Array ( [0] => 2 [1] => /Image ) [/Intent] => Array ( [0] => 2 [1] => /RelativeColorimetric ) [/Length] => Array ( [0] => 1 [1] => 1431417 ) [/Filter] => Array ( [0] => 2 [1] => /FlateDecode ) [/Name] => Array ( [0] => 2 [1] => /X ) [/BitsPerComponent] => Array ( [0] => 1 [1] => 8 ) [/ColorSpace] => Array ( [0] => 2 [1] => /DeviceCMYK ) [/Width] => Array ( [0] => 1 [1] => 935 ) [/DecodeParms] => Array ( [0] => 5 [1] => Array ( [/Columns] => Array ( [0] => 1 [1] => 935 ) [/Predictor] => Array ( [0] => 1 [1] => 15 ) [/BitsPerComponent] => Array ( [0] => 1 [1] => 1 ) [/Colors] => Array ( [0] => 1 [1] => 4 ) ) ) [/Height] => Array ( [0] => 1 [1] => 541 ) [/Type] => Array ( [0] => 2 [1] => /XObject ) ) ) 

And there is $stream , where there is a source image.

Problem: the simple gzubcompress($stream) method gzubcompress($stream) something, but this is not a picture. Hike it still need to somehow transform.

I enclose the data itself to verify the correctness of receiving the array:

 29 0 obj <</Subtype/Image/Intent/RelativeColorimetric/Length 1431417/Filter/FlateDecode/Name/X/BitsPerComponent 8/ColorSpace/DeviceCMYK/Width 935/DecodeParms<</Columns 935/Predictor 15/BitsPerComponent 1/Colors 4>>/Height 541/Type/XObject>>stream 

I tried to save it in jpeg, jp2, jpg format - the picture does not open. I tried to open Imagick, who knows how to define what kind of .... without extensions on the inside. ()

Pdf version 1.7 Do not send the dock - already found, read, do not drive!)

If you can help, lay out the pdf parser! The text already gets.)

  • one
    It smells of PNG format (Predictor 15) Can you somehow look at the source of the stream? - mantigatos
  • Alas, not png (not gif, not tiff). I posted a couple of lines from the decoded stream and not decoded. Although in theory the filter FlateDecode says that gzuncompress should be applied. Not deciphered piece: H ‰ TWiP [H Ts. Ѕ Ѕ Ѓ $ Ryu $$ ф f Decrypted piece: & nd (F + i) D Stl E't b) from I) with Doc the campaign will have to be finished!) Found there useful lines, but so far no result. You must use the colorspace value to determine the method of embedding an image. deviceCmyk means that we have 4 colors of 8 bits per value. That is, every 4 8-bit values ​​define a point. How to collect a picture from this? - org
  • I take my words back - this is PNG. At least the manual stubbornly insists on this. The question remains: how to push this stream into the gzinflate. Cutting two characters at the beginning does not help, and gives the result H (I do unpack ('v', 2 characters)). They write in the internet if it gives x - then cutting two characters channels to get a picture. - org
  • And you did not try to add some bytes to the header before unpacking the stream? 31, 139, 8, 0, 0, 0, 0, 0, 0, 1 (This is from the adobe forum) - mantigatos
  • I tried, alas, something is missing - the picture is not displayed. It is these that I took the values ​​of the same Adob ​​forum. ( - org

1 answer 1

In the Zend framework, this is implemented here:

 abstract class Zend_Pdf_Filter_Compression implements Zend_Pdf_Filter_Interface { protected static function _applyDecodeParams($data, $params){ ... 

it has decoding including PNG

UPD. And then everything is pretty

 class Zend_Pdf_Filter_Compression_Flate extends Zend_Pdf_Filter_Compression { public static function decode($data, $params = null) { 
  • I'll try. Only here zend I do not use in the project. Where would the insides find this method (to write on pure php)? - org
  • Here's what I found from the useful: tig12.net/downloads/apidocs/zf/Pdf/Filter/Compression ... ... It remains to find _applyEncodeParams from the ancestor. - org
  • Now I will squeal with happiness!))) Joke. Here is the base: tig12.net/downloads/apidocs/zf/Pdf/Filter/… As long as I add! Thank you so much! - org
  • The joy was not long. The picture is decrypted, but I don’t know what comes out of it. So: applied Zendovsky algorithm for decoding with parameters. I tried to save in all formats known to me, but the file does not open. Any more ideas? - org
  • one
    Try to interpret the result as ppm (Portable Pixelmap, image / x-portable-pixmap) (at least IrfanView and ACDSee can open it). Maybe it will. If not, then pdfimages. It extracts content in just this format. Moreover, it extracts superbly - now I fed him a bunch of pdf-ok - everything was extracted, even what was not in them :-) Well, it will naturally work faster than the PHP implementation ... - mantigatos