Interested in the implementation, namely, examples on the python or pyhe splitting RTF or Doc from Melkomfgkih =).

Update. Files contain Cyrillic! =).

  • search.cpan.org/~sargie/RTF-Parser-1.09/lib/RTF/Parser.pm True, this is not a python and not even a puff. - alexlz
  • the whole sadness is that it does not work with Cyrillic: Đ  - Dartanyan
  • codeproject.com/Articles/27431/Writing-Your-Own-RTF-Converter Look at this. It’s probably easier to find any rtf to xml converter (there are a lot of them). And what a task, if not a secret? - alexlz
  • I will google, but there is something * nix, and then C # open source is a little amusing =). - Dartanyan
  • In ancient times, B. Tobotras used the omnimark parser (1). I did not try to look at the above library under my linux - lazily, but I installed C # under mono. (2) Maybe your task does not require full parsing of rtf (but this is up to you) - alexlz

1 answer 1

and you understand that doc, rtf and docx are still 3 big differences, and rtf is not so difficult to parse independently, and docx is so generally a container inside which xml-ki lie, if memory serves me

  • Yes, I know the difference very well. - Dartanyan
  • @actionless: are you sure that this is your answer and not a comment? - VladD
  • @ Dartanyan: then you should understand that docx is parsed with the same means as regular xml, and rtf is normally normally parsed by standard means, but you can still try pyRTF-next (or something like that) - actionless
  • @VladD, to the same extent as the question was raised in detail .. - actionless
  • I'll try libu but again you forgot the essence of the question is that the Cyrillic under rtf is encoded so that simply there is no possibility to subtract it. - Dartanyan