Remove BOM from file

Question

There is an array of bytes byte [] buffer, I read it into a string and convert to base64

string chunk = Convert.ToBase64String(buffer);

After I transfer it to powershell script in C #, where I decode it back and write this line to the file

 $DataDecoded = [System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String('Chunk')) Invoke-Command {$using:DataDecoded | Add-Content -Path C:\Test\Script1.ps1 -NoNewLine} -ComputerName " + configuration.Ip + @" -Credential $Cred

As a result of such cycles there may be several while copying the file in pieces, and the output is a file where at the very beginning of the first line there is a question mark '?' Nowhere else, if I change the encoding in a string to ASCII, then there are three question marks '???'

I understand this is a BOM. I certainly need a file without these question marks ... is it possible to remove it somehow? I have already tried all the options with these encodings

@PavelMayorov I was told to encode first on base64, and then decode already in powershell
And if you were told to jump off the roof of a skyscraper - would you jump?
If you had led the task completely, then the correct answer would have been received yesterday (in the previous question).
What do you work with the text, what are you reading in blocks ... etc.

Alexander Petrov Alexander Petrov 15.6k one sixteen 39 · Accepted Answer · 2018-06-29T09:23:34

BOM (byte order mark) exists in the file. So you need to solve the problem when reading from a file!

Surely you read an array of bytes from a file like this:

 var bytes = File.ReadAllBytes(path);

Naturally, if there is a BOM in the file, then these bytes fall into the array.

Therefore, read the data from the file exactly as text, and the BOM will be eliminated. Further from the text you will get the bytes in the desired encoding.

 var text = File.ReadAllText(path); var bytes = Encoding.UTF8.GetBytes(text);

Alexander Petrov

15.6k one sixteen 39

I need to read the data from the file in parts (1kb), so I read to the byte array - Alex
@ Alexey - As far as I can tell from your questions, you read from a text file. So? Specify (edit the question). If yes, then you can not work with the text as with bytes. Reading on a kilobyte, you risk breaking a multibyte character in the middle. Therefore read using StreamReader blocks of text ( not bytes !). There will be no problems with the BOM. - Alexander Petrov
I will have a file powershell.ps1 script I read it through FileStream - Alex
@ Alexey - The script is a text file. FileStream reads bytes . It is not right! In multibyte encodings (for example, UTF8), one character (letter) can be encoded with several bytes. For example, with the numbers 1023, 1024, 1025. Reading exactly one kilobyte (1024 bytes), you will read a part of this character, break it. - Alexander Petrov
one
@ Alexey - Yes, in such a scenario, the characters will eventually return to their place, they will become correct. But in general - you can not work with the text, as with bytes. And BOM is just one of the reasons. Just replace FileStream with StreamReader . And instead of an array of bytes, read an array of chars (char). - Alexander Petrov

|

Answer 2 · 2018-06-28T22:19:03

For example:

 if (String1.StartsWith(Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble()), StringComparison.Ordinal)) { String2 = String1.Remove(0, Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble()).Length); }

It is possible in the forehead, for simpler:

 String1.Trim(new char[]{'\uFEFF'});

or even to remove the space between the BOM and the text:

 String1.Trim(new char[]{'\uFEFF','\u200B'});

Better still .trimstart (), so as not to kill it at the end.

Alexey Alexey 71 eight · Answer 3 · 2018-06-29T07:55:35

I deleted the characters responsible for the BOM, only from the base64 of the converted string. I did it like this:

 fileChunk = fileChunk.Replace("77u/", "");

Remove BOM from file

3 answers 3

More articles: