A site on Asp.Net MVC using Razor. Periodically in random places there are such question marks, while the rest of the Russian text is displayed normally.

Unicode question mark on page

The position of these characters depends on the content of the page: if the content in front of the character changes even by the byte, the character either disappears or moves. If the content to this sign does not change, then it stably shows with each page refresh. Obviously, it appears with some kind of deformation of double-byte characters. In encodings everywhere UTF-8 is exposed.

There is an assumption that for some reason the bytes deteriorate at the junction of two TCP packets. I know that TCP is a transport-level protocol that ensures integrity and continuity of data at the top level, but I haven’t found another explanation. As well as I did not find the reason why TCP can spoil the joints of packets.

The problem manifests itself on different servers with the same site (transported a couple of times), and even on different sites created independently from scratch from standard templates.

UPD

An example of a broken line: Not processed.

Bytes of the same string:

0xD0, 0x9D, 0xD0, 0xB5, 0x20, 0xD0, 0xBE, 0xD0, 0xB1, 0xD1, 0x80, 0xD0, 0xB0, 0xD0, 0xB1, 0xD0, 0xBE, 0xB1, 0xD0, 0xBE, 0xB0, 0xD0, 0xBE, 0xD0, 0xB1, 0xD0, 0xB1, 0xD0, 0xB1, 0xD0, 0xB1, 0xD0

Here you can see that in the first case, the letter "a" is encoded 0xD0, 0xB0, and in the second (broken) case, 0xB0, 0xD0. For some reason, the bytes are swapped.

  • Save the result of the HTTP request as it is and see what happens in the sequence of bytes. - Athari
  • For the time being this is an "underground knock". Show relevant code. Maybe you incorrectly save the text (for example, turn the bytes of each packet into strings and concatenate, and not vice versa). - VladD
  • @Athari, updated the question, gave an example of a broken line. - Almeonamy
  • You also have the data to play and code. did you try to debug locally with the same data? Does the string break too? - PashaPash ♦
  • @PashaPash, the data is very difficult to pick up, so it is not locally debugged. But on the production such symbol periodically appears. - Almeonamy

2 answers 2

This may be caused by incorrect implementation of HttpResponse.Filter (self-written, or ready-made third-party).

The filter connection code usually looks like

 Response.Filter = new MyFilter(Response.Filter); 

You can check the presence of the filter under the debugger - see the type of the current value of Response.Filter .

ASP.NET writes response to the filter in chunks through the method:

 public override void Write(byte[] buffer, int offset, int count) { ... } 

and if the filter does not know about Unicode, then it processes the pieces "as is", cutting them in the middle of the symbol.

UPD: the problem can also be caused by one of the already fixed bugs in New Relic Browser Monitoring :

.NET Agent 2.25.208.0:

The problem was that the problem was where the browser was enabled.

.NET Agent 4.1.134.0:

Non-HTML resources if this has been fixed.

It is worth updating the agent to the latest version.

  • In Response.Filter seems to be System.Web.HttpResponseStreamFilterSink. If I'm not mistaken, this is the standard handler. Checked on the production. I also remembered that the server uses NewRelic for monitoring. Theoretically, he could change the output stream, but I do not know how to check it. - Almeonamy
  • @Almeonamy relic exactly rules the content (just not sure how exactly). disable end user monitoring in relic. check that the script has disappeared from the page. if it helps, write to them in support (and here at the same time, otherwise I also have a relic in the application) - PashaPash ♦
  • @Almeonamy found. be updated. - PashaPash ♦
  • I have version 3.1, i.e. newer than the one that fixed this bug. However, I turned off monitoring in the browser, after which I could not find this bug anywhere on the site. In the coming days I will try to update the agent and try to enable monitoring in the browser. If the bug no longer pops up, I’ll mark the answer. - Almeonamy
  • @Almeonamy better try to upgrade to the latest version and turn on monitoring. If the bug comes out right away, write to the support replica. - PashaPash ♦

This situation occurs when two problems with the encoding partially compensate each other.

Suppose you have a file written in utf-8 encoding. But at the same time, Razor reads it like windows-1251 . The result is rubbish, but this rubbish turns out to be “generally correct”, with the exception of those symbols that were not due. Then the rubbish is given in the windows-1251 encoding to the browser, in which the automatic detection of the encoding by signature works, and the same bytes are shown to you already in the utf-8 encoding. Total - You see the source text, with the exception of the characters lost during the conversion.

Check the encoding at all stages: how it is stored in the file (the correct encoding should be in the system.web/globalization section of your web.config ). Check the behavior of all intermediate proxy servers, if any. And of course, check if the encoding given to the servers in the headers matches the one that the browser has defined.

  • All project files in utf-8. The response header is utf-8. In web.config, also utf-8. I liked the idea, but the encoding seems to be everywhere. - Almeonamy
  • And these very lines in which question marks appear - are they read from the file or from the database? How are they stored in the database? - Pavel Mayorov
  • In different sources. Sometimes these are lines from the database, sometimes in the cshtml file itself. It does not depend on the source. - Almeonamy
  • It even depends. You at some stage is replaced by the encoding. About TCP is shamanism, at 3 am after the 100th cup of coffee. Maybe from js script it encodes, and then on the server is decoded incorrectly. - hardsky
  • one
    @Almeonamy, the comment was to ensure that the answer seems to be correct. It is necessary to check the encoding throughout the chain, the encoding of the htm page (view-chi directives can be), whether different methods are used encode / decode, encoding in the database. - hardsky