There is a line '67 469 250 004 '(taken, in general, from the field of the table with which you need to carry out this operation (removal of problems)). This is a text field. Initially, the talitsa itself was utf-8 encoded (if that has any meaning).

None of the commands works, gives the original string.

select '67 469 250 004' , replace('67 469 250 004', ' ', '') , regexp_replace('67 469 250 004', '\\s* ', '', 'g') , trim('67 469 250 004') 

In regexp_replace I tried different combinations in place of '\ s *' ('\ s', '\ s *', '').

  • one
    select replace('67 469 250 004', ' ', ''); => 67469250004 as a result I get it regularly. Are you sure that you have exactly given the query in the question shows the original string? And you didn’t write this query just for the question, but in reality try to convert the table field? - Small
  • Your query works - Sergey Gornostaev
  • @SergeyGornostaev, Maybe it works for you because you got the code from here. I copy this text field from the source table Ctrl + C Ctrl + V, and nothing works with it .... - Tatyana Aulova
  • @ Small I’m driving this particular request. Apparently, you can do it because the field '67 469 250 004' you copied from here, and I copied it from the source table ... Maybe something is wrong with some unknown encoding of this field .. and I’m blank should indicate some other characters in the functions .. but I didn’t find any other options in Google .. (other reasons why it doesn’t work, I don’t come to mind .. - Tatyana Aulova
  • Accordingly, you do not have spaces there. select encode(originalstring, 'hex') will display the hex representation of the string and byte-by-bye the encoding table will show that you actually have it. - Small

1 answer 1

In the hex representation of your source line 67 469 250 004 looking like

3637c2a0343639c2a0323530c2a0303034

After checking with a manually typed equivalent, with spaces that look like hex in:

3637203436392032353020303034

In place of the spaces ( 20 in all ASCII -based encodings), we find a sequence of bytes c2 a0 . Knowing from the question that you use utf8, we go to the unicode table and, in general, we immediately find out that this is the NO-BREAK SPACE symbol. Quite expected place for him in fact. The program that wrote this data, left the wish for further output values ​​do not break it into several lines.

Interestingly, this character is not included in the mask \s regular expression.

It is possible to delete it like this by specifying a non-breaking space (especially in hex for readability):

 select replace(input_value, E'\xc2\xa0', ''); 

Alternatively, for a task to remove all non-numeric characters from a string, you can write the regular expression like this:

 select regexp_replace(input_value, '\D+', '', 'g'); 
  • Ahhhhh !!! Earned, Lord, how clever you are all here !! God, you are mine. Where you can put + 1000 to karma. Damn, I suffered for three days with this. God, how beautiful it all is. - Tatyana Aulova
  • No, in general, a week. - Tatyana Aulova