There is a file from which data is read and added to the Oracle database.

Problem on an example:

  • Field VARCHAR2 , size 3 bytes

  • We try to insert there 'abb' and immediately catch the exception:

    value too large for column (actual: 6, maximum: 3)

That is, each character is encoded in two bytes. Okay, I thought, now recoded. In the database there is an AL32UTF8 encoding. The file encoding is CP866 .

Unsuccessful recoding attempt:

 Encoding srcEncodingFormat = Encoding.GetEncoding(866); Encoding dstEncodingFormat = Encoding.UTF8; byte[] originalByteString = srcEncodingFormat.GetBytes(s); byte[] convertedByteString = Encoding.Convert(srcEncodingFormat, dstEncodingFormat, originalByteString); s = dstEncodingFormat.GetString(convertedByteString); 

You cannot change the type of columns. How can this problem be solved? Is it possible to somehow explicitly specify the encoding of the added value when inserting data into the database?

  • Is it possible to somehow explicitly specify the encoding of the added value when inserting data into the database? CONVERT (), docs.oracle.com/database/121/SQLRF/functions041.htm#SQLRF00620 - Akina
  • Could you write the pseudo code for the insert request using the CONVERT function? - MrLebovsky
  • If the session encoding does not differ from the DB storage encoding, then oracle will not convert anything by itself. and the value is inserted as it is (provided that you do not convert it at the client and leave it at 866) after which the value from the database can read the same exactly crooked-tuned client and it will receive it, of course, at 866 - Mike
  • one
    And it is absolutely not clear why you can not change the column. varchar2 (3) when encoding a utf-8 database, in principle, it can store only one and a half characters, which is, to put it mildly, strange. obvious DB design error - Mike
  • 2
    It is necessary to replace the semantics of lowercase fields with byte to char . This is not a column type change, it will look like - mycol varchar2(3 char) . - 0xdb

1 answer 1

The simplest solution is to change the semantics of the length of the fields in which it is supposed to store information with non-ASCII characters.

Unicode is used in the examples:
\0430\0431\0432 = абв = 160,161,162 в CP866 или 208,176,208,177,208,178 в UTF8

 host chcp Active code page: 866 @[%nls_lang%] SP2-0310: unable to open file "[.RU8PC866]" --кодовые страницы ОС и ДБ сессии клиента совпадают alter table tstcp866 modify (code varchar2(3 char)); Table altered. insert into tstcp866 values ('абв'); 1 row created. select code, dump(code) dc from tstcp866; COD DC --- ----------------------------------------------- абв Typ=1 Len=6: 208,176,208,177,208,178 

Thus, in a field 3 characters long, 6-byte fit. There will be no problems with queries with this field, either from other clients or at the database level.

Not recommended solution

You can insert into the same 3rd byte and without changes to the database, but this is the same as polling one and a half to pour out.

To save the source encoding, the client's code page must be the same as the database. In this case, there will be no implicit conversion of data. Those. set the client's environment set NLS_LANG=.AL32UTF8 (or in the registry), and the data will be loaded into the database as it is in CP866.

 host chcp Active code page: 866 select * from nls_database_parameters where parameter = 'NLS_CHARACTERSET'; PARAMETER VALUE ------------------------------ -------------------------- NLS_CHARACTERSET AL32UTF8 @[%nls_lang%] SP2-0310: unable to open file "[.AL32UTF8]" create table tstcp866 (code varchar2(3)); insert into tstcp866 values ('абв'); 1 row created. select code, dump(code) dc from tstcp866; COD DC --- ----------------------------------------------- абв Typ=1 Len=3: 160,161,162 

The result seems to be achieved, but the field is not valid Unicode. How difficult the query should look at the database level to transcode the field:

 select convert(code, 'AL32UTF8', 'RU8PC866') code from tstcp866 where convert(code, 'AL32UTF8', 'RU8PC866')=to_nchar(unistr('\0430\0431\0432')) ; CODE --------------- абв 

Once again, this hack is not recommended, because Access only from the client with the "incorrect" installed encoding. In the worst case, for example, with implicit converting, data loss will occur.