KvE> Here it looks correct with Latin-1 as localcharset.
Understood. On an utf-8 terminal it looks like this with latin1 characters;
... A M<F8><F8>se once bit my sister ...
When I convert it to utf-8 characters then they show up as slashed o characters and I see the code as 0x0f8 in a proper multibyte utf-8 application. Sending it as utf-8 to an 8-bit app will produce something like this;
------------------- ye olde cut n' paste starts 0000000: 2e2e 2e20 4120 4dc3 b8c3 b873 6520 6f6e ... A M....se on 0000010: 6365 2062 6974 206d 7920 7369 7374 6572 ce bit my sister 0000020: 202e 2e2e ... ------------------- ye olde cut n' paste ends
To get the proper utf-8 codes as sent one needs an actual utf-8 application that can determine multibyte sequences. An 8-bit one will always show the codes as single byte characters which they aren't ... or at least not the slashed o's in this example. However if I convert it first to latin1 and send it to the same 8-bit app as shown above then this happens;
------------------- ye olde cut n' paste starts 0000000: 2e2e 2e20 4120 4df8 f873 6520 6f6e 6365 ... A M..se once 0000010: 2062 6974 206d 7920 7369 7374 6572 202e bit my sister . 0000020: 2e2e .. ------------------- ye olde cut n' paste ends
Note that the slashed o characters now show up as 0xf8 in a 8-bit application.
In order to convert multibyte utf-8 characters to single byte characters requires they be mapped out to the correct codepage that can produce the same character. For example the Møøse will never convert to the codepages cp437 and cp866 but will convert to cp850 since it contains the proper mapping of the character in question.
It is just that simple.
Life is good, Maurice
... A Møøse once bit my sister ... --- GNU bash, version 4.2.45(2)-release (x86_64-unknown-linux-gnu) * Origin: Pointy Stick Society - Ladysmith BC, Canada (1:153/7001.0)