FTN Header versus actual message body conveying Unicode.
When I telnet to a SQL server that speaks Unicode only, it always returns the following characters (pascal): #239#187#191
When I telnet to a web page that speaks Unicode, it too returns #239#187#191 plus the <!doctype html> etc.
So... would it not stand true that systems that are posting UTF8 do the same introduction on the message body? Then authors *know* it potentially has Unicode and leave it damn well alone, and also parse it based upon UTF8 instead of 8bit char...
This is how I am coding things here, just based upon NexusSQL, PremierSQL, MS SQL, Apache and Nexus Web Service. I do not have access to my Oracle box nor the MySQL 5 server to see if they do the same during the initial connection negotiation(s).
A quick google: It's the utf8 byte order mark. Some editors save the BOM inside the file (in order to be used as a header) which regularly causes confusion because it is optional.
So, if we wanted to help enforce at a reader (or even tosser level) how to handle, I would offer this up as a required BOM to the message body that is UTF8.
Ozz
--- ExchangeBBS NNTP Server v3.1/Linux64 * Origin: (1:1/123)