FGHIGate на GaNJa NeTWoRK ST@Ti0N - Просмотр сообщения в эхоконференции UTF-8


	Добро пожаловать, Гость. Пожалуйста авторизуйтесь здесь.

Введите FGHI ссылку:

Присутствуют сообщения из эхоконференции UTF-8 с датами от 19 Apr 14 00:41:48 до 01 Apr 24 00:03:00, всего сообщений: 1367

Ответить на сообщение

К списку сообщений

Предыдущее сообщение

Следующее сообщение

= Сообщение: 275 из 1367 ============================================= UTF-8 =
От   : Eugene Subbotin                  2:5075/35          01 Oct 16 02:31:58
Кому : Michiel van der Vlist                               01 Oct 16 02:31:58
Тема : Re: UTF-8 nodelist report
FGHI : area://UTF-8?msgid=2:5075/35@fidonet+57eeed49
На   : area://UTF-8?msgid=2:280/5555+57eed20f
= Кодировка сообщения определена как: UTF-8 ==================================
==============================================================================
   Hello Michiel!

30 Sep 16 22:26, you wrote to me:

MvdV> There are some 65 NCs in the R50 section of the nodelist. So to get
MvdV> five, you only need to convince another 4 out of 64 or 6% of the
MvdV> remaining NCs. Should be doable... ;-)

May be :) We'll see.

MvdV> So convince as many of your fellow NC to particpate as you can and get
MvdV> your RC to produce a segment for the DAILYUTF along with the regular
MvdV> nodelist segment.

Most of them didn't see the need of that or unaware of UTF-8 FTN techs.
I will try to convince them.

MvdV> Indeed. The concatenated Dutch 'ij' does not fit in any of the common
MvdV> code pages. And the o with slash does not fit in CP866.

MvdV> What I do is that I do some preprocessing.  For thet I use sed, the
MvdV> serial editor. You are probably familiar with that as well.

MvdV> Before calling iconv I run sed with the following script to convert
MvdV> some "impossible" characters to CP850:

Thank you :)

ES>> P.S. Did you see the Mithgol's UTF-7 draft?

MvdV> No, I am afraid not. UTF-7? Why bother with UTF-7 when we have an 8
MvdV> bit transparent medium?

Backward compatibility may be :) Please look on it here:
https://github.com/Mithgol/fiunis/blob/master/fiunis.txt

=== Cut ===
**********************************************************************
FGHI                                FIDONET GLOBAL HYPERTEXT INTERFACE
**********************************************************************
Status:         draft
Revision:       draft 2.0
Title:          Fidonet Unicode substrings
Author:         Mithgol the Webmaster   (aka Sergey Sokoloff, 2:50/88)
Revision Date:  29 Aug 2016
-+--------------------------------------------------------------------
Contents:
   1. Status of this document
   2. Introduction
   3. Key words to indicate requirement levels
   4. 8-bit encoding of a Fidonet message
      containing Unicode substrings
   5. Decoding of an 8-bit Fidonet message
      containing Unicode substrings
   6. Important notes
   Appendix A. Known implementations
----------------------------------------------------------------------

1. Status of this document
--------------------------

This document is a draft of a Fidonet Standards Proposal (FSP).

This document specifies an optional Fidonet standard
that can be used in the Fidonet community.

Implementation of the standard defined in this document is not
mandatory, but all implementations are expected to adhere
to this standard.

Distribution of this document is unlimited,
provided that its text is not altered without notice.

2. Introduction
---------------

Many classic Fidonet message editors (such as GoldED+, for example)
were designed as 8-bit applications. They expect that each character
in a Fidonet message is coded by one byte. Therefore they won't ever
support Unicode UTF-8 or UTF-16 encoding.

This situation is a problem of "chicken and egg" type. The messages
in UTF-8 charset do not appear in Fidonet because they won't ever be
read in any of the popular readers. On the other hand, the absence
of such messages means that there's no need for the developers of
popular readers to improve their soft, or for their users to upgrade
their readers or to choose some newer (Unicode-supporting) readers.

This document specifies a simple method that would allow Unicode
substrings to appear (encoded and escaped) within 8-bit strings.

The method of encoding is based on the UTF-7 format (RFC 2152).

The method of escaping is inspired by HTML character references
(HTML 4.01 subsection 5.3.1, subsection 5.3.2).

By implementing this method, the following situation is achieved:

*) The users of newer (Unicode-supporting) Fidonet applications
     can read and write Unicode substrings in 8-bit messages.

*) The users of older (8-bit) Fidonet applications can read the
     8-bit parts of the message. The Unicode susbstrings remain
     unintelligible, but that's natural for an 8-bit application,
     and brings only a minor discomfort, and serves as a reason
     for an upgrade.

3. Key words to indicate requirement levels
-------------------------------------------

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY",
and "OPTIONAL" in this document are to be interpreted as described
in FTA-1006 (based on RFC 2119).

4. 8-bit encoding of a Fidonet message containing Unicode substrings
--------------------------------------------------------------------

First of all, the source (Unicode) text is split into an array of
substrings following each other (successively), where the substrings
that have even indices (0, 2, 4...) contain characters that can be
encoded with the target encoding and the substrings that have odd
indices (1, 3, 5...) contain characters that cannot be encoded with
the target encoding. (Or vice versa; if a character that cannot be
encoded with the target encoding appeared first, then its substring
has zeroeth index, and all such substrings also have even indices.)

The traditional 8-bit encoding is performed for the substrings that
contain characters that can be encoded that way, i.e. each of such
characters is represented by a byte.

The other substrings ("Unicode substrings") are converted to the
UTF-7 format (RFC 2152). For example, a string that consists of
Unicode characters U+9802, U+5C16, U+5C0D, U+6C7A, U+4E4B, U+7A7F,
U+8932, U+5B50, U+7BC7 is represented by the following string:

        +mAJcFlwNbHpOS3p/iTJbUHvH-

However, the UTF-7 method of escaping (a plus before such string and
a minus after) is not sufficient for Fidonet. Thus the minus MUST be
followed by a semicolon, the plus MUST be prepended by an ampersand.
For example, a string that consists of Unicode characters U+9802,
U+5C16, U+5C0D, U+6C7A, U+4E4B, U+7A7F, U+8932, U+5B50, U+7BC7 is
represented by the following string:

        &+mAJcFlwNbHpOS3p/iTJbUHvH-;

Afterwards the traditional 8-but encoding is performed for these
(ASCII-compatible) characters.

The encoding results are concatenated (successively) in the order
that substrings had in the initial array (i.e. in the order of
their appearance in the source text).

5. Decoding of an 8-bit Fidonet message containing Unicode substrings
---------------------------------------------------------------------

First of all, the message is decoded by a traditional 8-bit decoder,
each byte is decoded to a character.

Encoded Unicode substrings are then found in the message (using
their unique form: an ampersand, then a plus, then one or more of
base64 characters, then a minus and a semicolon) and replaced by
their decoded equivalents.

For finding that encoded forms, the following PECL (Perl-compatible
regular expression) might be useful:

      /&\+[A-Za-z0-9+/]+-;/

For decoding them, some RFC2152-compatible UTF-7 decoder MUST be
used. (As explained in the previous section, Fidonet Unicode
substrings use UTF-7 encoding and a different escaping. If the
decoder expects RFC2152-compatible escaping, the ampersand before
the substring and the semicolon after the substring MUST be removed
before the substring is given to the decoder.)

6. Important notes
------------------

Note 1. An ampersand, a semicolon, a plus, a minus and some of
base64 codes (for example, capital Latin letters) might appear in
UUE code blocks in Fidonet. If a Fidonet message reader interpretes
UUE codes, then it MUST isolate and decode UUE before it applies the
decoder of Fidonet Unicode substrings to the rest of the message. If
a Fidonet message reader does not interprete UUE codes (i.e. just
presents UUE as a big lump of human-unreadable codes), it MAY not
care if some of these codes end up converted to Unicode substrings.

Note 2. Fidonet Unicode substrings MAY appear in source message even
before it is encoded (for example, when Fidonet Unicode substrings
are discussed). An encoder of Fidonet Unicode substrings SHOULD be
applied to them (so that their original form is restored after the
decoding; otherwise such substrings would be decoded to their
Unicode equivalents). Keep in mind the following:

2.1) Such second level of encoding MUST NOT be applied to Fidonet
       Unicode susbtrings when they (accidentally) are generated in
       UUE blocks. Otherwise UUE decoding in older Fidonet message
       readers (that do not know anything about Fidonet Unicode
       substrings) becomes prevented.

2.2) Fidonet Unicode substrings of the source message MAY be left
       untouched by the decoder for the sake of users of older Fidonet
       message readers (otherwise Fidonet Unicode substrings, encoded
       twice, become even more incomprehensible for them).

Appendix A. Known implementations
---------------------------------

By the time of this writing there are several implementation of the
draft editions of this standard.

Reference implementation (free open source):

https://github.com/Mithgol/fiunis

Application-level implementations written by the standard's author:

*) Fido2RSS https://github.com/Mithgol/fido2rss

*) PhiDo https://github.com/Mithgol/phido

*) twi2fido https://github.com/Mithgol/node-twi2fido/

**********************************************************************
EOTD                                               END OF THE DOCUMENT
**********************************************************************
=== Cut ===

Eugene

--- GoldED+/LNX 1.1.5-b20160827
* Origin: FireFox Station (2:5075/35)

К главной странице гейта