... darkrealms ...

Cooperative anarchy at its finest, still active today. Darkrealms is the Zone 1 Hub.
DBRIDGE
D'Bridge Support Echo
10,398 messages
[ << oldest | < older | list | newer > | newest >> ]
Message 7,766 of 10,398
mark lewis to Nicholas Boel
BBS Promotion
10 Feb 17 20:35:06
    On 2017 Feb 10 07:32:52, you wrote to me:   
      
    NB>>> TimEd is probably trying to convert the UTF-8 Russian characters to   
    NB>>> IBMPC, which won't happen.   
      
    ml>> FWIW: there is no ""conversion""... it is simply displaying the   
    ml>> glyphs represented by those raw bytes in their CP437 codepage   
    ml>> positions... CP437 and other old-school codepage characters are only   
    ml>> one byte wide... any ""conversion"" might come from translating   
    ml>> between single byte codepages where the character glyph is   
    ml>> transliterated from one position in the first codepage to another   
    ml>> position in the second codepage where its glyph is stored... in that   
    ml>> case, the raw byte changes because the position in the codepage   
    ml>> changed and the byte is the position...   
      
    NB> You say potato, etc..   
      
   yes and no... it is really easy to understand though...   
      
    NB> Fact of the matter is CP437/IBMPC will not display Russian characters   
    NB> properly,   
      
   of course not... their glyphs are different than latin glyphs... this is   
   really simple when looking at the old school way... there are numerous tables   
   of 256 bytes... each byte represents one character, a glyph... some are   
   actually control characters (eg: CR, LF) and others are just language   
   characters aka glyphs... in one table, the space character is held in position   
   32decimal (aka 20hex)... another table also has the space in position   
   32decimal (aka 20hex)... great! no ""conversion"" is needed for the space   
   character... now, if the capital letter 'A' is held in the first table at   
   position 65decimal (aka 41hex) and the capital letter 'A' is held in position   
   25 decimal (aka 19hex) in the second table then some ""conversion"" is needed   
   or you will see the wrong character when using one of the two pages... one   
   will be right and the other just won't be... this is actually tr   
   nsliteration... there are mapping files created to point to the proper   
   position for the 'A' when using the second table (aka codepage)... this is   
   easily seen when overlaying CP855 on top of CP437... most characters will   
   align in the same cells of the table but some are different... they are   
   generally up in the higher-than-127 range where the line drawing and box   
   characters reside in CP437...   
      
   then someone came along and said "hey! we can do better" so UTF-8, UTF-16 and   
   UTF-32 were born... UTF-8 is 8bit lossless and contains 1112064 positions in   
   its table instead of the original 256... converting from codepages to UTF-8 is   
   easy because every character exists in its huge table... going the other way   
   is not guaranteed because the glyphs just don't all map over... in some   
   languages, they have used "double characters" like "ae" to indicate the single   
   ae character which i don't know how to make on this OS... other languages may   
   also have an "ae" character but in them you cannot use "a" and "e" side by   
   side to indicate the single "ae" character... i don't know why, that's just   
   the way it is...   
      
   anyway, i'm just trying to help you understand why there's no ""conversion""   
   as such in the old school code pages... there is transliteration where on   
   glyph lives in one spot in this table and another spot in that table... UTF   
   stuff just greatly expands the size of the tables which means that the glyphs   
   are now represented by one or more bytes which are/were the old table position   
   numbers in the old school code pages...   
      
    NB> whether they're UTF-8 or not.   
      
   true...   
      
    NB> The only somewhat possible way for him to read it properly would be to   
    NB> change his default encoding to CP866 or KOI8-R,   
      
   eaxctly...   
      
    NB> and even then there is no guarantee that the translation from UTF-8   
    NB> will work as expected.   
      
   because it depends also on what his OS can display... what i mean by this is   
   that he has to be able to load the OS with the needed code page to view them   
   correctly but if he does that, he'll lose all the normal latin glyphs...   
   switching to UTF-8 on the OS will alleviate this but it requires that the   
   software is also able to transliterate the characters to their new positions   
   in the UTF-8 table so they can be rendered properly... we've seen this with   
   the box and line drawing characters... there's one or two BBS related packages   
   out there that do properly transliterate them to their new positions in the   
   UTF-8 table... i don't recall who did them or what packages they are/were but   
   they are or have been participants in AGORANET and at least one of them was   
   either a BBS or a terminal program...   
      
   so, ok... too long a day... only 20:30 here and i'm already going to call it a   
   night... on a friday damned night at that :(   
      
   )\/(ark   
      
   Always Mount a Scratch Monkey   
   Do you manage your own servers? If you are not running an IDS/IPS yer doin' it   
   wrong...   
   ... Well done! is better than well said!   
   ---   
    * Origin:  (1:3634/12.73)
[ << oldest | < older | list | newer > | newest >> ]