Just a sample of the Echomail archive
Cooperative anarchy at its finest, still active today. Darkrealms is the Zone 1 Hub.
|    DBRIDGE    |    D'Bridge Support Echo    |    10,398 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 8,471 of 10,398    |
|    Rob Swindell to mark lewis    |
|    Dupeloops    |
|    20 Jun 18 11:44:26    |
       Re: Dupeloops        By: mark lewis to Rob Swindell on Wed Jun 20 2018 08:08 am               >        > On 2018 Jun 19 22:43:24, you wrote to me:        >        > >> AFAIK, seenbys and paths are not included in most dupe detection        > >> schemes... other non-changing control lines are fine to be included...        > >> one of the problems comes when some system sort those control lines on        > >> messages they are passing along... we don't see so much of that like we        > >> did at one time ;)        >        > RS> So some metadata is included in the data that is hashed for dupe        > RS> detection and some is not?        >        > yes...        >        > RS> Are you sure about that?        >        > yes... in fact, and i don't recall who pointed this out to me back in the        > '90s,        > dbridge does exactly this in a manner of speaking... it takes the whole        > message        > header plus X bytes immediately following the message header and uses all of        > that as at least part of the checksum calculation... this was pointed out to        > me        > when i was working on my posting tool and was adding MSGID support to it...        >        > i was using a library and just letting it do its thing... some of my test        > posts        > were reported as dupes when they clearly weren't... IIRC, they were detected        > as        > dupes because they were posted within the same second... it turned out that        > my MSGID was somewhere in the middle of the control lines at the beginning        > of the message body and only my dbridge using testers were seeing this...        > someone pointed out this thing about dbridge also using X bytes from the        > beginning of the message body in addition to the message header so i moved        > my posting tool's        > MSGID to the top of the list and no more dupes were detected by those        > dbridge systems...        >        > i don't know what other systems do... there's only a very few that provide        > this        > information... SBBS is one of them... when i was testing Mystic, there was        > some        > discussion about dupe detection as james worked to try to figure out the        > best method he liked... i have used fastecho here for decades but i don't        > know what data it uses for its checksums... i do know it uses two        > checksums, though... i know this because i was being nosy one day and        > looking at FE's dupe database file (one for all message areas) with a hex        > viewer and noticed that groups of bytes were repeated all throughout the        > file... i asked about this and was told i found a bug... basically, FE has        > two checksums that it uses for each message and both are supposed to be        > stored in the database... what i found was that only one was being used and        > written to both fields... toby fixed that problem right quick... i just        > don't know what data is used to calculate them...        >        > back in the day, dupe detection formulas were not really shared around...        > maybe        > a couple of developers talking amongst themselves would tell each other what        > they were doing but this information was not published where everyone could        > find it... it was more or less black majik to a point...              To complete the discussion, Synchronet (smblib) actually uses multiple methods       of body text dupe detection:              1. A "legacy" CRC-32 hash of the body text, excluding any metadata, like FTN        control lines and excluding any trailing white-space or control-characters       2. A tuple of hashes (MD5 digest, CRC-32, and CRC-16) and length (char count)        of the body text excluding any metadata and *all* white-space characters              These, in addition to duplicate Internet (RFC-822) compliant Message-ID and       FTN-compliant Message-ID checks.              No black majik here. :-)               digital man              Synchronet "Real Fact" #64:       Synchronet PCMS (introduced w/v2.0) is Programmable Command and Menu Structure.       Norco, CA WX: 77.6øF, 57.0% humidity, 8 mph ENE wind, 0.00 inches rain/24hrs       --- SBBSecho 3.05-Linux        * Origin: Vertrauen - [vert/cvs/bbs].synchro.net (1:103/705)    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca