home bbs files messages ]

Just a sample of the Echomail archive

Cooperative anarchy at its finest, still active today. Darkrealms is the Zone 1 Hub.

   DOS      DOS operating systems      183 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 174 of 183   
   Ben Collver to All   
   Running GNU on DOS with DJGPP part 2   
   18 Feb 24 11:37:52   
   
   TZUTC: -0600   
   MSGID: 176.fido_dos@1:124/5016 2a377720   
   PID: Synchronet 3.20a-Linux master/862753d6c Feb 16 2024 GCC 11.4.0   
   TID: SBBSecho 3.20-Linux master/862753d6c Feb 16 2024 GCC 11.4.0   
   BBSID: EOTLBBS   
   CHRS: ASCII 1   
   NOTE: SlyEdit 1.88d (2024-02-16) (ICE style)   
   Leaving aside the fact that the DOS API is... ehem... bad, this   
   fundamental difference means that any Unix program ported to DOS has   
   a usability problem: you cannot use globs anymore when invoking it!   
   Something as simple and common as gcc -o program.exe *.c would just   
   not work. So then... how can we explain the following output from the   
   showargs.c program, a little piece of code that prints argv?   
      
       D:\>gcc -o showargs.exe showargs.c   
          
       D:\>.\showargs.exe *.c   
       argv[1] = headself.c   
       argv[2] = longcmd1.c   
       argv[3] = longcmd2.c   
       argv[4] = showargs.c   
       argv[5] = showpath.c   
          
       D:\>   
      
   In the picture above, you can see how I ran the showargs.c program   
   with *.c as its own argument and somehow it worked as you would   
   expect. But if we build it with a standard DOS compiler we get   
   different results:   
      
       D:\>tcc showargs.c   
       Turbo C++ Version 3.00 Copyright (c) 1992 Borland International   
       showargs.c:   
       Turbo Link  Version 5.0 Copyright (c) 1992 Borland International   
          
               Available memory 4133648   
          
       D:>.\showargs.exe *.c   
       argv[1] = *.c   
          
       D:>_   
      
   GCC is actually doing something to make glob expansion work--and it   
   has to, because remember that DJGPP was not just about porting GCC:   
   it was about porting many more GNU developer tools to DOS. Having had   
   to patch them one by one to work with DOS' COMMAND.COM semantics   
   would have been a sad state of affairs.   
      
   To understand what's happening here, know that all C programs   
   compiled by any compiler include a prelude: main is not the program's   
   true entry point. All compilers wrap main with some code of their own   
   to set up the process and the C library, and DJGPP is no different.   
   Such code is often known as the crt (or C Runtime) and it comes in   
   two phases: crt0, written in assembly for early bootstrapping, and   
   crt1, written in C.   
      
   As you can imagine, this is where the magic lives. DJGPP's crt1 is in   
   charge of processing the flat command line that it receives from DOS   
   and transforming it into the argv that POSIX C programs expect,   
   following common Unix semantics. In a way, this code performs the job   
   of a Unix shell.   
      
   Once again, take a break to inspect the crt0 sources and, in   
   particular, the contents of the c1args.c file. Pay attention to file   
   reads and the "proxy" thing, both of which bring us to the next   
   section.   
      
   # Long command lines   
      
   Unix command lines aren't different just because of glob expansion.   
   They are also different because they are usually long, and they are   
   long in part because of glob expansion and in part because Unix has   
   supported long file names for much longer than DOS.   
      
   Unfortunately... DOS restricted command lines to a maximum of   
   126 characters--fewer characters than you can fit in a Tweet or an   
   SMS--and this posed a problem because the build process of most GNU   
   developer tools, if not all, required using long command lines. To   
   resolve these issues, DJGPP provides two features.   
      
   The first is support for response files. Response files are text   
   files that contain the full command line. These files are then passed   
   to a process with the @file.txt syntax, which then causes DJGPP's   
   crt1 code to load the response files and construct the long command   
   line in extended memory.   
      
   Let's take a look. If we reuse our previous showargs.c program that   
   prints the command line arguments, we can observe how the behavior   
   differs between building this program with a standard DOS compiler   
   and with DJGPP:   
      
       D:\>type args.txt   
       first   
       second   
          
          
       D:\>gcc -o showargs.exe showargs.c   
          
       D:\>.\showargs.exe @args.txt   
       argv[1] = first   
       argv[2] = second   
          
       D:\>tcc showargs.c   
       Turbo C++ Version 3.00 Copyright (c) 1992 Borland International   
       showargs.c:   
       Turbo Link  Version 5.0 Copyright (c) 1992 Borland International   
          
               Available memory 4133648   
          
       D:\>.\showargs.exe @args.txt   
       argv[1] = @args.txt   
          
       D:\>   
      
   Response files are easy to implement and they are sufficient to   
   support long command lines: even if they require special handling on   
   the caller side to write the arguments to dsk and then place the   
   response file as an argument, this could all be hidden inside the   
   exec family of system calls. Unfortunately, using response files is   
   slow because, in order to invoke a program, you need to write the   
   command line to a file--only to load it immediately afterwards. And   
   disk I/O used to be really slow.   
      
   For this reason, DJGPP provides a different mechanism to pass long   
   command lines around, and this is via the transfer buffer described   
   earlier. This mechanism involves putting the command line in the   
   transfer buffer and telling the executed command where its command   
   line lives. This mechanism obviously only works when executing a   
   DJGPP program from another DJGPP program, because no matter what,   
   process executions are still routed through DOS and thus are bound by   
   DOS' 126 character limit.   
      
   Let's try this too. For this experiment, we'll play with two   
   programs: one that prints the length of the received command line and   
   another one that produces a long command line and executes the former.   
      
   The first program is longcmd1.c and is depicted below. All this   
   program does is allocate a command line longer than DOS' maximum   
   length of 126 characters and, once it has built the command line,   
   invokes longcmd2.exe with said long command line:   
      
       #ifdef __GNUC__   
       #include    
       #else   
       #include    
       #endif   
       #include    
       #include    
       #include    
          
       int main(int argc, char** argv) {   
           char** longcmd;   
           int i;   
          
           // Generate a command line that exceeds DOS' limits.   
           longcmd = (char**)malloc(32);   
           longcmd[0] = argv[0];   
           for (i = 1; i < 31; i++) {   
               longcmd[i] = strdup("one-argument");   
           }   
           longcmd[i] = NULL;   
          
           // Execute the second stage of this demo to print the received   
           // command line.   
           if (execv(".\\longcmd2.exe", longcmd) == -1) {   
               perror("execv failed");   
               return EXIT_FAILURE;   
           }   
           return EXIT_SUCCESS;   
       }   
      
   The second program is longcmd2.c and is depicted below. This program   
   prints the number of arguments it received and also computes the   
   length of the command line (assuming all arguments were separated by   
   just one space character):   
      
       #include    
       #include    
       #include    
          
       int main(int argc, char** argv) {   
           int i;   
           int total;   
          
           total = 0;   
           for (i = 0; i < argc; i++) {   
               if (i > 0) {   
                   total += 1;  // Assume 1 space between arguments.   
               }   
               total += strlen(argv[i]);   
           }   
           printf("argc after re-exec: %d\n", argc);   
           printf("textual length: %d\n", total);   
          
           return EXIT_SUCCESS;   
       }   
      
   Now let's see what happens when we compile these two programs with   
   Turbo C++ and with DJGPP. First, let's build both with Turbo C++ and   
   run the longcmd1.exe entry point:   
      
       D:\>tcc longcmd1.c   
       Turbo C++ Version 3.00 Copyright (c) 1992 Borland International   
       longcmd1.c:   
       Warning longcmd1.c 29: Parameter 'argc' is never used in function main   
       Turbo Link  Version 5.0 Copyright (c) 1992 Borland International   
          
               Available memory 4116968   
          
       D:\>tcc longcmd2.c   
       Turbo C++ Version 3.00 Copyright (c) 1992 Borland International   
       longcmd2.c:   
       Turbo Link  Version 5.0 Copyright (c) 1992 Borland International   
          
               Available memory 4124048   
          
       D:\>.\longcmd1.exe   
       execv failed: Not enough memory.   
          
       D:\>   
      
   Running longcmd1.exe fails because the command line is too long and   
   execv cannot process it. (I'm not exactly sure why execv returns   
   ENOMEM because the Turbo C++ documentation claims that this function   
   should return E2BIG on this condition, but alas.)   
      
   Now, let's build just longcmd1.c with DJGPP and run it:   
      
       D:\>gcc -o longcmd1.exe longcmd.c   
          
       D:\>tcc longcmd2.c   
       Turbo C++ Version 3.00 (c) 1992 Borland International   
       longcmd2.c:   
       Turbo Link  Version 5.0 (c) 1992 Borland International   
          
               Available memory 4124048   
          
       D:\>.\longcmd1.exe   
       argc after re-exec: 13   
       textual length: 141   
          
       D:\>   
      
   We get a bit further now! longcmd1.exe runs successfully and executes   
   longcmd2.exe... but longcmd2.exe claims that the command line is   
   shorter than we expect. This is because DJGPP's execv implementation   
   knew that it was running a standard DOS application not built by   
   DJGPP, so it had to place a truncated command line in the system call   
   issued to DOS. (As a detail also note that this shows 141 and not   
   126: the reason for this is that DOS does not place argv[0] on the   
   command line, but the C runtime has to synthesize this value.)   
      
   But now look at what happens when we also compile longcmd2.c with   
   DJGPP:   
      
       D:\>gcc -o longcmd2.exe longcmd1.c   
          
       D:\>gcc -o longcmd2.exe longcmd2.c   
          
       D:\>.\longcmd1.exe   
       argc after re-exec: 31   
       textual length: 377   
          
       D:\>   
      
   Ta-da! When longcmd2.exe runs, it now sees the full command line.   
   This is because longcmd1.exe now knows that longcmd2.exe understands   
   the transfer buffer arrangement and can send the command line to it   
   this way.   
      
   You can read more about this in the spawn documentation from DJGPP's   
   libc and peek at the dosexec.c sources.   
      
      
      
      
      
   # Unix-style paths   
      
   Let's move on to one more Unix-y thing that DJGPP has to deal with,   
   which is paths and file names. You see, paths are paths in both DOS   
   and Unix: a sequence of directory names (like /usr/bin/) followed by   
   an optional file name (like /usr/bin/gcc). Unfortunately, DOS and   
   Unix paths differ in two aspects.   
      
   The first is that DOS paths separate directory components with a   
   backslash, not a forward slash. This is a historical artifact of the   
   early CP/M and DOS days, where command-line flags used the forward   
   slash (DIR /P) instead of Unix's dash (ls -l). When DOS gained   
   support for directories in its 2.0 release, it had to pick a   
   different character to separate directories, and it picked the   
   backslash. Dealing with this duality in DJGPP-built programs seems   
   easy: just make DJGPP's libc functions allow both and call it a day.   
   And for the most part, this works--and in fact even PowerShell does   
   this on Windows today.   
      
   The second is that DOS paths may include an optional drive name such   
   as C: and... the drive name has the colon character int. While Unix   
   uses the colon character to separate multiple components of the   
   search PATH, DOS could not do that: it had to pick a different   
   character, and it picked the semicolon. Take a look:   
      
       C:\>path   
       PATH=Z:\;C:\DEVEL\BIN;C:\DEVEL\DJGPP\BIN;C:\DEVEL\TC\BIN   
      
   The problem here is that many Unix applications, particularly shell   
   --- SBBSecho 3.20-Linux   
    * Origin: End Of The Line BBS - endofthelinebbs.com (1:124/5016)   
   SEEN-BY: 15/0 90/1 105/81 106/201 987 124/5014 5016 128/260 129/305   
   SEEN-BY: 135/225 153/7715 218/700 226/30 227/114 229/110 112 113 206   
   SEEN-BY: 229/307 317 400 426 428 470 664 700 266/512 282/1038 291/111   
   SEEN-BY: 320/219 322/757 342/200 387/25 396/45 460/58 633/280 712/848   
   SEEN-BY: 5020/400 5075/35   
   PATH: 124/5016 396/45 229/426   
      

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca