home bbs files messages ]

Just a sample of the Echomail archive

Cooperative anarchy at its finest, still active today. Darkrealms is the Zone 1 Hub.

   DOS      DOS operating systems      183 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 173 of 183   
   Ben Collver to All   
   Running GNU on DOS with DJGPP   
   18 Feb 24 11:32:51   
   
   TZUTC: -0600   
   MSGID: 175.fido_dos@1:124/5016 2a3775f2   
   PID: Synchronet 3.20a-Linux master/862753d6c Feb 16 2024 GCC 11.4.0   
   TID: SBBSecho 3.20-Linux master/862753d6c Feb 16 2024 GCC 11.4.0   
   BBSID: EOTLBBS   
   CHRS: ASCII 1   
   NOTE: SlyEdit 1.88d (2024-02-16) (ICE style)   
   # Running GNU on DOS with DJGPP   
      
   Peeking under the covers to see how DJGPP manages to run GCC on DOS   
      
   by Julio Merino   
   Feb 14, 2024   
      
   The recent deep dive into the IDEs of the DOS times 30 years ago made   
   me reminisce of DJGPP, a distribution of the GNU development tools   
   for DOS.   
      
   [Cover image consisting on a tiny portion of the sources of DJGPP's   
   dosexec.c source file, with a big MS-DOS logo in the center   
   surrounded by the logos of GNU, GCC, Bash, and Emacs.]   
      
   I remember using DJGPP back in the 1990s before I had been exposed to   
   Linux and feeling that it was a strange beast. Compared to the   
   Microsoft C Compiler and Turbo C++, the tooling was bloated and alien   
   to DOS, and the resulting binaries were huge. But DJGPP provided a   
   complete development environment for free, which I got from a monthly   
   magazine, and I could even look at its source code if I wished. You   
   can't imagine what a big deal that was at the time.   
      
   But even if I could look under the cover, I never did. I never really   
   understood why was DJGPP so strange, slow, and huge, or why it even   
   existed. Until now. As I'm in the mood of looking back, I've spent   
   the last couple of months figuring out what the foundations of this   
   software were and how it actually worked. Part of this research has   
   resulted in the previous two posts on DOS memory management. And part   
   of this research is this article. Let's take a look!   
      
   Special thanks go to DJ Delorie himself for reviewing a draft of this   
   article. Make sure to visit his website for DJGPP and a lot more cool   
   stuff!   
      
      
      
   # What is DJGPP?   
      
   Simply put, DJGPP is a port of the GNU development tools to DOS. You   
   would think that this was an easy feat to achieve given that other   
   compilers did exist for DOS. However... you should know that Richard   
   Stallman (RMS)--the creator of GNU and GCC--thought that GCC, a   
   32-bit compiler, was too big to run on a 16-bit operating system   
   restricted to 1 MB of memory. DJ Delorie took this as a challenge in   
   1989 and, with all the contortions that we shall see below, made GCC   
   and other tools like GDB and Emacs work on DOS.   
      
   To a DOS and Windows user, DJGPP was, and still is, an alien   
   development environment: the tools' behavior is strange compared to   
   other DOS compilers, and that's primarily due to their Unix heritage.   
   For example, as soon as you start using DJGPP, you realize that flags   
   are prefixed by a dash instead of a slash, paths use forward slashes   
   instead of backward slashes, and the files don't ship in a flat   
   directory structure like most other programs did. But hey, all the   
   tools worked and, best of all, they were free!   
      
   In fact, from reading about the historical goals of the project, I   
   gather that a secondary goal was for DJ to evangelize free software   
   to as many people as possible, meeting them where they already were:   
   PC users with a not-very-powerful machine that ran DOS. Mind you,   
   this plan worked on some of us as we ended up moving to Linux and the   
   free software movement later on.   
      
      
      
   In any case, being a free alien development environment doesn't   
   explain why it had to be huge and slow compared to other others. To   
   explain this, we need to look at the "32-bit compiler" part.   
      
   # DOS and hardware constraints   
      
   As we saw in a previous article, Intel PCs based on the 80386 have   
   two main modes of operation: real mode and protected mode. In real   
   mode, the processor behaves like a fast 16-bit 8086, limiting   
   programs to a 1 MB address space and with free reign to access memory   
   and hardware peripherals. In protected mode, programs are 32-bit,   
   have access to a 4 GB address space, and there are protection rules   
   in place to access memory and hardware.   
      
      
      
   DOS was a 16-bit operating system that ran in real mode. Applications   
   that ran on DOS leveraged DOS' services for things like disk access,   
   were limited to addressing 1 MB of memory, and had complete control   
   of the computer. Contrary to that, GCC was a 32-bit program that had   
   been designed to run on Unix (oops sorry, GNU is Not Unix) and   
   produce binaries for Unix, and Unix required virtual memory from the   
   ground up to support multiprocessing. (I know that's not totally   
   accurate but it's easier to think about it that way.)   
      
      
      
      
      
   Intel-native compilers for DOS, such as the Microsoft C compiler and   
   Turbo C++, targeted the 8086's weird segmented architecture and   
   generated code accordingly. Those compilers had to deal with short,   
   near, and far jumps--which is to say I have extra research to do and   
   write another article on ancient DOS memory models. GCC, on the other   
   hand, assumes the full address space is available to programs and   
   generates code making such assumptions.   
      
   GCC was not only a 32-bit program, though: it was also big. In order   
   to compile itself and other programs, GCC needed more physical memory   
   than PCs had back then. This means that, in order to port GCC to DOS,   
   GCC needed virtual memory. In turn, this means that GCC had to run in   
   protected mode. Yet... DOS is a real mode operating system, and   
   calling into DOS services to access files and the like requires the   
   processor to be in real mode.   
      
   To address this conundrum, DJ had to find a way to make GCC and the   
   programs it compiles integrate with DOS. After all, if you have a C   
   program that opens a file and you compile said program with GCC, you   
   want the program to open the file via the DOS file system for   
   interoperability reasons.   
      
   Here, witness this. The following silly program, headself.c, goes out   
   of its way to allocate a buffer above the 2 MB mark and then uses   
   said buffer to read itself into it, printing the very first line of   
   its source code:   
      
       #include    
       #include    
       #include    
       #include    
       #include    
          
       #define BUFMINBASE  2 * 1024 * 1024   
       #define BUFSIZE     1 * 1024 * 1024   
          
       int main(void) {   
           // Allocate a buffer until its base address is past the 2MB boundary.   
           char* buf = NULL;   
           while (buf < (char*)(BUFMINBASE))   
               buf = (char*)malloc(BUFSIZE);   
           printf("Read buffer base is at %zd KB\n", ((intptr_t)buf) / 1024);   
          
           // Open this source file and print its first line.  Really unsafe.   
           int fd = open("headself.c", O_RDONLY);   
           read(fd, buf, BUFSIZE);   
           char *ptr = buf; while (*ptr != '\n') ptr++; *(ptr + 1) = '\0';   
           printf("%s", buf);   
          
           return EXIT_SUCCESS;   
       }   
      
   Yes, yes, I know the above code is really unsafe and lacks error   
   handling throughout. But that's not important here. Watch out what   
   happens when we compile and run this program with DJGPP on DOS:   
      
       D:\>head -n1 headself.c   
       #include    
          
       D:\>gcc -o headself.exe headself.c   
          
       D:\>.\headself.exe   
       Read buffer is at 2673 KB   
       #include    
          
       D:\>_   
      
   Note two things. The first is that the program has to have run in   
   protected mode because it successfully allocated a buffer above the   
   1 MB mark and used it without extraneous API calls. The second is   
   that the program is invoking file operations, and those operations   
   interact with files managed by DOS.   
      
   And here is where the really cool stuff begins. On the one hand, we   
   have DOS as a real mode operating system. On the other hand, we have   
   programs that want to interoperate with DOS but they also want to   
   take advantage of protected mode to leverage the larger address space   
   and virtual memory. Unfortunately, protected mode cannot call DOS   
   services because those require real mode.   
      
   The accepted solution to this issue is the use of a DOS Extender as   
   we already saw in the previous article but such technology was in its   
   infancy. DJ actually went through three different iterations to fully   
   resolve this problem in DJGPP:   
      
      
      
   1. The first prototype used Phar Lap's DOS Extender but it didn't get   
      very far because it didn't support virtual memory.   
      
   2. Then, the first real version of DJGPP used DJ's own DOS Extender   
      called go32, a big hack that I'm not going to talk about here.   
      
   3. And then, the second major version of DJGPP--almost a full rewrite   
      of the first one--switched to using the DOS Protected Mode   
      Interface (DPMI).   
      
   At this point, DJGPP was able to run inside existing DPMI hosts such   
   as Windows or the many memory managers that already existed for DOS   
   and it didn't have to carry the hacks that previously existed in go32   
   (although the go32 code went on to live inside CWSDPMI). The   
   remainder of this article only talks about the latter of these   
   versions.   
      
   # Large buffers   
      
   One thing you may have noticed in the code of the headself.c example   
   above is that I'm using a buffer for the file read that's 1 MB-long.   
   That's not unintentional: for such a large buffer to even exist (no   
   matter our attempts to push it above 2 MBs), the buffer must be   
   allocated in extended memory. But if it is allocated in extended   
   memory, how can the file read operations that we send to DOS actually   
   address such memory? After all, even if we used unreal mode, the DOS   
   APIs wouldn't understand it.   
      
   The answer is the transfer buffer. The transfer buffer is a small and   
   static piece of memory that DJGPP-built programs allocate at startup   
   time below the 1 MB mark. With that in mind, and taking a file read   
   as an example, DJGPP's C library does something akin to the   
   following:   
      
   1. The protected-mode read stub starts executing.   
      
   2. The stub issues a DPMI read call (which is to say, it executes the   
      DOS read file API but uses the DPMI trampoline) onto the transfer   
      buffer.   
      
   3. The DPMI host switches to real mode and calls the DOS read file API.   
      
   4. The real-mode DOS read places the data in the transfer buffer.   
      
   5. The real-mode DPMI host switches back to protected mode and   
      returns control to the protected-mode stub.   
      
   6. The protected-mode read stub copies the data from the transfer   
      buffer into the user-supplied buffer.   
      
   This is all good and dandy but... take a close look at DOS's file   
   read API:   
      
       Request:   
       INT 21h   
       AH    -> 3Fh   
       BX    -> file handle   
       CX    -> number of bytes to read   
       DS:DX -> buffer for data   
          
       Return:   
       CF    -> clear if successful   
       AX    -> number of bytes actually read (0 if at EOF before call)   
       CF    -> set on error   
       AX    -> error code (05h,06h) (see #01680 at AH=59h/BX=0000h)   
      
   That's right: file read and write operations are restricted to 64 KB   
   at a time because the number of bytes to process is specified in the   
   16-bit CX register. Which means that, in order to perform large file   
   operations, we need to go through the dance above multiple times in a   
   loop. And that's why DJGPP is slow: if the DPMI host has to switch to   
   real mode and back for every system call, the overhead of each system   
   call is significant.   
      
   Now is a good time to take a short break and peek into DJGPP's read   
   implementation. It's succinct and clearly illustrates what I   
   described just above. And with that done, let's switch gears.   
      
      
      
   # Globs without a Unix shell   
      
   Leveraging protected mode and a large memory address space are just   
   two important but small parts of the DJGPP puzzle. The other   
   interesting pieces of DJGPP are those that make Unix programs run   
   semi-seamlessly on DOS, and there are many such pieces. I won't cover   
   them all here because Eli Zarateskii's presentation did an excellent   
   job at that. So want I to do instead is look at a subset of them   
   apart and show them in action.   
      
      
      
   To begin, let's try to answer this question: how do you interact with   
   a program originally designed for Unix on a DOS system? The Unix   
   shell is a big part of such interaction and COMMAND.COM is no Unix   
   shell. To summarize the linked article: the API to invoke an   
   executable on Unix takes a list of arguments while on DOS and Windows   
   it takes a flat string. Partially because of this, the Unix shell is   
   responsible for expanding globs and dealing with quotation   
   characters, while on DOS and Windows each program is responsible for   
   tokenizing the command line.   
      
      
      
   Leaving aside the fact that the DOS API is... ehem... bad, this   
   fundamental difference means that any Unix program ported to DOS has   
   a usability problem: you cannot use globs anymore when invoking it!   
   Something as simple and common as gcc -o program.exe *.c would just   
   not work. So then... how can we explain the following output from the   
   showargs.c program, a little piece of code that prints argv?   
      
       D:\>gcc -o showargs.exe showargs.c   
          
       D:\>.\showargs.exe *.c   
       argv[1] = headself.c   
       argv[2] = longcmd1.c   
   --- SBBSecho 3.20-Linux   
    * Origin: End Of The Line BBS - endofthelinebbs.com (1:124/5016)   
   SEEN-BY: 15/0 90/1 105/81 106/201 987 124/5014 5016 128/260 129/305   
   SEEN-BY: 135/225 153/7715 218/700 226/30 227/114 229/110 112 113 206   
   SEEN-BY: 229/307 317 400 426 428 470 664 700 266/512 282/1038 291/111   
   SEEN-BY: 320/219 322/757 342/200 387/25 396/45 460/58 633/280 712/848   
   SEEN-BY: 5020/400 5075/35   
   PATH: 124/5016 396/45 229/426   
      

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca