July 25th, 2011, 11:44 Posted By: wraggster
Pate has updated his Dos Emulator for the Nintendo DS, heres the news:
Sorry it took me over a month to get a new version released. However, this version now has two noteworthy improvements, namely FPU support and the fact that this version is now built with version 0.13beta of the SDK. This version of the SDK has already been released at the end of last year, but I only recently noticed that I still used the even older 0.12beta version to build DS2x86. This might have caused at least some of the audio problems, as the audio features were somewhat improved in SDK v0.13beta. I also decided to jump the version number to 0.20 with this version, to show that it has some major internal changes.
The biggest enhancement is the addition of FPU support. For now this only works in 32-bit protected mode (meaning those DOS4GW games), as those are the games that mostly expect the existence of an FPU. My two test games, X-COM UFO and Destruction Derby, now seem to run the FPU parts fine. The problems I mentioned in the previous blog post were actually both caused by the FPU opcodes, though in some rather interesting ways. It took me almost three days to track down and fix the problems. There is still a rather serious problem in the texture mapping in Destruction Derby, but I am not sure if that has anything to do with the FPU opcodes. In any case, Destruction Derby runs much too slow to be properly playable, so fixing the texture mapping is not a high priority. Fixing it might fix some other games as well, so I do plan to look into it at some point.
Anyways, the FPU problem in Destruction Derby was in the FCOMP (compare) opcode. When I first tested that opcode it seemed to return a correct result, so I spent a lot of time looking elsewhere for the problem. In the end it turned out that I had a small typo in the opcode handler, I had written "sll t4, t3" when I meant "sll t4, t3, 1", and the assembler had interpreted it as "sllv t4, t4, t3". In this case the assembler tried to be a bit too smart, when it replaced the sll opcode (which can only have an immediate shift value) with the sllv opcode (shift by a register value). Since I had forgotten to type the immediate value, it would have been nicer if the assembler had given an error instead. That problem meant that the comparison result was mostly random, it sometimes returned the correct result and sometimes not, somewhat depending on the lowest 5 bits in the registers I wanted to shift left by one bit before the comparison.
A more interesting problem was the one that affected X-COM UFO. It behaved very erratically, dropping to a debugger with an unsupported opcode at random locations, where the opcode should have been supported. That pointed to the interrupt routine failing to execute properly and dropping back to where the interrupt should have happened. It took me a while to notice that when it dropped into the debugger, the Virtual 86 mode flag in the flags registers was set! I had not coded support for interrupts in virtual 86 mode yet, so that was the reason for the weird unsupported opcodes. However, I was pretty sure that the game does not set this flag on purpose, so I added a check into DS2x86 to drop to the debugger immediately after this bit gets set in the CPU flags. After a few test runs, the results were quite interesting. Once the bit did not get set, but DS2x86 dropped into the debugger anyways. In other runs, twice the opcode was fsin, and once fcos, two of my new FPU opcodes.
The interesting part was that I did not touch the CPU flags in the fsin and fcos opcode handlers at all! I did store the register that contained the flags into stack, along with various other registers that calling the GCC math library sin() and cos() functions will globber, and then restored the registers after the call. I could not immediately see anything wrong with my code, so just for fun I copied the flags before the GCC library call to the EAX emulated register and the same flags after restoring them from stack to ECX emulated register, and then added a forced drop to the debugger after the call. And, much to my surprise, the value restored from the stack was completely different to the value I pushed to the stack!
I looked into the dump file for what happens in the GCC library sin() and cos() functions, and noticed that indeed, they destroy the two topmost words of the caller's stack!. I have a very hard time believing this behaviour is by design, this might actually be a bug in the GCC compiler itself. Here below is the dump from the start of the math library sin() function. The input is a 64-bit double value, in 32-bit registers a0 (low word) and a1 (high word).
80250fd0: 3c027fff lui v0,0x7fff
80250fd4: 3442ffff ori v0,v0,0xffff
80250fd8: 00a23024 and a2,a1,v0
80250fdc: 3c033e50 lui v1,0x3e50
80250fe0: 27bdfda8 addiu sp,sp,-600 // sp -= 600, make room for local variables
80250fe4: 00c3182a slt v1,a2,v1
80250fe8: afbe0250 sw s8,592(sp)
80250fec: afbf0254 sw ra,596(sp)
80250ff0: afb7024c sw s7,588(sp)
80250ff4: afb60248 sw s6,584(sp)
80250ff8: afb50244 sw s5,580(sp)
80250ffc: afb40240 sw s4,576(sp)
80251000: afb3023c sw s3,572(sp)
80251004: afb20238 sw s2,568(sp)
80251008: afb10234 sw s1,564(sp)
8025100c: afb00230 sw s0,560(sp)
80251010: afa40258 sw a0,600(sp) // Store the low word of input to sp+600
80251014: afa5025c sw a1,604(sp) // Store the high word of input to sp+604
You can see that the routine first reserves 600 (0x258) bytes of stack space for local variables, but then stores the input value to offsets 600 and 604 from the start of the reserved space! This effectively destroys the two last words that the caller had pushed into the stack, which in DS2x86's case were the emulated CPU flags and a stack size mask register. So, after returning from fsin or fcos routines, the flags were randomly set, and also the stack-relative addressing might have addressed the wrong memory area.
This problem was easy to overcome by leaving two extra words to the top of the stack when calling these functions, but if you are writing software for DSTwo yourself, beware this problem! I spent some time googling whether this was a known problem and if a fix is available, but did not immediately find anything. Let me know if you know the correct place to look for this information!
I also made some other fixes, based on the debug logs you have sent. Still another change is that I changed the Makefile to compile the C modules with -no-long-calls option, which makes the code smaller and faster. I don't know why the SDK defaults to long-calls, as they are only needed if the code does not fit into a single 128MB block, and the DSTwo only has 32MB of RAM! Anyways, let me know if you run into any new problems with this version, and feel free to test the games that used to report unsupported FPU opcodes! Thanks again for your interest in DSx86 and DS2x86!
Download via Comments
For more information and downloads, click here!
There are 0 comments - Join In and Discuss Here