Heeeeex!

Tools, assembly, and file formats.
Post Reply
thehackercat
Posts: 69
Joined: Sat Sep 26, 2009 10:49 pm
Location: Mississippi, USA

Heeeeex!

Post by thehackercat »

$48 $69 $20 $54 $68 $65 $72 $65 $21
(Hi There!)
I think I've got hex figured out now!
What I need to figure out now is how exactly Keen:Vorticons is built in hex.
What language is it written in? BASIC? C? Or one of those dedicated languages that's built from source code and takes ages and ages to learn and- [/panic]

Could someone shed some light on this?

It should be noted that I possess binary dumps of keens 1-6, courtesy of Levellass.
Draik
Posts: 117
Joined: Sat Jul 26, 2008 8:52 am
Contact:

Post by Draik »

Weeeell, according to the various developers of Keen, it was originally programmed in a combination of C and assembly. The dumps you have are in assembly (as converting programs back to high-level source code such as C is nigh impossible - you can translate it backwards, but the effect is rather like translating from english to japanese and back again, i.e. not pretty). It doesn't matter what language the program was originally in - once it is compiled down to an .exe file, it's in assembly. I have no clue as to what the assembly commands are in hex (not a huge assembly fan), but it also seems you are misunderstanding slightly what hex is.

Hex (short for hexadecimal) is merely a numbering system, specifically base 16. (Decimal, our standard numbering system, is base 10, binary, a computer's native numbering system, is base 2.) Due to a computer's ability to only work in ones and zeros (zeroes?), all numbers are in base 2. People, having been taught base 10 from as soon as they could count, have a little difficulty understanding binary. 16, being a power of two, is easily convertable between binary, and is slightly easier on the human eye, too.

All computers can understand is numbers. Therefore, in a computer, everything is represented by numbers. Letters are numbers (as demonstrated by the handful of ASCII code you displayed at the top of your post), graphics are numbers, program instructions are numbers. What you want to learn is not so much the numbering system (although you seem to have done so quite well, and it is useful anyways) as the program instructions, ie assembly.

You may or may not have known most of that already, but anyways. I will now hand over to someone a bit more qualified in this area, like lemm or levellass.
levellass
Posts: 3001
Joined: Wed Oct 11, 2006 12:03 pm
Location: Ngaruawahia New Zealand

Post by levellass »

Hello there. I'm slightly more qualified.


Keen itself says it was written in C+ (At the start of its text segment) but as it's been said, in the executable it's in assembly. Assembly is actually pretty easy to learn, once you get used to it. The problem is getting used to it, as it's all numbers.

First up, we'll need to be practical here, there is NO way you're going to be able to write out the whole executable with things like '$3DEC: Here's where Keen checks for door tiles' and whatnot. There's just too much data and it's too complicated.

There are programs that can 'decompile' the whole thing automatically, but what comes out is little better than what goes in, you may have seen some of this stuff where Spleen was trying to work out the robot shot, it'll tell you what each byte of assembly is doing, (And helpfully work out where jumps, calls, etc go.) but it doesn't do much to help you.

As I've said before, you're better off asking for something, then seeing how people solve the problem. (Many a patcher has started off trying to make a Keen 2/3 patch from a Keen 1 one) As an example, here is how I dissected the sprite spawning code of the Butler Bot found at $1777 in Keen 1:

Code: Select all

55 #Start something big
8B #Start
EC 56 
E8 B7 11 #Jump to 'spawn something' code
8B #Start
F0 
C7 #Set...
04 #Variable 4...
0F 00 #To 15
8B #Start
46 04 99 B1 0C 
E8 C4 C9 #Jump to something or other
89 44 04 89 54 06 
8B #Start
46 06 99 B1 0C 
E8 B5 C9 #Jump to same thing as before
89 44 08 89 54 0A 
C7 #Set...
44 
20 #Variable 32... (Initial speed)
5A 00 #To 90
8B #Start
44 06 
8B #Start
54 04 3B 06 E0 6E 7F 10 7C 06 3B 16 DE 6E 73 08 
8B #Start
44 20 F7 D8 89 44 20 
C7 #Set...
44 
32 #Variable 50...
C7 #Set...
1D 
C7 #Set...
44 
34 #Variable 52... (Sprite behavior)
94 1E #to Butler Bot walk (At $1E94)
C7 #Set...
44 
28 #Variable 46... (Sprite to show)
60 00 #to 96
5E #Close 1
5D #Close 2
C3 #Close big

Now as you can see, this is total gibberish, mostly because I didn't fill everything in, but also because I didn't use the proper wording. But *I* can understand it; and by comparing it to other bits of code, can make patches I might want. I am sure even you can see a few things to look for (For example C7 44 28 will be just before the initial sprite to show for all spawned sprites) and you can gradually work your way up from there.


I now turn you over to Lemm, who has actually studied the proper mouth words for things rather than bumbling along.
lemm
Posts: 554
Joined: Sun Jul 05, 2009 12:32 pm

Post by lemm »

This is the output from the dissassembler, which takes the .exe and turns the machine instructions (numbers) into readable operation codes. For example, in levellass' post, the first number is $55. This is the opcode for "push bp," the first instruction in the dissassembly.

I have done some commenting to name the variables. The seg000:xxxx is the location in the executable (in bytes) where the instruction resides. The CPU goes through the instructions, line by line, moving number back and forth between memory and the cpu, and sometimes performing operations in the cpu. The two-letter words like ax, dx, etc. are called registers. These are 16-bit locations in the CPU that hold numbers temporarily so that the CPU can do operations on them. The bracketed regions [bp+4. si+sprite..etc] refer to locations in memory.

You can see that most of the operations (in the second column) are:

mov (move bytes from the cpu to/from memory),
call (call a subroutine)
cmp, jl/jg (compare two values, jump if greater, or jump if less to another instruction)


Code: Select all

seg000:1777 ; ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦ S U B R O U T I N E ¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦
seg000:1777
seg000:1777 ; Attributes: bp-based frame
seg000:1777
seg000:1777 add_monster_2_butler proc near          ; CODE XREF: sub_115EB+8Dp
seg000:1777
seg000:1777 tile_x_spawn    = word ptr  4
seg000:1777 tile_y_spawn    = word ptr  6
seg000:1777
seg000:1777                 push    bp              ; preserve callers stack frame
seg000:1778                 mov     bp, sp
seg000:177A                 push    si
seg000:177B                 call    add_monster     ; allocate memory for new sprite
seg000:177E                 mov     si, ax          ; add_monster returns pointer to memory in ax
seg000:177E                                         ; save this pointer (memory location) in si
seg000:1780                 mov     [si+sprite.type], 5 ; store 5 in memory there, because butler bots are type = 5 in game
seg000:1784                 mov     ax, [bp+tile_x_spawn] ; load x spawn coord in TILES to ax
seg000:1787                 cwd                     ; manipulate...
seg000:1788                 mov     cl, 0Ch
seg000:178A                 call    near ptr H_LLSH
seg000:178D                 mov     word ptr [si+sprite.pos_x], ax ; and store the resulting coordinate in 256ths of a pixel in sprite memory
seg000:1790                 mov     word ptr [si+(sprite.pos_x+2)], dx
seg000:1793                 mov     ax, [bp+tile_y_spawn] ; do the same for y
seg000:1796                 cwd
seg000:1797                 mov     cl, 0Ch
seg000:1799                 call    near ptr H_LLSH
seg000:179C                 mov     word ptr [si+sprite.pos_y], ax
seg000:179F                 mov     word ptr [si+(sprite.pos_y+2)], dx
seg000:17A2                 mov     [si+sprite.vel_x], 5Ah ; 'Z' ; store velocity of $5A in memory
seg000:17A7                 mov     ax, [si+6]      ; next few instructions compare butler bot position to keen position
seg000:17AA                 mov     dx, [si+4]
seg000:17AD                 cmp     ax, word ptr sprite_array.pos_x+2
seg000:17B1                 jg      short functions
seg000:17B3                 jl      short startsright
seg000:17B5                 cmp     dx, word ptr sprite_array.pos_x
seg000:17B9                 jnb     short functions
seg000:17BB
seg000:17BB startsright:                            ; CODE XREF: add_monster_2_butler+3Cj
seg000:17BB                 mov     ax, [si+20h]    ; if keen starts to the left, then NEGate velocity
seg000:17BB                                         ; so the butler bot heads left
seg000:17BE                 neg     ax
seg000:17C0                 mov     [si+20h], ax
seg000:17C3
seg000:17C3 functions:                              ; CODE XREF: add_monster_2_butler+3Aj
seg000:17C3                                         ; add_monster_2_butler+42j
seg000:17C3                 mov     word ptr [si+32h], offset think_2_butler ; store pointers to behaviour...
seg000:17C8                 mov     word ptr [si+34h], offset contact_2_butler ; ... and to contact function
seg000:17CD                 mov     word ptr [si+28h], 60h ; '`' ; animation frame 60
seg000:17D2                 pop     si
seg000:17D3                 pop     bp              ; restore stack frame
seg000:17D4                 retn
seg000:17D4 add_monster_2_butler endp ; sp =  4
seg000:17D4
After you learn the syntax, you can write your own little routines to make new sprite behaviours and things. There's only about 20 commands that you will use 95% of the time, so it isn't hard to learn. "The Art of Assembly Programming" is what I read. It's free online.
Post Reply