; ; play3.z - dual task audio/video streaming from diskette ; ; Author: George Phillips ; ; This program can play back a short movie from several floppy diskettes. ; It outputs both video and audio and is known to handle 30 second clips. ; Playback time is theoretically unlimited as long as you're OK with ; constantly swapping floppies. ; ; Requires a TRS-80 Model 4 with 64 KB of no wait state RAM and dual floppies. ; These limitations could be checked but are not. ; ; The program will wait for a key to be pressed before beginning. The user ; must have the first two data disks in the floppy drives and ready. Once ; the light on the first drive light goes off the 3rd disk should be inserted. ; And the 4th disk when the second drive light goes off. And so on if more ; floppies are needed. ; ; On any error the program jumps to 'err;' which displays a single '!' in ; the top left corner. Anyone modifying the program would do well to add ; register dumps at that point to assist debugging. ; ; There are two threads of control. The disk thread reads data from ; the floppies and puts it into a ring buffer. The output thread reads ; data from the ring buffer and outputs the video and audio. The two ; threads are rate matched so only a small amount of buffering is needed ; to cover the times when the disk thread is switching between tracks or ; across floppies. The rate matching is not perfect so that will ultimately ; limit how long the streaming can continue without problem. ; ; The threads are run in lockstep with each getting a fixed portion of a ; 128 cycle steps. A pre-built stack allows the output thread to pass ; control to the disk thread using the 'RET' instruction. The disk thread ; passes control to the output thread using 'JP (HL)'. The disk thread ; moves to its next step automatically but can jump to other steps by loading ; the stack pointer. The output thread must load HL with a new value to ; choose a different step. Both use unrolling to save time and keep the ; coding simpler. The program itself is small but uses up considerable ; space as it unpacks this unrolled program code. ; ; The output thread has BC, DE and AF' for its use. The disk thread can use ; BC', DE' and HL'. Both may use AF but only within a step as the other ; thread may change it. IX and IY are unused. ; ; The 128 cycle step was chosen because floppy disk bytes arrive every ; 129.76 cycles. When reading bytes from the floppy the timing will be ; fixed by the floppy data rate as the Z-80 will be forced to wait until ; the data is ready. During disk seeks and such the timing is maintained ; by ensuring the disk and output threads always use exactly their allotted ; time quanta. Each step is padded with otherwise useless instructions ; in order to meet this restriction. Macros and assert statements are used ; extensively to enforce these rules and ease the programming burden. ; The latest version of zmac is needed and recommended to assemble: ; http://members.shaw.ca/gp2000/zmac.html ; ; The disk thread is given a 54 cycle quantum which leaves a 74 cycle ; quantum for the output thread. ; ; Approximate memory map: ; ; $0800 - $37ff 12 KB "ret" stack for disk thread ; $4000 - $7aff 14.5 KB unrolled code for output thread ; $7b00 - $7fff Main program startup code ; $8000 - $ffff 8 KB ring buffer for disk data ; ; 1 byte audio, 7 bytes video - basic movie unit ; stack equ $3800 ; end of ~ 12 K for ret-controlled disk thread audvid equ $4000 ; unrolled audio/video display code (output thread) ring equ $8000 ; start of 32 KB disk input ring buffer ; Hard coded parameters from movie generator. audblk equ 4 ; 64 bit (8 byte) audio blocks per frame tdatln equ 5156 ; data bytes per track framcnt equ 684 ; frames in movie numdsk equ 4 ; diskettes to read org $7b00 proglow: ; -------------------- Disk Thread -------------------------- ; Various macros for construction of each step in the thread. ; Ultimately the "ret" stack is an array of addresses of each step to use ; in sequence. To speed loading this "ret" stack is assembled as ; run-length encoded (RLE) data. A control word with the high bit ($8000) ; set uses the lower 15 bits to record twice the number of times the ; following word is repeated. All the other control words indicate the ; length of literal data to copy onto the "ret" stack. ; ; The RLE data for the "ret" stack is appended on to the end of the program ; assembly. The macros will ORG to the program end, add some RLE data ; and then ORG back to where assembly is happening. dskqnt equ 54 ; Disk thread quantum ; Fail assembly if the current cycle count is not exactly the disk quantum dq_check macro quant defl t($) assert quant == dskqnt endm ; Working variables to track the construction of the disk thread stack. stpnum defl 0 ; size of "ret" stack stpbase defl stack_init ; pointer to literal data size count stporg defl stack_init+2 ; where to store next step address in block stpcnt defl 0 ; number of steps in literal block ; Macro for starting the next step in the disk thread. The step is called ; and stpoff_ is defined to record the step number. It adds ; the state to the current block of literal data under construction. step macro name stpoff_`name defl stpnum ; remember step number for goto stpnum defl stpnum+2 ; "ret" stack has another entry stpcnt defl stpcnt+1 ; one more step in the current literal block sett 0 ; reset zmac's cycle counter name: ; label the step org stporg ; record address of step in RLE data dw name stporg defl $ ; update RLE data pointer org name ; go back to where we were assembling endm ; End the current block of literal data. endlit macro assert stpcnt > 0 ; fail if no steps in literal block tmp defl $ ; remember where we are org stpbase dw stpcnt*2 ; record size of literal block org tmp ; return to assembling where we were stpcnt defl 0 ; no steps in literal data stpbase defl stporg ; get ready for next stporg defl stporg+2 ; RLE data record endm ; Reuse a step. The current step in the disk thread does not require new ; code but simply repeats a previously generated step. No extra code is ; assembled, but the "ret" stack still grows by "count" words. reuse macro name,count endlit tmp defl $ org stpbase ; directly emit dw $8000|((count)*2) ; 1 or more repeats dw name ; of step 'name' stpnum defl stpnum+(count)*2 ; record "ret" stack size growth stpbase defl $ ; get ready for next stporg defl $+2 ; RLE data record org tmp ; as you were endm ; Load the stack pointer so that the next step will be the one given. ; "goto" is a little misleading as it doesn't immediately transfer control ; but instead means control will transfer to "name" when the output thread ; returns back to the disk thread. goto macro name ld sp,stack_top+stpoff_`name endm ; Helper macros to get the timing right for conditional jumps. The tricky bit ; is that each branch of a conditional jump must end up using the same number ; of cycles. "baljp" records the number of cycles used in the current step. ; "tail" is used to record the cycles used if the conditional jump is not ; taken. Then "tail_check" is called after the code at "label" to ensure that ; the cycle count is the same as the not taken case. baljp macro cond,label jp cond,label bjt0 defl t($) endm tail macro name bjt1 defl t($) case1 defl bjt1 - bjt0 name: endm tail_check macro case2 defl t($) - bjt1 assert case1 == case2 endm ; -------------------- Disk Thread -------------------------- ; Using the helper macros we can program the disk thread in a straight ; line fashion. Instructions added for the purpose of padding timing ; are generally commented as "balance" instructions. ; ; The disk thread reads all the tracks from 0 to 39 off each drive in ; turn until "numdsk" floppies have been read. The tracks themselves ; are raw data without sector markings save for a few sync bytes at the ; start followed by a 0 to indicate the start of the data. ; ; The entire track is not read, only the first "tdatln" data bytes. This ; means I don't need (and don't get to) the NMI that signals the end of ; the track (though I do set it up as a vestige of original testing code). ; Moreover, it is critical to data throughput. Track reads all start at ; the same physical rotation of the diskette (at the index hole). Reading ; an entire track would mean waiting a full rotation before the next track ; could be read. Instead we read most of the track and seek to the next ; one leaving enough time for the seek to happen (6 milliseconds says the ; manual) and for the head to settle and be able to read data. I don't ; recall any recommended time to wait for settling. Instead, the limits ; were determined by experiment and even a mere 6 ms or less is enough. ; I think mechanically the seeks are faster than required by the controller. ; ; The main program sets "drive" to 0, "track" to 0, loads HL' with the ; start of the ring buffer ("ring") and starts us as step "stream0". ; Prepare to read a track from drive 0. ; stream0 and strrm0a modify the track reading code to ; select drive 0 access with and without wait states as necessary. step stream0 jp $+3 ; balance ld a,$81 ; balance ld a,$81 ; select drive 0 with no wait states ld (dr0),a ; modify track ld (dr1),a ; reading code jp (hl) ; run output thread dq_check ; remember that we fall through to the next step step strm0a ld a,$c1 ; balance ld a,$c1 ; select drive 0 with wait states ld (drw0),a ; modify track ld (drw1),a ; reading code goto stream ; continue on at "step stream" jp (hl) ; run output thread dq_check ; Prepare to read a track from drive 1. step stream1 jp $+3 ; balance ld a,$82 ; balance ld a,$82 ; select drive 0 with no wait states ld (dr0),a ; modify track ld (dr1),a ; reading code jp (hl) ; run output thread dq_check ; remember that we fall through to the next step step strm1a ld a,$c2 ; balance ld a,$c2 ; select drive 0 with wait states ld (drw0),a ; modify track ld (drw1),a ; reading code goto stream ; continue on at "step stream" jp (hl) ; run output thread dq_check ; Begin streaming data from the drive selected via self-modification. step stream ld a,$81 dr1 equ $-1 out ($f4),a ; select drive with no wait states ld a,$08 ; restore to track 0 without verify, 6 ms step out ($f0),a ; start the disk seek to track 0 jp $+3 ; balance nop ; balance jp (hl) ; run output thread (I won't say it again) dq_check ; Disk commands require a short amount of time to pass before the ; processor will get reliable status data from the controller. This step ; does nothing but waste time and we'll be reusing it quite a bit. step rest jp $+3 rept 10 nop endm jp (hl) dq_check reuse rest,1 ; Repeat that step, so 256 cycles pass ; Some of the most painful series of steps as we wait for the seek to ; track 0 to complete and check for any errors. It'd be easier if we ; had more time to check multiple conditions at once but instead must ; break the process down into multiple steps. step cw1 in a,($f0) ; get status of "restore" command exx ; switch to out working errors ld b,a ; save status in B' exx and 1 ; look at bit 0 of status baljp nz,cont ; if bit 0 set then goto cwechk ; restore complete, check for error next jp (hl) dq_check tail cont goto cw2 ; check for timeout jp (hl) tail_check step cwechk ld a,0 ; balance nop ; balance exx ld a,b ; get the status we saved exx and ~($20|4|2) ; track 0, head loaded and one other bit call nz,err ; are expected/don't care, otherwise error! goto nxttrk ; All OK, then start reading the track jp (hl) dq_check step cw2 exx bit 7,b ; Did the restore operation time out? call nz,err ; Yes, too bad. ld bc,$f3 ; Set data port ahead of time (and balance!) exx nop ; balance goto cw1 ; No timeout, keep polling the status jp (hl) dq_check ; At this point either a restore or a track seek has completed. ; Therefore we issue the track read command. step nxttrk jp $+3 ; balance nop ; balance ld a,$e8 ; read track, no settle out ($f0),a ld a,$80 out ($e4),a ; allow NMI jp (hl) dq_check ; Wait for the track read start sending data. ; No checking for errors when waiting for data. Bit hard to ; get it fitting without using AF' and it occurs to me that ; I can't really afford missing a second step. step wtdat in a,($f0) ; get status bit 1,a ; any data yet? baljp nz,havdat ; yes, go read it ld a,0 ; balance nop ; balance goto wtdat ; no, keep waiting jp (hl) dq_check tail havdat in a,($f3) ; get data (balances 'ld a,0; nop') goto sync jp (hl) tail_check ; Since we're pulling data, we don't have to balance, just hit the minimum. step sync ld a,$c1 ; drive select, with waits drw0 equ $-1 out ($f4),a in a,($f3) ; Waits until track data is ready or a jr z,synced ; if taken we lose 7, gain 10 so will be OK goto sync ; No zero byte? Keep looking for it. synced: jp (hl) dq_check ; Now we have the track data to read and place into the ring buffer. ; Disk data is coming so fast we much process a byte every step. Thus ; the cycle count for this step defines the disk thread quantum. The faster ; the better to give more time for processing graphics and video data in ; the output thread. step dskbyt exx ld a,$c1 drw1 equ $-1 out ($f4),a ; Keep telling the drive we want to wait. ini ; Put byte from disk into ring buffer set 7,h ; Keep HL in the ring (i.e., $8000 - $ffff) exx jp (hl) dq_check ; doesn't have to be equal, but it is our basis reuse dskbyt,tdatln-1 ; repeat byte reads for the rest of the track ; We've read enough of the data. Stop NMI and wait states. step datend ld a,(0) ; balance nop ; balance xor a out ($e4),a ; turn off NMI ld a,$81 dr0 equ $-1 out ($f4),a ; turn off wait states (prob. not needed) jp (hl) dq_check ; The track read is still active so cancel it. step cancmd jp $+3 ; balance jp $+3 ; balance nop ; balance nop ; balance nop ; balance ld a,$d0 out ($f0),a ; terminate any commands in progress jp (hl) dq_check ; The disk controller requires quite a long time before it will be ; willing to accept new commands. reuse rest,26 ; long wait (wasting stack) ; Now some pretty straightforward code to increment the track and determine ; if we should seek to a new track or move on to the next diskette. step trk1 ld a,(0) ; balance ld a,(0) ; balance ld a,0 ; cute trick to save a few cycles track equ $-1 ; "track" is stored in the code. inc a ld (track),a ; track++ and, oops, out of time! jp (hl) dq_check step trk2 jp $+3 ; balance ld a,(track) cp 40 ; at the end of the disk baljp c,trkok goto dsknxt ; yes? move to the next drive jp (hl) dq_check tail trkok goto seek ; no? head off to seek to next track jp (hl) tail_check ; We've just finished reading an entire diskette. Reset some variables ; and figure out which drive to start reading next (or stop). step dsknxt ld a,r ; balance xor a ld (track),a ; set track counter back to 0 ld a,0 drive equ $-1 inc a ld (drive),a ; drive++ and, oops, out of time! jp (hl) dq_check step dn2 jp $+3 ; balance ld a,(drive) cp numdsk ; Have we read all the diskettes? baljp c,dskok goto drain ; Yes? We can rest easy. jp (hl) dq_check tail dskok goto seldsk ; No, figure out which drive next jp (hl) tail_check step seldsk jp $+3 ; balance ld a,(drive) ; disk number mod 2 is the drive to read and 1 baljp z,sel0 goto stream1 ; == 1 then go read drive 1 jp (hl) dq_check tail sel0 goto stream0 ; == 0 then go read drive 0 jp (hl) tail_check ; Issue command to seek to the next track. If you've read the rest then ; the drill should be well known by now. Send command, wait, wait for ; ready and check for errors. ; ; In other words, the "restore" sequence was so commented that I'll not ; say too much here. step seek nop ; balance nop ; balance ld a,(track) out ($f3),a ; set track to seek to ld a,$18 ; seek, motor on, no verify out ($f0),a jp (hl) dq_check reuse rest,2 ; short delay step cs1 in a,($f0) exx ld b,a exx and 1 baljp nz,cont1 goto csechk ; Ready? Then go check for errors. jp (hl) dq_check tail cont1 ; Not Ready? Then go check for timeout. goto cs2 jp (hl) tail_check step csechk ld a,0 ; balance nop ; balance exx ld a,b exx and ~$20 ; Ignore "head loaded" bit. call nz,err ; Anything else set is an error. goto nxttrk ; Otherwise we're ready to read the track jp (hl) dq_check step cs2 jp $+3 ; balance nop ; balance exx bit 7,b exx call nz,err ; Error if command timed out. goto cs1 jp (hl) dq_check ; We've read all the data required. Nothing for us to do but use up ; our time quantum. The output thread is responsible for deciding when ; all the data has been processed. step drain jp $+3 ; balance jp $+3 ; balance jp $+3 ; balance jp $+3 ; balance goto drain jp (hl) dq_check ; Now a bit of bookkeeping. A call to "endlit" to finish the last block ; of literal RLE data. Then lay down a 0 control record to mark the ; end of the RLE data stream. And do an assert to ensure we are not ; assembling code into the ring buffer. Which, actually, could be ; survivable but we unpack the data on each run to keep things simple. ; Besides, if an error occurs it uses "call" which will wipe out a bit ; of the "ret" stack. endlit ; finish last set of literals tmp defl $ org stpbase dw 0 ; emit 0 length to end run-length encoding assert $ <= ring org tmp ; -------------------- Output Thread -------------------------- ; The output thread must first wait for the ring buffer to fill up so ; that is does not overrun the disk data. Then it just displays frames ; until it has done "framcnt" and that's it. The biggest requirement is ; sending an audio bit each step. With approximately 128 cycles per step ; that gives an audio bit rate of 31250 Hz. Given the quality of one bit ; audio there wouldn't be terrible harm in missing the odd step. On the ; other hand, it is very easy to generate audio artifacts and a systematic ; skip could easily generate a nasty 25 Hz buzz. ; ; The main program initializes BC to the top left of the display area, ; DE to the start of the ring buffer and "frame" to "framecnt" and starts ; at "fillwt". ; ; The threading overhead isn't as complicated as the disk thread. We ; just "RET" to pass control to the disk thread and "LD HL,code" to set ; the next output thread step to execute. The main complication comes ; in the unrolling of the code. There isn't enough time to maintain ; loop counters. Instead we make copies of each step and link them ; together by modifying "LD HL,nn" instructions. By following the ; convention of putting "LD HL,nn; ret" as the last instructions in each ; step the unrolling code always knows where to place the links. ; ; The movie data format is organized into two types of 8 byte blocks. ; A graphics block has 7 graphics characters followed by one byte ; (8 samples) of audio. An audio block is 8 bytes (64 samples) ; of audio. A frame consists of 64 graphics blocks followed by "audblk" ; audio blocks (as determined by the data converter). The graphics ; characters are arranged into a 56 x 16 array centered 64 x 16 text ; mode for a resolution of 112 x 48. Timing is driven by the program ; and works out to about 24.4 frames/second. ; ; A block size of 8 divides our ring buffer size. And the blocks all ; end in an audio byte. The means that the ring buffer pointer DE only ; needs to be forced into the ring when we read an audio byte. And 8 ; also divides 256 so when reading a graphics byte we only have to ; increment E and not DE saving 2 more cycles. shwqnt equ 128-dskqnt ; Output thread quantum ; Use this macro at the end of a step to ensure the time taken is ; exactly that allocated to output thread steps. sq_check macro quant defl t($) assert quant == shwqnt endm ; When we write to video we must leave 4 cycles free as a video write ; in 64x16 mode may incur as many as 4 wait states. Any output thread ; step that writes to the screen uses this macro to ensure the time ; taken is the output thread quantum less the 4 cycles. vwq_check macro quant defl t($) assert quant == shwqnt - 4 endm ; Wait until the ring buffer is mostly full ($6000 bytes, currently). sett 0 fillwt: ld a,(0) ; balance jr $+2 ; balance exx ld a,h ; peek at disk streaming pointer (HL') exx cp high(ring + $6000) baljp c,notyet ld hl,audvid ; move on to audio/video output ret ; let the disk thread run sq_check tail notyet ; not enough bytes, continue waiting ld hl,fillwt ; unnecessary, but for balance ret ; let the disk thread run tail_check ; Code to read and display a graphics character. ; Now that we have finished waiting each step must output a sound sample. ; And these bits of code are copied to create unrolled code linked together ; by changing address loaded by the "LD HL," instruction near to end. ; This is the most repeated step (56 * 16 == 896 times) so its size ; largely determines the size of the unrolled output thread code. sett 0 vidbyt: ex af,af' ; get audio samples rrca ; move to next 1 bit sample out ($90),a ; output audio bit ex af,af' ; save audio samples ld a,(de) ; get graphics byte inc e ; we know E is never == 255 ld (bc),a ; write graphics to screen inc c ; also C is never == 255 ret z ; balance (C cannot be 0) ld hl,0 ret vwq_check vidbyt_len equ $ - vidbyt ; Code for reading the audio byte at the end of a graphics block. ; With no graphics to update we have plenty of time to do a fully ; general increment and modulus of the ring buffer pointer. sett 0 audbyt: ld a,(de) ; get data byte inc de ; move to next position in ring set 7,d ; keep DE in ring ($8000 - $ffff) out ($90),a ; output audio bit ex af,af' ; save it in AF' ld l,(hl) ; balance add hl,hl ; balance ld hl,0 ; link to next step in unrolling ret ; switch to disk thread sq_check audbyt_len equ $ - audbyt ; The last byte in a block is always audio. In a few cases special ; processing is required. At the end of a line (after 8 graphics blocks) ; we must load BC with the start of the next line. The very last ; audio byte of the audio blocks must load BC with the top of the ; screen. And the last audio byte after the end of graphics data for ; the frame must load BC with loop counters so that the audio blocks ; can be output efficiently without having to unroll code for each ; audio bit. ; ; In short, sometimes we need to load a new audio byte and load BC ; with something. This macro provides for that case. m_audlbc macro bcval,next ld a,(de) inc de set 7,d out ($90),a ex af,af' bit 0,a ; balance ld bc,bcval ld hl,next ret endm ; Instantiate the BC loading variant of reading an audio byte for ; use by the unrolling code. sett 0 audlbc: m_audlbc 0,0 sq_check audlbc_len equ $ - audlbc ; Instead of unrolling code for each audio bit in the audio blocks ; we go to the trouble of running a loop over 6 of the bits in each ; audio byte and over the audio bytes themselves. B is used as the ; bit counter, C as the byte counter. The branches involved are ; always a bit painful to get right but the savings in unrolled code ; size is worth it. ; Output the first bit in an audio byte and load B with 6 so the ; next 6 bits of the byte are output with a loop. audb7 macro sett 0 ex af,af' rrca out ($90),a ex af,af' ld b,6 ; remaining bits ld l,(hl) ; balance inc hl ; balance add hl,hl ; balance ld hl,$+4 ret sq_check endm ; Output an audio sample and loop on B. If B != 0 then we repeat this ; step otherwise we move on to the next. audblp macro local tobyt,me sett 0 me: ex af,af' rrca out ($90),a ex af,af' add hl,hl ; balance inc hl ; balance dec b baljp z,tobyt ; any more bits in this byte ld hl,me ; yes? Then repeat this step. ret sq_check tail tobyt ld hl,$+4 ; no? Move to the step just after us. ret tail_check endm ; Load an audio byte and loop on C for more bytes. This step is used ; for audio bytes both in the middle of a block and at the end so it ; must do a fully general increment and modulus of the ring buffer ; pointer to guarantee it stays in the $8000 - $ffff range. audonl macro back local aodn sett 0 ld a,(de) inc de set 7,d out ($90),a ex af,af' dec c baljp z,aodn ; more audio bytes? nop ; balance ld hl,back ; yes? loop back as directed ret sq_check tail aodn nop ; balance ld hl,$+4 ; no? continue to the step after us ret tail_check endm ; Output of the last 7 bits of the last audio byte of the audio blocks ; is done specially so we can update the "frame" countdown and jump to ; the program end when it reaches 0. ; A few of the bits can be output simply. audbit macro sett 0 ex af,af' rrca out ($90),a scf ; balance (and helping audbf0) ret nc ; balance ex af,af' add hl,hl ; balance add hl,hl ; balance ld hl,$+4 ret sq_check endm ; For three of the bits we use the time to test the "frame" counter. ; As usual, this would be trivial if cycle balancing and budgets were ; not a factor. ; First step, load frame counter into BC and decrement it. audbf0 macro sett 0 ex af,af' ret nc ; balance (works because of scf in audbit) rrca out ($90),a ex af,af' ld bc,(frame) dec bc ld hl,$+4 ret sq_check endm ; Second step. Save the frame counter. audbf1 macro sett 0 ex af,af' rrca out ($90),a ex af,af' ld (frame),bc add hl,hl ; balance ld hl,$+4 ret sq_check endm ; Third step. Test the frame counter for zero and exit the thread if so. audbf3 macro sett 0 ex af,af' rrca out ($90),a ex af,af' ld a,b or c jp z,done ; We're done if frame count == 0 ld l,(hl) ; balance inc hl ; balance ld hl,$+4 ret sq_check endm ; To process the audio blocks in a loop we need to calculate the number ; of audio bytes with is simply the number of blocks times 8 and is assigned ; to "extra". I've left the original code in place, but you can see that ; the first two asserts can never be true and the calculation is rather ; convoluted. extraT equ audblk*64 assert (extraT % 8) == 0 extra equ extraT / 8 assert extra % 8 == 0 assert extra > 1 ; We insist on having at 2 audio blocks ; The code to handle the audio blocks. The unrolling code will figure out ; the number of audio bytes less 1 and load that into C register in the ; step before us. This chunk of code handles looping over C bytes. If ; there are more than 256 bytes then we unroll this loop as many times as ; necessary to handle the 256 bytes. To be honest, I don't think I've ; tested that case. aud_only: rept (extra-1+255)/256 local loop loop: audb7 ; Handle first bit, loading B with 6 audblp ; loop over the 6 bits using B audonl loop ; loop over the bytes using C endm ; Now we have that very last audio byte of the frame where we check ; the frame count. rept 3 audbit ; 3 easy bits. endm audbf0 ; increment "frame" audbf1 ; save it audbf3 ; if "frame" == 0 then we're done. ; Output last audio bit, load BC with the top of the screen and ; go back to the first step of a frame (the audvid unrolled code buffer). m_audlbc 15360+4,audvid frame: dw 0 ; frame number countdown to detect end of movie ; -------------- Main Program Subroutines ------------------------- ; Restore drive selected by D (e.g., $81 for drive 0, $82 for drive 1) ; to track 0. The straightforward code here that commands the disk ; controller might help make sense of the Data Thread which does pretty ; much the same things but in a much more convoluted style. restore: ld a,$d0 out ($f0),a ; terminate any commands in progress ld b,0 djnz $ ld a,d out ($f4),a ; select drive ld a,$08 ; restore, no verify, 6 ms step out ($f0),a call stat ; get command status result. ; Seems that '4' may be OK (TR00 indication) ; As perhaps '2' ; And head loaded ($20) is fine and ~($20|4|2) ret ; Wait for and return disk controller command result status. stat: ld b,$12 ; wait for disk controller djnz $ ; to be ready for answer .wst in a,($f0) ; read disk command status bit 0,a ret z ; return if not busy bit 7,a ret nz ; return if not ready jr .wst ; wait until not busy or not ready ; Non-maskable interrupt is vectored here. Because we never read a track ; to completion it should never happen and is treated as an error condition. nmi: ld a,'N' ld (15360),a xor a out ($e4),a ; turn off NMI in a,($f0) or a jp err ; say where it happened ; Copy C bytes from HL to DE and put the resulting DE value into the ; copied block as a link to the next block. In other words, a subroutine ; perfect for copying output thread steps into the unrolled buffer. copy: ld b,0 ldir ; copy audio/video program step push de pop ix ld (ix-2),d ; link to next step following immediately ld (ix-3),e ret ; Unroll 7 graphics byte output steps into the unroll buffer. vid7: ld a,7 block: ld hl,vidbyt ld c,vidbyt_len call copy dec a jr nz,block ret rowcnt: defb 0 colcnt: defb 0 ; ------------------ Main Program ----------------------------- ; Besides a minor bit of hardware setup, the main program must RLE ; uncompress the "ret" stack for the disk thread and unroll the ; audio and video output code for the output thread. start: di im 1 ; Choose memory map 1 which has 64K of RAM with keyboard and video ; mapped to the customary Model I and III locations. ld a,1 out ($84),a ; Switch out the Model 4P boot ROM. ld a,0 out ($9c),a ld a,$48 out ($ec),a ; fast CPU + wingdings ; Vector Non Maskable Interrupt to our own handler. We should never get ; an NMI and shouldn't really ask for it, but it can act as a minor error ; check for the programmer. ld a,$c3 ; Z-80 "JP" instruction opcode. ld ($66),a ld hl,nmi ld ($67),a ; We loop back here when the movie display is done. done: di ; No interrupts, please! ld sp,stack ; A little stack space while we setup ld hl,15360 ld de,15360+1 ld bc,1024-1 ld (hl),128 ldir ; Clears the screen. ; Wait for any key to be pressed wk: ld a,($38ff) or a jr z,wk ; Unroll the output thread loop ld hl,15360+4 ; Start address of first line of graphics ld de,64 ; Offset to next line of text exx ld de,audvid ; Output thread unroll buffer ld a,16 row: ld (rowcnt),a ld a,7 col: ld (colcnt),a call vid7 ; Unroll 7 steps for graphics bytes ld hl,audbyt ld c,audbyt_len call copy ; Unroll audio byte step ld a,(colcnt) dec a jr nz,col ; Do 7 of the 8 graphics blocks in a line call vid7 ; Graphics bytes for last block ld hl,audlbc ld c,audlbc_len call copy ; Unroll special BC loading audio byte step exx add hl,de ; next line address ld (ix-5),h ; B: modify "LD BC," instruction to ld (ix-6),l ; C: have BC get address of next line exx ld a,(rowcnt) dec a jr nz,row ; Do all 16 rows. ; We don't need the address of the next line. Intead we're linking to ; the code that outputs the audio bytes en-masse. The number of audio ; bytes was computed previously as "extra". We set up C with that less ; one as the very last byte is unrolled in "aud_only" and handles ; frame count checking and moving back to the top of the screen. ;ld (ix-5),6 ; B (not needed) ld (ix-6),low(extra-1) ; C ld (ix-2),high aud_only ld (ix-3),low aud_only ; This is a debugging check that could be removed. It guarantees that ; the unrolled output thread code does not run into the start of the ; program. ld hl,proglow or a sbc hl,de call c,err ; ld d,$81 call restore ; Restore drive 0 to track 0 call nz,err ld d,$82 call restore ; Restore drive 1 to track 0 call nz,err ; Unpack the RLE data of the "ret" stack for disk thread ; We're going to overwrite our stack so no subroutine calls ; from here on out, please. Or PUSH! POP could be OK. ld hl,stack_init ; pointer to RLE data stack_top equ stack-stpnum ld de,stack_top ; destination pointer is the "ret" stack silp: ld c,(hl) inc hl ld b,(hl) inc hl ld a,b or c jr z,sidn ; RLE code 0 means end of data bit 7,b jr nz,sirun ; High bit set means a run. ldir ; Otherwise, literal data, copy it jr silp ; and keep uncompressing sirun: res 7,b ; Clear repeat bit to get count in BC ldi ; must have two bytes to copy ldi jp po,silp ; only 2 bytes? Then we're done. ld (hlsv+1),hl ; save HL (note, PUSH is not safe right now) ld hl,-2 ; set up overlapping LDIR add hl,de ldir ; to copy out the repeats hlsv: ld hl,0 ; restore our RLE data pointer jr silp sidn: ; BTW, that "jp po," has got to be one of my favorite bits of Z-80 programming. ; It is not often that you get to use the flags set by LDI! ; Wait for the all the keys to be released. wtup: ld a,($38ff) or a jr nz,wtup xor a ld (drive),a ; Disk thread to start on disk 0 in drive 0 ld hl,framcnt ; Number frames in movie ld (frame),hl ; for the output thread to count down. ; Initialize registers for output thread ld bc,15360+4 ; Top left of display for centering 56 wide. ld de,ring ; Ring buffer read pointer ld hl,fillwt ; first step in output thread xor a ex af,af' ; clear first audio output ; Initialize registers for disk thread exx ld hl,ring ; Ring buffer write pointer exx goto stream0 ; first step in disk thread ret ; and send it off and running ; Called when an error occurs. A jump would be fine but I do a call so ; that the place where the error happened can be reported. But that ; diagnostic code isn't necessary for a demo. err: pop hl ld a,'!' ld (15361),a jp wk stack_init: end start