If it works, it works, I just want to know where these cycle counts and requirements come from.

For BUSY (or, delay after writing DATA), I looked at Nuked-OPM and jt51, and they both agree that BUSY takes 32 YM2151 cycles. The YM2151 application manual says the master clock is halved internally, so each YM2151 cycle is actually 1.79MHz.
If you allow (up to) 1 cycle for the YM2151 to poll its async address/data interface, 33 cycles is about 147.5 CPU clocks (at 8MHz), so that matches the "wait 150 cycles" advice.
For the delay after writing ADDRESS, I don't have hard data, but if I assume it takes one cycle to set the address (after allowing up to one cycle to poll that interface again, like above), 2 cycles is about 8.94 CPU clocks, which matches our documentation recommending a 10-cycle delay.