Update SPI.transfer(buf, cnt) to use the FIFO queue to speed things up.
It also speeds things up to pack the data into 16 bit transfers instead
of 8 bits.
As this code is more complicated, no longer makes sense to inline it,
so functions moved from .h to .cpp files.
SPI1 and SPI2 transfer functions were updated as well for T3.5 and T3.6