Experiment with making the whole transfer of the buffer act like it is
to logically hold the CS pins for that whole transfer. This removes
gaps between each 16 bit transfer.
Update SPI.transfer(buf, cnt) to use the FIFO queue to speed things up.
It also speeds things up to pack the data into 16 bit transfers instead
of 8 bits.
As this code is more complicated, no longer makes sense to inline it,
so functions moved from .h to .cpp files.
SPI1 and SPI2 transfer functions were updated as well for T3.5 and T3.6
These changes plus some changes in core (added register defines, plus
added logical SPCR1 register), allowed me to do quick hack to
serialFlash library and test and initialize flash memory using 3.5 test
board connected to prop shield using pins 0, 1, 20 and 6