The buffer transfer now uses FIFO queues, Currently it only outputs the 8 bits per entry, unlike the T3.x version which packed the bytes into words in order to speed up some more.
http://www.pjrc.com/teensy/td_libs_SPI.html