We just ran a first successful test of the closed-loop latencies achievable on a regular PC when interfacing to existing intan headstages via our new PCIe acquisition system prototype. This system is based on a Kintex 7 evaluation board, that connects to existing Intan headstages via an interface board design by Jon Newman and Jakob Voigts, via a port of the Intan firmware written by Aaron Cuevas Lopez. Goncalo Lopes and Adam Kampff are working on doing the same thing for the Bonsai software package.
Currently, the open Ephys system supports up to 512 channels via USB3.0, but pushing that limit even higher requires higher throughput interfaces such as ethernet or direct PCIe interfaces to the host PC's memory.
In addition to the maximum channel count, many interesting experiments will require closed-loop systems that can react to neural activity on fast timescales. Current USB interfaces lead to latencies of ~5-10ms which is good enough for many behavioural experiments, but not enough to intervene right after or during spike decoding. Moving to streamlined ethernet interfaces can push the latency to ~1ms, but requires careful management of network traffic. By moving directly to PCIe, where the FPGA that controls the headstages can write data directly to the host PCs memory, we cut out almost all intervening protocols that otherwise induce latencies.
As a first test, we sent pulses to the headstage with one of our tester boards, and wrote a simple feedback module in C++ that is integrated into the PCIe interface plugin in the Open Ephys GUI, to send a response output whenever the input voltage at the headstage is over a threshold. We ran these tests on a stock 16.04 Ubuntu system with an Intel core i7-3770K @ 3.5GHz.
The results for the delay between the input and the response were a mean of 69us, and a max delay of 85us. A single sample at 30KHz is already 33us, so we're fairly close to the theoretical minimum here. This system should scale to over 1000 channels without increases in this latency.
As far as we're aware the PCIe hardware is the fastest possible user-space program accessible solution for closed-loop experiments on standard PCs. Anything beyond this requires moving the processing to the FPGA or specialized buses. There is still room for improvement by switching to a real-time OS, using methods like RTXI has demonstrated.
In sum, the approximate minimum achievable round-trip latencies for interventions by a program running on a PC by interface are:
USB: ~5-10ms
Ethernet: ~1ms
PCIe: ~100us on standard OS (demonstrated here),
<10us are possible on RTOS (see RTXI)