TheDNK7_F5_8_Clusteris a complete, 5U rack mount FPGA acceleration cluster. The standard configuration contains the following
Trenton TSB7053 Xeon processor card - other options available 8 DNK7_F5PCIe Kintex 7 FPGA cards with 5 7K410T-1 FPGAs per card.
1.5 TB SATA II Hard Drive
This system contains the maximum number of cost-effective FPGAs that can be reasonably integrated into a 5U chassis. Power and cooling are the constraining variables. High-performance data paths between FPGA boards enable data movement under algorithmic control that is wholly separate from the host processor, enabling FPGA-based acceleration of whole new classes of data-intensive algorithms.
In short, the DNK7_F5_8_Cluster is a massive number of large, low-cost FPGAs integrated with an excellent single/dual Xeon-based processor host.
A partial list of possible applications includes:
l Genomic search
l financial analytics
Ø low latency analysis
Ø derivative calculations
l image processing
l signal processing
l scientific computing
l video compression
l encryption/decryption (cryptography )
1. The Processor Card - Intel Xeon
Central to the DNK7_F5_8_Cluster is the TrentonTSB7053 host processor card (other boards may be substituted). This single-board computer has an Intel Xeon processors, clocked at 3.4GHz. The processor has 4 DIMM slots that can be stuffed with up to 32GB DDR3 RAM, max 8GB of memory per slot. The processor card has two 10/100/1000 Base-T Ethernet ports, along with 4, USB2.0 ports. The chassis can host up to 2 SATA drives. Power and cooling are provided for up to 8 DNK7_F5PCIe cards. Power is cabled to the FPGA cards separately and not drawn from the motherboard, allowing us to exceed the 25W slot PCIe limitation. The power budget is TBDW per board. Note that this requires a lot of airflow and the fans are noisy. Fully populated, the system is perhaps too noisy to be in close quarters with an engineer.
2. The DNK7_F5PCIe — 5 Xilinx Kintex 7 FPGAs
The DNK7_F5PCIe is Xilinx Kintex-7 based FPGA board optimized for algorithmic acceleration applications requiring FPGAs with high performance local memory. Data movement to/from the FPGA grid is accomplished via a fixed 4-lane, GEN1/GEN2 PCIe bridge. Each field Kintex-7 FPGA (FPGAs 1-4 in the block diagram) has five separate 256M x 16 DDR3 (4 Gb) memories. The Dataflow Manager FPGA (FPGA 0 in the block diagram) has six 256M x 16 DDR3 memories. The DNK7_F5_8_Cluster can host 8 of these cards.
3. Dedicated PCIe, 4-lane controller (GEN1 or GEN2)
We ship the DNK7_F5PCIe with a fixed, full function, 4-lane master/target PCIe controller. The PCI controller has two mastering DMA engines, 2 for transmit (board -> host) and 2 for receive (host -> board). Drivers with 'C' source for several operating systems are included at no cost.
4. Kintex-7 FPGAs from Xilinx - Performance and Low Power
The Xilinx Kintex-7, 28 nm FPGAs are utilized. We use the second largest member of this cost effective (read: CHEAP) family. The Kintex-7 FPGA family has an impressive price/performance ratio for hardware-in-the-loop accelerators with excellent device power consumption properties. Operation frequency is approximately twice that of the previous low cost Xilinx FPGAs - Spartan-6.
Features of Kintex-7 include efficient, dual-register 6-input look-up table (LUT) logic, 36 Kb block RAMs, and second generation DSP slices which contain 25 x 18 multipliers along with a 48-bit accumulator.
We use the largest device from this family, the 7K410T, in the FFG900 and FFG676 packages. 100% of the FPGA resources are dedicated to your application. All FPGAs, excluding the PCIe controller, are configured via PCIe. The PCIe FPGA can be updated in the field.
5. Memory - DDR3
The availability of large amounts of local high speed memory is pivotal to FPGA-based algorithmic acceleration applications. The DNK7_F5PCIe is optimized accordingly. Each of the four field FPGAs (FPGAs 1 thru 4) has a total of five, 4 Gb DDR3 memories. Each memory is 256M x16 with separate data, address and control. Three of these DDR3 memories are connected to FPGA pins capable of 800 MHz (1600 Mb/s per data pin) and remaining two are connected to FPGA pins capable of 400 MHz (800 Mb/s per data pin). The Xilinx Memory Interface Generator (MIG) works fine. The five memories can be used independently or grouped in any manner that best fits your application. The Dataflow Manager FPGA (FPGA 0) has a total of six, 4 Gb DDR3 memories. Three of these memories are connected to FPGA pins capable of 800 MHz (1600 Mb/s per data pin) and three are connected to FPGA pins capable of 400 MHz (800 Mb/s per data pin).
As always, we provide examples and references designs to help you with all of your memory interface issues. Please check with us to make sure that what we ship for no charge meets your requirements.
6. Power Consumption
The PCI Express specification limits slot power to 25 watts. The DNK7_F5PCIe is capable of consuming power significantly beyond that. In addition to the PCIe fingers, a separate connector adds a second path for power. This product is shipped with adequate heat sinks to consume TBD watts, but airflow is required in the chassis to dissipate the heat. Contact the factory if you require high reliability, no-fan heatsinks.
7. Status LEDs, Debug
Although no specific testing was performed, sophisticated statistical finite element models and back of the envelope calculations are showing the number of status LEDs to be bright enough to work as a flashlight. These LEDs are user controllable from the FPGAs so can be used as visual feedback in addition to emergency lighting. A JTAG connector provides an interface to ChipScope and other third party debug tools.
List of available FPGAs for DNK7_F5PCIe in the cluster
l 5U Rackmount Chassis containing
Ø 1 Intel Xeon® E3-1275 processor card or better/similar
Ø 8 DNK7_F5PCIe FPGA cards each with 5 Xilinx of the largest Kintex 7 FPGAs (7K410T)
l PCIe 4-lane (GEN1/GEN2)
l 40 FPGAs in total, 100% dedicated to application
Ø Other configurations with different CPU-to-FPGA ratios are available
Ø 2 bays for SATA-2 hard drives
l Processor card
Ø Intel Xeon® E3-1200 series or better/similar processors (Sandy Bridge,if available), 3.4 GHz
l Quad-Core, 8MB shared L2 cache
l 4 GB DDR3 memory (4 GB total)
l Options up to 32 GB (32 GB max)
l VGA with standard D-Sub connector
l 10/100/1000BASE-T Ethernet (2 ports)
l USB 2.0 (4 ports total)
l 2 ports on front panel
l 2 ports on back bracket
l Supports virtually all Linux distributions
l DNK7_F5PCIe FPGA HPC Acceleration card
Ø PCI Express (4-lane) FPGA-based algorithm acceleration peripheral with 5 Kintex-7 FPGAs
Ø 4 Xilinx Kintex-7 FPGAs: 7K410T-1 (FFG676)
Ø 1 Xilinx Kintex-7 FPGA: 7K410T-1 (FFG900)
Ø Fixed 4-lane PCIe interface and controller
l PCIe GEN1/GEN2
l Full mastering DMA
l 2 transmit (host memory -> card)
l 2 receive (card -> host memory)
Ø FPGA Kintex-7 7K410T-1 - 5 total user FPGAs
l 508,400 flip-flops per FPGA flip-flops per FPGA
l 254K flips-flops with 6-input LUT
l 1540, 25x18 multipliers + 48-bit accumulator per FPGA
l 1590, 18 Kbit block RAM (2 Mbytes) per FPGA (or 445, 36 Kbit blocks)
l Fully dual-ported
l Each block RAM configurable as:
u 2K x 1, 16K x 2, 8K x 4, 4K x 9 (or 8),
u 2K x 18 (or 16), 1K x 36 (or 32), or 512 x 72 (or 64)
l 5 separate 256Mb x 16 DDR3 memories for each field FPGA
Ø 3 memories PC3-1600
Ø 2 memories PC3-800
Ø Each memory has separate address, data, and control
l 6 separate 256Mb x 16 DDR3 memories for Dataflow Manager FPGA
Ø 3 memories PC3-1600
Ø 3 memories PC3-800
Ø Each memory has separate address, data, and control
l Two independent low-skew Global clock networks differentially distributed and balanced
Ø distributed differentially and balanced
l Fast and Painless FPGA configuration via PCIe
Ø On-board battery for AES bitstream encryption
l Full support for embedded logic analyzers via JTAG interface
Ø ChipScope, and other third-party debug solutions
l FPGA-controller LEDs
Ø Enough light to use as LED-based flashlight