Abstract : Blue Gene is a massively parallel computer being developed at the IBM Thomas J. Watson Research Center. Blue Gene represents a hundred-fold improvement on performance compared with the fastest supercomputers of today. It will achieve 1 PetaFLOP/sec through unprecedented levels of parallelism in excess of 4,0000,000 threads of execution. The Blue Gene project has two important goals, in which understanding of biologically import processes will be advanced, as well as advancement of knowledge of cellular architectures (massively parallel system built of single chip cells that integrate processors, memory and communication), and of the software needed to exploit those effectively. This massively parallel system of 65,536 nodes is based on a new architecture that exploits system-on-a-chip technology to deliver target peak processing power of 360 teraFLOPS (trillion floating-point operations per second). The machine is scheduled to be operational in the 2004-2005 time frame, at price/performance and power consumption/performance targets unobtainable with conventional architectures.
The Blue Gene/L supercomputer was unique in the following aspects:Trading the speed of processors for lower power consumption. Blue Gene/L used low frequency and low power embedded PowerPC cores with floating point accelerators. While the performance of each chip was relatively low, the system could achieve better performance to energy ratio, for applications that could use larger numbers of nodes.Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication; and virtual-node mode, where both processors are available to run user code, but the processors share both the computation and the communication load.System-on-a-chip design. All node components were embedded on one chip, with the exception of 512 MB external DRAM.A large number of nodes (scalable in increments of 1024 up to at least 65,536)Three-dimensional torus interconnect with auxiliary networks for global communications (broadcast and reductions), I/O, and management.Lightweight OS per node for minimum system overhead (system noise).