Psykorgasm wrote:Makes sense.
All right, I'll go
real slow.
Rockstar, makers of GTA series, and great devs.
The term 'heads down, bums up', implies lots of work.
Japanese kernel developers are very eccentric folk, who by nature consider their lives one with the kernel. They are brutally good at their work, and are behind some of the great performances of Japanese games on the PS2, next to, say EA.
'Developers' is plural, meaning more then one. The reason 'Developers' is used, is because the ability to import a Japanese kernel dev depends largely on supply, having supply implies more then one. (!) You also don't get to be a kernel developer by being lazy, inefficient, or sloppy. (!)
http://www.ddj.com/dept/64bit/197801624?pgno=1 Is a link to Dr. Dobb's, where they explain some of the pits falls of Cell programming, and exactly how much can be gained in optimisations.
Consider this bit:
The PPE is a 64-bit processor with a PowerPC instruction set, 64 KB of L1 cache memory, and 512K L2. Like Intel's HyperThreading, it supports simultaneous multithreading, but is remarkably simpler than Pentiums or Opterons.
SPEs are different. They have 128-bit registers and SIMD (single instruction, multiple data) instructions that can simultaneously process the four 32-bit words inside each register. Plus, there are so many registers (128) that you can unroll loops many times before running out of them. This is ideal for dataflow-based applications.
But the most radical peculiarity for programmers is that SPEs have no cache memory. Rather, they have a 256-KB-scratchpad memory called "local store" (LS). This makes SPEs small and efficient because caches cost silicon area and electrical power. Still, it complicates things for programmers. All the variables you declare are allocated in the LS and must fit there. Larger data structures in main memory can be accessed one block at a time; it is your responsibility to load/store blocks from/to main memory via explicit DMA transfers. You have to design your algorithms to operate on a small block of data at a time, fitting in the LS. When they are finished with a block, they commit the results to main memory, and fetch the next block. In a way, this feels like the old DOS days, when everything had to fit in the (in)famous 640 KB. On the other hand, an SPE's local storage (256 KB) is so much larger than most L1 data caches (a Xeon has just 32 KB). This is one of the reasons why a single SPE outperforms the highest-clocked Pentium Xeon core by a factor of three on many benchmarks.
Bolded are some points that should make you sit up a bit straighter.
Or, if you want a more simple read:
Thanks to nine processors on a single silicon die, the Cell Broadband Engine—a processor jointly designed by IBM, Sony, and Toshiba and used in the PlayStation 3—promises lots of power. The good news is that the Cell is really fast: It provides enough computational power to replace a small high-performance cluster. The bad news is that it's difficult to program: Software that exploits the Cell's potential requires a development effort significantly greater than traditional platforms. If you expect to port your application efficiently to the Cell via recompilation or threads, think again.
In this article, we present strategies we've used to make a Breadth-First Search on graphs as fast as possible on the Cell, reaching a performance that's 22 times higher than Intel's Woodcrest, comparable to a 256-processor BlueGene/L supercomputer—and all this with just with a single Cell processor! Some techniques (loop unrolling, function inlining, SIMDization) are familiar; others (bulk synchronous parallelization, DMA traffic scheduling, overlapping of computation and transfers) are less so.
I don't know about you, but being able to match a 256-processor super computer in -any- benchmark is quite impressive. The fact that's it's actually something useful makes it kick arse. A Japanese kernel developer should be able to construct a kernel that automates a lot of these tasks, thus reducing your production costs greatly for the next few years.
An example in good Cell code.(F@H client made by Sony, doubles F@H's computing power with only a handful of PS3's actually bothering to run it.)
C:Enter:£££ wrote:Shit happens when only 1 person can do it.
The post is quite insightful, if only it wasn't drastically out of context, and lacking a complete understanding of what's being said.
C:Enter:£££ wrote:Yeh, the one that is really expensive to program for.
Yeah, that's why if you're on a budget, you license the Unreal3 engine.

±MC Chedda'± wrote:Almost impossible aswell. All this for "realtime weapon change?!" and "giant enemy crabs from Japanese history".
http://www.terrasoftsolutions.com/news/ ... -10b.shtml
Right... 'impossible'...
Edit: Urgh... don't type up two posts at once... -_-