Thursday 8 May 2008

Nice Threads

Last month I saw this post on codinghorror.  I wanted to wait and see what sort of discussion it kicked off among developers before I picked it up - and it looks like it's got a fair trail of comments now.

Firstly, let me just say I think it is good (and long overdue) to see software engineers getting this interested in hardware.  Abstract all you want but the systems we build all eventually run on hardware.  A fundamental understanding of IO, of how computers store and retrieve data, perform logical operations and access memory will make the difference between good software and great software.

Secondly, I'm not sure I agree with the statement "dual-core CPUs protect you from badly written software" but I think the right sentiment is there if you add a "certain failure conditions" qualifier.  There is a central topic this post skirts around without really nailing, and that is how vital resource management in a system is.

What we're talking about is good threading behavior.  I was hoping to see a lot more discussion about limiting the lifetime of threads, dynamically managing thread count according to capacity, creating affinity (making threads sticky to certain cores) and assumptions about environment (how much of a CPU/core can you safely say is yours?).  This kind of discussion usually leaks into good memory and disk management - which is excellent - these are the 2 other key physical constraints your systems are bound by.

As a final point I think it's worth noting the difference between a core and a CPU.  Most people will tell you the answer is nothing, but there is one key difference - pins.  In most architectures a core is essentially a CPU via all the right components being present and correct; CU, ALU, FPU, registers etc but a core always shares pins (the little copper legs that connect the CPU to the motherboard) with all other cores on that die.  As a CPU your pins are how you get data on and off the die - so anytime you reach out to memory, non-integrated L2 cache, disk or network etc you do so against contention with all the other cores resident on the same die.  Subtle but critical when bus becomes an issue, and maybe you'll need to think more about affinity when your application gets to that scale.

No comments: