smplocks.html

SMPLOCKS

At any given point, the cpu can be thought of as being at a 'level'. The level is used to prevent another cpu or I/O device from interfering with what a cpu is doing. These are referred to in the source as smp lock levels. Here is a list of the levels that the kernel uses (in order from least interruptible to most interruptible):

high priority software smp lock levels
hardware interrupt smp lock levels
low priority software smp lock levels
softint level
standard kernel ast mode
normal kernel mode
user ast mode (express)
user ast mode (standard)
normal user mode - this is where applications generally execute their code

Notes about the levels:

If a CPU is above the softint level, it cannot be interrupted by the scheduler, so it will keep executing on the same stack.
If the CPU is at softint level, it will not be interrupted by the scheduler, but it can voluntarily yeild to the scheduler by entering a wait state (like to wait for an I/O or a pagefault).
If the CPU is below softint level, it is subject to interrupts by the scheduler at any point (to execute an higher priority thread).
The CPU must be in one of the kernel levels to acquire a smplock.
When a CPU acquires a smplock, that moves it to the level of the smplock. When it releases the smplock, it returns to its previous level.
A CPU cannot acquire a smplock that is at a lower level than what the CPU is currently at. This is how the system prevents a deadlock.
A CPU can re-acquire a smplock it already has only if it is currently already at that exact level, again to avoid deadlock. It cannot acquire a different smplock that is at its same level. For example, each thread has a state smplock, and a CPU cannot acquire more than one thread state smplock at a time (to avoid the A-then-B / B-then-A deadlock).
Acquiring any of the high-priority smplocks inhibits all interrupt delivery
Acquiring an interrupt level smplock inhibits delivery of that interrupt and all lower priority interrupts. This is done to eliminate any possibility of deadlocking (like if CPU1 got IRQ0 then IRQ1 whilst simultaneously CPU2 got IRQ1 then IRQ0, that would cause a deadlock).
An interrupt causes the CPU to acquire the corresponding smplock, thus inhibiting itself and all lower priority interrupts.

I wrote my smplock acquire/release routines to enforce the above rules so I can't do something stupid. If I try to break a rule, the acquire/release routine crashes on the spot, even in the uniprocessor version.

Theoretically, one could simply use one smplock for everything. Whenever you enter kernel mode, acquire the smplock and release it before returning to user mode. The problem with that is that there will be lots of unnecessary contention for the smplock. For example, the thread executing on CPU-1 might be doing some TCP/IP stuff while the thread on CPU-2 might be doing some disk I/O. This would lead to disasterous performance.

So now I needed to determine just how many smplocks were enough. Too few smplocks will work, it just is bad for performance. So is there such a thing as too many smplocks? Well, I think there is. I know I have too many smplocks when either of these conditions hold:

If there is a case in the code where I must acquire smplock A then smplock B, and another place where I must acquire smplock B then smplock A, the two must be combined or it would deadlock.
Whenever I acquire smplock B, I always already have smplock A. So this means I can eliminate smplock B as I will never wait for it, because the other CPU is being blocked by smplock A.

Initially, I came up with a list of what I thought I would need, and it was a fairly short list. Then I got the system working with those after adding a couple that I didn't think about originally. Then I put counters on the smplocks to find which ones had the most contention. Using the above rules, I split up the most contented smplocks as much as I could conceive. Now all the contention counts are low (less than 1% cpu time) when I pound on it, so I think I have a minimal set or very close to it.

The worst offender originally was the non-paged pool allocation smplock. For that one, I rewrote the 'malloc' routine so that it wouldn't even use a smplock for the common cases. It really sped things up quite a bit. The next bad one was that I just had a single smplock for all thread state related stuff. Now it is split up into PS,TF,EV,TP,TC, and I squeezed a couple percentage points of performance out of that. The most contended smplock I have now is for the state of physical memory (PM). I don't think I can split it up, but even if I could, the best performance gain would be less than 1%.

Update. I had nothing to do once and I found out that the contention on the PM lock was due to the disk cache routines using it for their own internal tables. So I split it up into PM and CP, where CP is used for per-cache block lists. Now there is very little contention on either of those locks! So a fairly painless change got rid of a little nag. Not a noticable improvement, but getting rid of lock contention can only help. Now there is no one lock that really stands out for contention.

So here is a list of the actual smplock levels I ended up with:

	#define OZ_SMPLOCK_LEVEL_SH 0x10		/* shutdown handler table */
	#define OZ_SMPLOCK_LEVEL_HT 0x18		/* per-process handle tables */
	#define OZ_SMPLOCK_LEVEL_VL 0x20		/* per-volume filesystem volume locks */
	#define OZ_SMPLOCK_LEVEL_DV 0x28		/* devices */
	#define OZ_SMPLOCK_LEVEL_PT 0x30		/* per-process page table */
	#define OZ_SMPLOCK_LEVEL_GP 0x40		/* global sections page table */
	#define OZ_SMPLOCK_LEVEL_PM 0x48		/* physical memory tables */
	#define OZ_SMPLOCK_LEVEL_CP 0x49		/* cache private lock */
	#define OZ_SMPLOCK_LEVEL_PS 0x4A		/* per-process state */
	#define OZ_SMPLOCK_LEVEL_TF 0x4C		/* thread family list */
	#define OZ_SMPLOCK_LEVEL_EV 0x4E		/* individual event flag state */
	#define OZ_SMPLOCK_LEVEL_TP 0x50		/* individual thread private lock */
	#define OZ_SMPLOCK_LEVEL_TC 0x52		/* thread COM queue lock */
	#define OZ_SMPLOCK_LEVEL_PR 0x54		/* process lists */
	#define OZ_SMPLOCK_LEVEL_SE 0x60		/* security structs */
	#define OZ_SMPLOCK_LEVEL_NP 0x68		/* non-paged pool */
	#define OZ_SMPLOCK_LEVEL_QU 0x70		/* quota */

	#define OZ_SMPLOCK_LEVEL_IRQS 0xE0		/* irq's use 0xE0..0xEF */
							/* the lowest priority, IRQ 7, uses level 0xE0 */
							/* the highest priority, IRQ 0, uses level 0xEF */

	#define OZ_SMPLOCK_LEVEL_KT 0xF8		/* kthread routines */
	#define OZ_SMPLOCK_LEVEL_HI 0xFC		/* lowipl routines */

Notice the non-paged pool lock, NP, is below the IRQ locks. This is what prevents the OZ_KNL_NPPMALLOC from being able to be called from an interrupt routine. If it really becomes necessary to, the NP lock could be changed to be above the IRQ levels, but then that means IRQ delivery will be inhibited during any smplocked non-paged pool allocations/deallocations. Fortunately, most operations are performed atomically and don't need the smplock so it actually might not be that bad, but I haven't needed it yet.