If found this today: I had a loop for scanning an image line by line:

for il in range(ystart, yend+1):

data = N.ravel(fid.GetRasterBand(band+1).ReadAsArray(xstart, il, ns,1))
wts = (data != 0)

if wts.any():

index = N.arange(ns*nl)
jpos  = index / ns
ipos  = index - jpos * ns
jmin = jpos.min()
jmax = jpos.max()
imin = ipos.min()
imax = ipos.max()
etc...

The loop was pretty slow.  I thought it was coming from computation on the arrays in the 'if' statement. Actually, by commenting those lines, I realized it was due to the creation of the index array:
index = N.arange(ns*nl)
To (significantly) speed up the loop, I simply had to create first a reference indexRef array to avoid creating it in the loop.

indexRef = N.arange(ns*nl)
for il in range(ystart, yend+1):

data = N.ravel(fid.GetRasterBand(band+1).ReadAsArray(xstart, il, ns,1))
wts = (data != 0)

if wts.any():

index = indexRef[wts]
jpos  = index / ns
etc...