Problem 1¶

In [2]:

data = load('/home/bhardin2/ece458/2015.01.08.14.02.24.10min.npz')
print data.keys()

['timezone_hr', 'ISRdata', 'epochtime', 'heights']

In [3]:

E = data['ISRdata']
print shape(E)

(38400, 130)

In [4]:

E[0,0]

Out[4]:

(-5761464-4611693j)

In [34]:

A = abs(E)
subplot(2,1,1)
plot(A[10,:]) 
title('10th row')
ylabel('Amplitude')
xlabel('Column')
subplot(2,1,2)
plot(A[:,10]) 
title('10th col')
ylabel('Amplitude')
xlabel('Row')
tight_layout()

The amplitude varies more rapidly along each row. Specifically, it is much larger in the first few columns than in the last. Along each column, it looks like mostly noise of the same magnitude.

In [38]:

rows = [30,60]
figure(figsize=(6,3))
for i in range(len(rows)):
    subplot(1,2,i+1)
    dat = A[rows[i],:]
    hist(dat,bins=linspace(0,2000,15))
    locs, labels = plt.xticks()
    plt.setp(labels, rotation=90)
    title('row %i' % rows[i])
    xlabel('Amplitude')
    ylabel('Counts')
    xlim((0,2000))
tight_layout()

Row 60 seems to have a slightly higher average power than row 30, though it is hard to tell from these plots. The peak in the histogram is a little more towards the right in row 60.

Instead of fitting curves to find $\sigma$, we use the maximum likelihood estimate given the data (http://en.wikipedia.org/wiki/Rayleigh_distribution):

$$\hat{\sigma} = \sqrt{\frac{1}{2N} \sum_{i=1}^N x_i^2}$$

We'll plot this along with the empirical cdf and compare:

In [58]:

rows = [30,60]
figure(figsize=(6,3))
for i in range(len(rows)):
    subplot(1,2,i+1)
    dat = A[rows[i],:]
    sigma = sqrt(0.5 * median(dat**2))
    x = linspace(0,2000,100)
    cdf = 1 - exp(-x**2/(2*sigma**2))
    plot(x,cdf,'k-',label='Fitted')
    # Calculate empirical cdf
    cdf_emp = zeros(len(x))
    for i in range(len(x)):
        cdf_emp[i] = 1.0*sum(dat < x[i])/len(dat)
    plot(x,cdf_emp,'r-',label='Empirical')
    title('Sigma = %e' % sigma)
    xlabel('Amplitude')
    ylabel('CDF')
    legend(loc='best')
tight_layout()

The distributions are not conforming with the Rayleigh distribution. This is expected because of the large variation in amplitude across rows of the data, as discovered in the previous section. The distribution is much more heavy-tailed than a Rayleigh distribution.

In fact, I cheated somewhat, in that I used the median to calculate the maximum likelihood $\sigma$, instead of the mean. When I used the mean, the value of $\sigma$ was heavily affected by the heavy tail, giving an unreasonable estimate. The median works better.

In [67]:

cols = [[1,100],[100,200]]
figure(figsize=(6,3))
for i in range(len(rows)):
    subplot(1,2,i+1)
    dat = A[:,cols[i][0]:cols[i][1]].flatten()
    hist(dat,bins=linspace(0,2000,15))
    locs, labels = plt.xticks()
    plt.setp(labels, rotation=90)
    title('cols %s' % cols[i])
    xlabel('Amplitude')
    ylabel('Counts')
    xlim((0,2000))
tight_layout()

The first 100 columns seem to have more average power than the second hundred, as seen by the tail of the histogram.

In [70]:

cols = [[1,100],[100,200]]
figure(figsize=(6,3))
for i in range(len(rows)):
    subplot(1,2,i+1)
    dat = A[:,cols[i][0]:cols[i][1]].flatten()
    sigma = sqrt(0.5 * median(dat**2))
    x = linspace(0,2000,100)
    cdf = 1 - exp(-x**2/(2*sigma**2))
    plot(x,cdf,'k-',label='Fitted')
    # Calculate empirical cdf
    cdf_emp = zeros(len(x))
    for i in range(len(x)):
        cdf_emp[i] = 1.0*sum(dat < x[i])/len(dat)
    plot(x,cdf_emp,'r-',label='Empirical')
    title('Sigma = %e' % sigma)
    xlabel('Amplitude')
    ylabel('CDF')
    legend(loc='best')
tight_layout()

The first 100 columns do not correspond to a Rayleigh distribution. This makes sense, since the amplitude is varying significantly over the first few columns, as we discovered in the previous section. The second hundred columns (which are actually just columns 100-130, since there aren't 200 columns in the dataset) correspond much better, although the empirical distribution is more heavy-tailed (i.e., there are outliers). This suggests that the distribution is not stationary with respect to column number.

Problem 2¶

I will assume throughout this problem that "row" means "column". The usual convention is:

$$ \textrm{E[row, column]} $$

but I will assume in this problem that it is:

$$ \textrm{E[column, row]}$$

In [104]:

rows = [50,70]
figure(figsize=(8,3))

for i in range(len(rows)):
    row = rows[i]
    subplot(1,2,i+1)
    Ei = E[:,row]
    n = 16
    M = len(Ei)/n
    F2 = zeros(n)
    for j in range(M):
        Estub = Ei[n*j:n*(j+1)]
        F = fft.fft(Estub)/n
        F2 += abs(F)**2
    F2 = F2/M
    F2s = fft.fftshift(F2)
    x = arange(n)
    xs = fftshift(x)
    xs[xs>=n/2] = xs[xs>=n/2] - n
    plot(xs,F2s,'k')
    xlabel('FFT bin')
    ylabel('Magnitude-squared')
    title('Row %i' % row)
    print 'Row %i average power according to FFT: %f' % (row,sum(F2))
    
tight_layout()

Row 50 average power according to FFT: 728232.444974
Row 70 average power according to FFT: 224185.505807

The row 50 average power is calculated using the FFT above. The average power calculated using the raw samples is almost exactly the same. I'm not sure what's causing the difference - it's probably numerical, or I missed an index somewhere.

In [119]:

print 'Row 50 average power according to raw samples: %f' % (mean(abs(E[:,50])**2))

Row 50 average power according to raw samples: 728237.000000

Problem 3¶

In [120]:

data = load('/home/bhardin2/ece458/2015.01.08.14.02.24.MST_10min.npz')
data.keys()

Out[120]:

['timezone_hr', 'heights', 'epochtime', 'MSTdata']

In [128]:

t = data['epochtime']
print len(t)
h = data['heights']
print len(h)
dat = data['MSTdata']
print shape(dat)
plot(diff(t)[0:300])
xlabel('Index')
ylabel('Sampling Interval')

4800
1287
(4800, 1287)

Out[128]:

<matplotlib.text.Text at 0x7f063847a550>

The data are not uniformly sampled. There is a tiny gap every so often. This gap is probably negligible, however.

In [134]:

T = t[1]-t[0]
fnyq = 1/(2*T)
print 'f_nyq is %f Hz' % fnyq
N = 64
df = 1/(T*N)
print 'Delta f is %f Hz' % df

f_nyq is 3.703704 Hz
Delta f is 0.115741 Hz

As found above, the number of samples in time is 4800, and in height is 1287.

In [147]:

hv = 146.1
hi = argmin(abs(hv - h)) # Extract index closest to hv
M = 100
n = 32

F2 = zeros(n)
for j in range(M):
    Estub = dat[n*j:n*(j+1),hi]
    F = fft.fft(Estub)/n
    F2 += abs(F)**2
F2 = F2/M
F2s = fft.fftshift(F2)
x = arange(n)
xs = fftshift(x)
xs[xs>=n/2] = xs[xs>=n/2] - n
plot(xs,F2s,'k')
xlabel('FFT bin')
ylabel('Magnitude-squared')
axis('tight')

Out[147]:

(-16.0, 15.0, 791188.32769767311, 368844057.20683759)

In [151]:

mi = argmax(F2s)
m = xs[mi]
print 'index of maximum is m=%i' % m
print 'frequency at maximum is f=%f Hz' % (m*df)

index of maximum is m=-4
frequency at maximum is f=-0.462963 Hz

In [158]:

E = dat[:32,hi]
Q = imag(E)
I = real(E)
plot(I,Q,'k.-')
for n in range(len(Q)):
    text(I[n],Q[n],str(n))
grid()
xlabel('I')
ylabel('Q')

Out[158]:

<matplotlib.text.Text at 0x7f063791ef10>

The (I,Q) points appear to circulate around the origin clockwise, which is to be expected from the negative Doppler shift found above. The points appear to circulate once every 9 samples, which corresponds to a frequency of:

In [164]:

print 'f = %f Hz' % (-1/(9*T))

f = -0.823045 Hz

which is within about a factor of 2 of the answer found using the fft.

I'm not entirely sure what Part g is asking, but the peak at negative $m$ indicates a negative Doppler shift, indicating motion of scatterers away from the radar. The data are complex-valued, so we should not expect a symmetric spectrum. (I wasn't in class on Thursday to hear Billy's question, so I may be missing the point.)

In [ ]: