import numpy as np
import matplotlib.pyplot as plt
import scipy as sc
import scipy.special
from math import lgamma
Template searching using BITS
Introduction
It is often not clear how one can identify features of different scales in a signal, especially when the signal is distorted by noise. Here, we show that this can be achieved very simply by comparing the local shape of a signal to the previously known ideal shape of the feature, represented here by a template. The process we outline here, where the Bayes factor for the presence of the shape determined by a localised template (evidence_shape) vs the absence of any shape (evidence_flat) is determined at each point of the data, is a simplified proof of principle example for a generalised algorithm which we call Bayesian Inference-based Template Search (BITS), as described here.
def evidence_shape(y, x): # corresponds to Eq. 2.2.1 in SI for shape calculations
=float(x.size)
N
= np.mean(x)
ex = np.mean(x*x)
exx = np.mean(y)
ey = np.mean(y*y)
eyy = np.mean(x*y)
exy = exx - ex*ex
vx = eyy - ey*ey
vy = exy - ex*ey
vxy = vxy*vxy
v2xy = vxy/np.sqrt(vx*vy)
r= r*r
r2 = (N-2.)/2.
m
= 0
log_l = np.log(1e20) #this is the prior term for the scale
log_delm += -log_delm
log_l
+= -m*np.log(np.pi)
log_l += -N/2*np.log(N)
log_l += lgamma(m)
log_l += -0.5*np.log(vx)
log_l += -m*np.log(vy)
log_l += -m*np.log(1.-r2)
log_l += -np.log(2)
log_l += np.log(1.+(r/np.abs(r))*sc.special.betainc(0.5,m,r2))
log_l
return log_l
print(evidence_shape(np.random.rand(4), np.random.rand(4))) #checking it works
-46.613923638849904
def evidence_flat(y): # corresponds to Eq. 2.2.4 in SI for shape calculations
=float(y.size)
N
= np.mean(y)
ey = np.mean(y*y)
eyy = eyy - ey*ey
vy = (N-2.)/2.
m
= -(m + .5)*np.log(np.pi)
log_l += -(N/2)*np.log(N)
log_l += lgamma(m +.5)
log_l += -(m + 0.5)*np.log(vy)
log_l
return log_l
print(evidence_flat(np.random.rand(10)))
-4.865221143158362
def make_peak(N, m, b):
= np.arange(N)
x = m*np.exp(-((x - N/2)**2)/2/(N**1)) + b
y
return y
First, we compute the Bayes factor for a noisy signal with a Gaussian peak (red). The template (which is normalised to have height of 1) used is shown in black (mutliplied x100 for scale). The log_Bayes is printed. The Bayes factor shows that there is overwhelming evidence that the data is shaped like the template as opposed to being flat (i.e, just noise).
= 150
width = make_peak(width, 100.,0)*1 + np.random.normal(0., 10, width)
peak = make_peak(width, 1., 0)
template
1, figsize=(3,4))
plt.figure(0,150)
plt.xlim('Intensity')
plt.ylabel('r')
plt.plot(peak, *100, 'k--',lw = 2)
plt.plot(template
plt.show()
print('log_bayes =',evidence_shape(peak, template)- evidence_flat(peak))
log_bayes = 139.65598721558422
Next we compare the same template(x100, in black) to a data set which we know contains no peaks, i.e., it is just noise (in red). Here, we see that the Bayes factor overwhelmingly shows that there is no peak, and that the variation in the data is only caused by noise.
= np.random.normal(0., 10, width)
flat
1, figsize=(3,4))
plt.figure(0,150)
plt.xlim('Intensity')
plt.ylabel('r')
plt.plot(flat, *100, 'k--',lw = 2)
plt.plot(template
plt.show()
print('log_bayes =',evidence_shape(flat, template) - evidence_flat(flat))
log_bayes = -44.08800743753579
With the above results in mind, we next construct a dataset composed of multiple such Gaussian peaks. To get rid of the effect of scale, we randomise the scale for each of these peaks. The simulated dataset is plotted below.
= np.zeros(1000)
x
= 150
width = np.array([20, 176, 432,593,765])
start
666)
np.random.seed(
for i in start:
+width] = x[i:i+width] + (make_peak(width, np.abs(np.random.randn())*100,0))
x[i:i
= x + np.random.normal(0, 10, 1000)
x
1, figsize=(10,4))
plt.figure(0,1000)
plt.xlim('r')
plt.plot(x, -35,135)
plt.ylim( plt.show()
To search for our template in this dataset, we make use of the fact that our template is supposed to be localised (i.e, it quickly drops to zero at distances far enough from the maximum). Therefore, we segment the full dataset into slices of the same size as the template, and compute the Bayes factor for the evidence of the shape vs the evidence of no shape for each of these slices. In effect, this is equivalent to performing a linear search of our localised template through the full dataset.
The log_Bayes for each location (after allowing for truncation at the edges) is plotted below (in blue). The locations of the centers of our simulated peaks, as previously known, are plotted as well (in black).
= []
bayes
= make_peak(width, 1, 0)
template for i in range(width//2, x.shape[0] - width//2):
= x[i -width//2:i +width//2]
temp_shape
-evidence_flat(temp_shape) + evidence_shape(temp_shape, template))
bayes.append(
1, figsize=(10,4))
plt.figure(0,1000)
plt.xlim('r')
plt.plot(x, -35,135)
plt.ylim('Intensity')
plt.ylabel(
plt.show()2, figsize=(10,4))
plt.figure(0,1000)
plt.xlim(-50,175)
plt.ylim(range(width//2, x.shape[0] - width//2),bayes, 'b')
plt.plot(range(x.shape[0]),np.zeros_like(x), 'k', alpha = 0.3)
plt.plot(+ width//2, -55, 200, 'k', alpha = 0.3)
plt.vlines(start 'log_Bayes')
plt.ylabel( plt.show()
The log_Bayes factor is positive where there is more evidence for the presence of the shape, and negative where there is more evidence for its absence. We see that the Bayes factor is locally maximised at the previously known centers for our peaks, thereby showing that this approach can identify and localise features, in this case, Gaussian peaks, present in noisy data.