Kalman Filter

Using the Kalman Filter on IOT-sensor temperature data in Python

I started looking into the Kalman Filter for a separate project on cleaning sensor data and found that there was limited online Python resources, so I’d like to use this page to add on to the existing and provide my code in case someone out there can find it useful. I’m also happy to hear from others if the way I ended up using the filter was completely incorrect.

Since I am not well versed in the Kalman Filter, I will not be going into the mathematical definition, but there is a wealth of information online. I will be using the Kalman Filter in a single dimension solution (i.e. not calculating velocity), for temperature measurement readings from an IOT sensor. The csv file was downloaded from Kaggle and can be found here.


Importing libraries

import pandas as pd
import plotly as py
import plotly.express as px
import plotly.figure_factory as ff
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
import statsmodels

from itertools import compress
from math import *
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
from sklearn.model_selection import train_test_split
from pykalman import KalmanFilter

from scipy import interpolate
from sklearn import datasets, linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
from plotly.offline import init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

Data describe

The data provided had both an inside and outside data sensor, so I decided to split the original dataset into two based on this Boolean value.

Summary:

  1. Total data pts = 97606, 8733 inside, 22605 outside

  2. Inside mean = 30.67, outside mean = 39.51

  3. Inside min = 21, outside min = 24

  4. Inside max = 41, outside max = 51

df = pd.read_csv (r'C:\Users.... (enter your file path)
df_out = df[(df['out/in']=='Out')]
df_in = df[(df['out/in']=='In')]

# Inside Sensor data set
temp_in = pd.DataFrame(df_in, columns=['noted_date', 'temp'])
temp_in = temp_in.groupby(['noted_date']).mean()
temp_in.reset_index(inplace=True)
temp_in['temp_inside'] = temp_in['temp']
del temp_in['temp']
print (temp_in.describe())

# Outside Sensor data set
temp_out = pd.DataFrame(df_out, columns=['noted_date', 'temp'])
temp_out = temp_out.groupby(['noted_date']).mean()
temp_out.reset_index(inplace=True)
temp_out['temp_outside'] = temp_out['temp']
del temp_out['temp']
print (temp_out.describe())
Inside temp.png

Inside temp

Inside Temp over time

Outside temp.png

Outside temp

Outside temp over time


Kalman Filter Python code

The below is the Kalman Filter model I used on the Inside temperature data from above. I did break this table into a train and test, but will not be covering that portion of the code here.

# Kalman Filter model
# using the Kalman Filter to clean the Inside IOT sensor 
measurements = np.asarray(temp_in)

kf = KalmanFilter(transition_matrices=[1],
                  observation_matrices=[1],
                  initial_state_mean=measurements[0,1],
                  initial_state_covariance=1,
                  observation_covariance=5,
                  transition_covariance=1) 
state_means, state_covariances = kf.filter(measurements[:,1]) 
state_std = np.sqrt(state_covariances[:,0]) 

plt.figure(figsize=(12, 8))
plt.plot(measurements[10:,0], measurements[10:,1], '-b', label='Data') 
plt.plot(measurements[10:,0], state_means[10:,0], '-r', label='Kalman-filter 5') 
plt.legend(loc='upper left') 
plt.show()

kalmandf = pd.DataFrame(state_means, columns=['temp_inside_KF'], index=temp_in.index)

temp_kalman = pd.concat([temp_in,kalmandf], axis=1)
print(temp_kalman.head(10))

temp_kalman.drop(temp_kalman.head(10).index, inplace=True)

print('')
print ('--Kalman Filter--'*5)
print(model_eval(temp_kalman['temp_inside'],temp_kalman['temp_inside_KF']))
Inside Kalman.png

Model Performance

Mean Absolute Error: 0.24 Mean Squared Error: 30.375 R2 Score: 0.953 Root Mean Squared Error: 0.504 Mean absolute percentage error: 0.806 Scaled Mean absolute percentage error: 0.802 Mean forecast error: 30.368 Normalised mean squared error: 0.047 Theil_u_statistic: 0.001