·5 min read

Deploying Anomaly Detection Models on Raspberry Pi

raspberry-piedge-aianomaly-detection

So at Myelin, we've been working on industrial IoT monitoring. The idea is simple: stick sensors on factory equipment, run anomaly detection on-device, and alert when something's off. The device of choice? A Raspberry Pi 3B+. One gig of RAM. A quad-core ARM CPU that sounds impressive until you actually try to do inference on it.

Honestly, I love this little board. I also want to throw it out the window sometimes.

The edge anomaly detection pipeline: sensor data flows through preprocessing and on-device inference, with alerts sent when anomalies are detected.

Picking the Right Model

The first instinct is to go with a deep autoencoder. Train it on "normal" sensor readings, then flag anything with high reconstruction error. And look, autoencoders work great on your laptop. On a Pi with 1GB RAM where the OS itself eats 400MB? Not so much.

Here's what I ended up comparing:

ModelSizeInference Time (Pi 3B+)RAM Usage
Conv Autoencoder (FP32)12MB180ms340MB
Conv Autoencoder (INT8 TFLite)3MB45ms95MB
Isolation Forest (sklearn)2MB8ms60MB
Simple Dense Autoencoder (INT8)800KB12ms40MB

The thing is, for sensor data with 10-20 features, you don't need a massive model. A 3-layer dense autoencoder with 64-32-64 neurons, quantized to INT8, runs beautifully. The isolation forest is even faster but doesn't generalize as well to new patterns.

TFLite Conversion

The conversion itself is straightforward:

import tensorflow as tf
 
converter = tf.lite.TFLiteConverter.from_saved_model('anomaly_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()
 
with open('anomaly_model.tflite', 'wb') as f:
    f.write(tflite_model)

The representative_dataset part is key. You need to feed it ~200 samples of real sensor data so the quantizer can calibrate the INT8 ranges properly. Skip this and your model outputs garbage.

The 2am Debugging Sessions

Here's something nobody warns you about. The Pi's SD card is slow. Really slow. If your inference script loads the model from disk on every prediction cycle, you'll bottleneck on I/O, not compute. Load the model once into memory and keep it there.

I learned this the hard way. SSH-ed into the Pi from my PG at like 2am, trying to figure out why inference was taking 500ms when my benchmarks said 45ms. Three cups of chai later, I realized the model was being reloaded every cycle because of a bug in the data pipeline restart logic.

Also, thermal throttling is real. The Pi 3B+ will downclock itself if it gets too hot. We ended up sticking a tiny heatsink on the CPU and running the inference loop at 2Hz instead of 10Hz. Good enough for industrial monitoring where anomalies develop over minutes, not milliseconds.

Running Inference in Production

The production setup ended up looking like this:

import numpy as np
import tflite_runtime.interpreter as tflite
 
interpreter = tflite.Interpreter(model_path='anomaly_model.tflite')
interpreter.allocate_tensors()
 
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
 
def detect_anomaly(sensor_reading):
    input_data = np.array([sensor_reading], dtype=np.float32)
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()
    reconstruction = interpreter.get_tensor(output_details[0]['index'])
    error = np.mean((sensor_reading - reconstruction) ** 2)
    return error > THRESHOLD

Pro tip: use tflite_runtime instead of full TensorFlow on the Pi. Full TF is over 400MB and takes forever to import. The runtime is under 5MB and loads in seconds.

What I'd Do Differently

If I were starting fresh, I'd look at the Raspberry Pi 4 with 4GB RAM, or even dedicated accelerators like the Coral Edge TPU or Jetson Nano that we later evaluated at Myelin. The extra headroom means you can run a proper conv autoencoder without worrying about OOM kills. But constraints breed creativity, and the 3B+ forced us to build something genuinely efficient. That model is still running on a factory floor somewhere in Pune, quietly catching equipment faults at 2 predictions per second. I later wrote about the full end-to-end IoT pipeline we built around this model, including the MQTT plumbing and adaptive thresholds that made it production-ready.