Deploying Anomaly Detection Models on Raspberry Pi
So at Myelin, we've been working on industrial IoT monitoring. The idea is simple: stick sensors on factory equipment, run anomaly detection on-device, and alert when something's off. The device of choice? A Raspberry Pi 3B+. One gig of RAM. A quad-core ARM CPU that sounds impressive until you actually try to do inference on it.
Honestly, I love this little board. I also want to throw it out the window sometimes.
Picking the Right Model
The first instinct is to go with a deep autoencoder. Train it on "normal" sensor readings, then flag anything with high reconstruction error. And look, autoencoders work great on your laptop. On a Pi with 1GB RAM where the OS itself eats 400MB? Not so much.
Here's what I ended up comparing:
| Model | Size | Inference Time (Pi 3B+) | RAM Usage |
|---|---|---|---|
| Conv Autoencoder (FP32) | 12MB | 180ms | 340MB |
| Conv Autoencoder (INT8 TFLite) | 3MB | 45ms | 95MB |
| Isolation Forest (sklearn) | 2MB | 8ms | 60MB |
| Simple Dense Autoencoder (INT8) | 800KB | 12ms | 40MB |
The thing is, for sensor data with 10-20 features, you don't need a massive model. A 3-layer dense autoencoder with 64-32-64 neurons, quantized to INT8, runs beautifully. The isolation forest is even faster but doesn't generalize as well to new patterns.
TFLite Conversion
The conversion itself is straightforward:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model('anomaly_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.int8]
converter.representative_dataset = representative_data_gen
tflite_model = converter.convert()
with open('anomaly_model.tflite', 'wb') as f:
f.write(tflite_model)The representative_dataset part is key. You need to feed it ~200 samples of real sensor data so the quantizer can calibrate the INT8 ranges properly. Skip this and your model outputs garbage.
The 2am Debugging Sessions
Here's something nobody warns you about. The Pi's SD card is slow. Really slow. If your inference script loads the model from disk on every prediction cycle, you'll bottleneck on I/O, not compute. Load the model once into memory and keep it there.
I learned this the hard way. SSH-ed into the Pi from my PG at like 2am, trying to figure out why inference was taking 500ms when my benchmarks said 45ms. Three cups of chai later, I realized the model was being reloaded every cycle because of a bug in the data pipeline restart logic.
Also, thermal throttling is real. The Pi 3B+ will downclock itself if it gets too hot. We ended up sticking a tiny heatsink on the CPU and running the inference loop at 2Hz instead of 10Hz. Good enough for industrial monitoring where anomalies develop over minutes, not milliseconds.
Running Inference in Production
The production setup ended up looking like this:
import numpy as np
import tflite_runtime.interpreter as tflite
interpreter = tflite.Interpreter(model_path='anomaly_model.tflite')
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
def detect_anomaly(sensor_reading):
input_data = np.array([sensor_reading], dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
reconstruction = interpreter.get_tensor(output_details[0]['index'])
error = np.mean((sensor_reading - reconstruction) ** 2)
return error > THRESHOLDPro tip: use tflite_runtime instead of full TensorFlow on the Pi. Full TF is over 400MB and takes forever to import. The runtime is under 5MB and loads in seconds.
What I'd Do Differently
If I were starting fresh, I'd look at the Raspberry Pi 4 with 4GB RAM, or even dedicated accelerators like the Coral Edge TPU or Jetson Nano that we later evaluated at Myelin. The extra headroom means you can run a proper conv autoencoder without worrying about OOM kills. But constraints breed creativity, and the 3B+ forced us to build something genuinely efficient. That model is still running on a factory floor somewhere in Pune, quietly catching equipment faults at 2 predictions per second. I later wrote about the full end-to-end IoT pipeline we built around this model, including the MQTT plumbing and adaptive thresholds that made it production-ready.
Related Posts
Edge AI in 2024: Why On-Device Inference Changes Everything
Four years after I called edge ML the future, on-device inference is finally mainstream. Here's what changed, what didn't, and where we're headed.
TFLite vs ONNX Runtime: A Practical Edge AI Comparison
I deploy models with both TFLite and ONNX Runtime. Here's an honest comparison from someone who deals with the rough edges daily.
Building a Real-Time Anomaly Detection Pipeline for IoT
From sensor data to alerts in under 2 seconds. Here's the full architecture we built at Myelin for industrial monitoring.