Building Cross-Platform ML Inference with KMP
A deep dive into abstracting TFLite and CoreML behind a unified Kotlin Multiplatform API.
Building Cross-Platform ML Inference with KMP
Running ML models on mobile devices is not new. What's still painful is doing it consistently across Android and iOS without duplicating everything.
This post walks through how I built a unified inference layer using Kotlin Multiplatform — one API surface, two platform backends.
The Problem
Every mobile ML project starts the same way. You train a model, export it, then realize you need to write inference code twice:
- Android: TensorFlow Lite, load
.tflitemodel, manage interpreters, handle input/output buffers - iOS: Core ML, convert to
.mlmodel, deal withMLMultiArray, manage predictions
The model is the same. The preprocessing is the same. The postprocessing is the same. But the code? Completely different.
Why KMP
Kotlin Multiplatform lets you share business logic while keeping platform-specific implementations where they need to be. For ML inference, this means:
- Shared: Model configuration, preprocessing, postprocessing, result types
- Platform-specific: The actual inference call (TFLite on Android, CoreML on iOS)
The key insight is that the inference call itself is a tiny part of the pipeline. Most of the work — image resizing, normalization, NMS, label mapping — is pure logic that doesn't need platform APIs.
Architecture
The design uses Kotlin's expect/actual mechanism:
// Shared
expect class ModelInterpreter {
fun predict(input: FloatArray): FloatArray
}
// Android
actual class ModelInterpreter {
private val interpreter = Interpreter(modelBuffer)
actual fun predict(input: FloatArray): FloatArray { ... }
}
// iOS
actual class ModelInterpreter {
private val model = MLModel(contentsOf: modelUrl)
actual fun predict(input: FloatArray): FloatArray { ... }
}
The shared module defines the full pipeline — from raw camera frame to structured detection results. Platform modules only implement the thin interpreter wrapper.
Results
After extracting the shared logic:
- ~70% code sharing across platforms
- Single test suite for preprocessing/postprocessing
- New model integration takes hours, not days
- Bug fixes apply everywhere at once
Takeaways
KMP isn't a silver bullet. The platform-specific parts still need attention — memory management on iOS, thread handling on Android. But for the parts that are genuinely the same across platforms? Write them once.
The best abstraction is the one that matches reality. ML inference pipelines are 80% math and 20% platform plumbing. KMP lets you treat them that way.