[Mobile] Memory crash after repeated inference with dynamic shape input #22520

laurenspriem · 2024-10-21T10:06:45Z

Describe the issue

We recently altered our ONNX model used in production in our mobile app to include the preprocessing steps, which were previously done separately prior to inference. Because it is an image model, this means that now the model takes as input an array of raw RGBA bytes of an image, which tends to be a lot of data. We've found that since this change the memory consumption goes continually up as the app performs more inference runs, eventually resulting in a crash.

I was wondering, is there anything we can do in our Java/Kotlin code to make sure memory is getting properly cleared? Aside from the outputs.close() and inputTensor.close() calls that we already have? It seems like GC is not able to keep up with continued inference runs right now.

Please see below for the crash logs. Thank you in advance for any and all help!

I/dependent.debug(16354): Background concurrent mark compact GC freed 638KB AllocSpace bytes, 30(193MB) LOS objects, 29% free, 230MB/326MB, paused 762us,6.342ms total 308.671ms
I/dependent.debug(16354): Waiting for a blocking GC Alloc
I/dependent.debug(16354): Background concurrent mark compact GC freed 200KB AllocSpace bytes, 2(104KB) LOS objects, 21% free, 342MB/438MB, paused 453us,7.482ms total 133.554ms
I/dependent.debug(16354): WaitForGcToComplete blocked Alloc on Background for 53.726ms
I/dependent.debug(16354): Starting a blocking GC Alloc
I/dependent.debug(16354): Forcing collection of SoftReferences for 111MB allocation
I/dependent.debug(16354): Starting a blocking GC Alloc
I/dependent.debug(16354): Alloc concurrent mark compact GC freed 52KB AllocSpace bytes, 0(0B) LOS objects, 21% free, 342MB/438MB, paused 454us,3.273ms total 36.865ms
W/dependent.debug(16354): Throwing OutOfMemoryError "Failed to allocate a 117235219 byte allocation with 100630528 free bytes and 169MB until OOM, target footprint 460065312, growth limit 536870912" (VmSize 26713268 kB)
2
I/dependent.debug(16354): Starting a blocking GC Alloc
I/dependent.debug(16354): Alloc concurrent mark compact GC freed 64KB AllocSpace bytes, 0(0B) LOS objects, 21% free, 342MB/438MB, paused 501us,5.569ms total 47.679ms
I/dependent.debug(16354): Forcing collection of SoftReferences for 111MB allocation
I/dependent.debug(16354): Starting a blocking GC Alloc
I/dependent.debug(16354): Alloc concurrent mark compact GC freed 32KB AllocSpace bytes, 0(0B) LOS objects, 21% free, 342MB/438MB, paused 480us,3.199ms total 40.527ms
W/dependent.debug(16354): Throwing OutOfMemoryError "Failed to allocate a 117235219 byte allocation with 100663296 free bytes and 169MB until OOM, target footprint 460065312, growth limit 536870912" (VmSize 26713268 kB)
E/AndroidRuntime(16354): FATAL EXCEPTION: DefaultDispatcher-worker-1
E/AndroidRuntime(16354): Process: io.ente.photos.independent.debug, PID: 16354
E/AndroidRuntime(16354): java.lang.OutOfMemoryError: Failed to allocate a 117235219 byte allocation with 100663296 free bytes and 169MB until OOM, target footprint 460065312, growth limit 536870912
E/AndroidRuntime(16354): at dalvik.system.VMRuntime.newNonMovableArray(Native Method)
E/AndroidRuntime(16354): at java.nio.DirectByteBuffer$MemoryRef.<init>(DirectByteBuffer.java:73)
E/AndroidRuntime(16354): at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:347)
E/AndroidRuntime(16354): at ai.onnxruntime.OrtUtil.prepareBuffer(OrtUtil.java:507)
E/AndroidRuntime(16354): at ai.onnxruntime.OnnxTensor.createTensor(OnnxTensor.java:754)
E/AndroidRuntime(16354): at ai.onnxruntime.OnnxTensor.createTensor(OnnxTensor.java:610)
E/AndroidRuntime(16354): at ai.onnxruntime.OnnxTensor.createTensor(OnnxTensor.java:589)
E/AndroidRuntime(16354): at io.ente.photos.onnx_dart.OnnxDartPlugin$predict$2.invokeSuspend(OnnxDartPlugin.kt:205)
E/AndroidRuntime(16354): at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
E/AndroidRuntime(16354): at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
E/AndroidRuntime(16354): at kotlinx.coroutines.internal.LimitedDispatcher$Worker.run(LimitedDispatcher.kt:115)
E/AndroidRuntime(16354): at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:100)
E/AndroidRuntime(16354): at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:584)
E/AndroidRuntime(16354): at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:793)
E/AndroidRuntime(16354): at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:697)
E/AndroidRuntime(16354): at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:684)
E/AndroidRuntime(16354): Suppressed: kotlinx.coroutines.internal.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@4fa0031, Dispatchers.IO]
I/Process (16354): Sending signal. PID: 16354 SIG: 9
Lost connection to device.

To reproduce

Get any model that has dynamic shaped input (lmk if I should share mine, can do that)
Continuously run inference on the model with different data using Java API on Android
Watch the memory consumption go up and up till app crashes

Urgency

Urgent, as this issue is happening in production, causing crashes and inconvenience for our mobile customers.

Platform

Android

OS Version

Android 14

ONNX Runtime Installation

Released Package

Compiler Version (if 'Built from Source')

No response

Package Name (if 'Released Package')

onnxruntime-android

ONNX Runtime Version or Commit ID

1.18

ONNX Runtime API

Java/Kotlin

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

Craigacp · 2024-10-21T14:35:20Z

How are you constructing the tensors? For best performance you should be using a cache of direct ByteBuffers you manage rather than letting the JVM create & garbage collect them as the GC algorithm can get overwhelmed.

laurenspriem added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Oct 21, 2024

github-actions bot added the api:Java issues related to the Java API label Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Mobile] Memory crash after repeated inference with dynamic shape input #22520

[Mobile] Memory crash after repeated inference with dynamic shape input #22520

laurenspriem commented Oct 21, 2024

Craigacp commented Oct 21, 2024

[Mobile] Memory crash after repeated inference with dynamic shape input #22520

[Mobile] Memory crash after repeated inference with dynamic shape input #22520

Comments

laurenspriem commented Oct 21, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

Compiler Version (if 'Built from Source')

Package Name (if 'Released Package')

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Craigacp commented Oct 21, 2024