-
Notifications
You must be signed in to change notification settings - Fork 60
Description
Describe the bug, including details regarding any error messages, version, and platform.
Hi team,
I've been playing recently with the Go SQL Driver For DuckDB which uses the arrow code here for some functionalities. Specifically the RegisterView method here is making use of the arrow.cdata.ExportRecordReader()
function.
The issue is that in my testing I found that this method is causing a pretty big memory leak that cannot be prevented without changes in arrow-go
.
I was able to fix it by forking and doing changes both in this repo and go-duckdb
, however as I'm new to arrow-go
could you please help me understand if I'm missing something and my fixes (described below) are actually unnecessary or they are OK and I should contribute them!
Here is the sequence of events I observed when the memory leak is occurring:
- In a function...
- I create a new
array.RecordReader
(it is created withrefCount = 1
) - I pass that record reader to the
go-duckdb
RegisterView()
method and therefore tocdata.ExportRecordReader()
cdata.ExportRecordReader()
internally does aRetain()
on thearray.RecordReader
(thus makingrefCount = 2
)- some work is done
- The
release()
function returned by the DuckDB method is called. It only frees the memory forcdata.CArrowArrayStream
(I don't observe therefCount
going down, soRelease()
is not called to compensate for the internalRetain()
) - I call
Release()
on thearray.RecordReader
to release its memory, but because therefCount
was internally increased to 2, now it is 1 and its memory is not released - then the function is called again and again and on every call a new
array.RecordReader
is created whose memory is not released ... and so on and so on until I have millions of record readers in memory.
My initial solution was just to call Release()
on the array.RecordReader
one more time and that fixed the memory issue right away. However, this doesn't seem like a proper solution given that not my code did the extra Retain()
. Therefore I attempted to do the change in go-duckdb
but initially wasn't able to figure out how.
The only example of using the cdata.ExportRecordReader()
where the memory is explicitly released is this test TestExportRecordReaderStreamLifetime here. It is using the non-exported C.ArrowArrayStreamRelease()
function which I guess is somehow calling the streamRelease function . Therefore I went on to see what would happen if I added in to cdata
a new ReleaseCArrowArrayStream()
function only calling C.ArrowArrayStreamRelease()
(similarly to the ReleaseCArrowArray here) and then calling that in the release()
function for RegisterView()
in go-duckdb
.
This worked as I expected (no visible memory leak) however I'm still not sure if I and/or go-duckdb
is using the cdata.ExportRecordReader()
function incorrectly and all those code changes are unnecessary.
Please advise!
Thanks,
Kalin
Component(s)
Other