Skip to content

Memory leak in arrow.cdata.ExportRecordReader() ? #371

@karsov

Description

@karsov

Describe the bug, including details regarding any error messages, version, and platform.

Hi team,
I've been playing recently with the Go SQL Driver For DuckDB which uses the arrow code here for some functionalities. Specifically the RegisterView method here is making use of the arrow.cdata.ExportRecordReader() function.

The issue is that in my testing I found that this method is causing a pretty big memory leak that cannot be prevented without changes in arrow-go.
I was able to fix it by forking and doing changes both in this repo and go-duckdb, however as I'm new to arrow-go could you please help me understand if I'm missing something and my fixes (described below) are actually unnecessary or they are OK and I should contribute them!

Here is the sequence of events I observed when the memory leak is occurring:

  • In a function...
  • I create a new array.RecordReader (it is created with refCount = 1)
  • I pass that record reader to the go-duckdb RegisterView() method and therefore to cdata.ExportRecordReader()
  • cdata.ExportRecordReader() internally does a Retain() on the array.RecordReader (thus making refCount = 2)
  • some work is done
  • The release() function returned by the DuckDB method is called. It only frees the memory for cdata.CArrowArrayStream (I don't observe the refCount going down, so Release() is not called to compensate for the internal Retain())
  • I call Release() on the array.RecordReader to release its memory, but because the refCount was internally increased to 2, now it is 1 and its memory is not released
  • then the function is called again and again and on every call a new array.RecordReader is created whose memory is not released ... and so on and so on until I have millions of record readers in memory.

My initial solution was just to call Release() on the array.RecordReader one more time and that fixed the memory issue right away. However, this doesn't seem like a proper solution given that not my code did the extra Retain(). Therefore I attempted to do the change in go-duckdb but initially wasn't able to figure out how.
The only example of using the cdata.ExportRecordReader() where the memory is explicitly released is this test TestExportRecordReaderStreamLifetime here. It is using the non-exported C.ArrowArrayStreamRelease() function which I guess is somehow calling the streamRelease function . Therefore I went on to see what would happen if I added in to cdata a new ReleaseCArrowArrayStream() function only calling C.ArrowArrayStreamRelease() (similarly to the ReleaseCArrowArray here) and then calling that in the release() function for RegisterView() in go-duckdb.
This worked as I expected (no visible memory leak) however I'm still not sure if I and/or go-duckdb is using the cdata.ExportRecordReader() function incorrectly and all those code changes are unnecessary.

Please advise!
Thanks,
Kalin

Component(s)

Other

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions