[Docling](https://github.com/DS4SD/docling) looks like a promising text extraction library that could possibly augment or replace Apache Tika. **Update**: Docling added 3.9 support, this is a go! ~The main integration issue is that [it only supports Python 3.10+](https://github.com/DS4SD/docling/issues/385).~