.xls support? And a weird header issue

Firstly, thanks for roapi! I've been playing around with it for a little while and really like it.

I've been trying to use it to proxy some datafiles containing information on currency codes as published by the body behind the ISO 4217 standard. The [data is available](https://www.six-group.com/en/products-services/financial-information/data-standards.html) in .xls or .xml format, but I can't get roapi to parse the .xls file. Opening it in Excel shows that the file has a couple of leading rows of dud data, in a non-tabular format that's not relevant to the data to be loaded.

When running it with roapi I get the following errors:
```
$ RUST_LOG=debug roapi --table "https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/lists/list-one.xls,format=xlsx,sheet_name=Active"
[2023-02-13T11:26:06Z DEBUG datafusion::execution::memory_manager] Creating memory manager with initial size 11744051.2 TB
[2023-02-13T11:26:06Z INFO  roapi::context] loading `uri(https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/lists/list-one.xls)` as table `list-one`
Error: Error loading Xlsx table: Failed to load .xlsx file
```

Downloading the file locally gives the same error:
```
$ RUST_LOG=debug roapi --table "list-one.xls,format=xlsx,sheet_name=Active"
[2023-02-13T11:27:03Z DEBUG datafusion::execution::memory_manager] Creating memory manager with initial size 11744051.2 TB
[2023-02-13T11:27:03Z INFO  roapi::context] loading `uri(list-one.xls)` as table `list-one`
Error: Error loading Xlsx table: Failed to load .xlsx file
```

If I save it as an .xlsx file I get an error about unparseable data:
```
$ RUST_LOG=debug RUST_BACKTRACE=1 roapi --table "list-one.xlsx,format=xlsx,sheet_name=Active"
[2023-02-13T11:28:07Z DEBUG datafusion::execution::memory_manager] Creating memory manager with initial size 11744051.2 TB
[2023-02-13T11:28:07Z INFO  roapi::context] loading `uri(list-one.xlsx)` as table `list-one`
Error: Arrow error: Invalid argument error: column types must match schema types, expected Null but found Utf8 at column index 2

Caused by:
    Invalid argument error: column types must match schema types, expected Null but found Utf8 at column index 2
```

Do you think there's anything that might be done to support the loading of this file? Something like `skiprows` or `header` in Pandas would work. I've worked around it by processing the data with Pandas and saving it as a .parquet file, but I would like the ability to reference the current upstream file without having to process it.

Thanks for any input you can provide. I look forward to any suggestions you might have.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

.xls support? And a weird header issue #259

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

.xls support? And a weird header issue #259

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions