-
Notifications
You must be signed in to change notification settings - Fork 77
Description
Issue
I am attempting to parse a sas7bdat
with a data set page size of 2097152
(~ 2 MiB). When attempting to parse this file I get the error:
Stopping with error: Error when attempting to parse sas7bdat: READSTAT_ERROR_PARSE
SAS file
As noted, the data set page size of the file is 2091752
. The SAS dataset is a randomly created dataset with 2,000 rows and 110 created columns and is ~6MB in size.
I can regenerate the same file, reducing the page size from 2091752
(~ 2 MiB) down to 1048576
(~ 1MiB) and the file parses without issue.
The files I used for testing are in the following location:
- rand_ds_largepage_err.sas7bdat
- Throws error when attempting to parse
- Data set page size ~ 2MiB
- rand_ds_largepage_ok.sas7bdat
- Able to parse successfully
- Data set page size ~ 1MiB
To generate the tables linked to above, I manually adjusted using the following in SAS.
/* Unable to parse */
data rand_ds_largepage_err(bufsize=2M);
/* elided */
run;
/* Able to parse */
data rand_ds_largepage_ok(bufsize=1M);
/* elided */
run;
SAS BUFSIZE
option
According to the SAS documentation on the BUFSIZE
option, the page size may be adjusted by altering the system or dataset BUFSIZE
. Again from the documentation, the maximum data set page size that may be set is 2147483647
.
MAX → sets the page size to the maximum possible number in your operating environment, up to the largest 4-byte, signed integer, which is 2^31–1, or approximately 2 billion bytes.
Troubleshooting / Potential Fix
On line 257 of the readstat_sas.c
file I note the hinfo->header_size
is checked against 1<<20
(1048576 in decimal). If I alter this line to check against 1<<21
(2097152 in decimal), the file parses without issue.
Because the SAS documentation notes that users can set the data set page size to as much as ~ 4GiB, I wonder if the line should be adjusted to check against INT32_MAX
. Obviously making the adjustment may have ramifications that I don't immediately observe as I'm not extremely familiar with the repository.
Finally, I am glad to submit a PR with the change I noted above. Or if you have suggestions on a set of other changes I would need to trace through, I am glad to put in the work. All in all, I am glad to help in any way! Thanks so much for all the effort you (and others) have put into the library!