-
Notifications
You must be signed in to change notification settings - Fork 434
FIX: reduce dataset size in compressor example #735
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Other changes are related to data displays. This fixes the documentation build on ReadTheDocs.
Another thing was also changed in the ReadTheDocs configuration: it now builds with CPython3.5. This is required by the LZMA compression used in the compressor example, since it was introduced in Python 3.3. |
Codecov Report
@@ Coverage Diff @@
## master #735 +/- ##
==========================================
- Coverage 95.31% 95.28% -0.04%
==========================================
Files 42 42
Lines 6128 6128
==========================================
- Hits 5841 5839 -2
- Misses 287 289 +2
Continue to review full report at Codecov.
|
That's great!! Thank you very much. Merging! |
That worked, but the timings on the generate documentation are really strange and do not convey the ideas that we were hoping :( |
Thanks for merging @GaelVaroquaux !
Indeed, it's ugly. |
Maybe try a twice as large number of rows? Would that fit the rtfd constraints? |
I did some benchmarks of the impact of the
Here are some plots:
I also tested the builds on RTD and they all pass until 2e6 lines. The duration results are really meaningful starting from 500k lines in the dataset. I ran a couple of builds on RTD with 1e6 lines (800MB of RAM) and they passed. Conclusion: |
This PR is an attempt to fix #724.
The idea is just to use less rows in the input dataset. Other changes are related to data displays.
This fixes the documentation build on ReadTheDocs, see the generated documentation here.
Some problems: the size of the dataset is now very small, so there's a lot of noise in the measured durations. This induces a lot of variations in the measures and sometimes LZ4 compression/decompression is slightly slower than raw (which is contradicting the example comments).