Deprecate `use_auth_token` in favor of `token` #5996

mariosasko · 2023-06-28T16:26:38Z

... to be consistent with transformers and huggingface_hub.

HuggingFaceDocBuilderDev · 2023-06-28T16:32:06Z

The documentation is not available anymore as the PR was closed or merged.

github-actions · 2023-06-28T16:32:26Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006134 / 0.011353 (-0.005219)	0.003816 / 0.011008 (-0.007193)	0.098226 / 0.038508 (0.059718)	0.036830 / 0.023109 (0.013721)	0.314551 / 0.275898 (0.038653)	0.372251 / 0.323480 (0.048771)	0.004762 / 0.007986 (-0.003224)	0.003041 / 0.004328 (-0.001287)	0.077651 / 0.004250 (0.073401)	0.052445 / 0.037052 (0.015393)	0.324632 / 0.258489 (0.066143)	0.365724 / 0.293841 (0.071883)	0.028069 / 0.128546 (-0.100477)	0.008444 / 0.075646 (-0.067203)	0.312767 / 0.419271 (-0.106505)	0.047773 / 0.043533 (0.004240)	0.305317 / 0.255139 (0.050178)	0.332007 / 0.283200 (0.048807)	0.018985 / 0.141683 (-0.122698)	1.538022 / 1.452155 (0.085868)	1.575898 / 1.492716 (0.083182)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.204780 / 0.018006 (0.186774)	0.428125 / 0.000490 (0.427635)	0.003454 / 0.000200 (0.003254)	0.000078 / 0.000054 (0.000024)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.025064 / 0.037411 (-0.012348)	0.099419 / 0.014526 (0.084893)	0.111068 / 0.176557 (-0.065489)	0.169775 / 0.737135 (-0.567361)	0.112067 / 0.296338 (-0.184271)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.429642 / 0.215209 (0.214433)	4.275556 / 2.077655 (2.197901)	1.914658 / 1.504120 (0.410539)	1.706556 / 1.541195 (0.165361)	1.754228 / 1.468490 (0.285738)	0.563669 / 4.584777 (-4.021108)	3.391501 / 3.745712 (-0.354211)	1.791517 / 5.269862 (-3.478345)	1.030704 / 4.565676 (-3.534973)	0.070882 / 0.424275 (-0.353393)	0.011351 / 0.007607 (0.003744)	0.529438 / 0.226044 (0.303394)	5.294316 / 2.268929 (3.025387)	2.344653 / 55.444624 (-53.099972)	1.997468 / 6.876477 (-4.879009)	2.108932 / 2.142072 (-0.033140)	0.676794 / 4.805227 (-4.128433)	0.135058 / 6.500664 (-6.365607)	0.065857 / 0.075469 (-0.009612)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.231864 / 1.841788 (-0.609924)	13.986694 / 8.074308 (5.912386)	13.306600 / 10.191392 (3.115208)	0.145520 / 0.680424 (-0.534904)	0.016717 / 0.534201 (-0.517484)	0.366303 / 0.579283 (-0.212980)	0.391637 / 0.434364 (-0.042727)	0.425445 / 0.540337 (-0.114892)	0.507719 / 1.386936 (-0.879217)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006236 / 0.011353 (-0.005116)	0.003766 / 0.011008 (-0.007242)	0.076794 / 0.038508 (0.038286)	0.037210 / 0.023109 (0.014101)	0.378387 / 0.275898 (0.102489)	0.425456 / 0.323480 (0.101977)	0.004694 / 0.007986 (-0.003291)	0.002921 / 0.004328 (-0.001407)	0.076985 / 0.004250 (0.072735)	0.052188 / 0.037052 (0.015136)	0.394385 / 0.258489 (0.135896)	0.432527 / 0.293841 (0.138686)	0.029091 / 0.128546 (-0.099455)	0.008364 / 0.075646 (-0.067282)	0.082583 / 0.419271 (-0.336689)	0.042928 / 0.043533 (-0.000605)	0.375321 / 0.255139 (0.120182)	0.391719 / 0.283200 (0.108519)	0.019388 / 0.141683 (-0.122295)	1.550644 / 1.452155 (0.098489)	1.604882 / 1.492716 (0.112166)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.236859 / 0.018006 (0.218853)	0.418528 / 0.000490 (0.418039)	0.000388 / 0.000200 (0.000188)	0.000059 / 0.000054 (0.000004)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.025548 / 0.037411 (-0.011863)	0.100644 / 0.014526 (0.086118)	0.109102 / 0.176557 (-0.067455)	0.161694 / 0.737135 (-0.575441)	0.112088 / 0.296338 (-0.184250)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.484128 / 0.215209 (0.268919)	4.849952 / 2.077655 (2.772297)	2.512769 / 1.504120 (1.008649)	2.303295 / 1.541195 (0.762100)	2.356699 / 1.468490 (0.888209)	0.564181 / 4.584777 (-4.020596)	3.421393 / 3.745712 (-0.324319)	2.570875 / 5.269862 (-2.698987)	1.474307 / 4.565676 (-3.091370)	0.068035 / 0.424275 (-0.356240)	0.011300 / 0.007607 (0.003693)	0.587867 / 0.226044 (0.361823)	5.862447 / 2.268929 (3.593519)	3.004017 / 55.444624 (-52.440607)	2.664989 / 6.876477 (-4.211488)	2.740020 / 2.142072 (0.597948)	0.680840 / 4.805227 (-4.124387)	0.137001 / 6.500664 (-6.363663)	0.068098 / 0.075469 (-0.007371)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.297362 / 1.841788 (-0.544426)	14.207891 / 8.074308 (6.133583)	14.087562 / 10.191392 (3.896170)	0.149514 / 0.680424 (-0.530910)	0.016566 / 0.534201 (-0.517635)	0.367602 / 0.579283 (-0.211681)	0.400692 / 0.434364 (-0.033671)	0.432907 / 0.540337 (-0.107431)	0.525924 / 1.386936 (-0.861012)

…-use_auth_token

github-actions · 2023-06-30T13:16:07Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006223 / 0.011353 (-0.005130)	0.003672 / 0.011008 (-0.007336)	0.097451 / 0.038508 (0.058943)	0.036243 / 0.023109 (0.013133)	0.375650 / 0.275898 (0.099752)	0.431652 / 0.323480 (0.108172)	0.004758 / 0.007986 (-0.003227)	0.002941 / 0.004328 (-0.001387)	0.077383 / 0.004250 (0.073132)	0.055342 / 0.037052 (0.018289)	0.390335 / 0.258489 (0.131846)	0.427867 / 0.293841 (0.134026)	0.027619 / 0.128546 (-0.100927)	0.008244 / 0.075646 (-0.067402)	0.313499 / 0.419271 (-0.105773)	0.054987 / 0.043533 (0.011454)	0.394044 / 0.255139 (0.138905)	0.398784 / 0.283200 (0.115584)	0.026499 / 0.141683 (-0.115184)	1.496907 / 1.452155 (0.044753)	1.554465 / 1.492716 (0.061749)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.241197 / 0.018006 (0.223190)	0.427856 / 0.000490 (0.427366)	0.006264 / 0.000200 (0.006065)	0.000218 / 0.000054 (0.000164)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.025550 / 0.037411 (-0.011862)	0.104426 / 0.014526 (0.089901)	0.110310 / 0.176557 (-0.066246)	0.173813 / 0.737135 (-0.563322)	0.112129 / 0.296338 (-0.184209)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.458806 / 0.215209 (0.243597)	4.576351 / 2.077655 (2.498697)	2.265670 / 1.504120 (0.761550)	2.073230 / 1.541195 (0.532035)	2.135283 / 1.468490 (0.666793)	0.562506 / 4.584777 (-4.022271)	3.375101 / 3.745712 (-0.370611)	1.734393 / 5.269862 (-3.535469)	1.026622 / 4.565676 (-3.539054)	0.068144 / 0.424275 (-0.356131)	0.011092 / 0.007607 (0.003485)	0.562779 / 0.226044 (0.336734)	5.608256 / 2.268929 (3.339328)	2.706468 / 55.444624 (-52.738157)	2.381607 / 6.876477 (-4.494869)	2.451027 / 2.142072 (0.308954)	0.671590 / 4.805227 (-4.133637)	0.135749 / 6.500664 (-6.364915)	0.065389 / 0.075469 (-0.010080)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.244806 / 1.841788 (-0.596981)	14.042150 / 8.074308 (5.967841)	14.246612 / 10.191392 (4.055220)	0.134309 / 0.680424 (-0.546114)	0.017082 / 0.534201 (-0.517119)	0.366043 / 0.579283 (-0.213240)	0.400748 / 0.434364 (-0.033616)	0.425695 / 0.540337 (-0.114643)	0.509355 / 1.386936 (-0.877581)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006134 / 0.011353 (-0.005219)	0.003980 / 0.011008 (-0.007028)	0.078353 / 0.038508 (0.039845)	0.038011 / 0.023109 (0.014902)	0.375784 / 0.275898 (0.099886)	0.433619 / 0.323480 (0.110139)	0.004897 / 0.007986 (-0.003088)	0.002981 / 0.004328 (-0.001347)	0.077362 / 0.004250 (0.073112)	0.056108 / 0.037052 (0.019056)	0.395984 / 0.258489 (0.137495)	0.427397 / 0.293841 (0.133556)	0.029325 / 0.128546 (-0.099221)	0.008498 / 0.075646 (-0.067148)	0.082478 / 0.419271 (-0.336794)	0.044085 / 0.043533 (0.000552)	0.389923 / 0.255139 (0.134784)	0.391180 / 0.283200 (0.107980)	0.022452 / 0.141683 (-0.119231)	1.507758 / 1.452155 (0.055603)	1.530459 / 1.492716 (0.037743)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.230928 / 0.018006 (0.212922)	0.408484 / 0.000490 (0.407995)	0.000806 / 0.000200 (0.000606)	0.000067 / 0.000054 (0.000012)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.025183 / 0.037411 (-0.012228)	0.102292 / 0.014526 (0.087766)	0.108142 / 0.176557 (-0.068415)	0.161172 / 0.737135 (-0.575963)	0.114476 / 0.296338 (-0.181862)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.482978 / 0.215209 (0.267769)	4.816103 / 2.077655 (2.738448)	2.505567 / 1.504120 (1.001447)	2.302598 / 1.541195 (0.761404)	2.371238 / 1.468490 (0.902748)	0.567467 / 4.584777 (-4.017310)	3.363407 / 3.745712 (-0.382306)	1.746213 / 5.269862 (-3.523649)	1.035468 / 4.565676 (-3.530208)	0.068431 / 0.424275 (-0.355844)	0.011069 / 0.007607 (0.003462)	0.598241 / 0.226044 (0.372196)	5.953927 / 2.268929 (3.684999)	3.007493 / 55.444624 (-52.437132)	2.629399 / 6.876477 (-4.247078)	2.737201 / 2.142072 (0.595129)	0.682456 / 4.805227 (-4.122771)	0.137613 / 6.500664 (-6.363051)	0.067941 / 0.075469 (-0.007528)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.306015 / 1.841788 (-0.535772)	14.359240 / 8.074308 (6.284932)	14.187601 / 10.191392 (3.996209)	0.138612 / 0.680424 (-0.541812)	0.016708 / 0.534201 (-0.517493)	0.366365 / 0.579283 (-0.212918)	0.396982 / 0.434364 (-0.037382)	0.426939 / 0.540337 (-0.113398)	0.520064 / 1.386936 (-0.866872)

lhoestq

Cool ! AFAIK transformers and use_auth_token still uses use_auth_token for models no ? cc @LysandreJik

There is also the option of not having a deprecation message before we update all the scripts in transformers

mariosasko · 2023-06-30T14:08:49Z

They use token and emit a deprecation warning if use_auth_token is passed instead (see https://github.com/huggingface/transformers/blob/78a2b19fc84ed55c65f4bf20a901edb7ceb73c5f/src/transformers/modeling_utils.py#L1933).

I think we can update the examples scripts after merging this PR.

lhoestq · 2023-06-30T14:15:04Z

I think we can update the examples scripts after merging this PR.

We should do a release before updated in the examples scripts no ? That's why it's an option to not have a deprecation warning until transformers and co are updated with the token arg

mariosasko · 2023-06-30T14:37:33Z

We should do a release before updated in the examples scripts no ? That's why it's an option to not have a deprecation warning until transformers and co are updated with the token arg

This would avoid the warning only for the latest datasets release. TBH, I don't think this is worth the hassle, considering how simple it is to remove it.

lhoestq

Looks all good then, let us know @LysandreJik if it sounds good to you

github-actions · 2023-07-03T16:12:14Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.007644 / 0.011353 (-0.003709)	0.004667 / 0.011008 (-0.006341)	0.117347 / 0.038508 (0.078839)	0.050620 / 0.023109 (0.027510)	0.415402 / 0.275898 (0.139504)	0.485898 / 0.323480 (0.162418)	0.005848 / 0.007986 (-0.002138)	0.003736 / 0.004328 (-0.000592)	0.089798 / 0.004250 (0.085547)	0.069344 / 0.037052 (0.032292)	0.441684 / 0.258489 (0.183195)	0.468972 / 0.293841 (0.175131)	0.036637 / 0.128546 (-0.091909)	0.010219 / 0.075646 (-0.065427)	0.394293 / 0.419271 (-0.024978)	0.061462 / 0.043533 (0.017929)	0.409448 / 0.255139 (0.154309)	0.431557 / 0.283200 (0.148358)	0.027795 / 0.141683 (-0.113888)	1.837844 / 1.452155 (0.385690)	1.862683 / 1.492716 (0.369967)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.230500 / 0.018006 (0.212494)	0.483139 / 0.000490 (0.482649)	0.006517 / 0.000200 (0.006317)	0.000143 / 0.000054 (0.000088)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.033152 / 0.037411 (-0.004259)	0.133673 / 0.014526 (0.119147)	0.143853 / 0.176557 (-0.032704)	0.215254 / 0.737135 (-0.521882)	0.150676 / 0.296338 (-0.145662)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.503796 / 0.215209 (0.288587)	5.049981 / 2.077655 (2.972326)	2.399427 / 1.504120 (0.895307)	2.167635 / 1.541195 (0.626441)	2.257448 / 1.468490 (0.788958)	0.641298 / 4.584777 (-3.943479)	4.828676 / 3.745712 (1.082964)	4.346069 / 5.269862 (-0.923793)	2.103890 / 4.565676 (-2.461786)	0.079115 / 0.424275 (-0.345160)	0.013377 / 0.007607 (0.005770)	0.621207 / 0.226044 (0.395162)	6.190939 / 2.268929 (3.922011)	2.920129 / 55.444624 (-52.524495)	2.549225 / 6.876477 (-4.327252)	2.719221 / 2.142072 (0.577149)	0.790949 / 4.805227 (-4.014278)	0.172032 / 6.500664 (-6.328632)	0.077779 / 0.075469 (0.002310)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.432572 / 1.841788 (-0.409216)	21.000031 / 8.074308 (12.925723)	17.555093 / 10.191392 (7.363701)	0.166646 / 0.680424 (-0.513778)	0.020451 / 0.534201 (-0.513750)	0.488767 / 0.579283 (-0.090516)	0.737036 / 0.434364 (0.302672)	0.621694 / 0.540337 (0.081356)	0.732074 / 1.386936 (-0.654862)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.008198 / 0.011353 (-0.003155)	0.004987 / 0.011008 (-0.006021)	0.090714 / 0.038508 (0.052206)	0.053379 / 0.023109 (0.030270)	0.425199 / 0.275898 (0.149301)	0.514036 / 0.323480 (0.190556)	0.006043 / 0.007986 (-0.001943)	0.003888 / 0.004328 (-0.000441)	0.088294 / 0.004250 (0.084043)	0.073024 / 0.037052 (0.035971)	0.435983 / 0.258489 (0.177494)	0.514293 / 0.293841 (0.220452)	0.039451 / 0.128546 (-0.089095)	0.010439 / 0.075646 (-0.065207)	0.096885 / 0.419271 (-0.322387)	0.060165 / 0.043533 (0.016632)	0.421053 / 0.255139 (0.165914)	0.455545 / 0.283200 (0.172345)	0.027234 / 0.141683 (-0.114449)	1.768975 / 1.452155 (0.316820)	1.842853 / 1.492716 (0.350137)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.278940 / 0.018006 (0.260933)	0.480709 / 0.000490 (0.480219)	0.000436 / 0.000200 (0.000236)	0.000070 / 0.000054 (0.000016)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.034900 / 0.037411 (-0.002511)	0.144893 / 0.014526 (0.130368)	0.149567 / 0.176557 (-0.026989)	0.213200 / 0.737135 (-0.523935)	0.156735 / 0.296338 (-0.139604)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.535897 / 0.215209 (0.320687)	5.336998 / 2.077655 (3.259343)	2.685854 / 1.504120 (1.181734)	2.470177 / 1.541195 (0.928983)	2.547495 / 1.468490 (1.079004)	0.642830 / 4.584777 (-3.941947)	4.595866 / 3.745712 (0.850154)	2.186696 / 5.269862 (-3.083165)	1.317969 / 4.565676 (-3.247708)	0.079268 / 0.424275 (-0.345007)	0.013792 / 0.007607 (0.006185)	0.662236 / 0.226044 (0.436192)	6.604775 / 2.268929 (4.335847)	3.355888 / 55.444624 (-52.088736)	2.968911 / 6.876477 (-3.907565)	3.121862 / 2.142072 (0.979790)	0.794752 / 4.805227 (-4.010475)	0.170800 / 6.500664 (-6.329864)	0.078393 / 0.075469 (0.002924)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.601605 / 1.841788 (-0.240183)	20.743553 / 8.074308 (12.669245)	17.543968 / 10.191392 (7.352576)	0.221884 / 0.680424 (-0.458540)	0.020779 / 0.534201 (-0.513422)	0.479677 / 0.579283 (-0.099606)	0.516207 / 0.434364 (0.081843)	0.564046 / 0.540337 (0.023709)	0.711336 / 1.386936 (-0.675600)

LysandreJik · 2023-07-05T14:25:05Z

Yes, sounds great! Thanks

julien-c

note that anyways like transformers datasets already picks up the user's token automaticaly no?

lhoestq · 2023-07-05T15:22:19Z

yup

Deprecate use_auth_token in favor of token

1ec069f

Merge branch 'main' of github.com:huggingface/datasets into deprecate…

21d0fd0

…-use_auth_token

mariosasko marked this pull request as ready for review June 30, 2023 13:26

mariosasko requested a review from lhoestq June 30, 2023 13:26

lhoestq reviewed Jun 30, 2023

View reviewed changes

lhoestq approved these changes Jun 30, 2023

View reviewed changes

mariosasko merged commit 819bb43 into main Jul 3, 2023

mariosasko deleted the deprecate-use_auth_token branch July 3, 2023 16:03

julien-c reviewed Jul 5, 2023

View reviewed changes

This was referenced Jul 28, 2023

Fix deprecation of use_auth_token in DownloadConfig #6094

Merged

Fix deprecation of use_auth_token in file_utils #6107

Merged

albertvillanova mentioned this pull request Sep 11, 2024

Evaluate uses deprecated use_auth_token and will break with datasets-3.0 huggingface/evaluate#620

Closed

Deprecate use_auth_token in favor of token #5996

Deprecate use_auth_token in favor of token #5996

Uh oh!

Conversation

mariosasko commented Jun 28, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Jun 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jun 28, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Uh oh!

github-actions bot commented Jun 30, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

mariosasko commented Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhoestq commented Jun 30, 2023

Uh oh!

mariosasko commented Jun 30, 2023

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 3, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Uh oh!

LysandreJik commented Jul 5, 2023

Uh oh!

julien-c left a comment

Choose a reason for hiding this comment

Uh oh!

lhoestq commented Jul 5, 2023

Uh oh!

Uh oh!

Deprecate `use_auth_token` in favor of `token` #5996

Deprecate `use_auth_token` in favor of `token` #5996

HuggingFaceDocBuilderDev commented Jun 28, 2023 •

edited

Loading

mariosasko commented Jun 30, 2023 •

edited

Loading