Skip to content

Conversation

collijk
Copy link
Contributor

@collijk collijk commented Aug 21, 2018

Fix #751.

Anytime unique args were treated as kwargs, joblib was failing. For instance

from joblib import Memory

m = Memory(location='joblib_cache')

@m.cache
def plus_one(a):
    return a + 1

>>> plus_one(1)
[Memory] Calling __main__--home-james-<stdin>.plus_one...
plus_one(1)
_________________________________________________________plus_one - 1.3s, 0.0min
2
>>>plus_one(a=1)
2
>>>plus_one(a=2)
[Memory] Calling __main__--home-james-<stdin>.plus_one...
plus_one(a=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/james/miniconda3/envs/vivarium/lib/python3.6/site-packages/joblib/memory.py", line 511, in __call__
    return self._cached_call(args, kwargs)[0]
  File "/home/james/miniconda3/envs/vivarium/lib/python3.6/site-packages/joblib/memory.py", line 455, in _cached_call
    out, metadata = self.call(*args, **kwargs)
  File "/home/james/miniconda3/envs/vivarium/lib/python3.6/site-packages/joblib/memory.py", line 675, in call
    func_id, args_id = self._get_output_identifiers(*args, **kwargs)
  File "/home/james/miniconda3/envs/vivarium/lib/python3.6/site-packages/joblib/memory.py", line 717, in _persist_input
    func_id, args_id = self._get_output_identifiers(*args, **kwargs)
  File "/home/james/miniconda3/envs/vivarium/lib/python3.6/site-packages/joblib/memory.py", line 532, in _get_output_identifiers
    argument_hash = self._get_argument_hash(*args, **kwargs)
  File "/home/james/miniconda3/envs/vivarium/lib/python3.6/site-packages/joblib/memory.py", line 526, in _get_argument_hash
    return hashing.hash(filter_args(self.func, self.ignore, args, kwargs),
  File "/home/james/miniconda3/envs/vivarium/lib/python3.6/site-packages/joblib/func_inspect.py", line 288, in filter_args
    _function_called_str(name, args, kwargs))
ValueError: Wrong number of arguments for plus_one(a):
     plus_one(, ) was called.
Wrong number of arguments for plus_one(a):
     plus_one(, ) was called.

This fixes the error by not consuming kwargs during func_inspect.filter_args.

@codecov
Copy link

codecov bot commented Aug 21, 2018

Codecov Report

Merging #750 into master will increase coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #750     +/-   ##
=========================================
+ Coverage   95.25%   95.36%   +0.1%     
=========================================
  Files          43       43             
  Lines        6197     6208     +11     
=========================================
+ Hits         5903     5920     +17     
+ Misses        294      288      -6
Impacted Files Coverage Δ
joblib/func_inspect.py 95.45% <100%> (+0.56%) ⬆️
joblib/test/test_memory.py 98.08% <100%> (+0.02%) ⬆️
joblib/test/test_func_inspect.py 91.86% <100%> (+0.27%) ⬆️
joblib/_parallel_backends.py 96% <0%> (-1.21%) ⬇️
joblib/test/test_numpy_pickle.py 98.35% <0%> (+0.65%) ⬆️
joblib/test/test_dask.py 98.11% <0%> (+2.51%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c492cf3...61f407f. Read the comment docs.

@lesteve
Copy link
Member

lesteve commented Aug 21, 2018

Good point, I ran a git bisect and it is another regression in 0.12 following the Memory store backend refactoring in #397. Would you be kind enough to add a test in joblib/test/test_memory.py and maybe one in joblib/test/test_func_inspect.py (to make sure that filter_args does not have a side effect on its kwargs parameter?

I have to say I don't really understand why filter_args was modifying its kwargs parameter but this may require a closer look at the changes that went into #397.

@collijk
Copy link
Contributor Author

collijk commented Aug 21, 2018

Yep. I'll take a look later today at the tests.

@collijk
Copy link
Contributor Author

collijk commented Aug 23, 2018

Now with regression tests against the end-to-end Memory usage and one particular use case of filter_args.

pytest output before fix:

======================================================================================= test session starts ========================================================================================
platform linux -- Python 3.6.6, pytest-3.7.2, py-1.5.4, pluggy-0.7.1
rootdir: /home/james/code/external_tools/joblib, inifile: setup.cfg
collected 1070 items / 2 skipped                                                                                                                                                                   

joblib/__init__.py s                                                                                                                                                                         [  0%]
joblib/parallel.py ss                                                                                                                                                                        [  0%]
joblib/test/test_backports.py s....                                                                                                                                                          [  0%]
joblib/test/test_disk.py .........                                                                                                                                                           [  1%]
joblib/test/test_format_stack.py s...                                                                                                                                                        [  1%]
joblib/test/test_func_inspect.py s.............................F......                                                                                                                       [  5%]
joblib/test/test_hashing.py sssssssssss..................................................................................................................................................... [ 20%]
............................................................................................................................................................................................ [ 37%]
....................................................................................................................................................ss.............                          [ 53%]
joblib/test/test_logger.py .                                                                                                                                                                 [ 53%]
joblib/test/test_memmapping.py ssssssssssssss..                                                                                                                                              [ 54%]
joblib/test/test_memory.py ss............F............................                                                                                                                       [ 58%]
joblib/test/test_my_exceptions.py ...                                                                                                                                                        [ 59%]
joblib/test/test_numpy_pickle.py sssssssssssssssssssss................................................................................s.                                                     [ 68%]
joblib/test/test_numpy_pickle_compat.py .                                                                                                                                                    [ 68%]
joblib/test/test_numpy_pickle_utils.py ..                                                                                                                                                    [ 68%]
joblib/test/test_parallel.py sssss.......................................................................................................................................................... [ 83%]
..................................................................................................................................................................ss                         [ 99%]
joblib/test/test_store_backends.py ...                                                                                                                                                       [ 99%]
joblib/test/test_testing.py .....                                                                                                                                                            [ 99%]
joblib/test/data/create_numpy_pickle.py s                                                                                                                                                    [100%]

============================================================================================= FAILURES =============================================================================================
_______________________________________________________________________________ test_filter_args_kwargs_consumption ________________________________________________________________________________

    def test_filter_args_kwargs_consumption():
        """Regression test against 0.12.0 changes.
    
        Make sure filter args doesn't consume the kwargs dict that gets passed to it.
        """
        kwargs = {'x': 0}
        filter_args(g, [], [], kwargs)
>       assert kwargs == {'x': 0}
E       AssertionError: assert {} == {'x': 0}
E         Right contains more items:
E         {'x': 0}
E         Use -v to get the full diff

joblib/test/test_func_inspect.py:233: AssertionError
____________________________________________________________________________________ test_memory_args_as_kwargs ____________________________________________________________________________________

func = <function test_memory_args_as_kwargs.<locals>.plus_one at 0x7f0d9fdd67b8>, ignore_lst = [], args = [], kwargs = {}

    def filter_args(func, ignore_lst, args=(), kwargs=dict()):
        """ Filters the given args and kwargs using a list of arguments to
            ignore, and a function specification.
    
            Parameters
            ----------
            func: callable
                Function giving the argument specification
            ignore_lst: list of strings
                List of arguments to ignore (either a name of an argument
                in the function spec, or '*', or '**')
            *args: list
                Positional arguments passed to the function.
            **kwargs: dict
                Keyword arguments passed to the function
    
            Returns
            -------
            filtered_args: list
                List of filtered positional and keyword arguments.
        """
        args = list(args)
        if isinstance(ignore_lst, _basestring):
            # Catch a common mistake
            raise ValueError(
                'ignore_lst must be a list of parameters to ignore '
                '%s (type %s) was given' % (ignore_lst, type(ignore_lst)))
        # Special case for functools.partial objects
        if (not inspect.ismethod(func) and not inspect.isfunction(func)):
            if ignore_lst:
                warnings.warn('Cannot inspect object %s, ignore list will '
                              'not work.' % func, stacklevel=2)
            return {'*': args, '**': kwargs}
        arg_spec = getfullargspec(func)
        arg_names = arg_spec.args + arg_spec.kwonlyargs
        arg_defaults = arg_spec.defaults or ()
        if arg_spec.kwonlydefaults:
            arg_defaults = arg_defaults + tuple(arg_spec.kwonlydefaults[k]
                                                for k in arg_spec.kwonlyargs
                                                if k in arg_spec.kwonlydefaults)
        arg_varargs = arg_spec.varargs
        arg_varkw = arg_spec.varkw
    
        if inspect.ismethod(func):
            # First argument is 'self', it has been removed by Python
            # we need to add it back:
            args = [func.__self__, ] + args
        # XXX: Maybe I need an inspect.isbuiltin to detect C-level methods, such
        # as on ndarrays.
    
        _, name = get_func_name(func, resolv_alias=False)
        arg_dict = dict()
        arg_position = -1
        for arg_position, arg_name in enumerate(arg_names):
            if arg_position < len(args):
                # Positional argument or keyword argument given as positional
                if arg_name not in arg_spec.kwonlyargs:
                    arg_dict[arg_name] = args[arg_position]
                else:
                    raise ValueError(
                        "Keyword-only parameter '%s' was passed as "
                        'positional parameter for %s:\n'
                        '     %s was called.'
                        % (arg_name,
                           _signature_str(name, arg_spec),
                           _function_called_str(name, args, kwargs))
                    )
    
            else:
                position = arg_position - len(arg_names)
                if arg_name in kwargs:
                    arg_dict[arg_name] = kwargs.pop(arg_name)
                else:
                    try:
>                       arg_dict[arg_name] = arg_defaults[position]
E                       IndexError: tuple index out of range

joblib/func_inspect.py:281: IndexError

During handling of the above exception, another exception occurred:

tmpdir = local('/tmp/pytest-of-james/pytest-13/test_memory_args_as_kwargs0')

    def test_memory_args_as_kwargs(tmpdir):
        """Regression test against 0.12.0 changes."""
        memory = Memory(location=tmpdir.strpath, verbose=0)
    
        @memory.cache
        def plus_one(a):
            return a + 1
    
        # This would (and should) pass before the patch.
        assert plus_one(1) == 2
        assert plus_one(a=1) == 2
    
        # However, a positional argument that joblib hadn't seen
        # before would cause a failure if it was passed as a kwarg
>       assert plus_one(a=2) == 3

joblib/test/test_memory.py:362: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
joblib/memory.py:514: in __call__
    return self._cached_call(args, kwargs)[0]
joblib/memory.py:458: in _cached_call
    out, metadata = self.call(*args, **kwargs)
joblib/memory.py:685: in call
    metadata = self._persist_input(duration, args, kwargs)
joblib/memory.py:719: in _persist_input
    func_id, args_id = self._get_output_identifiers(*args, **kwargs)
joblib/memory.py:535: in _get_output_identifiers
    argument_hash = self._get_argument_hash(*args, **kwargs)
joblib/memory.py:529: in _get_argument_hash
    return hashing.hash(filter_args(self.func, self.ignore, args, kwargs),
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

func = <function test_memory_args_as_kwargs.<locals>.plus_one at 0x7f0d9fdd67b8>, ignore_lst = [], args = [], kwargs = {}

    def filter_args(func, ignore_lst, args=(), kwargs=dict()):
        """ Filters the given args and kwargs using a list of arguments to
            ignore, and a function specification.
    
            Parameters
            ----------
            func: callable
                Function giving the argument specification
            ignore_lst: list of strings
                List of arguments to ignore (either a name of an argument
                in the function spec, or '*', or '**')
            *args: list
                Positional arguments passed to the function.
            **kwargs: dict
                Keyword arguments passed to the function
    
            Returns
            -------
            filtered_args: list
                List of filtered positional and keyword arguments.
        """
        args = list(args)
        if isinstance(ignore_lst, _basestring):
            # Catch a common mistake
            raise ValueError(
                'ignore_lst must be a list of parameters to ignore '
                '%s (type %s) was given' % (ignore_lst, type(ignore_lst)))
        # Special case for functools.partial objects
        if (not inspect.ismethod(func) and not inspect.isfunction(func)):
            if ignore_lst:
                warnings.warn('Cannot inspect object %s, ignore list will '
                              'not work.' % func, stacklevel=2)
            return {'*': args, '**': kwargs}
        arg_spec = getfullargspec(func)
        arg_names = arg_spec.args + arg_spec.kwonlyargs
        arg_defaults = arg_spec.defaults or ()
        if arg_spec.kwonlydefaults:
            arg_defaults = arg_defaults + tuple(arg_spec.kwonlydefaults[k]
                                                for k in arg_spec.kwonlyargs
                                                if k in arg_spec.kwonlydefaults)
        arg_varargs = arg_spec.varargs
        arg_varkw = arg_spec.varkw
    
        if inspect.ismethod(func):
            # First argument is 'self', it has been removed by Python
            # we need to add it back:
            args = [func.__self__, ] + args
        # XXX: Maybe I need an inspect.isbuiltin to detect C-level methods, such
        # as on ndarrays.
    
        _, name = get_func_name(func, resolv_alias=False)
        arg_dict = dict()
        arg_position = -1
        for arg_position, arg_name in enumerate(arg_names):
            if arg_position < len(args):
                # Positional argument or keyword argument given as positional
                if arg_name not in arg_spec.kwonlyargs:
                    arg_dict[arg_name] = args[arg_position]
                else:
                    raise ValueError(
                        "Keyword-only parameter '%s' was passed as "
                        'positional parameter for %s:\n'
                        '     %s was called.'
                        % (arg_name,
                           _signature_str(name, arg_spec),
                           _function_called_str(name, args, kwargs))
                    )
    
            else:
                position = arg_position - len(arg_names)
                if arg_name in kwargs:
                    arg_dict[arg_name] = kwargs.pop(arg_name)
                else:
                    try:
                        arg_dict[arg_name] = arg_defaults[position]
                    except (IndexError, KeyError):
                        # Missing argument
                        raise ValueError(
                            'Wrong number of arguments for %s:\n'
                            '     %s was called.'
                            % (_signature_str(name, arg_spec),
>                              _function_called_str(name, args, kwargs))
                        )
E                       ValueError: Wrong number of arguments for plus_one(a):
E                            plus_one(, ) was called.

joblib/func_inspect.py:288: ValueError
======================================================================== 2 failed, 1003 passed, 67 skipped in 54.76 seconds ========================================================================
[INFO:MainProcess:MainThread] process shutting down
[DEBUG:MainProcess:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up queue_manager_threads [(<Thread(QueueManagerThread, started daemon 139695514167040)>, <joblib.externals.loky.process_executor._ThreadWakeup object at 0x7f0d9cf0c6a0>)]
[DEBUG:MainProcess:QueueManagerThread] queue management thread shutting down
[DEBUG:MainProcess:QueueManagerThread] closing call_queue
[DEBUG:MainProcess:QueueManagerThread] telling queue thread to quit
[DEBUG:MainProcess:QueueManagerThread] joining processes
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG:MainProcess:QueueManagerThread] queue management thread clean shutdown of worker processes: []
[DEBUG:MainProcess:MainThread] running the remaining "atexit" finalizers

pytest output after fix:

======================================================================================= test session starts ========================================================================================
platform linux -- Python 3.6.6, pytest-3.7.2, py-1.5.4, pluggy-0.7.1
rootdir: /home/james/code/external_tools/joblib, inifile: setup.cfg
collected 1070 items / 2 skipped                                                                                                                                                                   

joblib/__init__.py s                                                                                                                                                                         [  0%]
joblib/parallel.py ss                                                                                                                                                                        [  0%]
joblib/test/test_backports.py s....                                                                                                                                                          [  0%]
joblib/test/test_disk.py .........                                                                                                                                                           [  1%]
joblib/test/test_format_stack.py s...                                                                                                                                                        [  1%]
joblib/test/test_func_inspect.py s....................................                                                                                                                       [  5%]
joblib/test/test_hashing.py sssssssssss..................................................................................................................................................... [ 20%]
............................................................................................................................................................................................ [ 37%]
....................................................................................................................................................ss.............                          [ 53%]
joblib/test/test_logger.py .                                                                                                                                                                 [ 53%]
joblib/test/test_memmapping.py ssssssssssssss..                                                                                                                                              [ 54%]
joblib/test/test_memory.py ss.........................................                                                                                                                       [ 58%]
joblib/test/test_my_exceptions.py ...                                                                                                                                                        [ 59%]
joblib/test/test_numpy_pickle.py sssssssssssssssssssss................................................................................s.                                                     [ 68%]
joblib/test/test_numpy_pickle_compat.py .                                                                                                                                                    [ 68%]
joblib/test/test_numpy_pickle_utils.py ..                                                                                                                                                    [ 68%]
joblib/test/test_parallel.py sssss.......................................................................................................................................................... [ 83%]
..................................................................................................................................................................ss                         [ 99%]
joblib/test/test_store_backends.py ...                                                                                                                                                       [ 99%]
joblib/test/test_testing.py .....                                                                                                                                                            [ 99%]
joblib/test/data/create_numpy_pickle.py s                                                                                                                                                    [100%]

============================================================================= 1005 passed, 67 skipped in 52.61 seconds =============================================================================
[INFO:MainProcess:MainThread] process shutting down
[DEBUG:MainProcess:MainThread] running all "atexit" finalizers with priority >= 0
[DEBUG:MainProcess:MainThread] Interpreter shutting down. Waking up queue_manager_threads [(<Thread(QueueManagerThread, started daemon 139697284175616)>, <joblib.externals.loky.process_executor._ThreadWakeup object at 0x7f0df8e796d8>)]
[DEBUG:MainProcess:QueueManagerThread] queue management thread shutting down
[DEBUG:MainProcess:QueueManagerThread] closing call_queue
[DEBUG:MainProcess:QueueManagerThread] telling queue thread to quit
[DEBUG:MainProcess:QueueManagerThread] joining processes
[DEBUG:MainProcess:QueueFeederThread] feeder thread got sentinel -- exiting
[DEBUG:MainProcess:QueueManagerThread] queue management thread clean shutdown of worker processes: []
[DEBUG:MainProcess:MainThread] running the remaining "atexit" finalizers

@collijk
Copy link
Contributor Author

collijk commented Aug 23, 2018

I'd have written some more extensive tests, but filter_args is a pretty hairy and hard to test function.

Copy link
Contributor

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the non-regression tests. I have reworded them a bit and added an entry to the changelog. I will merge as soon as CI is green.

@ogrisel ogrisel merged commit 370dca7 into joblib:master Aug 23, 2018
yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Sep 4, 2018
* tag '0.12.3': (23 commits)
  Release 0.12.3
  Loky 2.2.1 (joblib#760)
  FIX: FileSystemStoreBackend string representation only returning location (path) (joblib#765)
  Add optional dependency on psutil
  MAINT remove brittle time based assertion in test (joblib#761)
  Fix a bug in nesting level computation with FallbackToBackend(SequentialBackend()) (joblib#759)
  Make docstring more consistent with project style
  Improved performance of call_and_shelve (joblib#757)
  Better test name
  Fix default context handling (joblib#754)
  cloudpickle 0.5.5 (joblib#756)
  Fixed error where filter args was consuming kwargs (joblib#750)
  FIX pickle roundtrip for Memory and related classes. (joblib#746)
  test that passes but timeout appveyor because of unclosed semaphore tracker (joblib#676)
  DOC: fine tune compressor example dataset size for ReadTheDocs (joblib#753)
  FIX: MemorizedResult not picklable (joblib#752)
  [MRG] Better message with py27 when lz4 is not installed (joblib#740)
  MNT remove mutable default value for backend_options parameter (joblib#748)
  MNT create test file for _store_backends module (joblib#749)
  DOC consistently use memory rather than mem in memory.rst (joblib#744)
  ...
yarikoptic added a commit to yarikoptic/joblib that referenced this pull request Sep 4, 2018
* releases: (23 commits)
  Release 0.12.3
  Loky 2.2.1 (joblib#760)
  FIX: FileSystemStoreBackend string representation only returning location (path) (joblib#765)
  Add optional dependency on psutil
  MAINT remove brittle time based assertion in test (joblib#761)
  Fix a bug in nesting level computation with FallbackToBackend(SequentialBackend()) (joblib#759)
  Make docstring more consistent with project style
  Improved performance of call_and_shelve (joblib#757)
  Better test name
  Fix default context handling (joblib#754)
  cloudpickle 0.5.5 (joblib#756)
  Fixed error where filter args was consuming kwargs (joblib#750)
  FIX pickle roundtrip for Memory and related classes. (joblib#746)
  test that passes but timeout appveyor because of unclosed semaphore tracker (joblib#676)
  DOC: fine tune compressor example dataset size for ReadTheDocs (joblib#753)
  FIX: MemorizedResult not picklable (joblib#752)
  [MRG] Better message with py27 when lz4 is not installed (joblib#740)
  MNT remove mutable default value for backend_options parameter (joblib#748)
  MNT create test file for _store_backends module (joblib#749)
  DOC consistently use memory rather than mem in memory.rst (joblib#744)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Caching breaks when new (uncached) args are treated as kwargs.
3 participants