Skip to content

Conversation

rabidaudio
Copy link
Contributor

@rabidaudio rabidaudio commented Jul 13, 2022

Hello folks!

I really wanted to set up Dagster with a test command that runs pytest on the project:

utilities:
  - name: dagster
    namespace: dagster
    pip_url: -e orchestrate
    executable: dagster
    settings:
    - name: home
      env: DAGSTER_HOME
    config:
      home: $MELTANO_PROJECT_ROOT/orchestrate
    commands:
      scheduler:
        args: run -w $DAGSTER_HOME/workspace.yaml
        executable: dagster-daemon
      ui:
        args: -w $DAGSTER_HOME/workspace.yaml -h 0.0.0.0
        executable: dagit
      test:
        args: --rootdir=$DAGSTER_HOME
        executable: pytest

However, despite setting --rootdir in the pytest args, pytest insisted on picking up tests and conftest.py files from elsewhere in our project, like transforms/dbt_modules.

I figured it made sense to allow plugins and commands to define a working directory to be invoked in, instead of the current working directory:

utilities:
  - name: dagster
    namespace: dagster
    pip_url: -e orchestrate
    executable: dagster
    cwd: orchestrate # also supports variable expansion, e.g. $MELTANO_PROJECT_ROOT/orchestrate although that's redundant
    commands:
      test:
        executable: pytest
        cwd: orchestrate/tests # commands can override the plugin config

This simplified DbtPlugin, as it no longer needs a custom PluginInvoker, it just needs to set the cwd to the project directory. This could be further simplified if we defined it in the discovery configuration, but to avoid backward compatibility issues I explicitly implemented the method.

This could use some docs, but wanted to get y'alls thoughts first.

@netlify
Copy link

netlify bot commented Jul 13, 2022

👷 Deploy request for meltano pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 3c5066e

@rabidaudio rabidaudio changed the title Allow setting the working directory for plugin invocation feat: Allow setting the working directory for plugin invocation Jul 13, 2022
@pandemicsyn
Copy link
Contributor

👍 I like the idea, a cwd option for plugins makes a lot of sense. Especially in the context of enabling a robust external plugin architecture (e.g. #6130 which leads to #6434 , #6398). Feels like this could end up being pretty useful.

@aaronsteers curious how you feel about this ?

root = alternate_cwd_plugin_invoker.project.root
assert alternate_cwd_plugin_invoker.cwd() == root / "path"
assert alternate_cwd_plugin_invoker.cwd(command="relative") == root / "path2"
assert alternate_cwd_plugin_invoker.cwd(command="absolute") == Path("/root")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up - we run tests on windows now as well, which are currently failing:

https://github.com/meltano/meltano/runs/7313447720?check_suite_focus=true#step:9:648

_________________________ TestPluginInvoker.test_cwd __________________________
self = <test_plugin_invoker.TestPluginInvoker object at 0x000001DAA9868310>
alternate_cwd_plugin_invoker = <meltano.core.plugin_invoker.PluginInvoker object at 0x000001DAADEFCBB0>
    def test_cwd(self, alternate_cwd_plugin_invoker):
        root = alternate_cwd_plugin_invoker.project.root
        assert alternate_cwd_plugin_invoker.cwd() == root / "path"
        assert alternate_cwd_plugin_invoker.cwd(command="relative") == root / "path2"
>       assert alternate_cwd_plugin_invoker.cwd(command="absolute") == Path("/root")
E       AssertionError: assert WindowsPath('C:/root') == WindowsPath('/root')
E        +  where WindowsPath('C:/root') = <bound method PluginInvoker.cwd of <meltano.core.plugin_invoker.PluginInvoker object at 0x000001DAADEFCBB0>>(command='absolute')
E        +    where <bound method PluginInvoker.cwd of <meltano.core.plugin_invoker.PluginInvoker object at 0x000001DAADEFCBB0>> = <meltano.core.plugin_invoker.PluginInvoker object at 0x000001DAADEFCBB0>.cwd
E        +  and   WindowsPath('/root') = Path('/root')
D:\a\meltano\meltano\tests\meltano\core\test_plugin_invoker.py:184: AssertionError

@aaronsteers
Copy link
Contributor

👍 I like the idea, a cwd option for plugins makes a lot of sense. Especially in the context of enabling a robust external plugin architecture (e.g. #6130 which leads to #6434 , #6398). Feels like this could end up being pretty useful.

@aaronsteers curious how you feel about this ?

@rabidaudio - Love this direction and I agree the feature can be very helpful. I may want to hold while we settle the spec here but hopefully not leaving you in a holding pattern for long.

A few thoughts come to mind:

  1. cwd is not an obvious acronym for non-*nix folks out there. I wonder if something like workdir: __ would be more readable for a broader audience. Inspired by Docker's naming of a similar concept: https://www.educative.io/answers/what-is-the-workdir-command-in-docker
  2. dbt plugins are doing something similar today with the project_dir config option, which I think at least in theory could be replaced by this new feature. Thoughts? Example here: https://github.com/meltano/hub/blob/main/_data/meltano/transformers/dbt-snowflake/dbt-labs.yml#L17-L19
  3. We should plan ahead for users to override this and make sure we implement in a way that's as easy as possible to debug and not feel too magical. Like pip_url, we may want to this to be one of the settings that auto-registers itself into meltano.yml during the add process, and is therefor easy and intuitive for users to override if/when they start moving things around.

@pandemicsyn, @rabidaudio, @tayloramurphy, @pandemicsyn - Penny for your thoughts. Thanks! 😄

Comment on lines 19 to 24
def cwd(
self, command: Optional[str] = None, env: Optional[dict] = None
) -> Optional[Path]:
return (
super().cwd(command=command, env=env) or self.plugin_config["project_dir"]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superceding my prior comment about cwd replacing the project_dir setting of dbt, I see here that yes, absolutely this appears to replace that functionality - and I like that we're doing it in a generic way for all plugins, which would not require special per-plugin logic internally within Meltano.

Also - Thanks for keeping this backwards compatible. As much as I like the generic approach, we also don't want this to break existing dbt capabilities/behaviors.

@rabidaudio
Copy link
Contributor Author

cwd is not an obvious acronym for non-*nix folks out there. I wonder if something like workdir: __ would be more readable for a broader audience.

I agree, I think I like workdir better 👍

@tayloramurphy
Copy link
Collaborator

tayloramurphy commented Jul 19, 2022

@aaronsteers @rabidaudio I like this proposal and excited about the PR. Responding to AJ's points:

  1. Agreed, workdir is better than cwd.
  2. this makes me think about Python-based Plugin Architecture #6130 as well - seems like the plugin specific behavior should be there and not in Meltano
  3. Agreed also that users should be able to override.

I don't agree that it should auto-register itself in meltano.yml during the add process. Likely the default behavior should stay the same but we should document in multiple places about the availability of this plugin-level property.

Side note, our documentation on plugin properties versus config versus settings versus extras is a bit confusing. The addition of a workdir property can actually be thought of as a plugin extra that happens to apply to all plugins (aka a property!). While testing this, I also realized it's not possible to know what environment variables Meltano will be injecting from the plugin-level based on the property names https://docs.meltano.com/guide/configuration#available-environment-variables I'll make an issue around this. Edit: Issue made #6450

All that said, I agree with the spec that users should be able to set workdir at the plugin level and override it at the command level. This value should also be available to users at MELTANO_<PLUGIN_TYPE>_WORKDIR based on the documentation in https://docs.meltano.com/guide/configuration#available-environment-variables.

Question: Would env var expansion in the commands work as well such that:

utilities:
  - name: dagster
    namespace: dagster
    pip_url: -e orchestrate
    executable: dagster
    workdir: orchestrate # also supports variable expansion, e.g. $MELTANO_PROJECT_ROOT/orchestrate although that's redundant
    commands:
      test:
        executable: pytest
        workdir: $MELTANO_UTILITY_WORKDIR/tests

would evaluate to $MELTANO_PROJECT_ROOT/orchestrate/tests for the test command?

Perhaps that's a separate, larger issue on if and how commands are dealing with environment variables.

@aaronsteers
Copy link
Contributor

aaronsteers commented Jul 19, 2022

@tayloramurphy - Thanks! Yeah, sounds like we are agreed on most points. A couple thoughts on the points you raised:

this makes me think about #6130 as well - seems like the plugin specific behavior should be there and not in Meltano

Agreed. 👍 From the diff here, I think we are moving in the right direction, which will helps us towards #6130 because it removes/de-emphasizes the custom dbt behavior in favor of this more generic approach.

I don't agree that it should auto-register itself in meltano.yml during the add process.

To confirm, all I meant here is that if the remote plugin definition (on the hub, for instance) is defining a custom workdir, we should try to make that custom workdir as transparent and easy to override as possible for an end user, so they are not confused when move a directory breaks the functionality. If the plugin does override its workdir then I think it is a good idea to bring that into the meltano.yml file so they can see+modify what was provided to them by the plugin developer. (They otherwise would find this info by browsing the respective lock file.)

For plugins which do not require a custom workdir by default, there's no reason to bring that into the meltano.yml file by default. Also, in a first iteration, I'd be fine with either behavior. I do think bringing it into meltano.yml is probably the better long-term path because it is makes everything obvious and easy-to-debug.

This value should also be available to users at MELTANO_<PLUGIN_TYPE>_WORKDIR based on the documentation in docs.meltano.com/guide/configuration#available-environment-variables.

In that case (and if resolution order permits it), dbt's project_dir setting definition might actually just be declared with a default value: $MELTANO_UTILITY_WORKDIR. I had not considered this previously, and you bring up a very good point here and in your comment re env var expansion.

Optional:

For extra stability to the plugin developer, we might consider also populating workdir (and it's env var) with the value of MELTANO_PROJECT_ROOT if no explicit workdir is provided. This would allow the plugin to 'not care' how it is getting its workdir context, but trust it is there for referencing regardless regardless of how/if it was overridden.

@rabidaudio - No need to tackle everything in the above in a first iteration but I'm curious your thoughts on these points. For my part, the only things I see as strictly required are (1) agreement on the field name (workdir vs cwd or alternatives) and (2) enough exploration on behaviors like env var expansion that we can provide docs that tell users what to expect.

Have you seen the new process regarding integration tests? Adding this functionality to an existing integration test, or writing a new one would be super helpful. Let us know if you feel comfortable adding yourself or if you'd like assist from a team member on that.

@tayloramurphy
Copy link
Collaborator

To confirm, all I meant here is that if the remote plugin definition (on the hub, for instance) is defining a custom workdir, we should try to make that custom workdir as transparent and easy to override as possible for an end user, so they are not confused when move a directory breaks the functionality.

Ah ok, makes sense 👍

For extra stability to the plugin developer, we might consider also populating workdir (and it's env var) with the value of MELTANO_PROJECT_ROOT if no explicit workdir is provided.

I quite like that! Basically just have it as a default property of every plugin than can be overridden. Nice idea @aaronsteers 😄

@rabidaudio
Copy link
Contributor Author

rabidaudio commented Jul 19, 2022

This value should also be available to users at MELTANO_<PLUGIN_TYPE>_WORKDIR based on the documentation in https://docs.meltano.com/guide/configuration#available-environment-variables.

In that case (and if resolution order permits it), dbt's project_dir setting definition might actually just be declared with a default value: $MELTANO_UTILITY_WORKDIR.

@tayloramurphy @aaronsteers

MELTANO_<PLUGIN_TYPE>__WORKDIR (note the double underscores) works out of the box, although currently it's not expanded to an absolute path. Expanding it and changing the name should be easy to include. After that the pytest example should work as written!

My thought was doing things like workdir: $MELTANO_TRANSFORM_PROJECT_DIR but I like this way better, where the config is derived from the workdir setting rather than the other way around.

@rabidaudio
Copy link
Contributor Author

rabidaudio commented Jul 19, 2022

For extra stability to the plugin developer, we might consider also populating workdir (and it's env var) with the value of MELTANO_PROJECT_ROOT if no explicit workdir is provided.

I quite like that! Basically just have it as a default property of every plugin than can be overridden.

I like this too. Two concerns:

  1. It's probably a breaking change, as currently all plugins execute in the shell working directory by default, not the project root directory
  2. It complicates the override for DbtInvoker, as it's hard to tell if the value has been overridden

@codecov
Copy link

codecov bot commented Jul 19, 2022

Codecov Report

Merging #6409 (3c5066e) into main (33ea409) will increase coverage by 0.11%.
The diff coverage is 88.88%.

@@            Coverage Diff             @@
##             main    #6409      +/-   ##
==========================================
+ Coverage   88.97%   89.08%   +0.11%     
==========================================
  Files         276      277       +1     
  Lines       19559    19753     +194     
  Branches     1927     1947      +20     
==========================================
+ Hits        17402    17597     +195     
+ Misses       1829     1826       -3     
- Partials      328      330       +2     
Impacted Files Coverage Δ
src/meltano/core/plugin/base.py 97.24% <ø> (ø)
src/meltano/core/plugin/command.py 100.00% <ø> (ø)
src/meltano/core/plugin/dbt/base.py 77.77% <66.66%> (+0.50%) ⬆️
tests/fixtures/core.py 89.96% <71.42%> (-0.48%) ⬇️
src/meltano/core/plugin_invoker.py 94.56% <87.50%> (-0.76%) ⬇️
tests/meltano/core/plugin/test_command.py 100.00% <100.00%> (ø)
tests/meltano/core/test_plugin_invoker.py 96.00% <100.00%> (+0.70%) ⬆️
tests/meltano/core/job/test_job.py 92.80% <0.00%> (-3.82%) ⬇️
src/meltano/core/project_plugins_service.py 89.78% <0.00%> (-1.08%) ⬇️
src/meltano/core/behavior/canonical.py 95.34% <0.00%> (-0.81%) ⬇️
... and 18 more

@aaronsteers
Copy link
Contributor

For extra stability to the plugin developer, we might consider also populating workdir (and it's env var) with the value of MELTANO_PROJECT_ROOT if no explicit workdir is provided.

I quite like that! Basically just have it as a default property of every plugin than can be overridden.

I like this too. Two concerns:

  1. It's probably a breaking change, as currently all plugins execute in the shell working directory by default, not the project root directory
  2. It complicates the override for DbtInvoker, as it's hard to tell if the value has been overridden

Thanks for calling this out! It's a really good point and based on your points above, I withdraw my suggestion :) - specifically on the basis that the working directory will often need to match the shell directory. For example, if invoking sqlfluff on a specific file in the shell's directory (e.g. cd transforms/marketing && meltano invoke sqlfluff:fix ./my_table.sql) we would not want the shell's own working directory to get lost in that case.

I'll think on this a bit more, but per the above, it makes sense to only have an explicit workdir if the plugin explicitly defines it.

@tayloramurphy
Copy link
Collaborator

@aaronsteers @rabidaudio what's the status on this? Keen to get it in 😄

@rabidaudio
Copy link
Contributor Author

@aaronsteers @rabidaudio what's the status on this? Keen to get it in 😄

@tayloramurphy AFAIK this ready to go. We may want to add some documentation, not sure where is best for that as it's more for plugin creators than end users.

@edgarrmondragon
Copy link
Collaborator

@rabidaudio it might be good to update the JSON schema in schema/meltano.schema.json to know about this field.

@aaronsteers
Copy link
Contributor

aaronsteers commented Oct 10, 2022

@rabidaudio, @tayloramurphy - It looks like we still need a docs entry and some CI tests are not passing.

  • Docs entry for new workdir config. (Perhaps in the plugin reference page?)
  • CI tests passing or signoff. (I see dependency review, which we can ignore, but pre-commit tests should pass and we should make a call on if the coverage alarm can be ignored or if we should add additional tests.)
  • Merge conflicts to resolve. (Optionally we can wait for the above to resolve so as not to have to go through multiple merge conflict passes.)
  • Updating JSON Schema (schema/meltano.schema.json?) with the new field, per @edgarrmondragon's comment above.

@cjohnhanson - If any of the above need assist, would you be able to jump in?

@aaronsteers
Copy link
Contributor

@cjohnhanson - As discussed, feel free to jump in on the above to get this over the line - as time permits.

@tayloramurphy
Copy link
Collaborator

Hey @rabidaudio - I'm going to close this as we're not going to have time to get this over the finish line ourselves. AJ's comment here has a good summary of what it would take to get this over the line. We'd still love to have this in but seeing as it isn't a priority for Cloud we're going to make the tough call to close it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants