Skip to content

Provide convenience for establishing public url for local clones of otherwise public submodules #5950

@yarikoptic

Description

@yarikoptic

What is the problem?

A common scenario for creating a derived dataset is to use a local clone (with data, possibly in some reckless clone shape) of a publicly available dataset with the intention to publish it then publicly. When we do a local clone first it records local "url" within .gitmodules, requiring user to change it to a public one before publishing. Not a big deal I guess, but still -- user needs to know to do that, how to do that (editor or not immediately obvious git config --file .gitmodules --replace-all submodule.{name}.url {public_url} followed by save|commit, etc.

I think we should provide some convenience to streamline such use cases.

Candidate approaches requiring development in DataLad:

  • approach 1: add a new option clone [--use-sibling-url [NAME]] which would then work only for clones from local repos, and take their (tracked if no NAME provided) remote's URL as the url to use for submodule.{}.url

I guess original url would be recorded in submodule.{}.datalad-url in a similar fashion to what is done for RIA remotes (only AFAIK) in

ds.repo.call_git(

Then it would be

datalad clone --use-sibling-url --reckless ephemeral ../orig sourcedata
... do whatever
datalad push|publish
  • approach 2: make it possible to "add" a sibling which would make current annex having a reckless/ephemeral .git/annex/objects, thus possibly mutating what it already has, then it would be
datalad clone -d . public_url sourcedata
datalad siblings add -n local --reckless ephemeral ...
... do whatever
datalad push|publish

we also have ways to extend candidate urls via datalad-specific configuration in .datalad/config (e.g as described in handbook) but I think it is not directly relevant, since I think it is a question of fixing .gitmodules .{}.url entries which aren't usable anywhere but locally in such local reckless clones cases.

may be there are better approaches, which might not even require extending our API?

Metadata

Metadata

Assignees

No one assigned

    Labels

    UXuser experience

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions