Skip to content

Support Wildcards for all integrations in UrlReaderProcessor #22504

@benjdlambert

Description

@benjdlambert

🔖 Feature description

Currently we detect wether a URL for reading is a search query by detecting if there's a wildcard in the URL.

const { filepath } = parseGitUrl(location);
if (filepath?.match(/[*?]/)) {

This is not a great approach for many reasons, but one being that it doesn't work for non Git URL's of which there are many. One use case is #21955, when reading files from Google Cloud Storage URLs, which of course don't parse into a Git URL.

We attempted to fix that, by parsing that URL as is, and checking the pathname:

https://github.com/backstage/backstage/blob/c3249d6/plugins/catalog-backend/src/modules/core/UrlReaderProcessor.ts#L127-L128

But this caused issues with the AzureURLReader #22465, as the actual wildcard is in the querystring not the pathname e.g https://dev.azure.com/my-org/my-project/_git/my-repo?path=/templates/*/*.yaml

🎤 Context

We should support reading from sources consistently, and support wildcards for all URLReaders.

✌️ Possible Implementation

I think that there could be an option to flip this around a little bit, and have a isSearchUrl on the URLReader implementations which can be used to detect if the URL which has been provided is a wildcard URL before it is then used here:

const response = await this.options.reader.search(location, { etag });

Something like this:

if (this.options.reader.isSearchUrl(location))) {
  const limiter = limiterFactory(5);
  const response = await this.options.reader.search(location, { etag });
  const output = response.files.map(async file => ({
    url: file.url,
    data: await limiter(file.content),
  }));
  return { response: await Promise.all(output), etag: response.etag };
}

Where the implementation for all the Git based URL readers could be something like:

public isSearchUrl(location: string) {
  const { filepath } = parseGitUrl(location);
  return filepath?.match(/[*?]/)
 }

But more specialized URLReaders like the Google Cloud Storage one can be something like this:

public isSearchUrl(location: string) {
   const { pathname: filepath } = new URL(location);
   return filepath?.match(/[*?]/)
 }

👀 Have you spent some time to check if this feature request has been raised before?

  • I checked and didn't find similar issue

🏢 Have you read the Code of Conduct?

Are you willing to submit PR?

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:coreRelated to the Core Backstage Frameworktype:suggestionNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions