-
Notifications
You must be signed in to change notification settings - Fork 344
CML contexts #311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CML contexts #311
Conversation
* token * clean env before tests * print env * yaml * yaml * yaml * yaml * check * fix repo env * fix repo env * fix repo env * fix repo env * check name * env github tests * env github tests * log check * log check * no check tessts * enviromental tests * workflow eenv * gitlab uploads and github_token * snapshots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DavidGOrtega looks pretty nice- everything makes sense. Two big comments to discuss:
-
I don't think I fully understand where we would fit in the code for, say, Jenkins as my CI server- right now, the client code looks for environmental variables that are special to their runners. But I'm trying to imagine what the logic would be for Bitbucket with a Jenkins CI integration- since it looks like registering runners, creating comments, and calling the API all still happens in the same module as runner environmental variables are read in (
GitlabClient.js
andGithubClient.js
). How would I do it? -
Because we define properties and methods of the clients in
GitlabClient.js
andGithubClient.js
, and then again define those same methods and clients incml.js
(although in a very high-level way), it almost seems like there's duplicated code. It makes very clear sense why this pattern is used, but I'm concerned about maintenance with two extremely similar "levels". Do you think this is a concern?
Nice work, and looking forward to chatting at our meeting!
@@ -4,6 +4,12 @@ on: [push, pull_request] | |||
|
|||
env: | |||
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} | |||
TEST_GITHUB_TOKEN: ${{ secrets.TEST_GITHUB_TOKEN }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, does this mean:
in order to test cml, we'll need to supply a repo url, sha, and access token as secrets for each provider (gitlab and github)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in order to test it independently we need to setup env variables
TEST_GITHUB_TOKEN: ${{ secrets.TEST_GITHUB_TOKEN }}
TEST_GITHUB_REPO: https://github.com/iterative/cml_qa_tests_dummy/
TEST_GITHUB_SHA: 62edc8b3f46a60b3fe1e5c08fd3e0046d350ee29
TEST_GITLAB_TOKEN: ${{ secrets.TEST_GITLAB_TOKEN }}
TEST_GITLAB_REPO: https://gitlab.com/iterative.ai/cml_qa_tests_dummy/
TEST_GITLAB_SHA: c4c13286e78dc252dd2611f31a755f10d343fbd4
repo: RUNNER_REPO, | ||
token: repo_token | ||
}); | ||
const IS_GITHUB = cml.driver === 'github'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you point me to where IS_GITHUB
is used? Not sure I see it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines 68, 115 and 150 in the same file
src/GithubClient.test.js
Outdated
@@ -0,0 +1,95 @@ | |||
jest.setTimeout(20000); | |||
|
|||
const GithubClient = require('./GithubClient'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very very minor, but should we use GitHub
& GitLab
stylizations for variables throughout( instead of Github
and Gitlab
?
src/cml.js
Outdated
const { repo } = opts; | ||
const { GITHUB_REPOSITORY, CI_PROJECT_URL } = process.env; | ||
|
||
if (repo && repo.startsWith('https://github.com')) return 'github'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, this will make it easy to add as many clients as we want to the logic!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can actually infer the driver automatically in the environment or even setup the driver, repo and token to be pointing whatever we want
src/cml.js
Outdated
}; | ||
|
||
const { driver = env_driver(), repo, token } = opts; | ||
this.driver = driver; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
js question: in this context, what variable will this
be set to? I know it's to do with global, just trying to understand logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Points to the object itself, it's the classic OOP you know. JS supports class notation.
RUNNER_IMAGE = 'dvcorg/cml:latest' | ||
RUNNER_IMAGE = 'dvcorg/cml:latest', | ||
|
||
RUNNER_DRIVER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the flags and some js modules use driver
, but others use client
- if I understand correctly, they're referring to the same thing. Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont remember to have set any client as a parameter, however you are right the driver
points to one of the clients GitlabClient or GitlabClient.
Im changing them for Gitlab and GitHub and are going to be inside the drivers folder. To be more clear
src/cml.js
Outdated
|
||
let mime, uri; | ||
|
||
if (gitlab_uploads) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to move the gitlab_uploads logic out of CML.publish()
? The only reason I ask is that if we expand to more clients with their own unique methods for uploading artifacts, I could imagine this logic getting hard to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is where it should go actually, the real logic is in the client already which is calling the gitlab api but here we define the business logic somehow you have to decide which upload is going to be used cml's native one or one provide by the client/driver
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @elleobrien. This should be gitlab's client business on how to publish. Otherwise, all the logic is spread around CML, github and gitlab and it will grow exponentially with new clients.
A more abstract interface might be required like publich_or_upload()
(a better name is needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmpetrov it is. The logic to publish in Gitlab is inside the Gitlab client. In CML there is only a decision. Take into account that even of you work in Gitlab you can still publish with CML storage. So its just an if
if (gitlab_uploads) {
const client = get_client({ ...this, driver: 'gitlab' });
({ mime, uri } = await client.publish(opts));
} else {
({ mime, uri } = await upload(opts));
}
If you see all the specific publish code is inside the Gitlab client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we were setting that business logic we would be adding the CML upload in every client
} else {
({ mime, uri } = await upload(opts));
}
* token * clean env before tests * print env * yaml * yaml * yaml * yaml * check * fix repo env * fix repo env * fix repo env * fix repo env * check name * env github tests * env github tests * log check * log check * no check tessts * enviromental tests * workflow eenv * gitlab uploads and github_token * snapshots * log env * log env * update also github.context * No log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes the existing API cleaner. But I still see some opportunities in making API even better.
In the change. it tries to abstract out the providers (GH/GL) from logic but keep the separation on the public API level (CML class). It looks like we need to separate the public class (CML), the providers/drivers logic (it is mostly done), and the installation logic (this is part of CML for some reason).
Now: CML API (with some logic) --> Providers (with logic)
Should be: CML API (no logic, dummy proxy)--> Providers Factory (no logic, just abstracting out the providers) --> Providers(all logic is here)
Also, it is not clear from the change on how to use different systems:
- how to work with GitHub provider but GitLab CI/CD?
- how we suppose to integrate Jenkins (CI/CD) with GitLab/GitHub provider?
src/cml.js
Outdated
} | ||
|
||
env_repo() { | ||
return get_client(this).env_repo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These one-line methods seems a bit strange - clear logic duplication. You have already abstracted out the provider type by a factory method. Now you suppose to call the methods directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not a factory but a facade. CML api its defining some methods and then its calling the proper underlying code. That code is wrapped inside every client.
I dont see code duplication but expose the methods that it should expose. Users that would use the CML SDK should not dig into the drivers/clients. Just only know that several operations are available in the CML.
src/cml.js
Outdated
|
||
async comment_create(opts = {}) { | ||
const client = get_client(this); | ||
return await client.comment_create(opts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here but with two lines, not one.
src/cml.js
Outdated
|
||
let mime, uri; | ||
|
||
if (gitlab_uploads) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @elleobrien. This should be gitlab's client business on how to publish. Otherwise, all the logic is spread around CML, github and gitlab and it will grow exponentially with new clients.
A more abstract interface might be required like publich_or_upload()
(a better name is needed).
src/cml.js
Outdated
throw new Error('driver unknown!'); | ||
}; | ||
|
||
class CML { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is CML (and cml.js
) the right name? The code ends up using CML for different objects like:
const CML = require('../src/cml');
…
const cml = new CML(opts);
One of the big ideas behind this redesign is to separate different concepts - GitHub/GitLab and CI/CDs executors. But it seems like we still using a monolithic CML concept.
I'd suggest renaming this one to something like ScmHostingDriver
with the factory method and abstract (empty) methods (or without methods since it is JS).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this CML is a general, user-facing API - it makes sense to keep it here and keep the name. But all the client's specific logic needs to be extracted. Including the abstract *Driver
class with the factory method.
const client = get_client({ ...this, driver: 'gitlab' }); | ||
({ mime, uri } = await client.publish(opts)); | ||
} else { | ||
({ mime, uri } = await upload(opts)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there is one case is not covered - publish(gitlab_uploads=false)
for GitLab client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
publish(gitlab_uploads=false) goes to default upload the else
Its a facade. All the business logic (input data check, output data, etc...) should be done in the Facade Class, also its the one that exposes and defines all the methods available. It could be the class to be exposed in a microservice in example.
Its not. We are defining a common API to all the vendors. with equivalent inputs and outputs. I dont see monolithic to have a class
Now every command is totally independent from the environment we can setup a repo and a token and input variables. The missing input variables has to be:
Before we were not able to comment on a GH repo from an GL runner, simply because we were not making the commands independents from the ENV. As I say the SHA needed could be automatically set by the CI in its own SHA or we can actually extract it from the ENV or we will have to even do calls to APIs, its going to deepend on the integration, but its possible because we have a non ENV dependant code anymore. |
A code example would be really useful. Like GH repo with GL runner. |
* git sha * commit_sha * log * log gh sha * log gh sha * show origin * remote as repo * remote as repo * tests * remove one test not working on gitlab * space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great change! I especially like the SHA/ENV part.
A few comments are inline - please address before emerging.
bin/cml-publish.test.js
Outdated
[boolean] | ||
[deprecated: Use native instead] [boolean] | ||
--native Uses driver's native capabilities to upload assets instead | ||
of CML's backend. [boolean] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend
might be not the best term in the context of CI/CD. how about ... instead of external CML's assets storage
bin/cml-publish.js
Outdated
.boolean('native') | ||
.describe( | ||
'native', | ||
"Uses driver's native capabilities to upload assets instead of CML's backend." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend
might be the best word - please see the comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im confused here... not sure if you are asking to change native
to backend
or you are pointing is not the best word... A few iterations before I called it backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I mean "backend might be NOT the best word" (even in the text). and I suggested changing the description, not the option name.
@@ -17,9 +17,17 @@ describe('CML e2e', () => { | |||
--gitlab-uploads Uses GitLab uploads instead of CML storage. Use GitLab |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expect(output).toMatchInlineSnapshot(...)
- it might trigger false alarms if library will change.
It is not a problem now, but it worth changing in the future in a more reliable way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by the library I mean yargs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, every time that we change the specs we have to update the template. It's a common practice in web dev to control that everything is looking as it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that it can change even without text changes - from the lib change (they might decide to change the default formatting).
const infer_driver = (opts = {}) => { | ||
const { repo } = opts; | ||
if (repo && repo.includes('github.com')) return 'github'; | ||
if (repo && repo.includes('gitlab.com')) return 'gitlab'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these checks be just a bit more rigorous?
For example, I might have a local repo in my file system like ~/src/my-gitlab.com-repo/
which might be no GitLab repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repo has to be an URL. Its the url to be used by the API. can not be anything else and when its going to be used it will fail if not.
|
||
const { GITHUB_REPOSITORY, CI_PROJECT_URL } = process.env; | ||
if (GITHUB_REPOSITORY) return 'github'; | ||
if (CI_PROJECT_URL) return 'gitlab'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need similar variables for GL? Why not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by CI_PROJECT_URL
:) Why it is not just GitLab like GITLAB_REPOSITORY
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI_PROJECT_URL is an env variable that GL CI generates. Its not our choose to name it that way. What we are doing here is try to guess if the driver is not set nor the repo either where are we being run.
This is useful for on premise instances also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we should do if none is defined? Should we throw a meaningful exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing! later on the exception will be raised when creating the Driver
if (repo && repo.includes('github.com')) return 'github'; | ||
if (repo && repo.includes('gitlab.com')) return 'gitlab'; | ||
|
||
const { GITHUB_REPOSITORY, CI_PROJECT_URL } = process.env; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is none of the variables are defined?
Is it ok if it is defined but empty or wrong address (does it make sense to check URL pattern)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are specific env variables for specific vendors. We are not even looking at the value we just need to 'see' them to 'know' which kind driver it is in case its not specified by the user
src/cml.js
Outdated
if (driver === 'github') return new Github({ repo, token }); | ||
if (driver === 'gitlab') return new Gitlab({ repo, token }); | ||
|
||
throw new Error('driver unknown!'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be more informative to include the wrong name in the error message
const { upload, exec } = require('./utils'); | ||
|
||
const uri_no_trailing_slash = (uri) => { | ||
return uri.endsWith('/') ? uri.substr(0, uri.length - 1) : uri; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there triml/trimr
or rstrip/lstrip
in JS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hahaha no! I made one a few iterations before but I removed it. We could use underscore or a library like that
This PR introduces contexts, this is the ability to be able to setup a repo and token to the command, so you can run under gitlab CI and comment in GH scm setting the repo and token in the command.
Additionally the repo and sha is taken from the repo itself making future integrations faster if the CI do a full clone of the repo instead of a mirror like GL does with gitlab.
Benefits:
Example of a GL runner commenting under a GH repo setting repo_token to GH.
Note that token and commit-sha is not needed since the first can be extracted from the env variables (secret) and the second is inferred from the repo itself.:
example of your own github on premise: