CML contexts #311

DavidGOrtega · 2020-10-14T17:03:47Z

This PR introduces contexts, this is the ability to be able to setup a repo and token to the command, so you can run under gitlab CI and comment in GH scm setting the repo and token in the command.
Additionally the repo and sha is taken from the repo itself making future integrations faster if the CI do a full clone of the repo instead of a mirror like GL does with gitlab.

Benefits:

Integrations are easier since we are not looking for specific CI ENV variables but extracting the repo, sha, etc... from the repo.
Commands are independent to be run locally or in a non CI environment.
We could add another kind of drivers, slack, confluence, etc...
Very easy to work with you on premise instances.

Example of a GL runner commenting under a GH repo setting repo_token to GH.

Note that token and commit-sha is not needed since the first can be extracted from the env variables (secret) and the second is inferred from the repo itself.:

train:
  script:
    - echo "Training..."
    - echo 'Hi from cml!' >> report.md
    - cml-send-comment --repo=https://github.com/iterative/3_tensorboard  report.md
    - echo "Trained!"

example of your own github on premise:

train:
  script:
    - echo "Training..."
    - echo 'Hi from cml!' >> report.md
    - cml-send-comment --repo=https://myentreprisegithub.com/iterative/3_tensorboard  --driver=github report.md
    - echo "Trained!"

* token * clean env before tests * print env * yaml * yaml * yaml * yaml * check * fix repo env * fix repo env * fix repo env * fix repo env * check name * env github tests * env github tests * log check * log check * no check tessts * enviromental tests * workflow eenv * gitlab uploads and github_token * snapshots

…cml-contexts

elleobrien

@DavidGOrtega looks pretty nice- everything makes sense. Two big comments to discuss:

I don't think I fully understand where we would fit in the code for, say, Jenkins as my CI server- right now, the client code looks for environmental variables that are special to their runners. But I'm trying to imagine what the logic would be for Bitbucket with a Jenkins CI integration- since it looks like registering runners, creating comments, and calling the API all still happens in the same module as runner environmental variables are read in (GitlabClient.js and GithubClient.js). How would I do it?
Because we define properties and methods of the clients in GitlabClient.js and GithubClient.js, and then again define those same methods and clients in cml.js (although in a very high-level way), it almost seems like there's duplicated code. It makes very clear sense why this pattern is used, but I'm concerned about maintenance with two extremely similar "levels". Do you think this is a concern?

Nice work, and looking forward to chatting at our meeting!

elleobrien · 2020-10-20T20:59:26Z

.github/workflows/publish.yml

@@ -4,6 +4,12 @@ on: [push, pull_request]

 env:
  GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} 
+  TEST_GITHUB_TOKEN: ${{ secrets.TEST_GITHUB_TOKEN }} 


To be clear, does this mean:

in order to test cml, we'll need to supply a repo url, sha, and access token as secrets for each provider (gitlab and github)?

Yes, in order to test it independently we need to setup env variables

TEST_GITHUB_TOKEN: ${{ secrets.TEST_GITHUB_TOKEN }} TEST_GITHUB_REPO: https://github.com/iterative/cml_qa_tests_dummy/ TEST_GITHUB_SHA: 62edc8b3f46a60b3fe1e5c08fd3e0046d350ee29 TEST_GITLAB_TOKEN: ${{ secrets.TEST_GITLAB_TOKEN }} TEST_GITLAB_REPO: https://gitlab.com/iterative.ai/cml_qa_tests_dummy/ TEST_GITLAB_SHA: c4c13286e78dc252dd2611f31a755f10d343fbd4

elleobrien · 2020-10-20T23:00:33Z

bin/cml-cloud-runner-entrypoint.js

+  repo: RUNNER_REPO,
+  token: repo_token
+});
+const IS_GITHUB = cml.driver === 'github';


Can you point me to where IS_GITHUB is used? Not sure I see it

lines 68, 115 and 150 in the same file

elleobrien · 2020-10-21T00:40:19Z

src/GithubClient.test.js

@@ -0,0 +1,95 @@
+jest.setTimeout(20000);
+
+const GithubClient = require('./GithubClient');


This is very very minor, but should we use GitHub & GitLab stylizations for variables throughout( instead of Github and Gitlab?

elleobrien · 2020-10-21T00:41:50Z

src/cml.js

+      const { repo } = opts;
+      const { GITHUB_REPOSITORY, CI_PROJECT_URL } = process.env;
+
+      if (repo && repo.startsWith('https://github.com')) return 'github';


Great, this will make it easy to add as many clients as we want to the logic!

Yes, we can actually infer the driver automatically in the environment or even setup the driver, repo and token to be pointing whatever we want

elleobrien · 2020-10-21T00:45:42Z

src/cml.js

+    };
+
+    const { driver = env_driver(), repo, token } = opts;
+    this.driver = driver;


js question: in this context, what variable will this be set to? I know it's to do with global, just trying to understand logic

Points to the object itself, it's the classic OOP you know. JS supports class notation.

elleobrien · 2020-10-21T01:07:43Z

bin/cml-cloud-runner-entrypoint.js

-  RUNNER_IMAGE = 'dvcorg/cml:latest'
+  RUNNER_IMAGE = 'dvcorg/cml:latest',
+
+  RUNNER_DRIVER,


It seems like the flags and some js modules use driver, but others use client- if I understand correctly, they're referring to the same thing. Is this intentional?

I dont remember to have set any client as a parameter, however you are right the driver points to one of the clients GitlabClient or GitlabClient.
Im changing them for Gitlab and GitHub and are going to be inside the drivers folder. To be more clear

elleobrien · 2020-10-21T01:09:39Z

src/cml.js

+
+    let mime, uri;
+
+    if (gitlab_uploads) {


Is there any way to move the gitlab_uploads logic out of CML.publish()? The only reason I ask is that if we expand to more clients with their own unique methods for uploading artifacts, I could imagine this logic getting hard to maintain.

I think this is where it should go actually, the real logic is in the client already which is calling the gitlab api but here we define the business logic somehow you have to decide which upload is going to be used cml's native one or one provide by the client/driver

I agree with @elleobrien. This should be gitlab's client business on how to publish. Otherwise, all the logic is spread around CML, github and gitlab and it will grow exponentially with new clients.

A more abstract interface might be required like publich_or_upload() (a better name is needed).

@dmpetrov it is. The logic to publish in Gitlab is inside the Gitlab client. In CML there is only a decision. Take into account that even of you work in Gitlab you can still publish with CML storage. So its just an if

if (gitlab_uploads) { const client = get_client({ ...this, driver: 'gitlab' }); ({ mime, uri } = await client.publish(opts)); } else { ({ mime, uri } = await upload(opts)); }

If you see all the specific publish code is inside the Gitlab client

If we were setting that business logic we would be adding the CML upload in every client

} else { ({ mime, uri } = await upload(opts)); }

* token * clean env before tests * print env * yaml * yaml * yaml * yaml * check * fix repo env * fix repo env * fix repo env * fix repo env * check name * env github tests * env github tests * log check * log check * no check tessts * enviromental tests * workflow eenv * gitlab uploads and github_token * snapshots * log env * log env * update also github.context * No log

…cml-contexts

dmpetrov

The change makes the existing API cleaner. But I still see some opportunities in making API even better.

In the change. it tries to abstract out the providers (GH/GL) from logic but keep the separation on the public API level (CML class). It looks like we need to separate the public class (CML), the providers/drivers logic (it is mostly done), and the installation logic (this is part of CML for some reason).

Now: CML API (with some logic) --> Providers (with logic)
Should be: CML API (no logic, dummy proxy)--> Providers Factory (no logic, just abstracting out the providers) --> Providers(all logic is here)

Also, it is not clear from the change on how to use different systems:

how to work with GitHub provider but GitLab CI/CD?
how we suppose to integrate Jenkins (CI/CD) with GitLab/GitHub provider?

dmpetrov · 2020-10-25T21:49:54Z

src/cml.js

+  }
+
+  env_repo() {
+    return get_client(this).env_repo();


These one-line methods seems a bit strange - clear logic duplication. You have already abstracted out the provider type by a factory method. Now you suppose to call the methods directly.

Its not a factory but a facade. CML api its defining some methods and then its calling the proper underlying code. That code is wrapped inside every client.

I dont see code duplication but expose the methods that it should expose. Users that would use the CML SDK should not dig into the drivers/clients. Just only know that several operations are available in the CML.

dmpetrov · 2020-10-25T21:51:34Z

src/cml.js

+
+  async comment_create(opts = {}) {
+    const client = get_client(this);
+    return await client.comment_create(opts);


Same here but with two lines, not one.

dmpetrov · 2020-10-25T21:59:01Z

src/cml.js

+
+    let mime, uri;
+
+    if (gitlab_uploads) {


I agree with @elleobrien. This should be gitlab's client business on how to publish. Otherwise, all the logic is spread around CML, github and gitlab and it will grow exponentially with new clients.

A more abstract interface might be required like publich_or_upload() (a better name is needed).

dmpetrov · 2020-10-25T22:10:59Z

src/cml.js

+  throw new Error('driver unknown!');
+};
+
+class CML {


Is CML (and cml.js) the right name? The code ends up using CML for different objects like:

const CML = require('../src/cml'); … const cml = new CML(opts);

One of the big ideas behind this redesign is to separate different concepts - GitHub/GitLab and CI/CDs executors. But it seems like we still using a monolithic CML concept.

I'd suggest renaming this one to something like ScmHostingDriver with the factory method and abstract (empty) methods (or without methods since it is JS).

If this CML is a general, user-facing API - it makes sense to keep it here and keep the name. But all the client's specific logic needs to be extracted. Including the abstract *Driver class with the factory method.

dmpetrov · 2020-10-25T22:23:42Z

src/cml.js

+      const client = get_client({ ...this, driver: 'gitlab' });
+      ({ mime, uri } = await client.publish(opts));
+    } else {
+      ({ mime, uri } = await upload(opts));


It seems like there is one case is not covered - publish(gitlab_uploads=false) for GitLab client.

publish(gitlab_uploads=false) goes to default upload the else

DavidGOrtega · 2020-10-26T09:30:58Z

@dmpetrov

Now: CML API (with some logic) --> Providers (with logic)
Should be: CML API (no logic, dummy proxy)--> Providers Factory (no logic, just abstracting out the providers) --> Providers(all logic is here)

Its a facade. All the business logic (input data check, output data, etc...) should be done in the Facade Class, also its the one that exposes and defines all the methods available. It could be the class to be exposed in a microservice in example.

One of the big ideas behind this redesign is to separate different concepts - GitHub/GitLab and CI/CDs executors. But it seems like we still using a monolithic CML concept.

Its not. We are defining a common API to all the vendors. with equivalent inputs and outputs. I dont see monolithic to have a class

Also, it is not clear from the change on how to use different systems:
how to work with GitHub provider but GitLab CI/CD?
how we suppose to integrate Jenkins (CI/CD) with GitLab/GitHub provider?

Now every command is totally independent from the environment we can setup a repo and a token and input variables. The missing input variables has to be:

Aligned by the CI integration
Aligned by us

Before we were not able to comment on a GH repo from an GL runner, simply because we were not making the commands independents from the ENV.

As I say the SHA needed could be automatically set by the CI in its own SHA or we can actually extract it from the ENV or we will have to even do calls to APIs, its going to deepend on the integration, but its possible because we have a non ENV dependant code anymore.

dmpetrov · 2020-10-26T13:14:49Z

Before we were not able to comment on a GH repo from an GL runner, simply because we were not making the commands independents from the ENV.

A code example would be really useful. Like GH repo with GL runner.

* git sha * commit_sha * log * log gh sha * log gh sha * show origin * remote as repo * remote as repo * tests * remove one test not working on gitlab * space

dmpetrov

Great change! I especially like the SHA/ENV part.

A few comments are inline - please address before emerging.

dmpetrov · 2020-10-28T20:08:07Z

bin/cml-publish.test.js

-                                                                             [boolean]
+                                            [deprecated: Use native instead] [boolean]
+        --native          Uses driver's native capabilities to upload assets instead
+                          of CML's backend.                                  [boolean]


backend might be not the best term in the context of CI/CD. how about ... instead of external CML's assets storage

dmpetrov · 2020-10-28T20:09:20Z

bin/cml-publish.js

+  .boolean('native')
+  .describe(
+    'native',
+    "Uses driver's native capabilities to upload assets instead of CML's backend."


backend might be the best word - please see the comment above

Im confused here... not sure if you are asking to change native to backend or you are pointing is not the best word... A few iterations before I called it backend

sorry, I mean "backend might be NOT the best word" (even in the text). and I suggested changing the description, not the option name.

dmpetrov · 2020-10-28T20:13:27Z

bin/cml-publish.test.js

@@ -17,9 +17,17 @@ describe('CML e2e', () => {
        --gitlab-uploads  Uses GitLab uploads instead of CML storage. Use GitLab


expect(output).toMatchInlineSnapshot(...) - it might trigger false alarms if library will change.
It is not a problem now, but it worth changing in the future in a more reliable way.

by the library I mean yargs

Yep, every time that we change the specs we have to update the template. It's a common practice in web dev to control that everything is looking as it should.

I mean that it can change even without text changes - from the lib change (they might decide to change the default formatting).

dmpetrov · 2020-10-28T20:23:28Z

src/cml.js

+const infer_driver = (opts = {}) => {
+  const { repo } = opts;
+  if (repo && repo.includes('github.com')) return 'github';
+  if (repo && repo.includes('gitlab.com')) return 'gitlab';


Could these checks be just a bit more rigorous?
For example, I might have a local repo in my file system like ~/src/my-gitlab.com-repo/ which might be no GitLab repo.

repo has to be an URL. Its the url to be used by the API. can not be anything else and when its going to be used it will fail if not.

dmpetrov · 2020-10-28T20:30:55Z

src/cml.js

+
+  const { GITHUB_REPOSITORY, CI_PROJECT_URL } = process.env;
+  if (GITHUB_REPOSITORY) return 'github';
+  if (CI_PROJECT_URL) return 'gitlab';


Do we need similar variables for GL? Why not?

I'm a bit confused by CI_PROJECT_URL :) Why it is not just GitLab like GITLAB_REPOSITORY ?

CI_PROJECT_URL is an env variable that GL CI generates. Its not our choose to name it that way. What we are doing here is try to guess if the driver is not set nor the repo either where are we being run.
This is useful for on premise instances also.

What we should do if none is defined? Should we throw a meaningful exception?

nothing! later on the exception will be raised when creating the Driver

dmpetrov · 2020-10-28T20:39:35Z

src/cml.js

+  if (repo && repo.includes('github.com')) return 'github';
+  if (repo && repo.includes('gitlab.com')) return 'gitlab';
+
+  const { GITHUB_REPOSITORY, CI_PROJECT_URL } = process.env;


What is none of the variables are defined?
Is it ok if it is defined but empty or wrong address (does it make sense to check URL pattern)?

They are specific env variables for specific vendors. We are not even looking at the value we just need to 'see' them to 'know' which kind driver it is in case its not specified by the user

dmpetrov · 2020-10-28T20:40:21Z

src/cml.js

+  if (driver === 'github') return new Github({ repo, token });
+  if (driver === 'gitlab') return new Gitlab({ repo, token });
+
+  throw new Error('driver unknown!');


it might be more informative to include the wrong name in the error message

dmpetrov · 2020-10-28T20:42:39Z

src/cml.js

+const { upload, exec } = require('./utils');
+
+const uri_no_trailing_slash = (uri) => {
+  return uri.endsWith('/') ? uri.substr(0, uri.length - 1) : uri;


Are there triml/trimr or rstrip/lstrip in JS?

hahaha no! I made one a few iterations before but I removed it. We could use underscore or a library like that

DavidGOrtega added 10 commits October 14, 2020 19:01

[wip] CML contexts

72106b7

octokit on demand

e234780

docker entrypooint

164376a

Merge branch 'cml-contexts' of https://github.com/iterative/cml into …

64e102b

…cml-contexts

docker entrypoint register_runner

9d2cc12

update templates

c444438

token repo

f4b2ea4

token repo

c7f901c

specify driver

a03bd10

DavidGOrtega changed the title ~~[wip] CML contexts~~ CML contexts Oct 19, 2020

DavidGOrtega marked this pull request as draft October 19, 2020 16:30

elleobrien reviewed Oct 21, 2020

View reviewed changes

DavidGOrtega added 11 commits October 21, 2020 17:18

point to iterative repos

20fa7e2

Merge branch 'cml-contexts' of https://github.com/iterative/cml into …

b8ff436

…cml-contexts

point to iterative repos without slash

1f33b73

point to iterative ai

eedd09d

test slash

a0c121a

test slash

d0e810a

fix strip

4adfb1c

removal of strip

d21e710

refactor

2cc0ac6

slash

f454f10

dmpetrov requested changes Oct 25, 2020

View reviewed changes

DavidGOrtega added 3 commits October 26, 2020 15:30

minor refactor

9d0f22f

no console

2738dc9

Cml contexts sha (#326)

dd042e2

* git sha * commit_sha * log * log gh sha * log gh sha * show origin * remote as repo * remote as repo * tests * remove one test not working on gitlab * space

DavidGOrtega added 2 commits October 28, 2020 13:29

publish backend

ccc487e

no banckend in publish

f2ec83b

DavidGOrtega changed the base branch from master to release/contexts October 28, 2020 16:14

DavidGOrtega changed the base branch from release/contexts to master October 28, 2020 16:16

DavidGOrtega marked this pull request as ready for review October 28, 2020 17:20

dmpetrov approved these changes Oct 28, 2020

View reviewed changes

DavidGOrtega added 7 commits October 29, 2020 10:18

fix small review comments

91bf3d6

test template

77d0ccd

test template

08b81a3

adjust cml-cloud-runner

bb9b735

bump version

b099ac4

snapshot

a65bf87

snapshot

3bc9fc2

DavidGOrtega merged commit 6f9c79b into master Oct 29, 2020

DavidGOrtega deleted the cml-contexts branch October 29, 2020 14:48

This was referenced Nov 4, 2020

CML commands should run also without a CI environment #290

Closed

Consider splitting CI part from the comment publishing part. #264

Closed

		@@ -0,0 +1,95 @@
		jest.setTimeout(20000);

		const GithubClient = require('./GithubClient');

		@@ -17,9 +17,17 @@ describe('CML e2e', () => {
		--gitlab-uploads Uses GitLab uploads instead of CML storage. Use GitLab

CML contexts #311

CML contexts #311

Uh oh!

Conversation

DavidGOrtega commented Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example of a GL runner commenting under a GH repo setting repo_token to GH.

example of your own github on premise:

Uh oh!

elleobrien left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DavidGOrtega Oct 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DavidGOrtega Oct 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmpetrov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DavidGOrtega commented Oct 26, 2020

Uh oh!

dmpetrov commented Oct 26, 2020

Uh oh!

dmpetrov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

DavidGOrtega commented Oct 14, 2020 •

edited

Loading

elleobrien left a comment •

edited

Loading

DavidGOrtega Oct 22, 2020 •

edited

Loading

DavidGOrtega Oct 26, 2020 •

edited

Loading