Skip to content

Conversation

Skarlso
Copy link
Member

@Skarlso Skarlso commented Jul 15, 2018

I'm also thinking of using some ref magic, but comparing pulls seem to be the easiest for now.

https://github.com/src-d/go-git/blob/master/_examples/revision/main.go for ref magic.

Deals with #31 .

@codecov-io
Copy link

codecov-io commented Jul 15, 2018

Codecov Report

Merging #35 into master will increase coverage by 0.48%.
The diff coverage is 58.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #35      +/-   ##
==========================================
+ Coverage   49.94%   50.43%   +0.48%     
==========================================
  Files          13       13              
  Lines         979     1039      +60     
==========================================
+ Hits          489      524      +35     
- Misses        439      464      +25     
  Partials       51       51
Impacted Files Coverage Δ
pipeline/pipeline.go 87.14% <ø> (ø) ⬆️
pipeline/create_pipeline.go 0% <0%> (ø) ⬆️
pipeline/ticker.go 0% <0%> (ø) ⬆️
pipeline/build_golang.go 94.44% <100%> (+0.69%) ⬆️
pipeline/git.go 44.57% <79.41%> (+24.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 60d338d...9c8f63c. Read the comment docs.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

Current limitations:

  • This is sequential. If there are many pipelines this would be problematic. However git doesn't allow parallel access, it will lock.
  • No error handling yet
  • No tests yet
  • If the pull / build takes a long time it could run into the same pipeline again trying with the same action

these are things that need to be iron out.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

This is why a poll is hard. Imagine you have a hundred pipelines... The minute interval will reach around too fast.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

Another way could be to register a poll for each pipeline of it's own.

Also, I stand corrected. It's apparently possible to use git pull in parallel.

Which means this could be a parallel operation with a semaphore of 5 maybe so it's restricted in case there are a 100 pipelines to update.

@michelvocks
Copy link
Member

That sounds for me like the best solution. Having 3-5 (maybe configureable?) parallel workers which are responsible for the update stuff. We should also pass the pipeline build stuff to the scheduler so that he can overtake the build scheduling. 🤗

Anyway, awesome that you are already working on this. 👍

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

@michelvocks thanks, okay. that's some good ideas. :)

Also, this fails to update right now with reasons unknown to me. :O It says that the pipeline is non-fast-forwardable. But it is. :D

error2 : : non-fast-forward update=<unknown>

❯ git status
On branch master
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.

Weird.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

@michelvocks uh, I might be misunderstanding here something. The scheduler runs a pipeline, doesn't it? 🤔 I just want to build a current checked out repo and update the binary. But not trigger a build as well. But I should? After all, it could be used as some kind of checker I guess if the current pipeline is okay.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

I tried everything to no avail. It should work, so I submitted an issue to go-git: src-d/go-git#889

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

My fault. Path is actually =<unknown>. The pipeline isn't selected properly apparently.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

Ahh I see what the problem is. :D The repo isn't actually filled out... The Repo part of the pipeline is empty.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

So, this is a problem for the updater.

@michelvocks Question. :) Either I update the Repo to also contain the UUID so later on I can figure out where the location of the Repo is and fill out that part of the struct. Or I simply walk all the directories in the clone folder and update everything in there, but than I can't tie them to pipeline builds and can't trigger a build later on.

Either one of these is fine. I'm opting for walking the directory as I'm still unsure if I would like to trigger a bunch of builds if there is a change, BUT I can see how that would useful, like a Jenkins commit trigger.

EDIT: Actually, I can't just walk the folder because I need pipeline information such as Type in order to trigger a build process to avoid re-writing everything. So.. I will need to store the UUID to be able to Iter through the pipelines and build them. :/

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

So help me understand here something @michelvocks. Pipeline isn't stored until here:

			if shouldStore {
				log.Println("STORING :", pipeline.UUID)
				storeService.PipelinePut(pipeline)
			}

At which point you are creating a completely blank one, and thus when I'm trying to iter over GlobalActivePipelines the active pipelines actually have no information on them and nothing is saved that we set up previously. Is there another way of doing this?

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

Like, I'm not criticising the design here just wondering how far I am from the turht or how deep I should go into the rabbit hole. Because at this point I'm thinking of saving the pipeline details earlier, namely once building it is finished. So I could get all the repo info and there would be no need to store the uuid.

@michelvocks
Copy link
Member

@Skarlso Yeah this is a problem. We cannot fully expect that all fields like the source Git-Repository are set in GlobalActivePipelines. When, for example, a pipeline is build locally and the binary is manually put into the pipelines folder then there is no data.

I think we can only support continuous automatic builds when the pipeline has been created via gaia. Currently, we do not save any information at all: https://github.com/gaia-pipeline/gaia/blob/master/pipeline/create_pipeline.go#L23

In this function you can see that we just simply clone the repo, build the pipeline and copy the binary to the pipelines folder. We do store the gaia.CreatePipeline object but this is basically just for history.
So I think we somehow have to create and save the gaia.Pipeline object at the end of this function.

@michelvocks
Copy link
Member

uh, I might be misunderstanding here something. The scheduler runs a pipeline, doesn't it? 🤔 I just want to build a current checked out repo and update the binary. But not trigger a build as well. But I should? After all, it could be used as some kind of checker I guess if the current pipeline is okay.

You are right. I forgot that I didn't implement a scheduler for pipeline build. I think in the future we might need this but I guess that's too much for now. So I think we can implement that later.

@@ -18,7 +18,8 @@ import (
const (
// tickerIntervalSeconds defines how often the ticker will tick.
// Definition in seconds.
tickerIntervalSeconds = 5
tickerIntervalSeconds = 5
pollTicketIntervalMinutes = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be made configurable?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely.

@rmb938
Copy link
Contributor

rmb938 commented Jul 15, 2018

This may be out of scope for this PR however what do you think about using webhooks up to update pipelines and have a toggle or something to switch between webhooks and active polling? I know at least at my company constantly polling our git server is discouraged.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

@rmb938 I agree with you. But that would mean that we'd have to expose a public port and endpoint that the hook can reach. And not everyone would want to do that. Although I'm up for making it optionally selectable.

@rmb938
Copy link
Contributor

rmb938 commented Jul 15, 2018

Well it wouldn't have to be public, it could have some sort of token that doesn't expire (but can be regenerated). I know Github at least allows webhooks to have a secret that is injected into a X-Hub-Signature header. Gitlab is different and uses the X-Gitlab-Token header so that would have to be configurable somehow. It would be more complex so probably shouldn't be included in this pr.

Also keep in mind the endpoint doesn't have to be internet reachable since companies do host their own Github, Gitlab, ect... internally.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

Am I missing something here? :-) yes it can be secure but the hook still needs to call a callback endpoint which must be reachable by github and we need to listen to those event and setup an endpoint for that. The header and tokens are fine ofc but otherwise since we aren't running our own git we would need something they can ping.

Or, I'm completely missing something here and are not thinking of something you are. :-)

@rmb938
Copy link
Contributor

rmb938 commented Jul 15, 2018

Correct but you don't actually need github to test it with. The payload and headers that github sends is all documented and can easily be mocked out via curl.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

Yes. But for the hook to work you'll need an endpoint. And I was saying that some people don't prefer that. But I'm completely fine with making it an optional thing. I wanted a hook in the first place. :-):-)

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

Gentleman. I think we have a liftoff for phase one of this:

2018-07-15T22:43:50.114+0200 [DEBUG] Gaia: selected branch : : refs/heads/master=<unknown>
2018-07-15T22:43:51.421+0200 [DEBUG] Gaia: no need to update pipeline: : test1=<unknown>
// commit happens here.
2018-07-15T22:43:51.422+0200 [DEBUG] Gaia: updating pipeline: : test1=<unknown>
2018-07-15T22:43:52.627+0200 [DEBUG] Gaia: successfully updated: : test1=<unknown>

Before image of the pipeline:

screen shot 2018-07-15 at 22 47 06

Did a pull and changed it to a simple run then push. Did nothing just waited and after the poll saw there are changes, it fetched them and the new pipeline was available to be executed:

screen shot 2018-07-15 at 22 51 11

Everything worked as expected.

@Skarlso Skarlso changed the title [POC] Automatically update pipelines every minute. [WIP] Automatically update pipelines every minute. Jul 15, 2018
@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

Uh, this is bad. Now I'm clobbering over the pipeline instead of updating the record, which deletes the previous entries after restart they won't be visible any longer. Also, name is not saved this way. So need the facility to update a record after it comes in.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 15, 2018

Fixed all of it.

2018-07-15T23:50:22.808+0200 [DEBUG] Gaia: starting updating of pipelines...
2018-07-15T23:50:22.808+0200 [DEBUG] Gaia: selected branch : : refs/heads/master=<unknown>
2018-07-15T23:50:22.808+0200 [DEBUG] Gaia: selected branch : : refs/heads/master=<unknown>
2018-07-15T23:50:23.370+0200 [ERROR] Gaia: error while doing a pull request : : already up-to-date=<unknown>
2018-07-15T23:50:23.370+0200 [ERROR] Gaia: error while doing a pull request : : already up-to-date=<unknown>

After push:

2018-07-15T23:53:33.541+0200 [DEBUG] Gaia: starting updating of pipelines...
2018-07-15T23:53:33.541+0200 [DEBUG] Gaia: checking pipeline: : test2=<unknown>
2018-07-15T23:53:33.541+0200 [DEBUG] Gaia: selected branch : : refs/heads/master=<unknown>
2018-07-15T23:53:33.541+0200 [DEBUG] Gaia: checking pipeline: : test1=<unknown>
2018-07-15T23:53:33.541+0200 [DEBUG] Gaia: selected branch : : refs/heads/master=<unknown>
2018-07-15T23:53:34.751+0200 [DEBUG] Gaia: updating pipeline: : test1=<unknown>
2018-07-15T23:53:34.845+0200 [DEBUG] Gaia: updating pipeline: : test2=<unknown>
2018-07-15T23:53:36.056+0200 [DEBUG] Gaia: successfully updated: : test1=<unknown>
2018-07-15T23:53:36.102+0200 [DEBUG] Gaia: successfully updated: : test2=<unknown>

It also works with multiple pipelines and builds.

After pushing new code the pipeline changed and was visible. This time the history was retained. Multiple pipelines did not overwrote each other and history run contained the right pipeline run.

@michelvocks So... I had to update the interface to include a save, because this way it's cleaner to save the pipeline and I could actually throw out some code. :) I hope you don't mind... These facilities will be needed once webhooks are also implemented.

@Skarlso
Copy link
Member Author

Skarlso commented Jul 16, 2018

@rmb938 I have to split out the webhook into a different PR. This is getting rather huge. But the facilities will be there for the webhook story.

@michelvocks Let's focus on cleaning this up and writing some tests around it.

All in favour? Say aye.

@michelvocks
Copy link
Member

Wow! That's awesome! 🤗
This improves Gaia a lot!

@Skarlso
Copy link
Member Author

Skarlso commented Jul 19, 2018

Hundred pipelines test:

❯ GAIA_RUN_HUNDRED_PIPELINE_TEST=1 go test -run=TestUpdateAllPipelinesHundredPipelines
PASS
ok  	github.com/gaia-pipeline/gaia/pipeline	7.729s

Racer enabled:

❯ go test --race
2018-07-19T23:20:54.547+0200 [DEBUG] Gaia: cannot build pipeline: error="exit status 2" output="# _/Users/hannibal/golang/src/github.com/gaia-pipeline/gaia/pipeline/tmp
./main.go:4:13: syntax error: unexpected newline, expecting comma or )
"
2018-07-19T23:20:54.548+0200 [DEBUG] Gaia: cannot get dependencies: error="context deadline exceeded" output=
2018-07-19T23:20:54.549+0200 [DEBUG] Gaia: cannot find go executeable: error="exec: "go": executable file not found in $PATH"
PASS
ok  	github.com/gaia-pipeline/gaia/pipeline	6.980s

The output error messages are expected. All seems to be okay.

Also, releasing the semaphore with a proper defer now.

@Skarlso Skarlso changed the title [WIP] Automatically update pipelines every minute. Automatically update pipelines every minute. Jul 19, 2018
@Skarlso
Copy link
Member Author

Skarlso commented Jul 20, 2018

@michelvocks @rmb938 This should be ready for review now. :) Thanks guys!

Copy link
Member

@michelvocks michelvocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks already really good! 🤗

cmd/gaia/main.go Outdated
@@ -43,7 +43,8 @@ func init() {
flag.StringVar(&gaia.Cfg.JwtPrivateKeyPath, "jwtPrivateKeyPath", "", "A RSA private key used to sign JWT tokens")
flag.BoolVar(&gaia.Cfg.DevMode, "dev", false, "If true, gaia will be started in development mode. Don't use this in production!")
flag.BoolVar(&gaia.Cfg.VersionSwitch, "version", false, "If true, will print the version and immediately exit")

flag.BoolVar(&gaia.Cfg.Poll, "poll", false, "Instead of using a Webhook, keep polling git for changes on pipelines")
flag.IntVar(&gaia.Cfg.PVal, "pval", 1, "The interval in minutes in which to poll Github for changes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not limited to Github, right? 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely!

@@ -116,21 +135,6 @@ func checkActivePipelines() {
continue
}

// We couldn't find the pipeline. Create a new one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when someone locally builds a pipeline and puts it into the pipelines folder?
I know, many people will use the "create pipeline" functionality but I would still like to support that use-case. 🤗

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh! That's something I didn't think about. :D Thanks! I'll add that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Now I understand Gaia a bit better. :) Thanks for that.

@Skarlso Skarlso force-pushed the automatically_update_pipelines branch from 92b5c0c to 94e26d6 Compare July 20, 2018 08:07
@Skarlso Skarlso force-pushed the automatically_update_pipelines branch from 94e26d6 to 242997d Compare July 20, 2018 08:09
@Skarlso Skarlso force-pushed the automatically_update_pipelines branch from 3b7f70d to dec4407 Compare July 20, 2018 08:13
@Skarlso
Copy link
Member Author

Skarlso commented Jul 20, 2018

Huh. How did I just break that. :O

@Skarlso
Copy link
Member Author

Skarlso commented Jul 20, 2018

Created #45 for the flaky scheduler test.

Copy link
Member

@michelvocks michelvocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤗 Thanks a lot @Skarlso ❤️

@michelvocks michelvocks merged commit 7ad299e into gaia-pipeline:master Jul 20, 2018
@Skarlso
Copy link
Member Author

Skarlso commented Jul 20, 2018

Wooot \o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants