-
Notifications
You must be signed in to change notification settings - Fork 244
Added tests for plugin package. Implemented job results feature. #101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added tests for plugin package. Implemented job results feature. #101
Conversation
Codecov Report
@@ Coverage Diff @@
## master #101 +/- ##
==========================================
- Coverage 64.46% 64.23% -0.24%
==========================================
Files 22 24 +2
Lines 1911 2044 +133
==========================================
+ Hits 1232 1313 +81
- Misses 541 580 +39
- Partials 138 151 +13
Continue to review full report at Codecov.
|
Ping @Skarlso |
How is this a coverage decrease? :D |
Oh! Job results feature. :P |
\o/ for plugin tests. |
@Skarlso Yeah it seems like when you have no tests at all then it doesn't count at all. 🤷♂️ |
Hhhaha. Going to pick this up for a manual later today and merge it when everything is okay. :) |
Also, you misspelled |
} | ||
|
||
// Connect to plugin(pipeline) | ||
if err := pS.Validate(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message that follows this, is no longer valid. It didn't fail because it couldn't connect. It failed because the validation failed whatever that is in the plugin?
scheduler/scheduler.go
Outdated
// Connect to plugin(pipeline) | ||
if err := pS.Connect(c, nil); err != nil { | ||
if err := pS.Validate(); err != nil { | ||
gaia.Cfg.Logger.Debug("cannot connect to pipeline", "error", err.Error(), "pipeline", p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also isn't valid any longer since this is a validation error now.
@Skarlso PHAL 🤗 |
@michelvocks What's PHAL? :D Either one of these?
|
Please have a look 😆 |
LGTM! :) |
Dude! A failing pipeline just killed Gaia: 2018-09-06T06:37:49.007+0200 [DEBUG] plugin.failing_golang: 2018/09/06 06:37:49 CreateDeployment has been finished!
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x4 pc=0x1501680]
goroutine 1309 [running]:
github.com/gaia-pipeline/gaia/plugin.(*Plugin).Execute(0xc42063c6c0, 0xc42055c150, 0xc4204aad20, 0xc42055c150)
/Users/hannibal/golang/src/github.com/gaia-pipeline/gaia/plugin/plugin.go:184 +0x290
github.com/gaia-pipeline/gaia/scheduler.executeJob(0x69192d8a, 0xc4200e4400, 0x12, 0xc4205de900, 0x3e, 0xc420287820, 0x1, 0x4, 0xc4200e44a0, 0x15, ...)
/Users/hannibal/golang/src/github.com/gaia-pipeline/gaia/scheduler/scheduler.go:334 +0x151
created by github.com/gaia-pipeline/gaia/scheduler.(*Scheduler).executeScheduler
/Users/hannibal/golang/src/github.com/gaia-pipeline/gaia/scheduler/scheduler.go:581 +0xb29
exit status 2
make: *** [dev] Error 1 The repo I used: https://github.com/Skarlso/go-example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing pipeline job resulted in Gaia crashing.
Thanks @Skarlso . I also found some other issues which I need to fix before we can merge that 🤦♂️ |
@Skarlso please have a look again. When a job throws |
4257072
to
c63d0bc
Compare
LGTM now my man. Good to merge at your leisure @michelvocks. |
@Skarlso We cannot add set |
@michelvocks wait. I don't get it. Why shouldn't the pipeline fail if there is an error? It shouldn't pass. If there is an error, make it fail. I don't get that reasoning? :) |
Oh like, for example, if you want a cleanup at the end to be executed no matter if a job failed, right? Yeah, in that case we don't want to mark the pipeline as failed. :/ Sorry. :) |
@Skarlso The main idea was to safely exit a pipeline. So for example: You got a pipeline with three jobs which should be executed one after the other. Anyway, let me rework that. 😄 |
But wait. I just realised what you are writing. But in my case it wasn't safely exiting. I returned an error in the job. Isn't that supposed to fail the pipeline? Or cleanup still needs to run |
So, I still don't understand. I found the bug, which was a timing issue with closing finish in case of a job execution failure. But this will still exit the pipeline and make it fail, because the job failed. Also, I found that the forever running loops are pretty fragile. We need to add timeouts into those, or else we get into the trouble of an unfinished stuck pipeline which you can only kill if you kill gaia. That is unacceptable. I'll add some timeout selects here and there to fail the loop if after a certain time there is nothing happening. Also we might want to add user interaction like cancelling the whole thing mid run. Sample run result of a job failing stopping the pipeline and cleanup not running any longer. But also the status of the pipeline will be failed. Is this what you want? :D |
BTW also interesting one is that I was printing out the status from the workflow after it's supposedly replaced the items, but every time on every loop it was the same output: 2018-09-19T21:57:59.149+0200 [DEBUG] plugin.def-failing_golang: 2018/09/19 21:57:59 MigrateDB has been finished!
job: DB Migration success
job Create DB User is in state waiting for execution
job DB Migration is in state waiting for execution
job Create K8S Namespace is in state waiting for execution
job Create K8S Deployment is in state waiting for execution
job Create K8S Service is in state waiting for execution
job Create K8S Ingress is in state waiting for execution
job Clean up is in state waiting for execution
looping forever
looping forever Waiting for execution. Isn't that supposed to be updating? This is Part: for wl := range mw.Iter() {
fmt.Printf("job %s is in state %s\n", wl.job.Title, wl.job.Status)
if !wl.done {
allWLDone = false
}
} Where this part should have these updated to // Job is done
wl := mw.GetByID(j.ID)
wl.done = true
mw.Replace(*wl) Oh this is workload status pfff, not Job status.... Silly me. Still those guys should still be waiting for execution even after they finished? :) |
Damn it. :( No, I didn't fix it. :( It made a tad bit better, but it's still hanging in an infinite loop without any channels being notified of anything. |
Alright, to summarise, I know it's a timing issue with the channels and I know that it's happening in |
I found the problem. Fixing it.. |
@michelvocks So the problem is the following... :) In here Which is fine, if a job failed, we don't want to continue execution since that would only lead to more failures if the next job depended on the previous one doing something. However, then you go, if there are parallel jobs still running, we want them to finish and not abort, if I understand correctly. Which is again, reasonable and fine. However... Cleanup, which the next job in line, is also in started state according to the workload which only has two states, started and done. This hangs forever, because the channel is closed and the job won't run and it's just waiting for things to happen but no channel is ever called again thus this guy: // wait until finished
<-notFinishedWL.finishedSig blocks forever. what I will do is introduce another state called inProgress, so that we know that a job is currently running so we'll wait for it to finish. But if it's just started, we will ignore it and not run it since the pipeline failed. EDIT: OR! I just simply add |
Yiss. Fixed it. Interesting point is that executeJob was sending one last triggerSave call which failed because we close it before hand so that's a send on closed channel. I put the close in defer, that way it should be after that, but still prevent the select from triggering again. |
Nope... damn it.
still fails. |
Ideally... the writer should close the channel. But since multiple routines are using it, I'm opting that this particular part of the process handling just simply leaves this channel alone. There is no performance drop in leaving a channel open, and it will be recycled anyways. |
Hey @Skarlso . I've cured my illness and now try to get back on the track. I just tried to follow all your comments but I'm a bit lost. Give me some time to get into the code again 🤗 |
@Skarlso I found the problem. Like you said, when a job fails in a pipeline I have to wait until parallel jobs are finished.
You are absolutely right but maybe we do it in a separate (following) PR? What do you think? 🤗 |
Ah by the way. Here is a pipeline which returns a specific error. This error tells gaia that the job failed and the pipeline should be aborted but not marked as failed: https://github.com/michelvocks/gaia-test-pipeline |
@michelvocks Cool stuff dude. Yeah do the timeouts in a separate PR definitely. :) |
I would also try to cleanup the channels later on. :) |
Fixes #100 and also #2