Skip to content

Conversation

vadasambar
Copy link
Contributor

@vadasambar vadasambar commented Sep 3, 2019

details

  • uses secret to store keys and certificates

Fix for #1293


This change is Reviewable

Suraj Banakar added 2 commits September 3, 2019 17:25
details
----------
- uses secret to store keys and certificates
details
----------
- this will avoid confusion if the user assumes that the correct certificate files are already present
- disable kafka (it is disabled by default)
@life1347 life1347 added this to the 1.6 milestone Sep 4, 2019
Copy link
Member

@life1347 life1347 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, 4 unresolved discussions (waiting on @vadasambar)


charts/fission-all/auth/kafka/ca.crt, line 1 at r1 (raw file):

-----BEGIN CERTIFICATE-----

I think we can ask users to provide their own crt & key before installation like what we do in

{{- if .Files.Get "config/fluentbit.conf" }}

so that we don't need to push crt & key to repo.
The helm installation should be stopped if a user doesn't provide necessary files.


charts/fission-all/templates/deployment.yaml, line 684 at r1 (raw file):

          value: {{ .Values.debugEnv | quote }}
        - name: MESSAGE_QUEUE_SECRETS
          value: mqtrigger-kafka-secrets

It's better to mount secrets as a volume instead of passing it as environment variable.


pkg/mqtrigger/mqtrigger.go, line 52 at r1 (raw file):

	var secrets *v1.Secret
	if mqSecretName != "" {
		secrets, _ = kubeClient.CoreV1().Secrets(getCurrentNamespace()).Get(mqSecretName, metav1.GetOptions{})

Mount secrets as volume and read crt/key from the it instead of calling kubernetes API service.


pkg/mqtrigger/messageQueue/kafka.go, line 154 at r1 (raw file):

	cert, err := tls.X509KeyPair(kafka.certificates["userCert"], kafka.certificates["userKey"])
	if err != nil {
		panic(err)

Return the error and let the caller handles the error instead of doing panic here.

@life1347
Copy link
Member

life1347 commented Sep 4, 2019

Also, please check the error log in CI build: https://travis-ci.org/fission/fission/builds/580226550#L1172

Copy link
Contributor Author

@vadasambar vadasambar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do this.

Reviewable status: 0 of 8 files reviewed, 4 unresolved discussions (waiting on @life1347 and @vadasambar)


charts/fission-all/auth/kafka/ca.crt, line 1 at r1 (raw file):
The cert files I have added are samples/examples. We can remove them if we want.

The helm installation should be stopped if a user doesn't provide necessary files.

I will do this. Thank you :)


charts/fission-all/templates/deployment.yaml, line 684 at r1 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

It's better to mount secrets as a volume instead of passing it as environment variable.

Some advantages I can think of are:

  1. I can access it in the pod itself. So no need for an API call to the apiserver(a roundtrip which would probably include validations, decoding etc.,).
  2. No need for an environment variable. Environment variable can add another point of failure. i.e., if the environment variable is tampered with, kafka might not work anymore because it might not be able to find the secret.

Please let me know if there's anything that I have not considered.

If I am understanding this correctly, (1) would be a one-time thing since the secret is accessed only during initialization of mqtrigger pod.
As for (2), I agree. I think it is better not to pollute the environment variable pool (we might sacrifice some simplicity here because I would no longer be able to link some other secret instead of the one created by default, unless I edit the default secret).

I think overall, volume approach sounds better because it has less moving parts. I just wanted to know if I was understanding this correctly.


pkg/mqtrigger/messageQueue/kafka.go, line 154 at r1 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

Return the error and let the caller handles the error instead of doing panic here.

Makes sense. I will fix this. Thank you!

Copy link
Contributor Author

@vadasambar vadasambar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, 4 unresolved discussions (waiting on @life1347 and @vadasambar)


charts/fission-all/templates/deployment.yaml, line 684 at r1 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

Some advantages I can think of are:

  1. I can access it in the pod itself. So no need for an API call to the apiserver(a roundtrip which would probably include validations, decoding etc.,).
  2. No need for an environment variable. Environment variable can add another point of failure. i.e., if the environment variable is tampered with, kafka might not work anymore because it might not be able to find the secret.

Please let me know if there's anything that I have not considered.

If I am understanding this correctly, (1) would be a one-time thing since the secret is accessed only during initialization of mqtrigger pod.
As for (2), I agree. I think it is better not to pollute the environment variable pool (we might sacrifice some simplicity here because I would no longer be able to link some other secret instead of the one created by default, unless I edit the default secret).

I think overall, volume approach sounds better because it has less moving parts. I just wanted to know if I was understanding this correctly.

Another advantage I did not add.
3. Volume will be available across pod restarts.

Also, we would have to write the code to load all the secrets from the directory ourselves (this would be handled by kubernetes if we do api call).

Copy link
Member

@life1347 life1347 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 8 files reviewed, 4 unresolved discussions (waiting on @life1347 and @vadasambar)


charts/fission-all/auth/kafka/ca.crt, line 1 at r1 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

The cert files I have added are samples/examples. We can remove them if we want.

The helm installation should be stopped if a user doesn't provide necessary files.

I will do this. Thank you :)

We can add steps to fission doc and charts to teach users how to generate their own.


charts/fission-all/templates/deployment.yaml, line 684 at r1 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

Another advantage I did not add.
3. Volume will be available across pod restarts.

Also, we would have to write the code to load all the secrets from the directory ourselves (this would be handled by kubernetes if we do api call).

The points you list are correct. And here are some extra benefits I can think of

  1. Lower barrier
    • User can easily change the volume part of the YAML file with their own secret without knowing how fission works internally.
  2. Conventional secret management:
    • Services like Vault are able to rotate the secret periocally for security concerns. If we call API directly, we have to change it again for such cases.
  3. Don't reinvent the wheel if we already have a good one.
    • Normally, we don't call API server directly unless we have no other convenient ways to get resource information like deployment, pod information.

Suraj Banakar added 3 commits September 5, 2019 14:08
details/others
---------------------
- remove environment variable for kafka secrets. It's easier to tamper with
- read secrets from volume instead of making an apiserver call
- read secrets to a []byte map
- fix build failing because of mutex copy error (tlsConfig)
other
--------
- remove default path for TLS kafka keys
Copy link
Contributor Author

@vadasambar vadasambar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @life1347)


charts/fission-all/auth/kafka/ca.crt, line 1 at r1 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

We can add steps to fission doc and charts to teach users how to generate their own.

👍 I have removed the sample files.


charts/fission-all/templates/deployment.yaml, line 684 at r1 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

The points you list are correct. And here are some extra benefits I can think of

  1. Lower barrier
    • User can easily change the volume part of the YAML file with their own secret without knowing how fission works internally.
  2. Conventional secret management:
    • Services like Vault are able to rotate the secret periocally for security concerns. If we call API directly, we have to change it again for such cases.
  3. Don't reinvent the wheel again if we already have a good one.
    • Normally, we don't call API server directly unless we have no other convenient ways to get resource information like deployment, pod information.

👍 I have updated the code with volumes approach


pkg/mqtrigger/mqtrigger.go, line 52 at r1 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

Mount secrets as volume and read crt/key from the it instead of calling kubernetes API service.

Done

Copy link
Contributor Author

@vadasambar vadasambar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 4 unresolved discussions (waiting on @life1347)


pkg/mqtrigger/messageQueue/kafka.go, line 154 at r1 (raw file):

Previously, vadasambar (Suraj Banakar) wrote…

Makes sense. I will fix this. Thank you!

Done.

- name: kafka-secrets
mountPath: /etc/secrets
{{- end}}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final YAML file generated by helm will contain blank lines if they exist in the template, so please remove unnecessary blank lines here and other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me fix that. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# userCert: 'auth/kafka/user.crt'
# userKey: 'auth/kafka/user.key'


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: one blank line here should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me fix that. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you missed it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}
messageQueue.MakeMessageQueueTriggerManager(logger, fissionClient, routerUrl, mqCfg)
return nil
}

func getCurrentNamespace() string {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove this function since we don't need it anymore.

Copy link
Contributor Author

@vadasambar vadasambar Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove this. Thanks for pointing it out :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

func readSecrets(logger *zap.Logger) (map[string][]byte, error) {
secretsPath := "/etc/secrets"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard coding the secretsPath, I prefer to read it from the environment variable.
So that user can change the path later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. Will make the change. Thank you :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

secretFiles, err := ioutil.ReadDir(secretsPath)

if err != nil {
return secrets, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

secret, fileReadErr := ioutil.ReadFile(filePath)

if fileReadErr != nil {
return secrets, fileReadErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return nil, fileReadErr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

value: "true"
volumeMounts:
- name: kafka-secrets
mountPath: /etc/secrets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/etc/fission/secrets would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


func readSecrets(logger *zap.Logger) (map[string][]byte, error) {
secretsPath := "/etc/secrets"
secrets := make(map[string][]byte)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving this line just right before the place where it's first called, like

secrets := make(map[string][]byte)
for _, secretFile := range secretFiles {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Contributor Author

@vadasambar vadasambar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 5 files reviewed, 17 unresolved discussions (waiting on @life1347 and @vadasambar)


pkg/mqtrigger/mqtrigger.go, line 29 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

You may need to do it manually. The order of imports follow the rule

built-in pkg

3rd pkg

fission pkg

Thank you 👍


pkg/mqtrigger/mqtrigger.go, line 49 at r2 (raw file):

Previously, life1347 (Ta-Ching Chen) wrote…

Oops, sorry, I forgot to remove this comment last night. You're right for this.

No problem. :)

other
--------
- return error instead of panic'ing
- inject secrets path as an environment variable
@@ -680,7 +680,50 @@ spec:
value: {{ .Values.traceSamplingRate | default "0.5" | quote }}
- name: DEBUG_ENV
value: {{ .Values.debugEnv | quote }}
# TLS authentication is TLS with authentication (2 way)
# More info: https://docs.confluent.io/current/kafka/authentication_ssl.html#ssl-overview
{{- if .Values.kafka.authentication.tls.enabled}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Leave empty before }} L685, L693, L695, L700

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Thank you for noticing it. Will fix it. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# userCert: 'auth/kafka/user.crt'
# userKey: 'auth/kafka/user.key'


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you missed it.


// return if no secrets exist
if _, err := os.Stat(secretsPath); os.IsNotExist(err) {
return nil, nil
Copy link
Member

@life1347 life1347 Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here should return an error since we are not able to find the secrets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to use readSecrets as a generic function (i.e., use them for other messaging queues as well), it would make more sense to log that the secretsPath was not found and let it return nil.

There are two cases which are of importance here:

  1. When no secrets are present and they are not required. i.e., there is no need for secrets e.g. connecting to a non TLS secured Kafka cluster would not need secrets.
  2. When no secrets are present but they should be present (secrets are required). This would be the case when the Kafka cluster is to be connected using TLS but the secrets were not mounted properly.

Returning nil should handle the first scenario just fine because there will be no error and throwing an error is not really required here. Problem arises in the second case. If the secrets cannot be found at the path but they should be present there, an error should be raised so that we know what went wrong. Returning nil here would be just ignoring this case and it would make it difficult to debug the problem.

I believe returning error is what will handle the second case. But returning error when secrets will not be required (connecting to non TLS Kafka cluster) and distinguishing this harmless thing with an actual error would be difficult.

One way to fix this can be, we log the information (maybe as a warning) so that we know that the secrets were not found and hope that the messaging queue would throw an error which together with the warning would help debug the problem. We can word the warning to convey the intensity of the problem. I am not very happy with this fix but this is something which can solve the problem for now.

What do you think?

Copy link
Member

@life1347 life1347 Oct 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get it. If the secrets are not necessary, why calling this function?

It's the caller's responsibility to decide whether it needs to call the function, instead of calling a function that will silently ignore the error and pretend nothing happen. A generic function should always return the error instead of hiding it (unless the function itself can handle it) and let the caller to decide what to do to handle the error.

In this case, I think we can change the function signature

// caller checks whether it need to read secrets
secretsPath := strings.TrimSpace(os.Getenv("MESSAGE_QUEUE_SECRETS"))
if secretsPath > 0 {
    secrets, err := readSecrets(secretsPath, logger) (map[string][]byte, error)
}


// change the function signature to
func readSecrets(path string, logger *zap.Logger) (map[string][]byte, error) {

    // return if no secrets exist
	if _, err := os.Stat(path); os.IsNotExist(err) {
        return nil, err
    }
....

It brings a couple of benefits,

  1. A more generic function that can accept any path instead of reading the path from the environment variable.
  2. It's easier to write unit tests for the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. 👍

consumer, err := cluster.NewConsumer(kafka.brokers, string(trigger.Metadata.UID), []string{trigger.Spec.Topic}, consumerConfig)
kafka.logger.Info("created a new consumer", zap.Any("consumer", consumer))
if err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Will fix this. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

tlsConfig, err := kafka.getTLSConfig()

if err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. Will fix this. Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

other
--------
- add spaces before trailing }}
@life1347
Copy link
Member

@vadasambar any updates for this?

@vadasambar
Copy link
Contributor Author

@vadasambar any updates for this?

Hi, sorry I was a little busy this week. I will check this as soon as I get some time. Thanks for the reminder 👍

@vadasambar
Copy link
Contributor Author

@life1347 , I did not get a reply on the review comment. I thought I would just bring it to notice because it is buried in other review comments.
Link to review comment: #1300 (comment)

I think checking for TLS_ENABLED makes sense but I was thinking of readSecrets as a generic way to read any kind of secrets for a message queue (these secrets can be unrelated to TLS). What do you think?

@life1347
Copy link
Member

life1347 commented Oct 2, 2019

sorry.. i missed that comment. that make sense to me.

@vadasambar
Copy link
Contributor Author

I think I have made all the changes requested from my side. If there's anything left, please let me know. :)

@life1347
Copy link
Member

life1347 commented Oct 3, 2019

I will review the PR again before weekend, before that please fix the code conflicts. Thanks!

@life1347
Copy link
Member

life1347 commented Oct 4, 2019

There are still some unresolved comments, please fix them.

@vadasambar
Copy link
Contributor Author

Sorry, I thought I had fixed them. I have 3 pending changes. Thank you for pointing them out. I think I will fix them and then rebase/merge my branch with master.

Copy link
Member

@life1347 life1347 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, after resolving the conflict with master branch. Then we are good to go.

@codecov
Copy link

codecov bot commented Oct 9, 2019

Codecov Report

Merging #1300 into master will decrease coverage by 10.11%.
The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #1300       +/-   ##
===========================================
- Coverage    27.5%   17.39%   -10.12%     
===========================================
  Files          70       50       -20     
  Lines        7348     5819     -1529     
===========================================
- Hits         2021     1012     -1009     
+ Misses       5093     4733      -360     
+ Partials      234       74      -160
Impacted Files Coverage Δ
pkg/mqtrigger/messageQueue/messageQueue.go 0% <ø> (ø) ⬆️
pkg/mqtrigger/messageQueue/kafka.go 0% <0%> (ø) ⬆️
pkg/crd/crd.go 0% <0%> (-93.84%) ⬇️
pkg/crd/function.go 0% <0%> (-86.89%) ⬇️
pkg/crd/environment.go 0% <0%> (-86.89%) ⬇️
pkg/crd/httptrigger.go 0% <0%> (-86.89%) ⬇️
pkg/crd/kubernetesWatchTrigger.go 0% <0%> (-86.89%) ⬇️
pkg/crd/client.go 0% <0%> (-66.67%) ⬇️
pkg/controller/mqTriggerApi.go
pkg/controller/packageApi.go
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d358a29...2614c0f. Read the comment docs.

@life1347
Copy link
Member

life1347 commented Oct 9, 2019

@vadasambar thanks for all your effort! Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants