Skip to content

Conversation

mfvitale
Copy link
Contributor

I have created a Dockerfile based on the debezium-connect image.
This will simply add all the necessary stuff to build Kafka from the trunk branch.

Just did some tests and seems that it is working.

[mvitale@mvitale connect-config]$ kcctl apply -f postgres-connector-reg.json 
Created connector inventory-connector
[mvitale@mvitale connect-config]$ curl -X GET -H "Accept:application/json" localhost:8083/connectors/inventory-connector/offsets
{"offsets":[{"partition":{"server":"dbserver1"},"offset":{"last_snapshot_record":true,"lsn":34487416,"txId":768,"ts_usec":1691678351957450,"snapshot":true}}]}
[mvitale@mvitale connect-config]$ curl -X DELETE -H "Accept:application/json" localhost:8083/connectors/inventory-connector/offsets
{"error_code":400,"message":"Connectors must be in the STOPPED state before their offsets can be modified. This can be done for the specified connector by issuing a 'PUT' request to the '/connectors/inventory-connector/stop' endpoint"}
[mvitale@mvitale connect-config]$ curl -X PUT -H "Accept:application/json" localhost:8083/connectors/inventory-connector/stop
[mvitale@mvitale connect-config]$ curl -X DELETE -H "Accept:application/json" localhost:8083/connectors/inventory-connector/offsets
{"message":"The Connect framework-managed offsets for this connector have been reset successfully. However, if this connector manages offsets externally, they will need to be manually reset in the system that the connector uses."}
[mvitale@mvitale connect-config]$ curl -X GET -H "Accept:application/json" localhost:8083/connectors/inventory-connector/offsets
{"offsets":[]}

Copy link
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mfvitale! I built the image locally and it looks good for the most part.

If we plan on using this image in our integration tests, we'll need a few extra tweaks:

  1. Debezium customizes the Log4j config file used by their Connect workers, and we have integration tests that rely on that. The config file is stored in $KAFKA_HOME/config/log4j.properties and right now it's getting overwritten when we build this image; can we preserve that file somehow?
  2. Debezium exposes a KAFKA_VERSION environment variable that contains which version of Kafka the image is built against. We also use this in our integration tests; can we update this build file and/or our integration tests to read the correct Kafka version from this image? AFAICT we can't overwrite the environment variable itself without hardcoding a value (which seems distasteful), but we might consider reading it from the Kafka repo's gradle.properties file either during the build or in our integration tests.

@mfvitale
Copy link
Contributor Author

  1. Debezium exposes a KAFKA_VERSION environment variable that contains which version of Kafka the image is built against. We also use this in our integration tests; can we update this build file and/or our integration tests to read the correct Kafka version from this image? AFAICT we can't overwrite the environment variable itself without hardcoding a value (which seems distasteful), but we might consider reading it from the Kafka repo's gradle.properties file either during the build or in our integration tests.

Seems there is no way to set an ENV with command output. I saved it in the $KAFKA_HOME/kafka_version file so it can be read during tests.

Copy link
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, sorry--it looks like license checking doesn't work on raw Dockerfiles (see mathieucarbou/license-maven-plugin#204).

Can we add an exclude for the Dockerfile? Should look something like this (in pom.xml):

<exclude>**/Dockerfile</exclude>

Also, just a thought--we may want to rename the Dockerfile or at least add some detail to the path for it, since people may reasonably see a file in docker/Dockerfile and think that that has to do with running kcctl instead of running Kafka Connect. Do you think it'd make sense to put it in a path like docker/connect/trunk/Dockerfile?

@mfvitale
Copy link
Contributor Author

Ugh, sorry--it looks like license checking doesn't work on raw Dockerfiles (see mathieucarbou/license-maven-plugin#204).

Can we add an exclude for the Dockerfile? Should look something like this (in pom.xml):

<exclude>**/Dockerfile</exclude>

Sure.

Also, just a thought--we may want to rename the Dockerfile or at least add some detail to the path for it, since people may reasonably see a file in docker/Dockerfile and think that that has to do with running kcctl instead of running Kafka Connect. Do you think it'd make sense to put it in a path like docker/connect/trunk/Dockerfile?

What do you think to put it in the test resource folder?

@C0urante
Copy link
Contributor

What do you think to put it in the test resource folder?

This is interesting... I think it depends on how we plan on using the Dockerfile in the future. If we need to access it using Java code at test runtime, then src/test/resources would be a great fit. If we access it manually before running integration tests with it locally, then it probably doesn't belong anywhere in the src directory. If we access it programmatically using, e.g., the dockerfile-maven-plugin, then that falls into a bit of a grey area, but IMO it's still better to leave out of the src directory.

Did you have thoughts on how to leverage this in the future? I have a draft commit ready that adds support for running integration tests against this image, but not for building it. If it's fast enough, we could add trunk to our CI matrix and start building the image as part of that.

@mfvitale
Copy link
Contributor Author

mfvitale commented Aug 21, 2023

Did you have thoughts on how to leverage this in the future?

I saw that there are test containers that uses the connect image from Debezium. Do you think we can use also this one here?

I have a draft commit ready that adds support for running integration tests against this image, but not for building it.

If you plan to test it using test containers maybe we can think to build it on the fly . WDYT?

If it's fast enough, we could add trunk to our CI matrix and start building the image as part of that.

It requires more or less 6 minutes.

@C0urante
Copy link
Contributor

I saw that there are test containers that uses the connect image from Debezium. Do you think we can use also this one here?

Yes, that's exactly what my draft commit does 😄

If you plan to test it using test containers maybe we can think to build it on the fly . WDYT?

Sounds great! Not sure how easy this would be though--we use the DebeziumContainer class for our testing, which subclasses GenericContainer and may make it more complicated to use the ImageFromDockerfile class from testcontainers. However, if we adopt this approach, it'd definitely make sense to place the Dockerfile in src/test/resources.

We don't have to block merging this PR on implementing build-on-the-fly logic; if you're confident that it can be done, we can merge this now (after moving the Dockerfile to a different directory) and tackle that later.

@mfvitale
Copy link
Contributor Author

mfvitale commented Aug 23, 2023

I saw that there are test containers that uses the connect image from Debezium. Do you think we can use also this one here?

Yes, that's exactly what my draft commit does 😄

If you plan to test it using test containers maybe we can think to build it on the fly . WDYT?

Sounds great! Not sure how easy this would be though--we use the DebeziumContainer class for our testing, which subclasses GenericContainer and may make it more complicated to use the ImageFromDockerfile class from testcontainers. However, if we adopt this approach, it'd definitely make sense to place the Dockerfile in src/test/resources.

We don't have to block merging this PR on implementing build-on-the-fly logic; if you're confident that it can be done, we can merge this now (after moving the Dockerfile to a different directory) and tackle that later.

I have found a way to build the image on the fly and use it as a DebeziumContainer. I did some tests and seems working, the first time it requires about 6 minutes to build the image but next tests execution will be more fast.

Not sure if the Dockerfile exclusion works after moving it on test resource folder. Let's see if CI pass.

Copy link
Contributor

@C0urante C0urante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mfvitale, this is awesome!

Aside from the nit, I have two more thoughts about this. I don't think either has to be a blocker, but wanted to get your input before merging:

  1. Since the image build time is okay, we can start doing tests against trunk on our CI builds by adding TRUNK to the connect_version list.
  2. If I want to test against trunk locally, the image will only be built once and then reused for subsequent test runs. This is great for cutting down test runtime, but may lead to using a stale image the next day, month, etc. We may want to add a cleaner way to force a rebuild of the image aside from manually deleting the debezium-connect:trunk (or debezium/connect:trunk) image.

Let me know what you think about these; happy to classify them both as follow-up items (and file issues for them) if you'd prefer not to address them in this PR.

Co-authored-by: Chris Egerton <fearthecellos@gmail.com>
@mfvitale
Copy link
Contributor Author

Thanks @mfvitale, this is awesome!

Thank you!

  1. Since the image build time is okay, we can start doing tests against trunk on our CI builds by adding TRUNK to the connect_version list.

I think it's reasonable.

  1. If I want to test against trunk locally, the image will only be built once and then reused for subsequent test runs. This is great for cutting down test runtime, but may lead to using a stale image the next day, month, etc. We may want to add a cleaner way to force a rebuild of the image aside from manually deleting the debezium-connect:trunk (or debezium/connect:trunk) image.

Good point. I think that there isn't a solution that will safely save you from having a stale image. Two solution comes to my mind that can mitigate the risk:

  • Put the deletion of the image by default and then add a property to force the reuse of the image (so the dev is conscious of what is doing).
  • Leave the image deletion active by default and then delete it on Maven clean using docker-maven-plugin

Let me know what you think about these; happy to classify them both as follow-up items (and file issues for them) if you'd prefer not to address them in this PR.

Yes, better to have them on separate issues.

@C0urante C0urante merged commit be19345 into kcctl:main Aug 29, 2023
@C0urante
Copy link
Contributor

Great, thanks for all the excellent work @mfvitale!

@C0urante
Copy link
Contributor

Filed #381 and #382 as follow-ups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants