-
Notifications
You must be signed in to change notification settings - Fork 252
chore: fix shading problems and slightly update CI #615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: fix shading problems and slightly update CI #615
Conversation
- Refactor `scala-publish.yml` to clarify release and snapshot publishing conditions. - Adjust `docs.yml` trigger to specifically include the `main` branch. - Remove unused Sonatype import from `build.sbt`. - Enhance developer metadata and maintainers list in `build.sbt`. - Update dependencies and assembly configuration to address shading and exclude non-connect classes for the Uber JAR. - Introduce custom POM post-processing for correct dependency scope adjustments.
@Kimahriman sorry for tagging, but can I ask you to take a look? You helped a lot in realizing of all the problems this PR aims to fix. Thanks in advance! |
Can you describe exactly what you're trying to achieve with the assembly/shading? |
@Kimahriman Only one thing: renaming Last time you found a problem that I run
|
Ok that's what I thought just wanted to make sure. What was your experience with sbt-shading? Another option to simplify a lot of things would be only support the connect stuff in Spark 4, that's what delta is doing with their connect support. It seems like there should be a simpler way to do this but don't know what it would be |
I tried
Actually I spent some time today in attempts to realize how Based on what I found on different resources, it looks like most of projects uses |
Ok think I figured out the graphframes shading issue. If you print out the classpath for jars to exclude (and use the assembly classpath):
you get
The root project is included directly via classes instead of a jar, so you can't exclude the jar. But you can set
on the root project and it will use the jar instead in the connect project which lets you exclude it in assembly
So that should fix/remove the needed for the custom POM handling |
Actually it's still not perfect because the sbt-protoc plugin adds protobuf-java as a compile dependency, so you end up with that in addition to the shaded jars in the graphframes connect package: After running
|
After running <dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.18</version>
</dependency>
<dependency>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-spark3_2.12</artifactId>
<version>0.9.0-SNAPSHOT</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.24.4</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.12</artifactId>
<version>3.5.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.5.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.12</artifactId>
<version>3.5.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.16</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.12</artifactId>
<version>3.0.8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.zafarkhaja</groupId>
<artifactId>java-semver</artifactId>
<version>0.10.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-connect_2.12</artifactId>
<version>3.5.5</version>
<scope>provided</scope>
</dependency>
</dependencies> I think that by the
|
Tbh I cannot get why it put slf4j as a
|
Yeah not sure why that is. Turns out Delta's use of assembly isn't actually right either. They use it to shade jackson, but the Kernel modules that do the shading still have it as a runtime dependency as well:
I found this old post that I think describes the correct way to use the assembly plugin to shade, basically building an intermediate project just for the assembly jar, and then a separate project to actually publish with the right dependencies. I think I have it working if you want me to try to make a PR to compare |
Added a post-processing to mark protobuf scope to "provided" because it is a part of Apache Spark itself.
Meanwhile, I fixed the scope of the <dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.18</version>
</dependency>
<dependency>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-spark3_2.12</artifactId>
<version>0.9.0-SNAPSHOT</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.24.4</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.12</artifactId>
<version>3.5.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.5.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.12</artifactId>
<version>3.5.5</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.16</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.12</artifactId>
<version>3.0.8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.zafarkhaja</groupId>
<artifactId>java-semver</artifactId>
<version>0.10.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-connect_2.12</artifactId>
<version>3.5.5</version>
<scope>provided</scope>
</dependency>
</dependencies> |
…g-plugin # Conflicts: # build.sbt
I always forgot that GF is uses master as a default branch...
I will merge this to enable SNAPSHOTs publishing. The root problem was fixed in #617 |
What changes were proposed in this pull request?
scala-publish.yml
to clarify release and snapshot publishing conditions.docs.yml
trigger to specifically include themain
branch.build.sbt
.build.sbt
.Why are the changes needed?
Close #614