-
Notifications
You must be signed in to change notification settings - Fork 252
chore: Cleanup assembly and shading #617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Cleanup assembly and shading #617
Conversation
// Assembly settings | ||
assembly / test := {}, // No tests in assembly | ||
assemblyPackageScala / assembleArtifact := false, | ||
assembly / assemblyMergeStrategy := { | ||
case PathList("META-INF", xs @ _*) => MergeStrategy.discard | ||
case x if x.endsWith("module-info.class") => MergeStrategy.discard | ||
case x => | ||
val oldStrategy = (assembly / assemblyMergeStrategy).value | ||
oldStrategy(x) | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this because I don't think there's any need to run assembly on the root project? Unless you want to keep the ability to manual build a fat JAR
POM from <?xml version='1.0' encoding='UTF-8'?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0">
<modelVersion>4.0.0</modelVersion>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-connect-spark4_2.13</artifactId>
<packaging>jar</packaging>
<description>graphframes-connect</description>
<url>https://graphframes.io/</url>
<version>0.9.0-SNAPSHOT</version>
<licenses>
<license>
<name>Apache-2.0</name>
<url>https://opensource.org/licenses/Apache-2.0</url>
<distribution>repo</distribution>
</license>
</licenses>
<name>graphframes-connect</name>
<organization>
<name>org.graphframes</name>
<url>https://graphframes.io/</url>
</organization>
<scm>
<url>https://github.com/graphframes/graphframes</url>
<connection>scm:git@github.com:graphframes/graphframes.git</connection>
</scm>
<developers>
<developer>
<id>rjurney</id>
<name>Russell Jurney</name>
<url>https://github.com/rjurney</url>
<email>russell.jurney@gmail.com</email>
</developer>
<developer>
<id>SemyonSinchenko</id>
<name>Sem</name>
<url>https://github.com/SemyonSinchenko</url>
<email>ssinchenko@apache.org</email>
</developer>
</developers>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.13.12</version>
</dependency>
<dependency>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-spark4_2.13</artifactId>
<version>0.9.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.16</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.13</artifactId>
<version>3.0.8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.zafarkhaja</groupId>
<artifactId>java-semver</artifactId>
<version>0.10.2</version>
<scope>test</scope>
</dependency>
</dependencies>
</project> |
|
Oh yeah was able to get that working by just excluding all JARs. It's annoying Spark shades this, as normally you could just directly use |
|
Actually figured out how to simplify even more, don't need the extra project |
<?xml version='1.0' encoding='UTF-8'?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0">
<modelVersion>4.0.0</modelVersion>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-connect-spark4_2.13</artifactId>
<packaging>jar</packaging>
<description>graphframes-connect</description>
<url>https://graphframes.io/</url>
<version>0.9.0-SNAPSHOT</version>
<licenses>
<license>
<name>Apache-2.0</name>
<url>https://opensource.org/licenses/Apache-2.0</url>
<distribution>repo</distribution>
</license>
</licenses>
<name>graphframes-connect</name>
<organization>
<name>org.graphframes</name>
<url>https://graphframes.io/</url>
</organization>
<scm>
<url>https://github.com/graphframes/graphframes</url>
<connection>scm:git@github.com:graphframes/graphframes.git</connection>
</scm>
<developers>
<developer>
<id>rjurney</id>
<name>Russell Jurney</name>
<url>https://github.com/rjurney</url>
<email>russell.jurney@gmail.com</email>
</developer>
<developer>
<id>SemyonSinchenko</id>
<name>Sem</name>
<url>https://github.com/SemyonSinchenko</url>
<email>ssinchenko@apache.org</email>
</developer>
</developers>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.13.12</version>
</dependency>
<dependency>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-spark4_2.13</artifactId>
<version>0.9.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.16</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.13</artifactId>
<version>3.0.8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.zafarkhaja</groupId>
<artifactId>java-semver</artifactId>
<version>0.10.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-connect_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! Thanks a lot @Kimahriman !!!
* **Update Scala CI workflows and build configurations** - Refactor `scala-publish.yml` to clarify release and snapshot publishing conditions. - Adjust `docs.yml` trigger to specifically include the `main` branch. - Remove unused Sonatype import from `build.sbt`. - Enhance developer metadata and maintainers list in `build.sbt`. - Update dependencies and assembly configuration to address shading and exclude non-connect classes for the Uber JAR. - Introduce custom POM post-processing for correct dependency scope adjustments. * Add missing developer email * Specify the scope for protobuf-java Added a post-processing to mark protobuf scope to "provided" because it is a part of Apache Spark itself. * Take everything from #617 * main -> master I always forgot that GF is uses master as a default branch...
What changes were proposed in this pull request?
Resolves #614
Since the sbt-assembly plugin is meant for creating fat/uber JARs, it doesn't do anything about modifying POMs for published libraries to take into account the things that are shaded. So this creates an intermediate project for the connect shading, and then a final project with the correct dependencies and shaded JAR for actual publishing.
Why are the changes needed?
Fix connect artifact so only protobuf is shaded.