-
Notifications
You must be signed in to change notification settings - Fork 252
Closed
Labels
Description
Describe the bug
Trying to use the new Python package requires all Connect related dependencies are installed even if you are not using Spark Connect.
To Reproduce
Steps to reproduce the behavior:
% pip install pyspark graphframes-py
% pyspark
Python 3.11.13 (main, Jun 3 2025, 18:38:25) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/07/17 13:57:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 4.0.0
/_/
Using Python version 3.11.13 (main, Jun 3 2025 18:38:25)
Spark context available as 'sc' (master = local[*], app id = local-1752775048865).
SparkSession available as 'spark'.
>>> from graphframes import GraphFrame
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../test-venv/lib/python3.11/site-packages/graphframes/__init__.py", line 1, in <module>
from .graphframe import GraphFrame
File ".../test-venv/lib/python3.11/site-packages/graphframes/graphframe.py", line 39, in <module>
from graphframes.connect.graphframe_client import GraphFrameConnect
File ".../test-venv/lib/python3.11/site-packages/graphframes/connect/graphframe_client.py", line 4, in <module>
from pyspark.sql.connect import proto
File ".../test-venv/lib/python3.11/site-packages/pyspark/sql/connect/proto/__init__.py", line 18, in <module>
from pyspark.sql.connect.proto.base_pb2_grpc import *
File ".../test-venv/lib/python3.11/site-packages/pyspark/sql/connect/proto/base_pb2_grpc.py", line 19, in <module>
import grpc
ModuleNotFoundError: No module named 'grpc'
Expected behavior
System [please complete the following information]:
- OS: Mac
- Python Version (if applied): Python 3.11
- Spark / PySpark version: 4.0.0
- GraphFrames version: 0.9.0
Component
- Scala Core Internal
- Scala API
- Spark Connect Plugin
- PySpark Classic
- PySpark Connect
Additional context
Are you planning on creating a PR?
- I'm willing to make a pull-request