Skip to content

Releases: kedro-org/kedro

1.0.0

22 Jul 15:32
f18cc35
Compare
Choose a tag to compare

Major features and improvements

Data Catalog

  • The previously experimental KedroDataCatalog has been renamed to DataCatalog and is now the default catalog implementation.
  • It retains the dict-like interface, supports lazy dataset initialisation, and delivers improved performance.
  • While this change is seamless for users following standard Kedro workflows, it introduces a richer API for programmatic use:
    • New pipeline-aware commands, available via both the CLI and interactive environments.
    • Simplified handling of dataset factories.
    • Centralised pattern resolution via the CatalogConfigResolver property.
    • Ability to serialise the catalog to configuration and reconstruct it from it.

Read more in the Kedro documentation.

Namespaces

  • Added support for running multiple namespaces within a single session with --namespaces CLI option and namespaces argument in KedroSession.run() method.
  • Improved namespace validation efficiency to prevent significant slowdowns when creating large pipelines.
  • Added stricter validation to dataset names in the Node class, ensuring . characters are reserved to be used as part of a namespace.
  • Added a prefix_datasets_with_namespace argument to the Pipeline class which allows users to turn on or off the prefixing of the namespace to the node inputs, outputs, and parameters.
  • Changed pipeline filtering for namespace to return exact namespace matches instead of partial matches.

Other features and improvements

  • Changed the default node name to be formed of the function name used in the node suffixed by a secure hash (SHA-256) based on the function, inputs, and outputs, ensuring uniqueness and improved readability.
  • Added an option to select which multiprocessing start method is going to be used on ParallelRunner via the KEDRO_MP_CONTEXT environment variable.
  • Added --only-missing-outputs CLI flag to kedro run. This flag skips nodes when all their persistent outputs exist.
  • Updated kedro registry describe to return the node name property instead of creating its own name for the node.
  • Removed pre-commit-hooks dependency for new project creation.

Breaking changes to the API

CLI

  • kedro catalog create command has been removed.
  • kedro catalog list, kedro catalog rank, and kedro catalog resolve commands have been replaced with kedro catalog describe-datasets, kedro catalog list-patterns and kedro catalog resolve-patterns commands, respectively.
  • The kedro run option --namespace has been removed and replaced with --namespaces.
  • The kedro micropkg CLI command has been removed as part of the micro-packaging feature deprecation.

API

  • Private methods _is_project and _find_kedro_project are changed to is_kedro_project and find_kedro_project.
  • Renamed instances of extra_params and _extra_params to runtime_params.
  • Removed the modular_pipeline module and moved functionality to the pipeline module instead.
  • Renamed ModularPipelineError to PipelineError.
  • Pipeline.grouped_nodes_by_namespace() was replaced with group_nodes_by(group_by), which supports multiple strategies and returns a list of GroupedNodes, improving type safety and consistency for deployment plugin integrations.
  • Renamed session_id parameter to run_id in all runner methods and hooks to improve API clarity and prepare for future multi-run session support.
  • Removed the following DataCatalog methods: _get_dataset(), add_all(), add_feed_dict(), list(), and shallow_copy().
  • Changed the output of runner.run() and session.run() — it now always returns all pipeline outputs, regardless of catalog configuration.
  • Removed the AbstractRunner.run_only_missing() method, an older and underused API for partial runs. Please use --only-missing-outputs CLI instead.

Documentation changes

  • Revamped the look and feel of the Kedro documentation, including a new theme and improved navigation with mkdocs as the documentation engine.
  • Updated the DataCatalog documentation with improved structure and detailed description of new features. Read the DataCatalog documentation here.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

Migration guide from Kedro 0.19.* to 1.*

See the migration guide for 1.0.0 in the Kedro documentation.

1.0.0rc3

21 Jul 13:54
e82e026
Compare
Choose a tag to compare
1.0.0rc3 Pre-release
Pre-release

Major features and improvements

Changed DataCatalog.__getitem__ to raise DatasetNotFoundError for missing datasets, aligning with expected dictionary behavior.

Bug fixes and other changes

Breaking changes to the API

Upcoming deprecations for Kedro 1.0.0

Documentation changes

Community contributions

1.0.0rc2

18 Jul 22:31
c278651
Compare
Choose a tag to compare
1.0.0rc2 Pre-release
Pre-release

Major features and improvements

  • Added --only-missing-outputs CLI flag to kedro run. This flag skips nodes when all their persistent outputs exist.
  • Removed the AbstractRunner.run_only_missing() method, an older and underused API for partial runs. Please use --only-missing-outputs CLI instead.

Bug fixes and other changes

  • Improved namespace validation efficiency to prevent significant slowdowns when creating large pipelines

Breaking changes to the API

Upcoming deprecations for Kedro 1.0.0

Documentation changes

Community contributions

1.0.0rc1

20 Jun 13:33
edcddc4
Compare
Choose a tag to compare
1.0.0rc1 Pre-release
Pre-release

Major features and improvements

  • Added stricter validation to dataset names in the Node class, ensuring . characters are reserved to be used as part of a namespace.
  • Added a prefix_datasets_with_namespace argument to the Pipeline class which allows users to turn on or off the prefixing of the namespace to the node inputs, outputs, and parameters.
  • Changed the default node name to be formed of the function name used in the node suffixed by a secure hash (SHA-256) based on the function, inputs, and outputs, ensuring uniqueness and improved readability.
  • Added an option to select which multiprocessing start method is going to be used on ParallelRunner via the KEDRO_MP_CONTEXT environment variable.

Bug fixes and other changes

  • Changed pipeline filtering for namespace to return exact namespace matches instead of partial matches.
  • Added support for running multiple namespaces within a single session.
  • Updated kedro registry describe to return the node name property instead of creating its own name for the node.

Documentation changes

  • Updated the DataCatalog documentation with improved structure and detailed description of new features.

Community contributions

Breaking changes to the API

  • Private methods _is_project and _find_kedro_project are changed to is_kedro_project and find_kedro_project.
  • Renamed instances of extra_params and _extra_params to runtime_params.
  • Removed the modular_pipeline module and moved functionality to the pipeline module instead.
  • Renamed ModularPipelineError to PipelineError.
  • Pipeline.grouped_nodes_by_namespace() was replaced with group_nodes_by(group_by), which supports multiple strategies and returns a list of GroupedNodes, improving type safety and consistency for deployment plugin integrations.
  • The micro-packaging feature and the corresponding micropkg CLI command have been removed.
  • Renamed session_id parameter to run_id in all runner methods and hooks to improve API clarity and prepare for future multi-run session support.
  • Removed the following DataCatalog methods: _get_dataset(), add_all(), add_feed_dict(), list(), and shallow_copy().
  • Removed the CLI command kedro catalog create.
  • Changed the output of runner.run() — it now always returns all pipeline outputs, regardless of catalog configuration.

Migration guide from Kedro 0.19.* to 1.*

See the migration guide for 1.0.0 in the Kedro documentation.

0.19.14

17 Jun 10:01
0da4ec6
Compare
Choose a tag to compare

Major features and improvements

  • Added execution time to pipeline completion log.

Bug fixes and other changes

  • Fixed a recursion error in custom datasets when _describe() accessed self.__dict__.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

0.19.13

22 May 13:51
fa2c1e4
Compare
Choose a tag to compare

Major features and improvements

  • Unified pipeline() and Pipeline into a single module (kedro.pipeline), aligning with the node()/Node design pattern and improving namespace handling.

Bug fixes and other changes

  • Fixed bug where project creation workflow would use the main branch version of kedro-starters instead of the respective release version.
  • Fixed namespacing for confirms during pipeline creation to support IncrementalDataset.
  • Fixed bug where OmegaConfcause an error during config resolution with runtime parameters.
  • Cached inputs in Node when created from dictionary for better performance.
  • Enabled pluggy tracing only when logging level is set to DEBUG to speed up the execution of project runs.

Upcoming deprecations for Kedro 1.0.0

  • Added a deprecation warning for catalog CLI commands. The following commands will be replaced with their alternatives - kedro catalog rank, kedro catalog list, kedro catalog resolve and the kedro catalog create command will be removed.
  • Added a deprecation warning for KedroDataCatalog that will replace DataCatalog while adopting the original DataCatalog name.
  • Add deprecation warning for --namespace option for kedro run. It will be replaced with --namespaces option which will allow for running multiple namespaces together.
  • The modular_pipeline module is deprecated and will be removed in Kedro 1.0.0. Use the pipeline module instead.

Note: On March 20th, a security vulnerability, CVE-2024-12215, was identified in Kedro. This issue stems from the deprecated micropackaging functionality, which is scheduled for removal in the upcoming Kedro 1.0 release. While we agree with the CVE assigned, this vulnerability only poses a risk if you pull a malicious micropackage from an untrusted source. If you're concerned, we recommend avoiding the micropackaging feature for now and upgrading to Kedro 1.0 once it's released.

Documentation changes

  • Updated Dask deployment docs.
  • Added non-jupyter environment integration page (for example Marimo) with dynamic Kedro session loading.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

0.19.12

20 Mar 09:14
6696353
Compare
Choose a tag to compare

Major features and improvements

  • Added KedroDataCatalog.filter() to filter datasets by name and type.
  • Added Pipeline.grouped_nodes_by_namespace property which returns a dictionary of nodes grouped by namespace, intended to be used by plugins to facilitate deployment of namespaced nodes together.
  • Added support for cloud storage protocols in --conf-source, allowing configuration to be loaded from remote locations such as S3.

Bug fixes and other changes

  • Added DataCatalog deprecation warning.
  • Updated _LazyDataset representation when printing KedroDataCatalog.
  • Fixed MemoryDataset to infer assign copy mode for Ibis Tables, which previously would be inferred as deepcopy.
  • Fixed pipeline packaging issue by ensuring pipelines/__init__.py exists when creating new pipelines.
  • Changed the execution of SequentialRunner to not use an executor pool to ensure it's single threaded.
  • Fixed %load_node magic command to work with Jupyter Notebook >=7.2.0.
  • Remove 7: Kedro Viz from Kedro tools.
  • Updated node grouping API to only group on first level of namespace.

Documentation changes

  • Added documentation for Kedro's support for Delta Lake versioning.
  • Added documentation for Kedro's support for Iceberg versioning.
  • Added documentation for Kedro's nodes grouping in deployment.
  • Fixed a minor grammatical error in Kedro-Viz installation instructions to improve documentation clarity.
  • Improved the Kedro VSCode extension documentation.
  • Updated the recommendations for nesting namespaces.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

0.19.11

29 Jan 15:01
74c640a
Compare
Choose a tag to compare

Major features and improvements

  • Implemented KedroDataCatalog.to_config() method that converts the catalog instance into a configuration format suitable for serialization.
  • Improve OmegaConfigLoader performance.
  • Replaced trufflehog with detect-secrets for detecting secrets within a code base.
  • Added support for %load_ext kedro.

Bug fixes and other changes

  • Added validation to ensure dataset versions consistency across catalog.
  • Fixed a bug in project creation when using a custom starter template offline.
  • Added node import to the pipeline template.
  • Update error message when executing kedro run without pipeline.
  • Safeguard hooks when user incorrectly registers a hook class in settings.py.
  • Fixed parsing paths with query and fragment.
  • Remove lowercase transformation in regex validation.
  • Moved kedro-catalog JSON schema to kedro-datasets.
  • Updated Partitioned dataset lazy saving docs page.
  • Fixed KedroDataCatalog mutation after pipeline run.
  • Made KedroDataCatalog._datasets compatible with DataCatalog._datasets.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

0.19.10

26 Nov 17:44
e2c241b
Compare
Choose a tag to compare

Major features and improvements

  • Add official support for Python 3.13.
  • Implemented dict-like interface for KedroDataCatalog.
  • Implemented lazy dataset initializing for KedroDataCatalog.
  • Project dependencies on both the default template and on starter templates are now explicitly declared on the pyproject.toml file, allowing Kedro projects to work with project management tools like uv, pdm, and rye.

Note: KedroDataCatalog is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already. Let us know if you have any feedback about the KedroDataCatalog or ideas for new features.

Bug fixes and other changes

  • Added I/O support for Oracle Cloud Infrastructure (OCI) Object Storage filesystem.
  • Fixed DatasetAlreadyExistsError for ThreadRunner when Kedro project run and using runner separately.

Breaking changes to the API

Documentation changes

  • Added Databricks Asset Bundles deployment guide.
  • Added a new minimal Kedro project creation guide.
  • Added example to explain how dataset factories work.
  • Updated CLI autocompletion docs with new Click syntax.
  • Standardised .parquet suffix in docs and tests.

Community contributions

Many thanks to the following Kedroids for contributing PRs to this release:

0.19.9

10 Oct 19:13
91468e6
Compare
Choose a tag to compare

Major features and improvements

  • Dropped Python 3.8 support.
  • Implemented KedroDataCatalog repeating DataCatalog functionality with a few API enhancements:
    • Removed _FrozenDatasets and access datasets as properties;
    • Added get dataset by name feature;
    • add_feed_dict() was simplified to only add raw data;
    • Datasets' initialisation was moved out from from_config() method to the constructor.
  • Moved development requirements from requirements.txt to the dedicated section in pyproject.toml for project template.
  • Implemented Protocol abstraction for the current DataCatalog and adding new catalog implementations.
  • Refactored kedro run and kedro catalog commands.
  • Moved pattern resolution logic from DataCatalog to a separate component - CatalogConfigResolver. Updated DataCatalog to use CatalogConfigResolver internally.
  • Made packaged Kedro projects return session.run() output to be used when running it in the interactive environment.
  • Enhanced OmegaConfigLoader configuration validation to detect duplicate keys at all parameter levels, ensuring comprehensive nested key checking.

Note: KedroDataCatalog is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already. Let us know if you have any feedback about the KedroDataCatalog or ideas for new features.

Bug fixes and other changes

  • Fixed bug where using dataset factories breaks with ThreadRunner.
  • Fixed a bug where SharedMemoryDataset.exists would not call the underlying MemoryDataset.
  • Fixed template projects example tests.
  • Made credentials loading consistent between KedroContext._get_catalog() and resolve_patterns so that both use _get_config_credentials()

Breaking changes to the API

  • Removed ShelveStore to address a security vulnerability.

Documentation changes

  • Fix logo on PyPI page.
  • Minor language/styling updates.

Community contributions