Releases: kedro-org/kedro
1.0.0
Major features and improvements
Data Catalog
- The previously experimental
KedroDataCatalog
has been renamed toDataCatalog
and is now the default catalog implementation. - It retains the dict-like interface, supports lazy dataset initialisation, and delivers improved performance.
- While this change is seamless for users following standard Kedro workflows, it introduces a richer API for programmatic use:
- New pipeline-aware commands, available via both the CLI and interactive environments.
- Simplified handling of dataset factories.
- Centralised pattern resolution via the
CatalogConfigResolver
property. - Ability to serialise the catalog to configuration and reconstruct it from it.
Read more in the Kedro documentation.
Namespaces
- Added support for running multiple namespaces within a single session with
--namespaces
CLI option andnamespaces
argument inKedroSession.run()
method. - Improved namespace validation efficiency to prevent significant slowdowns when creating large pipelines.
- Added stricter validation to dataset names in the
Node
class, ensuring.
characters are reserved to be used as part of a namespace. - Added a
prefix_datasets_with_namespace
argument to thePipeline
class which allows users to turn on or off the prefixing of the namespace to the node inputs, outputs, and parameters. - Changed pipeline filtering for namespace to return exact namespace matches instead of partial matches.
Other features and improvements
- Changed the default node name to be formed of the function name used in the node suffixed by a secure hash (SHA-256) based on the function, inputs, and outputs, ensuring uniqueness and improved readability.
- Added an option to select which multiprocessing start method is going to be used on
ParallelRunner
via theKEDRO_MP_CONTEXT
environment variable. - Added
--only-missing-outputs
CLI flag tokedro run
. This flag skips nodes when all their persistent outputs exist. - Updated
kedro registry describe
to return the node name property instead of creating its own name for the node. - Removed
pre-commit-hooks
dependency for new project creation.
Breaking changes to the API
CLI
kedro catalog create
command has been removed.kedro catalog list
,kedro catalog rank
, andkedro catalog resolve
commands have been replaced withkedro catalog describe-datasets
,kedro catalog list-patterns
andkedro catalog resolve-patterns
commands, respectively.- The
kedro run
option--namespace
has been removed and replaced with--namespaces
. - The
kedro micropkg
CLI command has been removed as part of the micro-packaging feature deprecation.
API
- Private methods
_is_project
and_find_kedro_project
are changed tois_kedro_project
andfind_kedro_project
. - Renamed instances of
extra_params
and_extra_params
toruntime_params
. - Removed the
modular_pipeline
module and moved functionality to thepipeline
module instead. - Renamed
ModularPipelineError
toPipelineError
. Pipeline.grouped_nodes_by_namespace()
was replaced withgroup_nodes_by(group_by)
, which supports multiple strategies and returns a list ofGroupedNodes
, improving type safety and consistency for deployment plugin integrations.- Renamed
session_id
parameter torun_id
in all runner methods and hooks to improve API clarity and prepare for future multi-run session support. - Removed the following
DataCatalog
methods:_get_dataset()
,add_all()
,add_feed_dict()
,list()
, andshallow_copy()
. - Changed the output of
runner.run()
andsession.run()
— it now always returns all pipeline outputs, regardless of catalog configuration. - Removed the
AbstractRunner.run_only_missing()
method, an older and underused API for partial runs. Please use--only-missing-outputs
CLI instead.
Documentation changes
- Revamped the look and feel of the Kedro documentation, including a new theme and improved navigation with
mkdocs
as the documentation engine. - Updated the
DataCatalog
documentation with improved structure and detailed description of new features. Read the DataCatalog documentation here.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
Migration guide from Kedro 0.19.* to 1.*
See the migration guide for 1.0.0 in the Kedro documentation.
1.0.0rc3
Major features and improvements
Changed DataCatalog.__getitem__
to raise DatasetNotFoundError
for missing datasets, aligning with expected dictionary behavior.
Bug fixes and other changes
Breaking changes to the API
Upcoming deprecations for Kedro 1.0.0
Documentation changes
Community contributions
1.0.0rc2
Major features and improvements
- Added
--only-missing-outputs
CLI flag tokedro run
. This flag skips nodes when all their persistent outputs exist. - Removed the
AbstractRunner.run_only_missing()
method, an older and underused API for partial runs. Please use--only-missing-outputs
CLI instead.
Bug fixes and other changes
- Improved namespace validation efficiency to prevent significant slowdowns when creating large pipelines
Breaking changes to the API
Upcoming deprecations for Kedro 1.0.0
Documentation changes
Community contributions
1.0.0rc1
Major features and improvements
- Added stricter validation to dataset names in the
Node
class, ensuring.
characters are reserved to be used as part of a namespace. - Added a
prefix_datasets_with_namespace
argument to thePipeline
class which allows users to turn on or off the prefixing of the namespace to the node inputs, outputs, and parameters. - Changed the default node name to be formed of the function name used in the node suffixed by a secure hash (SHA-256) based on the function, inputs, and outputs, ensuring uniqueness and improved readability.
- Added an option to select which multiprocessing start method is going to be used on
ParallelRunner
via theKEDRO_MP_CONTEXT
environment variable.
Bug fixes and other changes
- Changed pipeline filtering for namespace to return exact namespace matches instead of partial matches.
- Added support for running multiple namespaces within a single session.
- Updated
kedro registry describe
to return the node name property instead of creating its own name for the node.
Documentation changes
- Updated the
DataCatalog
documentation with improved structure and detailed description of new features.
Community contributions
Breaking changes to the API
- Private methods
_is_project
and_find_kedro_project
are changed tois_kedro_project
andfind_kedro_project
. - Renamed instances of
extra_params
and_extra_params
toruntime_params
. - Removed the
modular_pipeline
module and moved functionality to thepipeline
module instead. - Renamed
ModularPipelineError
toPipelineError
. Pipeline.grouped_nodes_by_namespace()
was replaced withgroup_nodes_by(group_by)
, which supports multiple strategies and returns a list ofGroupedNodes
, improving type safety and consistency for deployment plugin integrations.- The micro-packaging feature and the corresponding
micropkg
CLI command have been removed. - Renamed
session_id
parameter torun_id
in all runner methods and hooks to improve API clarity and prepare for future multi-run session support. - Removed the following
DataCatalog
methods:_get_dataset()
,add_all()
,add_feed_dict()
,list()
, andshallow_copy()
. - Removed the CLI command
kedro catalog create
. - Changed the output of
runner.run()
— it now always returns all pipeline outputs, regardless of catalog configuration.
Migration guide from Kedro 0.19.* to 1.*
See the migration guide for 1.0.0 in the Kedro documentation.
0.19.14
Major features and improvements
- Added execution time to pipeline completion log.
Bug fixes and other changes
- Fixed a recursion error in custom datasets when
_describe()
accessedself.__dict__
.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.13
Major features and improvements
- Unified
pipeline()
andPipeline
into a single module (kedro.pipeline
), aligning with thenode()
/Node
design pattern and improving namespace handling.
Bug fixes and other changes
- Fixed bug where project creation workflow would use the
main
branch version ofkedro-starters
instead of the respective release version. - Fixed namespacing for
confirms
during pipeline creation to supportIncrementalDataset
. - Fixed bug where
OmegaConf
cause an error during config resolution with runtime parameters. - Cached
inputs
inNode
when created from dictionary for better performance. - Enabled pluggy tracing only when logging level is set to
DEBUG
to speed up the execution of project runs.
Upcoming deprecations for Kedro 1.0.0
- Added a deprecation warning for catalog CLI commands. The following commands will be replaced with their alternatives -
kedro catalog rank
,kedro catalog list
,kedro catalog resolve
and thekedro catalog create
command will be removed. - Added a deprecation warning for
KedroDataCatalog
that will replaceDataCatalog
while adopting the originalDataCatalog
name. - Add deprecation warning for
--namespace
option forkedro run
. It will be replaced with--namespaces
option which will allow for running multiple namespaces together. - The
modular_pipeline
module is deprecated and will be removed in Kedro 1.0.0. Use thepipeline
module instead.
Note: On March 20th, a security vulnerability, CVE-2024-12215, was identified in Kedro. This issue stems from the deprecated micropackaging functionality, which is scheduled for removal in the upcoming Kedro 1.0 release. While we agree with the CVE assigned, this vulnerability only poses a risk if you pull a malicious micropackage from an untrusted source. If you're concerned, we recommend avoiding the micropackaging feature for now and upgrading to Kedro 1.0 once it's released.
Documentation changes
- Updated Dask deployment docs.
- Added non-jupyter environment integration page (for example Marimo) with dynamic Kedro session loading.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.12
Major features and improvements
- Added
KedroDataCatalog.filter()
to filter datasets by name and type. - Added
Pipeline.grouped_nodes_by_namespace
property which returns a dictionary of nodes grouped by namespace, intended to be used by plugins to facilitate deployment of namespaced nodes together. - Added support for cloud storage protocols in
--conf-source
, allowing configuration to be loaded from remote locations such as S3.
Bug fixes and other changes
- Added
DataCatalog
deprecation warning. - Updated
_LazyDataset
representation when printingKedroDataCatalog
. - Fixed
MemoryDataset
to inferassign
copy mode for Ibis Tables, which previously would be inferred asdeepcopy
. - Fixed pipeline packaging issue by ensuring
pipelines/__init__.py
exists when creating new pipelines. - Changed the execution of
SequentialRunner
to not use an executor pool to ensure it's single threaded. - Fixed
%load_node
magic command to work with Jupyter Notebook>=7.2.0
. - Remove
7: Kedro Viz
from Kedro tools. - Updated node grouping API to only group on first level of namespace.
Documentation changes
- Added documentation for Kedro's support for Delta Lake versioning.
- Added documentation for Kedro's support for Iceberg versioning.
- Added documentation for Kedro's nodes grouping in deployment.
- Fixed a minor grammatical error in Kedro-Viz installation instructions to improve documentation clarity.
- Improved the Kedro VSCode extension documentation.
- Updated the recommendations for nesting namespaces.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.11
Major features and improvements
- Implemented
KedroDataCatalog.to_config()
method that converts the catalog instance into a configuration format suitable for serialization. - Improve OmegaConfigLoader performance.
- Replaced
trufflehog
withdetect-secrets
for detecting secrets within a code base. - Added support for
%load_ext kedro
.
Bug fixes and other changes
- Added validation to ensure dataset versions consistency across catalog.
- Fixed a bug in project creation when using a custom starter template offline.
- Added
node
import to the pipeline template. - Update error message when executing kedro run without pipeline.
- Safeguard hooks when user incorrectly registers a hook class in settings.py.
- Fixed parsing paths with query and fragment.
- Remove lowercase transformation in regex validation.
- Moved
kedro-catalog
JSON schema tokedro-datasets
. - Updated
Partitioned dataset lazy saving
docs page. - Fixed
KedroDataCatalog
mutation after pipeline run. - Made
KedroDataCatalog._datasets
compatible withDataCatalog._datasets
.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.10
Major features and improvements
- Add official support for Python 3.13.
- Implemented dict-like interface for
KedroDataCatalog
. - Implemented lazy dataset initializing for
KedroDataCatalog
. - Project dependencies on both the default template and on starter templates are now explicitly declared on the
pyproject.toml
file, allowing Kedro projects to work with project management tools likeuv
,pdm
, andrye
.
Note: KedroDataCatalog
is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already. Let us know if you have any feedback about the KedroDataCatalog
or ideas for new features.
Bug fixes and other changes
- Added I/O support for Oracle Cloud Infrastructure (OCI) Object Storage filesystem.
- Fixed
DatasetAlreadyExistsError
forThreadRunner
when Kedro project run and using runner separately.
Breaking changes to the API
Documentation changes
- Added Databricks Asset Bundles deployment guide.
- Added a new minimal Kedro project creation guide.
- Added example to explain how dataset factories work.
- Updated CLI autocompletion docs with new Click syntax.
- Standardised
.parquet
suffix in docs and tests.
Community contributions
Many thanks to the following Kedroids for contributing PRs to this release:
0.19.9
Major features and improvements
- Dropped Python 3.8 support.
- Implemented
KedroDataCatalog
repeatingDataCatalog
functionality with a few API enhancements:- Removed
_FrozenDatasets
and access datasets as properties; - Added get dataset by name feature;
add_feed_dict()
was simplified to only add raw data;- Datasets' initialisation was moved out from
from_config()
method to the constructor.
- Removed
- Moved development requirements from
requirements.txt
to the dedicated section inpyproject.toml
for project template. - Implemented
Protocol
abstraction for the currentDataCatalog
and adding new catalog implementations. - Refactored
kedro run
andkedro catalog
commands. - Moved pattern resolution logic from
DataCatalog
to a separate component -CatalogConfigResolver
. UpdatedDataCatalog
to useCatalogConfigResolver
internally. - Made packaged Kedro projects return
session.run()
output to be used when running it in the interactive environment. - Enhanced
OmegaConfigLoader
configuration validation to detect duplicate keys at all parameter levels, ensuring comprehensive nested key checking.
Note: KedroDataCatalog
is an experimental feature and is under active development. Therefore, it is possible we'll introduce breaking changes to this class, so be mindful of that if you decide to use it already. Let us know if you have any feedback about the KedroDataCatalog
or ideas for new features.
Bug fixes and other changes
- Fixed bug where using dataset factories breaks with
ThreadRunner
. - Fixed a bug where
SharedMemoryDataset.exists
would not call the underlyingMemoryDataset
. - Fixed template projects example tests.
- Made credentials loading consistent between
KedroContext._get_catalog()
andresolve_patterns
so that both use_get_config_credentials()
Breaking changes to the API
- Removed
ShelveStore
to address a security vulnerability.
Documentation changes
- Fix logo on PyPI page.
- Minor language/styling updates.