Skip to content

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Mar 18, 2023

This PR implements the feature request in #6706

project on list of column types

This allows you to provide a list of types to project on instead of column names, selecting all of the columns of the relation that match any of the provided types.

The implementation of this is likely temporarily, as discussed in the linked request.
Aiming to replace this when we have support for this functionality in core.

DuckDBPyType

I initially thought it was overkill to add a python class to represent our LogicalType, but we have thought of another use for this, so I have opted to add it anyways.

Primitives (defined on duckdb.typing)

  • SQLNULL
  • BOOLEAN
  • TINYINT
  • UTINYINT
  • SMALLINT
  • USMALLINT
  • INTEGER
  • UINTEGER
  • BIGINT
  • UBIGINT
  • HUGEINT
  • UUID
  • FLOAT
  • DOUBLE
  • DATE
  • TIMESTAMP
  • TIMESTAMP_MS
  • TIMESTAMP_NS
  • TIMESTAMP_S
  • TIME
  • TIME_TZ
  • TIMESTAMP_TZ
  • VARCHAR
  • BLOB
  • BIT
  • INTERVAL

Creation methods

  • sqltype(type_str: str)
    alias: type, dtype
    create a type from parsing the type_str, this can also be used for user or extension defined types
  • string_type(collation: str = "")
    create VARCHAR type with optional collation
  • decimal_type(width: int, scale: int)
    create a DECIMAL type of the given width + scale
  • enum_type(name: str, type: DuckDBPyType, values: list)
    create en ENUM type from the 'values' list, cast as type as underlying values
  • array_type(type: DuckDBPyType)
    alias: list_type
    create a LIST type of the 'type' as child type
  • struct_type(fields: List[DuckDBPyType] | Dict[str, DuckDBPyType])
    alias: row_type
    create a STRUCT type of the given field types (uses default names if given as List)
  • map_type(key: DuckDBPyType, value: DuckDBPyType)
    create a MAP type out of the 'key' and 'value' types
    union_type(members: List[DuckDBPyType] | Dict[str, DuckDBPyType])
    create a UNION type of the given member types (similar to STRUCT)

Misc

Can be compared against strings
Converts implicitly from: str, builtins types (str, bool, float etc..), list, dict, {'name', type, ..} dictionary (to STRUCT), numpy builtin types int64, bool_, float32 etc.

@Tishj
Copy link
Contributor Author

Tishj commented Mar 18, 2023

The implementation of the EnumType creation method is left as an exercise for the reader ;)
No but in all seriousness, I'll probably remove it for now.

Copy link
Contributor Author

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some further ideas

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great. I think adding support for types is a great idea - some comments below:

@Tishj Tishj requested a review from Mytherin March 27, 2023 12:34
@Mytherin
Copy link
Collaborator

Could you have a look at fixing the merge conflicts?

@Tishj
Copy link
Contributor Author

Tishj commented Mar 30, 2023

The failure seems unrelated?
Could we merge this, or first rerun the failing test?

@Tishj
Copy link
Contributor Author

Tishj commented Apr 3, 2023

Pandas provides this method as select_dtypes, and it has the option for both an include and an exclude list

Do we maybe also want to add the exclude option and provide a select_dtypes alias for this method?

@Mytherin
Copy link
Collaborator

Mytherin commented Apr 3, 2023

I think providing both options is sensible. Perhaps we should just rename it to be fully compatible? Or have both select_types and select_dtypes (considering our types aren't named dtypes).

@Tishj
Copy link
Contributor Author

Tishj commented Apr 4, 2023

I think I'll add exclude later, I'm working on the scalar python udf currently which depends on this PR

@Mytherin Mytherin merged commit 236e580 into duckdb:master Apr 13, 2023
@Mytherin
Copy link
Collaborator

Thanks! LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants