Skip to content

Conversation

c-herrewijn
Copy link
Member

@c-herrewijn c-herrewijn commented Apr 15, 2025

The PR add functionality to the document generation script added in PR #5017

functionalities added to script: generate_sql_function_docs.py

  • by default docs are generated in directory 'docs/preview'
  • url conversions distinguish between intra-page links, links to other doc pages, and external links
  • support for multiple function blocks per doc page. (e.g. file char.md has a section with regular functions, and a section for text_similarity
  • support for multipel categories per function block. (e.g. the first function block in char.md contains both regular string functions and regex functions.
  • support for multiple examples for function. (e.g. function array_slice has multiple examples)
  • basic support for multi line example results. (e.g. regexp_split_to_table)
  • all function aliases are added as individual entries (e.g. string_split_regex, string_split_regex, etc.)
  • all function signatures are added as individual entries (e.g. ltrim(string) and ltrim(string, characters)
  • support for variadic functions (... is appended to the arguments, e.g. for least, greatest etc. )

usage examples

The function table will be generated or updated between the section start and end lines.
All data (e.g. parameter names, descriptions, examples) comes from the output of duckdb_functions(). Any deviations (exclusion, additions or overrides), need to be hardcoded in the script generate_sql_function_docs.py via variables OVERRIDES and EXCLUDES.
The first time, the section start and end lines need to be added manually so the script can determine where to generate the funcion table. In this PR this has been done for char.md and blob.md.
The function categories that need can be set at the end of the start line, e.g.: categories: [blob]

File blob.md

---
layout: docu
title: Blob Functions
---

<!-- markdownlint-disable MD001 -->

This section describes functions and operators for examining and manipulating [`BLOB` values]({% link docs/preview/sql/data_types/blob.md %}).

<!-- Start of section generated by scripts/generate_sql_function_docs.py; categories: [blob] -->
<!-- End of section generated by scripts/generate_sql_function_docs.py -->

File char.md has 2 function tables that will be populated.

---
layout: docu
title: Text Functions
---

<!-- markdownlint-disable MD001 -->

## Text Functions and Operators

This section describes functions and operators for examining and manipulating [`STRING` values]({% link docs/preview/sql/data_types/text.md %}).

<!-- Start of section generated by scripts/generate_sql_function_docs.py; categories: [string, regex] -->
<!-- End of section generated by scripts/generate_sql_function_docs.py -->

## Text Similarity Functions

These functions are used to measure the similarity of two strings using various [similarity measures](https://en.wikipedia.org/wiki/Similarity_measure).

<!-- Start of section generated by scripts/generate_sql_function_docs.py; categories: [text_similarity] -->
<!-- End of section generated by scripts/generate_sql_function_docs.py -->

## Formatters

### `fmt` Syntax

...

NOTES

  • The addition of macros and table functions is hardcoded in the script, since this information cannot be retrieved via duckdb_functions, at the moment.
  • The examples, parameter names, and function descriptions have also been updated in the catalog (repo duckdb/duckdb). This is a separate PR. The function tables in this PR are generated based on these updated function defintions.
  • I largely updated the catalog to be equal to the current function descriptions in the docs. However, quite some deviation is still present, mainly for the following reasons:
    • the catalog has separate entries per function overload, with matching descriptions. This is different from the current description of 'optional' arguments, e.g. regexp_extract_all(string, regex[, group = 0])
    • sometimes duckdb_functions() does NOT deviate between function overloads, in the sense that they are listed only once with parameter_type = ANY. As a consequence the descriptions are more general. E.g. for concat the description now starts with Concatenates multiple strings, lists, or blobs since it needs to make sense on all these different doc pages.
    • The script is more rigid, in the sense that the function descriptions are always the same in the header table and the details table.
    • All overrides and all aliases are listed as separately, this is currently not always the case in the docs, which means the number of entries in the function tables increased quite a bit. For example regexp_split_to_array exists with 3 different aliases, and 2 different signatures, so it is listed 3*2=6 times
    • If (for these reasons) the info from duckdb_functions() is not suitable for the docs they can be adjusted via an hardcoded entry in OVERRIDES
    • I fixed some inconsistent formatting in the catalog, which also adds to the diff of this PR

Related PR: duckdb/duckdb#17132

@szarnyasg
Copy link
Collaborator

@c-herrewijn Thanks for the PR and the detailed explanation!

Users quite often send improvements to function documentations. Can you please update the CONTRIBUTING.md file (maybe by lifting some text from your post above) with information for people wishing to improve function docs?

@szarnyasg
Copy link
Collaborator

szarnyasg commented Apr 15, 2025

The CI failure seems to be caused by the .md extension missing in the {% link %} tags.

@szarnyasg szarnyasg merged commit 4b29685 into duckdb:main Apr 16, 2025
4 checks passed
@c-herrewijn c-herrewijn deleted the update-text-functions branch April 22, 2025 08:58
Mytherin added a commit to duckdb/duckdb that referenced this pull request May 22, 2025
- updated a large number of function descriptions and examples, so they
can serve as input for documentation generation
- function examples in the catalog now match the function alias name 
- fixed a small bug: `\001` and `\002` are now used as separators in the
function headers, instead of `\1` and `\2`. (3 octal digits prevent
interfernce from consequtive numerical chars)

related PR: duckdb/duckdb-web#5238 and
duckdb/duckdb-web#5396
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants