Skip to content

Conversation

mariusandra
Copy link
Collaborator

@mariusandra mariusandra commented Feb 27, 2023

Problem

The word "symbol" means something else in language parsers. Think of Ruby, which has Symbol as a separate class along String (symbols get stored in a global hash table, strings don't), or ECMAScript's Symbol object.

Changes

Renames all symbol-s to ref-s to see if that makes anything clearer.

Remember, in our case, a "Symbol" is a data structure, which has the following role. In the query select a.timestamp as b from events a, a "symbol" assigned to each part of the query will refer back to the value's source. So in this case "b" will link to "a.timestamp", which will link to the database field of that name, in a linked list of "symbols" or "refs" in this case.

There's also an alternative proposal out there to rename symbols to "pointers".

How did you test this code?

Didn't.

@alessiostalla
Copy link

As noted in the other PR, "ref" or "reference" is a common name for this sort of thing. A "symbol" instead usually refers to a unique name in a certain scope (or namespace, package, or similar; terminology varies across languages). So, I think this should be merged, because it brings more common terminology and doesn't introduce any other changes.

However, looking at the broader picture, some of the uses of references in this context seem awkward, in that the direction of the relationship between nodes and references is sometimes inverted or mixed. An AST is predominantly a tree in that there's a privileged containment relationship between nodes (often referred to as "parent" and "child" nodes). However, often an AST is actually a graph because nodes can have references to other nodes besides the parent-child relationship. For example, in a programming language, each use of a variable could have a reference to its declaration. Or, in SQL, each column could have a reference to its definition. Confusingly enough, the process of establishing these references is commonly called symbol resolution, because typically it involves resolving the node each symbol (name) refers to.

Anyway, the point is that, usually, references are employed to connect nodes that are otherwise unrelated in terms of the parent-child relationship. Here, though, we have cases such as SelectQueryRef, whose name suggests that it's a reference to a select query, and should therefore have a field of type SelectQuery; instead, it's the other way round – SelectQuery has a field of type SelectQueryRef, that holds some of the data that makes up the select query. I would have imagined that a SelectQueryRef would be the object that an alias resolves to; for example, in the query

select a.x from (select ...) a

the node for a.x could have a field called table_reference that at some point will have been resolved to a SelectQueryRef (or SubqueryRef, for example, if one were to stress the subquery concept). Similarly, that node could have a field called column_reference that points to column x in the subquery. Or, with a different design, the two references could be coalesced into a single ColumnInTableRef.

It's not like the current design is wrong, but certainly, it can be puzzling.

Base automatically changed from person-table-joins to master February 28, 2023 07:09
@mariusandra
Copy link
Collaborator Author

Hey @alessiostalla , thank you for the review and the explanation. We'll get this PR merged to get the house a bit more in order, and figure out the path forward from there.

When working on lazy/virtual tables and field traversal, I did too find the current system with "symbols" somewhat confusing or "upside down". I'll likely embark on a refactoring project once I'm done with a few of the current loose ends, and probably ping you again for some thoughts then. :)

Thanks again for taking a look!

Copy link
Contributor

@thmsobrmlr thmsobrmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@mariusandra mariusandra enabled auto-merge (squash) March 7, 2023 14:04
@mariusandra mariusandra merged commit e3908fe into master Mar 7, 2023
@mariusandra mariusandra deleted the rename-symbols-to-refs branch March 7, 2023 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants