Skip to content
This repository was archived by the owner on Sep 8, 2021. It is now read-only.
This repository was archived by the owner on Sep 8, 2021. It is now read-only.

Path to search in a specific case is incorrect #1139

@tesshucom

Description

@tesshucom

This issue has spun off from #1130.
The purpose is to observe and consider the current specifications rather than the defect report.

  • How to handle folder path is somewhat gray for current search specifications.
  • Lucene upgrades have some impact on search accuracy.

Problem description

This issue reports an example of a shuffle search false search.
In addition, it also mentions how folder paths are used when searching.

System information

  • Airsonic version: 10.3.1-RELEASE – May 21, 2019
  • Operating system: Apache Tomcat/8.5.40(embeded), Windows 10
  • Java version: , java 1.8.0_201
  • Proxy server: None
  • Client: Google Chrome 75.0.3770.90 (Official Build)(64bit)

Steps to reproduce

Create a MusicFolder with the following name.

image

Set.

image

It looks like it can be read properly when scanned.

image

Select Home> Random.

image

Let's select Music1.
Somehow all DIR is displayed.

image

Music2 is empty.

image

Music3 is empty.

image

The reason for some unexpected behavior is as follows.


The Airsonic Lucene search uses SpanOr instead of the exact path specification by Or.
SpanOr is similar to a phrase, divided into words and evaluated in order.

The directory names are interpreted by Analyzer as follows:

  • accessible -> accessible (as it is)
  • accessible︴s -> accessible and s
  • accessible's -> accessible (discard after comma)

In Lucene's SpanOr, in the example above, everything seems to mean equivalent to accessible.
This is not the case when doing an exact match comparison with Or.


As mentioned above, path comparison is not Or&exact-match, and SpanOr is used.
Such false searches can occur with certain character patterns.

There are other places where the current search logic has a bit problems.

When the input string is ABC DEF,
in Airsonic's Lucene implementation, it is converted to the following formula.

+((artist:abc* folder:abc*) (artist:def* folder:def*)) +spanOr([folder: <PATH> ])

The word delimiter path is also evaluated in the main query of the logical expression.

  • It may be useful if the file management rules for music management are closely related to the music information. Otherwise it may reduce the search accuracy.
  • It strongly depends on the analysis result of the tokenizer. It affects how Tokenizer breaks input into words.
  • Updating Lucene changes the method of StandardTokenizer to UAX29. Word breaks more frequently than at present. If the query body contains Folder's Path, search accuracy may be affected even in English.(Because the word is divided more finely, the result of scoring changes.)

Additional notes

There are several solutions.

  • This problem has been neglected even with the latest version of Subsonic, (at least when searching from the web screen).
  • Folder is also included in the main query(Change to phrase search..?).
  • If search accuracy is an issue in the future, there is room for ingenuity.

If there is a consensus among those who control the specification ... for example, it is also possible to create modified PR of query or index (for comparison test).
It is hard to make a lot😊

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions