-
Notifications
You must be signed in to change notification settings - Fork 238
Path to search in a specific case is incorrect #1139
Description
This issue has spun off from #1130.
The purpose is to observe and consider the current specifications rather than the defect report.
- How to handle folder path is somewhat gray for current search specifications.
- Lucene upgrades have some impact on search accuracy.
Problem description
This issue reports an example of a shuffle search false search.
In addition, it also mentions how folder paths are used when searching.
System information
- Airsonic version: 10.3.1-RELEASE – May 21, 2019
- Operating system: Apache Tomcat/8.5.40(embeded), Windows 10
- Java version: , java 1.8.0_201
- Proxy server: None
- Client: Google Chrome 75.0.3770.90 (Official Build)(64bit)
Steps to reproduce
Create a MusicFolder with the following name.
Set.
It looks like it can be read properly when scanned.
Select Home> Random
.
Let's select Music1.
Somehow all DIR is displayed.
Music2 is empty.
Music3 is empty.
The reason for some unexpected behavior is as follows.
The Airsonic Lucene search uses SpanOr
instead of the exact path specification by Or
.
SpanOr
is similar to a phrase, divided into words and evaluated in order.
The directory names are interpreted by Analyzer as follows:
accessible
->accessible
(as it is)accessible︴s
->accessible
ands
accessible's
->accessible
(discard after comma)
In Lucene's SpanOr
, in the example above, everything seems to mean equivalent to accessible
.
This is not the case when doing an exact match comparison with Or
.
As mentioned above, path comparison is not Or&exact-match
, and SpanOr
is used.
Such false searches can occur with certain character patterns.
There are other places where the current search logic has a bit problems.
When the input string is ABC DEF
,
in Airsonic's Lucene implementation, it is converted to the following formula.
+((artist:abc* folder:abc*) (artist:def* folder:def*)) +spanOr([folder: <PATH> ])
The word delimiter path is also evaluated in the main query of the logical expression.
- It may be useful if the file management rules for music management are closely related to the music information. Otherwise it may reduce the search accuracy.
- It strongly depends on the analysis result of the tokenizer. It affects how Tokenizer breaks input into words.
- Updating Lucene changes the method of StandardTokenizer to UAX29. Word breaks more frequently than at present. If the query body contains Folder's Path, search accuracy may be affected even in English.(Because the word is divided more finely, the result of scoring changes.)
Additional notes
There are several solutions.
- This problem has been neglected even with the latest version of Subsonic, (at least when searching from the web screen).
- Folder is also included in the main query(Change to phrase search..?).
- If search accuracy is an issue in the future, there is room for ingenuity.
If there is a consensus among those who control the specification ... for example, it is also possible to create modified PR of query or index (for comparison test).
It is hard to make a lot😊