-
Notifications
You must be signed in to change notification settings - Fork 238
Improve search accuracy #1235
Improve search accuracy #1235
Conversation
aec5e75 to
d069947
Compare
|
A few questions about how this works:
Otherwise I like what your screenshots show, it sounds reasonable to boost the priority of the artist field in the Artist column. Great job! |
We can create logic that lowers the boost for specific words. I have never seen it. Even if the boost value is lowered... As a result, it will be difficult to display in the order the user wants.
It is statically defined in the following file: stopwords.txt It can be changed dynamically, but the following points need to be considered.
Generally, it is handled by the file unit of stopwords.txt as is done in this PR.
i'll put in. Happy if you can accumulate feedback as follows: Candidates to remove from default Stopwardslos -> to be delete because German is not good. la -> can't find an artist called la la or a song called la la la. Candidates to addNone for now I tried to add but didn'tx -> X means collaboration. I can't find an artist named X. |
That's correct, although we could automatically start a rescan if that happens.
You're right on this. If we want to implement this we should probably add a Tokenizer setting in addition to the stop words, but not in this PR. I will pull & test, will let you know if everything looks good for me. EDIT: A couple of searches are already much, much better. I like the improvements! Thinking about this we should probably remove the "articles to ignore" field from the UI if it's not used anymore. |
|
The currently "articles to ignore" is a setting item corresponding to index of domain. "articles to ignore" is included in the domain design and is also an output item of the REST API. Search Stopward is not directly related to domain design. |
|
Ah, I missed that it was never affecting Lucene at all (the "index" the field was mentioning was the left sidebar index, not the Lucene index). I don't have an issue with the changes if that is the case. I found no issues so far on my server (was expecting none, but still it's nice to know). |
e762b06 to
4788992
Compare
|
Thank you very much. Language switching can be achieved by replacing Analyzer and Stopword if the language is similar to English. Implementing a simple switch may be possible but impractical I saw in the previous issues that "plugins have little merit to hard work". (Depending on the language, additional logic and dictionaries can increase the war size by 30Mb.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small nitpick, but otherwise looks fine to me. I agree with @fxthomas that there is future work that we can do here especially with regards to hardcoded stop words.
airsonic-main/src/main/java/org/airsonic/player/service/search/AnalyzerFactory.java
Outdated
Show resolved
Hide resolved
|
For example, if you have an additional Stopward file in a specific directory, you might find it useful to load it. Adding a setting to the screen may be a little strange.
There are few in English alone, but it is huge depending on the language. In any case, I agree with the idea that Stopward can be changed in some way. |
bf08f3c to
63d39e2
Compare
|
You can combine two commits. In terms of content, I think this is a slightly different modification. |
|
I do believe that this one is ready to merge as soon as conflicts are resolved? |
|
Plese note only the following points:
At the time this PR was created, it was impossible to predict when the release this PR would take place. |
|
Good point. If we release both Lucene upgrades and this PR at the same time, things should be fine without additional upgrades, am I right? |
|
Ah shucks this should have gotten pulled into the previous release. I'm sorry I forgot about this one. Well this will need to bump the index version now. |
|
We just need to update |
b5439f9 to
858e7c3
Compare
- Iterate index version.
cd312ac to
dba8610
Compare
|
After rebase, INDEX_VERSION update merged with Stopword update. |
|
Thank you so much for the help, merging now! |
Related to #1142.
This PR proposes the following two improvements.
In a search system, these are usually considered at design time.
The current search does n’t take into account the convenience of users who type short phrases.
Here is a simple example.
Stopward change example
Willcannot be used in legacy searches.Example of boost value adjustment
It is reasonable to give priority to the leftmost item in the search results.
Since the boost value is assigned a very small value, the priority will be reversed if the cost of the Artist name is high.
before
after
Brainstorm is required for the value set in Stopward.
(Particularly the opinions of English speaking people are necessary. Because I am Japanese.)
feat,with, are a little different.@muff1nman