Skip to content
This repository was archived by the owner on Sep 8, 2021. It is now read-only.

WIP: Lucene update #1113

Closed
wants to merge 2 commits into from
Closed

WIP: Lucene update #1113

wants to merge 2 commits into from

Conversation

jvoisin
Copy link
Contributor

@jvoisin jvoisin commented Jun 9, 2019

This PR is heavily based on #847, some might even call it a complete rip-off.

@tesshucom
Copy link
Contributor

tesshucom commented Jun 10, 2019

On my clone server it has already been updated to 7.7.1 and is used by some Japanese people.
It might be a partial help.

(Because of the long Japanese language processing, I was forced to change the class structure for unit testing.
In addition, processing different from Airsonic such as multi genre and purge processing of unnecessary data is included.
So partial....)

main
test

Due to Lucene's destructive updates, there are some syntactic differences.
ex)
How to write a logical expression is different.

        BooleanQuery.Builder subMusicFoldersQuery = new BooleanQuery.Builder();
        musicFolders.forEach(musicFolder -> {
            if (indexType == IndexType.ALBUM_ID3 || indexType == IndexType.ARTIST_ID3) {
                subMusicFoldersQuery.add(new TermQuery(new Term(FieldNames.FOLDER_ID, musicFolder.getId().toString())), Occur.SHOULD);
            } else {
                subMusicFoldersQuery.add(new TermQuery(new Term(FieldNames.FOLDER, musicFolder.getPath().getPath())), Occur.SHOULD);
            }
        });
        mainQuery.add(subMusicFoldersQuery.build(), Occur.MUST);

null is no longer allowed.

        if (!(isEmpty(criteria.getFromYear()) && isEmpty(criteria.getToYear()))) {
            query.add(IntPoint.newRangeQuery(FieldNames.YEAR, 
                isEmpty(criteria.getFromYear())
                    ? Integer.MIN_VALUE
                    : criteria.getFromYear(),
                isEmpty(criteria.getToYear())
                    ? Integer.MAX_VALUE :
                    criteria.getToYear()),
                Occur.MUST);
        }

Also, the field types used in this PR document definition may be different from mine.

Migration is possible.
But I do not know if there is a motive that is commensurate with hardship...
In my case I needed Japanese (Lucene 4.0 or higher).

@jvoisin
Copy link
Contributor Author

jvoisin commented Jun 10, 2019

This is a bit off-topic, but why aren't you trying to upstream more of your changes in airsonic?

@tesshucom
Copy link
Contributor

I apologize if I am a guilty.
However, I did not mean that I could estimate time and risk in advance.
I just took a lot of trial and error stacks.

It often seems that Lucene's topic has risen but it has always been negative.
Therefore I tried it personally.
As a result I feel that it contains some areas that are likely to be controversial.

@fxthomas
Copy link
Contributor

I, for one, welcome better support for complex scripts, it's currently a bit painful to search for Japanese stuff... I'd be glad to test and include this if you can make a PR! :)

@jvoisin
Copy link
Contributor Author

jvoisin commented Jun 10, 2019

Me too !

@tesshucom
Copy link
Contributor

tesshucom commented Jun 10, 2019

I also think that way.
And It may also be necessary to introduce some elements at an appropriate time to approach it.

  • Divide into several classes for each function
  • Improvement of index version control (generation rotate mechanism) which is dead

What about this? I suggest.
Class division seems to be useful.
At least the current "one class configuration" was a barrier to the lucene update.

  • I want to avoid destructive updates.
  • There is no test, so I want to add a test.
  • Tests are difficult to add because the features are combined too much.

It is the state now.

Although generational changes seem irrelevant, it reduce the burden on users at the time of release.


I found this link in past issues.
Subsonic utilise Elasticsearch comme base de données orientée documents
(The author is the French and uses French.)
Some countries other than English have such ideas.

I do not want the Airsonic to be complicated at all, and Multilingual support is an example of an anti-pattern for a specific purpose.
Making Airsonic multilingual is difficult and should not be the immediate goal.
However, design improvements can also reduce the burden of localization.

It is sad, the latest version of Subsonic has evolved so cleverly that it is impossible to search Japanese...

@jvoisin
Copy link
Contributor Author

jvoisin commented Jun 11, 2019

Splitting the file is a good start, I do agree :)

@tesshucom
Copy link
Contributor

I will create a PR from now on as a suggestion.
Because Analyzer is different, it may take some time.

I mainly use JapaneseAnalyzer.
What is required of Airsonic is StandardAnalyzer.
The Analyzer currently used by Airsonic and the parsing of the StandardAnalyzer should be slightly different.

It is because UAX#29 was introduced.
The handling of underbars and single quotes changes slightly, but it is faster to include the results in a test case.
French is a little sensitive.

@fxthomas
Copy link
Contributor

If I understand it correctly, if we start using StandardAnalyzer, replacing it with JapaneseAnalyzer is easy, right?

@tesshucom
Copy link
Contributor

tesshucom commented Jun 13, 2019

That's right but, my clone contains minor spec changes as well as Lucene updates and localization.
(Because the lucene document design is different to resolve Japanese reading and eliminate misanalysis, it is not completely compatible portability.)

I will create a minimum of PR in accordance with the Guidelines for Contributing.

  • Split SearchService
  • Lucene update

Other than that, I think that judgment is necessary individually.
Split and update will improvement make other improvements very easy and safety.
I will suggest things that may be useful, such as reuse of Reader and parge of unnecessary documents later.

@tesshucom tesshucom mentioned this pull request Jun 15, 2019
@jvoisin jvoisin closed this Jun 15, 2019
@jvoisin jvoisin deleted the update_lucene branch June 15, 2019 21:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants