Split SearchService #1130

tesshucom · 2019-06-15T08:45:10Z

This is a draft of SearchService's split refactoring related to #1113.
I divided the class to withstand Lucene update and added the test cases to the high importance class.

The relationship between the divided classes is as follows.
(Filter and Tokenizer may be added after updating)

The granularity of the division prioritizes that differences in test results can be easily seen during Lucene updates, and that improvements can easily be made individually after the update.

Additional point of view

SearchService is interfaced and holds the previous methods completely.
The following is written in the SearchService comment: Performs Lucene-based searching and indexing.
Now, startIndexing index stopIndex can be deleted from SearchService. MediaScannerService refers to SearchService, but can be replaced with IndexManager (now, IndexManager is a deligate in SearchService).
This means that the SearchService design is completely split up in search and indexing.

It is not included in PR because it is not related to Lucene's update. If it is desirable to make a change, will commit at the same time.

Is there a cool name for SearchServiceTermination ...?

Test cases for class division and version upgrade.

Extract interface of SearchService. And extract 2 classes from SearchService - Extract enum class of IndexType - Differentiate DocumentFactory from IndexType As the package of IndexType changes, fixed the class that referred to IndexType. And correct the service reference setting of SonosHelper. Because SonosHelper refers to the service entity, not auto inject. DocumentFactory is heavily influenced by Lucene version. These two classes have a major impact on index file management. It is desirable to rebuild the index file at the time of design change due to requirements or implementation change of Lucene version upgrade.

Create a factory class of Analyzer. Processing did not change, and test cases for current status observation were added.

Create a factory class of queries. Processing did not change, and test cases for current status observation were added.

The lock mechanism is left in SearchService to cope with the transition period. After the update, it is unnecessary because the framework manages locks. The present condition is only extraction of processing. There are many improvements in this class after the update.

Moved a lot of redundant scripts into one class. Judgment processing specific to SearchService, conversion of Object, refilling processing to list, etc. The random function is refactored. The list copy has been changed to a partial copy due to overprocessing. This will eliminate the dependency on guava. On the other hand, explicit declarations were added to pom to use CollectionUtils for addIgnoreNull.

jvoisin · 2019-06-15T15:40:39Z

airsonic-main/src/main/java/org/airsonic/player/service/search/AnalyzerFactory.java

+        analyzer = new CustomAnalyzer();
+      } catch (Exception e) {
+        /*
+         * It is a transient description, and it is appropriate to detect an IOException at Lucene


I think you meant "temporary" :)

That's right.
Analyzer initialization process after update needs to catch IOException.
I removed it because it is confusing.

airsonic-main/src/main/java/org/airsonic/player/service/search/DocumentFactory.java

jvoisin · 2019-06-15T15:50:41Z

airsonic-main/src/main/java/org/airsonic/player/service/search/IndexType.java

+
+  public static final class FieldNames {
+
+    private FieldNames() {


Is this necessary?

FieldNames can be package access.
It moved to another new class and deleted from enum.

Because the Id3 method was overloaded with createDocument. Fixed in createAlbumId3Document and createArtistId3Document.

Change index field name definition to package access.

Remove exception processing that is currently unnecessary.

jvoisin

There is no need to add a PDF file :)

tesshucom · 2019-06-16T09:47:08Z

I erased.

tesshucom · 2019-06-16T23:20:25Z

Currently Airsonic has a reproducible Path search bug.
The content is very related to this PR ... what next?
Do you want to handle here or issue ?

jvoisin · 2019-06-17T08:35:37Z

I would prefer to have it in a new issue :)

tesshucom · 2019-06-17T17:50:33Z

If you scan with a specific pass pattern, the data is corrupted and incorrect search.
(I thought it was related to this PR, but it might not be related ...)
I'll Investigate a little more the playback procedure and then write a new issue.

tesshucom · 2019-06-18T14:41:17Z

As for the scan, it seems that my confirmation was missing.
It used to be confused with the problem of nesting of directories with reported examples and children not being updated.

I wrote a new issue for shuffle false search.

tesshucom · 2019-06-20T02:46:16Z

Earlier, I wrote that my clones contained some spec changes.
That is the problem above.
Folders are excluded from queries, and path comparisons exact-match.
This is a simpler solution as Japanese are traditionally plagued by native language pass strings.
There is a bug in path comparison, and folder specification is only a factor of false search.

Doing the same with Airsonic means that removes folder condition from query.
If legacy should be respected, There is a alternative proposal such as :

update as it is
change the analysis of the FOLDER field to PathHierarchyTokenizer or a new dedicated PathTokenizer.

The most reliable and robust is to modify the logic so that the path comparison will be done by exact-match.
In my clone, I deleted the folder from the search condition, but can keep this condition by using another index.

In such a design, boost value(current dead code) allows for score adjustments.
I think the first design idea was like that.

Currently legacy code is implemented to reduce index size, in exchange for inaccuracies.
This is a tricky approach that is difficult to understand without careful reading.

In any case there is a better solution.
The benefits of Lucene's version upgrade outweigh the slight increase in processing.
(Memory improvement, index compression rate, search speed improvement, etc.)
If that is determined to be necessary, it is good to change to a better process.

jvoisin · 2019-06-24T20:30:38Z

I like the idea of changing the analysis of the FOLDER field to PathHierarchyTokenizer or a new dedicated PathTokenizer.

Worse case, increasing the size of the index is completely acceptable in my opinion.

tesshucom · 2019-06-29T08:07:32Z

It solved by doing some experiments.

In my clone, I wrote that the folder contains specification changes.
Not SpanOr but Or.
When I verified this accurately, the current SpanOr implementation can not be migrated to 7.x due to technical limitations.
(Available with the expert option. . .)

Legacy saves the path as follows:
In 7.x, SpanOr can not be used for NO_NORMS.

doc.add(new Field(FieldNames.FOLDER, mediaFile.getFolder(), Field.Store.NO,　Field.Index.NOT_ANALYZED_NO_NORMS

When Path is processed with Single Token by KeywordAnalyzer instead of NO_NORMS, Spanor can be used.
(Of course it is an undesirable implementation. For verification.)
In this case, if there are Music Folders such as Music, Music2 and Music3, misanalysis occurs.
For example, shuffle search will pick up the mess.
The result that can be observed by the user is a bug close to #1139.
But the cause is a potential defect different from #1139.

There is another legacy mysterious behavior.

"+((art:abc* f:abc*) (art:def* f:def*)) +spanOr([f:" + PATH1 + "])",

The path of Music Folder is specified in the main query, and comparison with the input value is performed.

I think this is not a meaningful specification.
(I do not understand unless I ask the author)

Folder is specified as NO_NORMS at the time of document generation. At least not related to scoring
However, since Folder is explicitly added to IndexType, it is analyzed by MultiFieldQueryParser. Due to the input is affected evil by analysis by MultiFieldQueryParser and there is no substantial meaning as a search of Folder

The correspondence method is either of the following:

The intention is unknown but to be implement as it is (It does not function as a proper search condition).
To be omit it because it causes false search.
Direct search of music folders is considered as a requirement for user, so to be fix.

In the case of the third policy, search will be realized using PathHierarchyTokenizer, KeywordTokenizer and PathTokenizer etc...

Besides replacing the simple API syntax, rewriting SpanOr to Or is a condition for updating.
UAX#29 needs some confirmation about the handling of the underbar.

I can create a PR keeping this in mind.

jvoisin · 2019-06-30T10:10:11Z

It's ok to break retro-compatibility: nobody is going to complain that the search is working way better :D

tesshucom · 2019-06-30T13:54:57Z

I will create a PR from this.

If you want to increase the variation of the test and apply it after and before the update, it is useful if the PR so far is merged into the master or the verification branch.
If you want to make a decision on discard including push after this, I will push on this PR.
(Which one is better?)

If you want to postpone the decision to discard , there are also ideas to improve the Reader and include it in the evaluation..

The overall performance by the upgrade tends to improve, but the search overhead time increases by about 1 ms.
It will be improved by reusing the reader.

Reader is open every search and not closed 😣
it is recommended to fix.
I'll push in succession if necessary.

MediaScannerService#
deleteOldIndexFiles() probably does not work.

It is a mechanism to change the generation of an existing index.
Database schemas also have generation management, but they can be thought of as similar roles.

This does not have to be immediate, but a fix in the future is desirable.

jvoisin · 2019-06-30T14:06:19Z

I'm ok with merging this PR so you can work on top of the master, since it's making the code a bit more clear: everyone prefers to review small PR, instead of massive ones.

But I would like that at least someone else review this PR, since it's moving a lot of things around.

Please close the Reader and fix deleteOldIndexFiles method is possible :)

tesshucom · 2019-06-30T14:14:06Z

For the time being, i will proceed locally.
Once the policy is decided, I'll PR it to the desired location.

I hope the tests are done as thoroughly as possible.
I will do my best as much as possible, but it would be better if different people would be involved.

muff1nman · 2019-07-01T17:19:43Z

airsonic-main/src/main/java/org/airsonic/player/service/search/FieldNames.java

+ * Enum that symbolizes the field name used for lucene index.
+ * This class is a division of what was once part of SearchService and added functionality.
+ */
+class FieldNames {


Any particular reason for the changing of the field names?

The value has been changed simply to shorten the query.
(I just want to reduce the line breaks in assert code in QueryFactoryTestCase.
I'll restore it.

I revised that according to your advice.

muff1nman · 2019-07-01T17:33:49Z

airsonic-main/src/main/java/org/airsonic/player/service/search/SearchServiceImpl.java

+   * @return result
+   * @throws IOException
+   */
+  private final <D> List<D> createRandomFiles(


I think this method needs to be renamed. How about createRandomDocsList?

I revised that according to your advice.

muff1nman · 2019-07-01T17:41:09Z

airsonic-main/src/main/java/org/airsonic/player/service/search/SearchServiceTermination.java

+ * so do not include exception handling in this class.
+ */
+@Component
+public class SearchServiceTermination {


This definitely needs a different name. It seems this class is mainly a holder for various utility functions related to searching. A generic name therefore might be SearchServiceUtilities?

I revised that according to your advice.

muff1nman · 2019-07-01T17:43:34Z

airsonic-main/src/test/java/org/airsonic/player/service/search/AnalyzerFactoryTestCase.java

@@ -0,0 +1,574 @@
+


Please use 4 space indentation. @jvoisin please add this to the list in #1121

I revised that according to your advice.

muff1nman

Overall this seems quite clean and is mainly just moving things around which is a great first step here. In fact the only functional change I can find is the field names. Can we keep the field names the same for this PR so that testing wise we can be certain there should be no functional changes? We can do the field name changes in a separate PR and would help us test that this PR keeps identical behavior.

Other than that I added some comments about naming.

tesshucom · 2019-07-01T18:39:59Z

I`ll correct the point you pointed out.

There are currently inevitable CVE issues in Airsonic.
I may not be able to respond immediately.

spring-security-core-4.2.12.RELEASE.jar
(cpe:/a:pivotal_software:spring_security:4.2.12,
 org.springframework.security:spring-security-core:4.2.12.RELEASE) : CVE-2019-11272

spring-security-ldap-4.2.12.RELEASE.jar
(org.springframework.security:spring-security-ldap:4.2.12.RELEASE,
 cpe:/a:pivotal_software:spring_security:4.2.12,
 cpe:/a:pivotal_software:spring-ldap:4.2.12) : CVE-2019-11272

tomcat-embed-core-8.5.40.jar
(org.apache.tomcat.embed:tomcat-embed-core:8.5.40,
 cpe:/a:apache_software_foundation:tomcat:8.5.40,
 cpe:/a:apache_tomcat:apache_tomcat:8.5.40,
 cpe:/a:apache:tomcat:8.5.40) : CVE-2019-10072

jackson-databind-2.9.9.jar
 (cpe:/a:fasterxml:jackson:2.9.9,
  cpe:/a:fasterxml:jackson-databind:2.9.9,
  com.fasterxml.jackson.core:jackson-databind:2.9.9) : CVE-2019-12814

Jdom2.x exists in Airsonic classpath.

$ mvn dependency:tree | grep -i jdom
[INFO] +- org.jdom:jdom:jar:2.0.2:compile

In this case, jackson-databind causes a security issue.
There is no effective countermeasure at present.

The 2.9.9.1 or 2.10.0 release of jackson-databind is planned and is pending for several days.
It may be easier if this is released.

muff1nman · 2019-07-01T19:08:12Z

@tesshucom id rebase with master. those cve issues should be fixed there.

tesshucom · 2019-07-02T01:19:46Z

I understand.
It was already solved in upstream!

muff1nman · 2019-07-02T02:38:24Z

Looks good. Can you rebase? (I can too if you want).

tesshucom · 2019-07-02T07:46:11Z

I'll leave it to you which granularity of rebase is appropriate!

muff1nman · 2019-07-02T15:21:05Z

Rebased in 767b39e

tesshucom added 6 commits June 11, 2019 22:11

Add Test Case for SearchService

cfa224e

Test cases for class division and version upgrade.

Extract Analyzer from SearchService

1c4a6aa

Create a factory class of Analyzer. Processing did not change, and test cases for current status observation were added.

Extract queries from SearchService

53698e0

Create a factory class of queries. Processing did not change, and test cases for current status observation were added.

jvoisin reviewed Jun 15, 2019

View reviewed changes

Misleading method name correction

36b704f

Because the Id3 method was overloaded with createDocument. Fixed in createAlbumId3Document and createArtistId3Document.

tesshucom force-pushed the separate-search-serv branch from 82ff245 to 36b704f Compare June 15, 2019 17:06

Exclude FieldNames from public enum.

9672421

Change index field name definition to package access.

tesshucom force-pushed the separate-search-serv branch from 2943597 to 9672421 Compare June 15, 2019 17:36

Remove temporary code

ee64127

Remove exception processing that is currently unnecessary.

jvoisin reviewed Jun 16, 2019

View reviewed changes

Remove PDF file for data pattern

fcf238f

tesshucom mentioned this pull request Jun 18, 2019

Path to search in a specific case is incorrect #1139

Closed

tesshucom changed the title ~~WIP：Split SearchService~~ Split SearchService Jun 20, 2019

tesshucom mentioned this pull request Jun 23, 2019

Search result quality #1142

Closed

jvoisin requested review from muff1nman and jooola June 30, 2019 14:03

muff1nman reviewed Jul 1, 2019

View reviewed changes

muff1nman suggested changes Jul 1, 2019

View reviewed changes

muff1nman removed the request for review from jooola July 1, 2019 17:48

Merge remote-tracking branch 'upstream/master' into separate-search-serv

6f4899f

tesshucom added 4 commits July 2, 2019 10:23

Rename createRandomFiles -> createRandomDocsList

d07599b

Rename SearchServiceTermination -> SearchServiceUtilities

5266d0d

correct indentation

4ba836a

Revert field names

e376dfd

muff1nman approved these changes Jul 2, 2019

View reviewed changes

muff1nman closed this Jul 2, 2019

tesshucom deleted the separate-search-serv branch July 3, 2019 20:05

tesshucom mentioned this pull request Jul 21, 2019

Update to Lucene 8.2.0 #1166

Merged

eharris mentioned this pull request Jul 28, 2019

Do we want to add more checks in checkstyle? #1121

Open


		public static final class FieldNames {

		private FieldNames() {

Split SearchService #1130

Split SearchService #1130

Uh oh!

Conversation

tesshucom commented Jun 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Additional point of view

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tesshucom Jun 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tesshucom Jun 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jvoisin left a comment

Choose a reason for hiding this comment

Uh oh!

tesshucom commented Jun 16, 2019

Uh oh!

tesshucom commented Jun 16, 2019

Uh oh!

jvoisin commented Jun 17, 2019

Uh oh!

tesshucom commented Jun 17, 2019

Uh oh!

tesshucom commented Jun 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tesshucom commented Jun 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvoisin commented Jun 24, 2019

Uh oh!

tesshucom commented Jun 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvoisin commented Jun 30, 2019

Uh oh!

tesshucom commented Jun 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jvoisin commented Jun 30, 2019

Uh oh!

tesshucom commented Jun 30, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tesshucom Jul 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

muff1nman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tesshucom commented Jul 1, 2019

Uh oh!

muff1nman commented Jul 1, 2019

Uh oh!

tesshucom commented Jun 15, 2019 •

edited

Loading

tesshucom Jun 15, 2019 •

edited

Loading

tesshucom Jun 15, 2019 •

edited

Loading

tesshucom commented Jun 18, 2019 •

edited

Loading

tesshucom commented Jun 20, 2019 •

edited

Loading

tesshucom commented Jun 29, 2019 •

edited

Loading

tesshucom commented Jun 30, 2019 •

edited

Loading

tesshucom Jul 1, 2019 •

edited

Loading

muff1nman left a comment •

edited

Loading