-
Notifications
You must be signed in to change notification settings - Fork 238
Split SearchService #1130
Split SearchService #1130
Conversation
Test cases for class division and version upgrade.
Extract interface of SearchService. And extract 2 classes from SearchService - Extract enum class of IndexType - Differentiate DocumentFactory from IndexType As the package of IndexType changes, fixed the class that referred to IndexType. And correct the service reference setting of SonosHelper. Because SonosHelper refers to the service entity, not auto inject. DocumentFactory is heavily influenced by Lucene version. These two classes have a major impact on index file management. It is desirable to rebuild the index file at the time of design change due to requirements or implementation change of Lucene version upgrade.
Create a factory class of Analyzer. Processing did not change, and test cases for current status observation were added.
Create a factory class of queries. Processing did not change, and test cases for current status observation were added.
The lock mechanism is left in SearchService to cope with the transition period. After the update, it is unnecessary because the framework manages locks. The present condition is only extraction of processing. There are many improvements in this class after the update.
Moved a lot of redundant scripts into one class. Judgment processing specific to SearchService, conversion of Object, refilling processing to list, etc. The random function is refactored. The list copy has been changed to a partial copy due to overprocessing. This will eliminate the dependency on guava. On the other hand, explicit declarations were added to pom to use CollectionUtils for addIgnoreNull.
analyzer = new CustomAnalyzer(); | ||
} catch (Exception e) { | ||
/* | ||
* It is a transient description, and it is appropriate to detect an IOException at Lucene |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you meant "temporary" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right.
Analyzer initialization process after update needs to catch IOException.
I removed it because it is confusing.
airsonic-main/src/main/java/org/airsonic/player/service/search/DocumentFactory.java
Outdated
Show resolved
Hide resolved
|
||
public static final class FieldNames { | ||
|
||
private FieldNames() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FieldNames can be package access.
It moved to another new class and deleted from enum.
Because the Id3 method was overloaded with createDocument. Fixed in createAlbumId3Document and createArtistId3Document.
82ff245
to
36b704f
Compare
Change index field name definition to package access.
2943597
to
9672421
Compare
Remove exception processing that is currently unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no need to add a PDF file :)
I erased. |
Currently Airsonic has a reproducible Path search bug. |
I would prefer to have it in a new issue :) |
If you scan with a specific pass pattern, the data is corrupted and incorrect search. |
As for the scan, it seems that my confirmation was missing. I wrote a new issue for shuffle false search. |
Earlier, I wrote that my clones contained some spec changes. Doing the same with Airsonic means that removes folder condition from query.
The most reliable and robust is to modify the logic so that the path comparison will be done by exact-match. In such a design, boost value(current dead code) allows for score adjustments. Currently legacy code is implemented to reduce index size, in exchange for inaccuracies. In any case there is a better solution. |
I like the idea of changing the analysis of the Worse case, increasing the size of the index is completely acceptable in my opinion. |
It solved by doing some experiments. In my clone, I wrote that the folder contains specification changes. Legacy saves the path as follows:
When Path is processed with Single Token by KeywordAnalyzer instead of NO_NORMS, Spanor can be used. There is another legacy mysterious behavior.
The path of Music Folder is specified in the main query, and comparison with the input value is performed. I think this is not a meaningful specification.
The correspondence method is either of the following:
In the case of the third policy, search will be realized using PathHierarchyTokenizer, KeywordTokenizer and PathTokenizer etc... Besides replacing the simple API syntax, rewriting SpanOr to Or is a condition for updating. I can create a PR keeping this in mind. |
It's ok to break retro-compatibility: nobody is going to complain that the search is working way better :D |
I will create a PR from this. If you want to increase the variation of the test and apply it after and before the update, it is useful if the PR so far is merged into the master or the verification branch. If you want to postpone the decision to discard , there are also ideas to improve the Reader and include it in the evaluation.. The overall performance by the upgrade tends to improve, but the search overhead time increases by about 1 ms. Reader is open every search and not closed 😣 MediaScannerService# It is a mechanism to change the generation of an existing index. This does not have to be immediate, but a fix in the future is desirable. |
I'm ok with merging this PR so you can work on top of the master, since it's making the code a bit more clear: everyone prefers to review small PR, instead of massive ones. But I would like that at least someone else review this PR, since it's moving a lot of things around. Please close the |
For the time being, i will proceed locally. I hope the tests are done as thoroughly as possible. |
* Enum that symbolizes the field name used for lucene index. | ||
* This class is a division of what was once part of SearchService and added functionality. | ||
*/ | ||
class FieldNames { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for the changing of the field names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value has been changed simply to shorten the query.
(I just want to reduce the line breaks in assert code in QueryFactoryTestCase.
I'll restore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised that according to your advice.
* @return result | ||
* @throws IOException | ||
*/ | ||
private final <D> List<D> createRandomFiles( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method needs to be renamed. How about createRandomDocsList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised that according to your advice.
* so do not include exception handling in this class. | ||
*/ | ||
@Component | ||
public class SearchServiceTermination { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely needs a different name. It seems this class is mainly a holder for various utility functions related to searching. A generic name therefore might be SearchServiceUtilities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised that according to your advice.
@@ -0,0 +1,574 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I revised that according to your advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this seems quite clean and is mainly just moving things around which is a great first step here. In fact the only functional change I can find is the field names. Can we keep the field names the same for this PR so that testing wise we can be certain there should be no functional changes? We can do the field name changes in a separate PR and would help us test that this PR keeps identical behavior.
Other than that I added some comments about naming.
I`ll correct the point you pointed out. There are currently inevitable CVE issues in Airsonic.
Jdom2.x exists in Airsonic classpath.
In this case, jackson-databind causes a security issue. The 2.9.9.1 or 2.10.0 release of jackson-databind is planned and is pending for several days. |
@tesshucom id rebase with master. those cve issues should be fixed there. |
I understand. |
Looks good. Can you rebase? (I can too if you want). |
I'll leave it to you which granularity of rebase is appropriate! |
Rebased in 767b39e |
This is a draft of SearchService's split refactoring related to #1113.
I divided the class to withstand Lucene update and added the test cases to the high importance class.
The relationship between the divided classes is as follows.
(Filter and Tokenizer may be added after updating)
The granularity of the division prioritizes that differences in test results can be easily seen during Lucene updates, and that improvements can easily be made individually after the update.
Additional point of view
Performs Lucene-based searching and indexing.
startIndexing
index
stopIndex
can be deleted from SearchService. MediaScannerService refers to SearchService, but can be replaced with IndexManager (now, IndexManager is a deligate in SearchService).It is not included in PR because it is not related to Lucene's update. If it is desirable to make a change, will commit at the same time.
Is there a cool name for SearchServiceTermination ...?