Skip to content

Conversation

ThinkChaos
Copy link
Collaborator

@ThinkChaos ThinkChaos commented Feb 18, 2023

This is a first step towards #867.
In example config from the issue, @adb76 also uses multiple hosts files, which this does not implement. I'll do that as a follow up so this PR doesn't get too huge.

I ended up doing a bit more than "just re-using our existing code" as I said in the issue:

  • Had a bit of fun with the list parsing implementation. It's streaming now, so we don't need to have the full file in memory in addition to the storage where we keep the parsed data. We do still have 2 copies of the parsed data while we refresh everything.
    There aren't tests for those new types/functions yet, I first wanted some feedback.
    All commits pass all existing tests though.
  • I changed the HostsFileResolver so it uses a map internally so lookups are faster (actually multiple to also speedup reserve lookups)

EDIT: force-pushed to remove a FDescribe I accidentally committed.

@codecov
Copy link

codecov bot commented Feb 18, 2023

Codecov Report

Patch coverage: 95.44% and project coverage change: -0.15 ⚠️

Comparison is base (6f79af7) 93.15% compared to head (7d91165) 93.00%.

❗ Current head 7d91165 differs from pull request most recent head 9bca5d0. Consider uploading reports for the commit 9bca5d0 to get more accurate results

Additional details and impacted files
@@               Coverage Diff               @@
##           development     #884      +/-   ##
===============================================
- Coverage        93.15%   93.00%   -0.15%     
===============================================
  Files               42       48       +6     
  Lines             4950     5332     +382     
===============================================
+ Hits              4611     4959     +348     
- Misses             268      291      +23     
- Partials            71       82      +11     
Impacted Files Coverage Δ
lists/list_cache.go 92.71% <85.22%> (-5.49%) ⬇️
resolver/hosts_file_resolver.go 93.15% <90.67%> (-6.85%) ⬇️
lists/parsers/adapt.go 100.00% <100.00%> (ø)
lists/parsers/filtererrors.go 100.00% <100.00%> (ø)
lists/parsers/hosts.go 100.00% <100.00%> (ø)
lists/parsers/lines.go 100.00% <100.00%> (ø)
lists/parsers/parser.go 100.00% <100.00%> (ø)
util/arpa.go 100.00% <100.00%> (ø)
resolver/query_logging_resolver.go 95.62% <0.00%> (-2.92%) ⬇️
redis/redis.go 91.78% <0.00%> (-1.45%) ⬇️
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -147,6 +147,8 @@ func logger() *logrus.Entry {
}

// downloads and reads files with domain names and creates cache for them
//
//nolint:funlen // will refactor in a later commit
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refactor this in the follow-up.
I already have something locally, but think it'll be better in the PR that allows the HostsFileResolver to use multiple sources since that is going to look a lot like this function.

return factory.Create(), err
cache := factory.Create()
if cache.ElementCount() == 0 && err != nil {
cache = nil // don't replace existing cache
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change from the previous behavior (I also changed a test). I think makes more sense to keep the current cache if the parsing completely failed.
I can easily revert it if you prefer the current behavior as it's isolated to 8caf7c5.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should replace cache content if file was parsed without errors, but contains no entries. Example: user has a local blacklist with 2 entries and he/she comments both entries out. In this case I would expect that both entries aren't in the cache anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that's covered by err != nil: this only ignores a new empty parse result if there was also an error.

@ThinkChaos ThinkChaos changed the title feat: allow hosts file resolver to a HTTP(S) line or inline block feat: allow hosts file resolver to use a HTTP(S) link or inline block Feb 18, 2023
Copy link
Owner

@0xERR0R 0xERR0R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed over your code. I leaved some comments, some minor points. The parsers package is difficult to understand, unit tests would be really helpful here.

return factory.Create(), err
cache := factory.Create()
if cache.ElementCount() == 0 && err != nil {
cache = nil // don't replace existing cache
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should replace cache content if file was parsed without errors, but contains no entries. Example: user has a local blacklist with 2 entries and he/she comments both entries out. In this case I would expect that both entries aren't in the cache anymore

}

return err
}

func (b *ListCache) groupElementCount(group string, isInit bool) int {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In "Configuration()" we do already count elements per group. Maybe we can reuse it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked this, but I think it's better kept as is: in Configuration we're iterating on the groups, so no need to check which ones exist like this func does. Also we can acquire the lock only once for the whole loop with the way it is.

@@ -131,7 +131,7 @@ var _ = Describe("ListCache", func() {
})
})
When("non transient err occurs on download", func() {
It("should delete existing elements from group cache", func() {
It("should keep existing elements from group cache", func() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied on the other comment too.

This test has an error occur on second parse, so it's not the case you mentioned (no entries and no error).
I'll add a test for that case.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test with 08b17e2

@ThinkChaos
Copy link
Collaborator Author

Wrote tests to get 100% coverage on parsers and util/arpa.go.
Will write a couple more to get missing coverage in the resolvers.

@ThinkChaos
Copy link
Collaborator Author

Got test coverage to a point I'm happy with.
For some reason, Codecov didn't update with the latest push and only shows results from 7d91165.

I'll squash the commits and see if that unblocks it.

Unify the hosts file parsing between the hosts resolver and lists so
the resolver supports more data sources than local files.
Instead of iterating through all hosts+aliases for each A/AAAA query,
we can do a single lookup.
For PTR we search through only the hosts with an IP version that matches
the question. And compare IPs instead of building the reverse DNS name
for each IP in the hosts database.
The new `MockCallSequence` generalizes creating a function that can
return `(T, error)` multiple times.
@ThinkChaos ThinkChaos force-pushed the feat/hosts-file-http branch from a984eb2 to 9bca5d0 Compare March 2, 2023 22:25
@ThinkChaos
Copy link
Collaborator Author

Seems like CodeCov isn't working properly with rebase+force push.
You can see the coverage by manually going to the last commit's tree coverage page.

I think everything is ready on my side :)
Tests cover everything new 100%, except for MockCallSequence which is only used in tests. The hosts file resolver is also at 100% now.

@0xERR0R
Copy link
Owner

0xERR0R commented Mar 6, 2023

CodeCov results are something "strange". I wanted to check it out, but as usual, no time. :)

I skimmed over your commit, it looks good. Is it now a stand what can be checked in? Or do you want to continue to develop? I think, threre was a TODO in your code (something with golint).

If you want to merge it, please squash your commits to one together.

@ThinkChaos
Copy link
Collaborator Author

Yeah the nolint will be solved by the follow-up.
I'll try the "squash and merge" feature.

Thanks!

@ThinkChaos ThinkChaos merged commit a2ab7c3 into 0xERR0R:development Mar 7, 2023
@ThinkChaos ThinkChaos deleted the feat/hosts-file-http branch March 7, 2023 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants