-
Notifications
You must be signed in to change notification settings - Fork 108
Fix a bug when int publish date lead to TypeError #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I wonder if just forcing it to a string would "fix" the error, or at least make it not fail with a That would mean changing line 175 in the crawler.py file to: self.article._publish_date = str(self.publishdate_extractor.extract()) |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #157 +/- ##
==========================================
- Coverage 91.03% 91.03% -0.01%
==========================================
Files 30 30
Lines 2409 2420 +11
==========================================
+ Hits 2193 2203 +10
- Misses 216 217 +1
|
…_date_unix_milliseconds
Maybe let's try to parse this UNIX time into expected ISO 8601 format? |
And keep this |
If you want you can move this entire code block, which translates UNIX time into ISO 8601 time, into separate function https://github.com/goose3/goose3/pull/157/files#diff-fe6e242d7cae1fa6f728979e6467729f9ad50d16afbb0e50386a40c2547669beR269 This way in further we will be able to add support for UNIX time not only for |
I thought about forcing it to string. But translating UNIX time into ISO 8601 time looks like a more correct solution because it will allow to initialize |
This PR looks great! Thanks! |
See https://www.linkedin.com/pulse/you-getting-raise-year-cnbc/
self.publishdate_extractor.extract()
extractsint
publish date -1676551869000
. Because of thatdateutil
failes to parse it and raises exception -TypeError: Parser must be a string or character stream, not int
. Which lead to exception raise byself._publish_date_to_utc()
, which lead to entire parsing fail.I don't sure if catching this
TypeError
is the best way to fix it. Perhaps it is best to fixself.publishdate_extractor.extract()
so that it shouldn't returnint
result at all.