-
Notifications
You must be signed in to change notification settings - Fork 63
location: add all.py, using takeout/gpslogger/ip #237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think thats everything I could think of, let me know if I should change anything |
Also, not using otherwise feels like its polluting all the other sources and would be hard to cache/keep efficient |
Ah, I guess this section assumed that the dates were sorted, but thats not really possible since It works when you just have the one google source, but since this combines a bunch of (possibly unknown and unsorted), its dropping a bunch of data there.... Guess the check for 'local time goes backwards' should just be removed -- the way Im 'solving' this is to sort the |
continue | ||
tz = pytz.timezone(zone) | ||
# TODO this is probably a bit expensive... test & benchmark | ||
ldt = l.dt.astimezone(tz) | ||
ldt = dt.astimezone(tz) | ||
ndate = ldt.date() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section comparing to the previous date received from locations may need to be removed? Since we cant control the order of my.location
So, this will be slower, but thats the trade off for allowing any sort of location data as input
Its not trivial to cache the my.location.all
function either, since it depends on any amount of inputs, so creating an accuracte cachew depends_on would be difficult
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah, I mean in principle we could either demand the upstream providers to provide it as sorted (and for many it'll already be, like google takeout?).
Or alternatively sort before merging them in all
(via a helper merge
function), then the whole thing would be much more iterative. But I guess can think about it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or could even do some magic, e.g. function which takes multiple generator factories (functions () -> Iterator
), assumes that they are emitting sorted values, and tries merging with https://docs.python.org/3/library/heapq.html#heapq.merge
If something out of order is detected, it starts from scratch with the offending factory wrapped in sorted
(hence merging factories, not plain Iterator
s). I guess realistically it would very quickly lead to detecting which iterables are already sorted and which aren't, usually the order is either increasing, decreasing or just completely random?
Just leaving it here, but potentially might be useful for other data sources too, e.g. often it's kinda nice to get things according to timestamps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, use a heapmerge would make sense in situations like these, but I still think it might be prone to errors because a user might just overwrite all.py
with something else (like I've done here) and then the magic doesn't work anymore. As usual, the magic might make it nice for us, but its not super extendible -- I feel like changing an all.py
file shouldnt break a separate module
For now I think I'll go with the iterative approach of sorting the _locations
in _iter_tzs
-- since its cached when its used repeatedly
Could possibly add a sorted
or geo
location modules in the future which sort and use geopy to return more info about the locations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea could be in all.py
modules to specify the sources as a SOURCES = [ ]
key at the toplevel, so then we could actually do some sort of magic and heapsort the iterators by querying that, instead of doing it in all.py
itself. Otherwise, my.location.all.locations()
is just a function and no way to get what the underlying iterators are. If the SOURCES
var doesn't exist, then fallback to sorting the giant list of locations iteratively
And then locations in my.location.all
would just be:
for provider in SOURCES:
yield from provider()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! A few discussion points
continue | ||
tz = pytz.timezone(zone) | ||
# TODO this is probably a bit expensive... test & benchmark | ||
ldt = l.dt.astimezone(tz) | ||
ldt = dt.astimezone(tz) | ||
ndate = ldt.date() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah, I mean in principle we could either demand the upstream providers to provide it as sorted (and for many it'll already be, like google takeout?).
Or alternatively sort before merging them in all
(via a helper merge
function), then the whole thing would be much more iterative. But I guess can think about it later
believe thats all the changes -- Would probably be best to squash this whole thing when merging |
nice! thanks |
Should resolve #154
Zulip convo: https://memex.zulipchat.com/#narrow/stream/279601-hpi/topic/locations
Still need to update the docs, but code has been updated -- can take a look at this whenever you're free
CI may fail, made quite a few updates
Still TODO:
my.config.time.tz.via_location.require_accuracy