Skip to content

Conversation

purarue
Copy link
Contributor

@purarue purarue commented Apr 26, 2022

Should resolve #154

Zulip convo: https://memex.zulipchat.com/#narrow/stream/279601-hpi/topic/locations

Still need to update the docs, but code has been updated -- can take a look at this whenever you're free

CI may fail, made quite a few updates

Still TODO:

  • filter locations in my.location.via_location._locations based on my.config.time.tz.via_location.require_accuracy

@purarue
Copy link
Contributor Author

purarue commented Apr 26, 2022

I think thats everything I could think of, let me know if I should change anything

@purarue
Copy link
Contributor Author

purarue commented Apr 26, 2022

Also, not using geopy anywhere, that was used in my.location.google but didnt want to copy that to everywhere, perhaps a separate my.location.geo could import from my.location.all to enrich the data, if you want elevation/additional geolocation info?

otherwise feels like its polluting all the other sources and would be hard to cache/keep efficient

@purarue
Copy link
Contributor Author

purarue commented Apr 26, 2022

Ah, I guess this section assumed that the dates were sorted, but thats not really possible since my.location.all could have any order of dates, from multiple sources.

It works when you just have the one google source, but since this combines a bunch of (possibly unknown and unsorted), its dropping a bunch of data there....

Guess the check for 'local time goes backwards' should just be removed -- the way Im 'solving' this is to sort the DayWithZones, in _iter_tzs, so at least we can group by sorted locations (as thats required)

continue
tz = pytz.timezone(zone)
# TODO this is probably a bit expensive... test & benchmark
ldt = l.dt.astimezone(tz)
ldt = dt.astimezone(tz)
ndate = ldt.date()
Copy link
Contributor Author

@purarue purarue Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section comparing to the previous date received from locations may need to be removed? Since we cant control the order of my.location

So, this will be slower, but thats the trade off for allowing any sort of location data as input

Its not trivial to cache the my.location.all function either, since it depends on any amount of inputs, so creating an accuracte cachew depends_on would be difficult

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, I mean in principle we could either demand the upstream providers to provide it as sorted (and for many it'll already be, like google takeout?).
Or alternatively sort before merging them in all (via a helper merge function), then the whole thing would be much more iterative. But I guess can think about it later

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or could even do some magic, e.g. function which takes multiple generator factories (functions () -> Iterator), assumes that they are emitting sorted values, and tries merging with https://docs.python.org/3/library/heapq.html#heapq.merge

If something out of order is detected, it starts from scratch with the offending factory wrapped in sorted (hence merging factories, not plain Iterators). I guess realistically it would very quickly lead to detecting which iterables are already sorted and which aren't, usually the order is either increasing, decreasing or just completely random?

Just leaving it here, but potentially might be useful for other data sources too, e.g. often it's kinda nice to get things according to timestamps

Copy link
Contributor Author

@purarue purarue Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, use a heapmerge would make sense in situations like these, but I still think it might be prone to errors because a user might just overwrite all.py with something else (like I've done here) and then the magic doesn't work anymore. As usual, the magic might make it nice for us, but its not super extendible -- I feel like changing an all.py file shouldnt break a separate module

For now I think I'll go with the iterative approach of sorting the _locations in _iter_tzs -- since its cached when its used repeatedly

Could possibly add a sorted or geo location modules in the future which sort and use geopy to return more info about the locations

Copy link
Contributor Author

@purarue purarue Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another idea could be in all.py modules to specify the sources as a SOURCES = [ ] key at the toplevel, so then we could actually do some sort of magic and heapsort the iterators by querying that, instead of doing it in all.py itself. Otherwise, my.location.all.locations() is just a function and no way to get what the underlying iterators are. If the SOURCES var doesn't exist, then fallback to sorting the giant list of locations iteratively

And then locations in my.location.all would just be:

for provider in SOURCES:
    yield from provider()

Copy link
Owner

@karlicoss karlicoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! A few discussion points

continue
tz = pytz.timezone(zone)
# TODO this is probably a bit expensive... test & benchmark
ldt = l.dt.astimezone(tz)
ldt = dt.astimezone(tz)
ndate = ldt.date()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah, I mean in principle we could either demand the upstream providers to provide it as sorted (and for many it'll already be, like google takeout?).
Or alternatively sort before merging them in all (via a helper merge function), then the whole thing would be much more iterative. But I guess can think about it later

@purarue
Copy link
Contributor Author

purarue commented Apr 26, 2022

believe thats all the changes -- _iter_tz now works for me properly (previously was dropping a bunch of locations since it was out of order)

Would probably be best to squash this whole thing when merging

@karlicoss
Copy link
Owner

nice! thanks

@karlicoss karlicoss merged commit 2cb8361 into karlicoss:master Apr 26, 2022
@purarue purarue deleted the location-all-provider branch April 26, 2022 21:09
@purarue purarue mentioned this pull request Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow additional sources for my.location
2 participants