Skip to content

Ideas for more accurate categorization, saving open files / projects, getting metadata without implementing new watchers #504

@phiresky

Description

@phiresky
  • I have searched the issues of this repo and believe that this is not a duplicate. [ searched for filename, accuracy, title etc]
  • I have searched the documentation and believe that my question is not covered.

I was working on a similar project as activitywatch a while ago, and though I've kinda abandoned it for now, I implemented some ideas that i think would benefit ActivityWatch as well to improve the accuracy of tracking:

  1. Save the command line the program was run with. If you run a program from a file manager etc, it is always run with the to-open filename as the first argument. This means that by storing the cmdline, you can accurately get the actual filename of pdf files in a pdf viewer, image in an image viewer, video file in video player, audio players, text editors (libreoffice), gimp etc. This is even more useful when you are able to use the directory name of the file for further "bucketing", since the directory structure says a lot about what the file is about.

    Example: Currently if I have my pdf viewer open, activitywatch only logs for example "Evince - Annual Report". It only shows the title of the pdf, and that it was a pdf. If you knew the filename, you could tell that the filepath was /home/x/projects/university/2020/some-topic-name/report.pdf, which is much more valuable. Also e.g. all movies are in a separate folder from tv shows, so you can categorize whether you spend more time with movies or shows.

    Implementation: On X11, you can easily use _NET_WM_PID to get the PID (on all window managers i know of, e.g. Gnome, KDE, i3), then use system info (/proc/x/cmdline) to get the cmdline of that PID. Example code here.

    On Windows, this should also be possible I think: https://github.com/phiresky/track-pc-usage-rs/blob/master/src/capture/winwins.rs#L115

  2. Save the lsof of processes. This is separate and more janky than (1), and I see it's already been discussed here: https://forum.activitywatch.net/t/log-the-path-of-open-files-for-categorization/487

  3. Parse structured data from window titles. In many programs, you can adjust what the displayed title is. This is a really easy trick to squeeze more useful data out of programs.

    1. For browsers, there's simple addons that add the full URL to the title: firefox, chrome. This can replace aw-watcher-web in its current form by just parsing a regex of URL from the window title, although aw-watcher-web could also get more other useful information in the future (e.g. the creator of a video on youtube)
    2. For shells, you can set the title using precmd / preexec to include the working directory, user (e.g. root), and command currently being run. In my case I simple add this as a JSON object to the title, which is then matched and parsed on the watcher side. The exact code is here: https://github.com/phiresky/track-pc-usage-rs#data-sources-setup This way I can track which project I was working on (via the cwd) as well as retrieve the full history of only that shell session since i store the session id.
    3. For IDEs such as VSCode, you can add the project name and file name to the title, so you can tell which project the user was working on. This is a much easier though somewhat less flexible alternative to e.g. https://forum.activitywatch.net/t/bucket-and-event-design-fo-vs-code-extension/120 . You set the config window.title to ${dirty}${activeEditorShort}${separator}${rootName}${separator}🛤sd🠚proj=${rootPath}🙰file=${activeEditorMedium}🠘 VSCode. This appends a machine-parseable token to the end of the title. The reason I didn't just add a JSON object here and instead used the format 🛤[sd for software development🠚key=value<🙰>key=value<🙰>key=value🠘] is that VSCode as well as some other programs don't have a "JSON-Escape" functionality, so if the project name or filename included a " or a , the json would be broken, while files containing those unique unicode symbols like 🛤, 🙰 and 🠚 is less likely. Hacky, I know, but even using JSON should be fine for 99% of cases.
  4. Currently it looks like aw-watch-window really only saves the window title and "appname" to the db. Imo, saving this little metadata is a bad idea, since if the app-name matchers change over time all old data collected is "worthless". I see there's been multiple issues here in the past about adding more browsers etc, which with the current model cannot be changed retroactively.

  5. External data sources could be used to get software category etc. For example, you can detect the debian package that contains a program using dpkg -S /usr/bin/firefox-developer-edition. That will give you a unique name such as firefox. You can then look this up in wikidata using a query such as select ?software where archlinux_package = "vlc". Then you can use the instance_of relation to get which software_categorys that software is a part of: https://www.wikidata.org/wiki/Q171477 VLC is a "video player". This way you would use existing open data and also enable users to improve it easily. Note that this also needs storing more metadata than your "appname".

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions