feat:Only OCR active app window #40

seletz · 2023-12-31T17:09:11Z

This PR implements feature #38. This feature is only active if the new setting:

Implementation Details

It turns out to be surprisingly difficult to get the bounds of the active window. After much trying and googling around, I finally adopted code from SO. I added WindowHelper to hide that code.

The signature of performOCR now takes a NSImage instead of a CGImage. The processScreenshot now passes a cropped NSImage based on the bounds of the active window. Cropping is done in ImageUtils.

- Added `WindowHelper` to get bounds for active window - Added `ImageUtil` to create a cropped NSImage based on the bounds of the active window - Changed `remApp` to use the aboce code to only OCR the active window iff the `onlyOCRFrontmostWindow` setting is `true`

jasonjmcghee

This is looking good - out of curiosity, could we do something like this instead?

(with your settings flag too - sorry, just wrote it quick)

guard let display = shareableContent.displays.first else { return }
let activeApplicationName = NSWorkspace.shared.frontmostApplication?.localizedName

let window = shareableContent.windows.first {
    $0.owningApplication?.processID == NSWorkspace.shared.frontmostApplication?.processIdentifier
}
let frame = window?.frame

// ...

guard let image = CGDisplayCreateImage(display.displayID, rect: display.frame) else { return }

if let region = frame {
    let cropped = image.cropping(to: region)
    // OCR
} else {
    // OCR
}

seletz · 2023-12-31T17:19:26Z

BTW -- I also added changes to the project.pbxproj file -- I hope this is OK. I have additional changes locally here, mostly related to code signing.

seletz · 2023-12-31T17:39:32Z

@jasonjmcghee Hmm -- will have a look, I believe I tried that and it didn't work. I'll try again.

@jasonjmcghee

As suggested by @jasonjmcghee. The cropping can be done using CGImage base methods.

seletz · 2023-12-31T19:05:27Z

@jasonjmcghee OK so I removed the ImageHelper as per your suggestion. However, it seems the window frame returned by the method you suggested does not work properly. It does return a frame, but using this to crop the window renders the OCR output incomplete -- when using the WindowHelper this is not the case. Perhaps I am missing something?

I "lifted" the OCR generation up one function, and removed the now unused frame argument.

For the filter work you suggested in in some other thread I think we'd need to refactor this a bit so that adding / removing steps to the processing pipeline is more straight-forward. It works pretty good now, however.

jasonjmcghee · 2023-12-31T19:05:31Z

@jasonjmcghee Hmm -- will have a look, I believe I tried that and it didn't work. I'll try again.

fwiw - seems to work (tried it real quick)

let activePid = NSWorkspace.shared.frontmostApplication?.processIdentifier
let window = shareableContent.windows.first { $0.owningApplication?.processID == activePid }
let frame = window?.frame

logger.debug("Active Frame: \(frame?.debugDescription ?? "<undefined>")")
logger.debug("Active Application: \(activeApplicationName ?? "<undefined>")")

jasonjmcghee · 2023-12-31T19:06:23Z

@jasonjmcghee OK so I removed the ImageHelper as per your suggestion. However, it seems the window frame returned by the method you suggested does not work properly. It does return a frame, but using this to crop the window renders the OCR output incomplete -- when using the WindowHelper this is not the case. Perhaps I am missing something?

😅 the one thing i didn't try - lemme see, maybe it's a pixel scaling issue?

jasonjmcghee · 2023-12-31T19:16:03Z

hm... seems to work for me somehow:

    ...
    let window = shareableContent.windows.first { $0.owningApplication?.processID == activePid }
    await processScreenshot(frameId: frameId, image: image, frame: window?.frame)
    ....

later

performOCR(..., frame: CGRect?) {
    ...
    // Crop if a frame was provided
    let img = (frame != nil ? image.cropping(to: frame!) : nil) ?? image
    self.logger.debug("Active Frame: \(frame.debugDescription)")

    let configuration = ImageAnalyzer.Configuration([.text])
    let nsImage = NSImage(cgImage: img, size: NSSize(width: img.width, height: img.height))
    ...

jasonjmcghee · 2023-12-31T19:19:49Z

to be clear, likely better to pass the frame as CGRect? and only crop if it's present - and then we could only even instantiate the window if the setting exists.

(updated above)

jasonjmcghee · 2023-12-31T19:33:04Z

but using this to crop the window renders the OCR output incomplete

Maybe I'm not understanding...

Also, i hope it goes without saying I really appreciate you joining as a contributer. it's awesome.

I just want to make sure that if we can manage to do this in like 5 lines of code, we do - also this is using the ScreenCaptureKit API which we'll need to use for filtering anyway

seletz · 2023-12-31T22:07:43Z

Also, i hope it goes without saying I really appreciate you joining as a contributer. it's awesome.

I just want to make sure that if we can manage to do this in like 5 lines of code, we do - also this is using the ScreenCaptureKit API which we'll need to use for filtering anyway

Hah -- no worries. I've been doing SW development for a very long time. I'm also old enough. I really would also like to have a simple solution. The WindowHelper code seems awkward and surprisingly complex. I'm a Swift and UIKit NOOB, I used to do a little ObjectiveC when iOS was new and hot. So all in all I contribute (a) because I like the idea of the rem app, and (b) I take it as a opportunity to learn something new.

seletz · 2023-12-31T22:17:28Z

Maybe I'm not understanding...

Well. When I use the frame of the window found using the ScreenCapture Framework, I found that the OCR result does not contain what I expect. I tested this by having a terminal window open and active with some text in it. I was unable to find the text using the Search function.
When I use the frame from WindowHelper instead, the text is found.

I did the same test with an open Xcode window -- it does find some text, but not all. I believe the cropped image is not wide enough somehow -- but I need more testing. I'll test this again later tomorrow.

jasonjmcghee · 2023-12-31T22:21:57Z

Happy to merge as-is if we remove ImageUtils which I think is not being used. We can always iterate to find a more ScreenCaptureKit approach!

jasonjmcghee · 2024-01-02T17:42:39Z

If you give me permission, happy to co-author by removing ImageUtils and merging your current code @seletz. It's such a great feature and I don't want to halt progress!

seletz · 2024-01-02T19:04:08Z

If you give me permission, happy to co-author by removing ImageUtils and merging your current code @seletz. It's such a great feature and I don't want to halt progress!

Sure! Sorry for not reacting, RL keeps interfering :)

Removing contributors section from README for now as `contrib.rocks` is not updating and github exposes this on the right bar anyway

jasonjmcghee · 2024-01-02T19:45:33Z

@seletz all good! Is ImageUtils being used? (Is it needed?)

seletz · 2024-01-02T20:07:20Z

FWIW, updated this branch to the latest mainline. I believe that the cropping is wrong:

So what I said above is wrong., i.e. your window finding method is not the culprit, the cropping is. I believe that the frame of the window found might use some different units or a different coordinate system. I'll need to read apple docs to get to the issue.

We might also save the cropped image to disc in some readable format so we can confirm this visually.

jasonjmcghee · 2024-01-02T20:11:20Z

ImageHelper (removed in this PR) has a function to save images. It's for debugging and shouldn't be deleted imo.

rem/rem/ImageHelper.swift

Line 34 in 3431d2a

static func saveNSImage(image: NSImage, path: String) {

seletz · 2024-01-02T21:04:22Z

ImageHelper (removed in this PR) has a function to save images. It's for debugging and shouldn't be deleted imo.

rem/rem/ImageHelper.swift

Line 34 in 3431d2a

static func saveNSImage(image: NSImage, path: String) {

Oh damn, I removed the wrong file -- I'm sorry. Let me add it again.

jasonjmcghee · 2024-01-02T21:10:48Z

NSScreen has scale factor info fwiw. If that's the issue!

seletz · 2024-01-02T21:25:29Z

So the cropped images look all wrong, verified by your png save function. I'm now looking at the docs and the frame info.

seletz · 2024-01-02T21:55:26Z

Ah. CGimage measures in pixels, Window frames in points. We need to scale.

Fix a bug where images would be cropped wrongly due to `CGImage` and `SCDisplay` size units count in pixels and points. We now determine the scale factor and update the crop rectangle. Fixed another bug where the active window would not be correctly found due to not filtering out off-screen windows.

The active window selection is better done using the ScreenCaptureKit results.

seletz · 2024-01-02T22:46:16Z

@jasonjmcghee ok this should work now. Could you please have a look?

jasonjmcghee

rem.xcodeproj/project.pbxproj looks out of date, but the code looks awesome and thank you for doing this and getting to this implementation!

seletz added 2 commits December 31, 2023 14:36

Add setting option for partial OCR jasonjmcghee#38

21ccd40

jasonjmcghee reviewed Dec 31, 2023

View reviewed changes

seletz mentioned this pull request Dec 31, 2023

Optionally only OCR the content of the frontmost application #38

Closed

seletz added 4 commits December 31, 2023 19:48

Remove ImageHelper

af5d7bf

As suggested by @jasonjmcghee. The cropping can be done using CGImage base methods.

Fix compiler warnings.

8d358c0

Remove dead comment / bit rot.

6ad45ab

Fix another warning.

b3c85e0

caetano-dev and others added 9 commits January 2, 2024 20:09

MacOS standard settings window.

4cf1aeb

Update README.md

c1d19a4

Removing contributors section from README for now as `contrib.rocks` is not updating and github exposes this on the right bar anyway

Prompt typo

0fdc279

[rem-3]: try to get everything ffmpeg + app store to work

6acc686

Update README.md

3e29eed

Update README.md

3bc0234

[rem-52]: fix compatability with 13

0e500d1

whoops, add missing changes in ffmpeg binary due to signing

4ec58d3

Update README.md

dd1d46e

Merge branch 'main' into only-ocr-active-app-window

ab108af

Resurrected accidentally removed file

adb3570

seletz added 2 commits January 2, 2024 23:37

Remove unneeded code

a289f4e

The active window selection is better done using the ScreenCaptureKit results.

jasonjmcghee approved these changes Jan 2, 2024

View reviewed changes

jasonjmcghee merged commit 6d225ad into jasonjmcghee:main Jan 3, 2024

feat:Only OCR active app window #40

feat:Only OCR active app window #40

Uh oh!

Conversation

seletz commented Dec 31, 2023

Implementation Details

Uh oh!

jasonjmcghee left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seletz commented Dec 31, 2023

Uh oh!

seletz commented Dec 31, 2023

Uh oh!

seletz commented Dec 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonjmcghee commented Dec 31, 2023

Uh oh!

jasonjmcghee commented Dec 31, 2023

Uh oh!

jasonjmcghee commented Dec 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonjmcghee commented Dec 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonjmcghee commented Dec 31, 2023

Uh oh!

seletz commented Dec 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seletz commented Dec 31, 2023

Uh oh!

jasonjmcghee commented Dec 31, 2023

Uh oh!

jasonjmcghee commented Jan 2, 2024

Uh oh!

seletz commented Jan 2, 2024

Uh oh!

jasonjmcghee commented Jan 2, 2024

Uh oh!

seletz commented Jan 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonjmcghee commented Jan 2, 2024

Uh oh!

seletz commented Jan 2, 2024

Uh oh!

jasonjmcghee commented Jan 2, 2024

Uh oh!

seletz commented Jan 2, 2024

Uh oh!

seletz commented Jan 2, 2024

Uh oh!

seletz commented Jan 2, 2024

Uh oh!

jasonjmcghee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jasonjmcghee left a comment •

edited

Loading

seletz commented Dec 31, 2023 •

edited

Loading

jasonjmcghee commented Dec 31, 2023 •

edited

Loading

jasonjmcghee commented Dec 31, 2023 •

edited

Loading

seletz commented Dec 31, 2023 •

edited

Loading

seletz commented Jan 2, 2024 •

edited

Loading