Skip to content

feat:Only OCR active app window #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jan 3, 2024

Conversation

seletz
Copy link
Contributor

@seletz seletz commented Dec 31, 2023

This PR implements feature #38. This feature is only active if the new setting:

image

Implementation Details

It turns out to be surprisingly difficult to get the bounds of the active window. After much trying and googling around, I finally adopted code from SO. I added WindowHelper to hide that code.

The signature of performOCR now takes a NSImage instead of a CGImage. The processScreenshot now passes a cropped NSImage based on the bounds of the active window. Cropping is done in ImageUtils.

- Added `WindowHelper` to get bounds for active window
- Added `ImageUtil` to create a cropped NSImage based on the
  bounds of the active window
- Changed `remApp` to use the aboce code to only OCR the active window
  iff the `onlyOCRFrontmostWindow` setting is `true`
Copy link
Owner

@jasonjmcghee jasonjmcghee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good - out of curiosity, could we do something like this instead?

(with your settings flag too - sorry, just wrote it quick)

guard let display = shareableContent.displays.first else { return }
let activeApplicationName = NSWorkspace.shared.frontmostApplication?.localizedName

let window = shareableContent.windows.first {
    $0.owningApplication?.processID == NSWorkspace.shared.frontmostApplication?.processIdentifier
}
let frame = window?.frame

// ...

guard let image = CGDisplayCreateImage(display.displayID, rect: display.frame) else { return }

if let region = frame {
    let cropped = image.cropping(to: region)
    // OCR
} else {
    // OCR
}

@seletz
Copy link
Contributor Author

seletz commented Dec 31, 2023

BTW -- I also added changes to the project.pbxproj file -- I hope this is OK. I have additional changes locally here, mostly related to code signing.

@seletz
Copy link
Contributor Author

seletz commented Dec 31, 2023

@jasonjmcghee Hmm -- will have a look, I believe I tried that and it didn't work. I'll try again.

@seletz
Copy link
Contributor Author

seletz commented Dec 31, 2023

@jasonjmcghee OK so I removed the ImageHelper as per your suggestion. However, it seems the window frame returned by the method you suggested does not work properly. It does return a frame, but using this to crop the window renders the OCR output incomplete -- when using the WindowHelper this is not the case. Perhaps I am missing something?

I "lifted" the OCR generation up one function, and removed the now unused frame argument.

For the filter work you suggested in in some other thread I think we'd need to refactor this a bit so that adding / removing steps to the processing pipeline is more straight-forward. It works pretty good now, however.

@jasonjmcghee
Copy link
Owner

@jasonjmcghee Hmm -- will have a look, I believe I tried that and it didn't work. I'll try again.

fwiw - seems to work (tried it real quick)

let activePid = NSWorkspace.shared.frontmostApplication?.processIdentifier
let window = shareableContent.windows.first { $0.owningApplication?.processID == activePid }
let frame = window?.frame

logger.debug("Active Frame: \(frame?.debugDescription ?? "<undefined>")")
logger.debug("Active Application: \(activeApplicationName ?? "<undefined>")")
image

@jasonjmcghee
Copy link
Owner

@jasonjmcghee OK so I removed the ImageHelper as per your suggestion. However, it seems the window frame returned by the method you suggested does not work properly. It does return a frame, but using this to crop the window renders the OCR output incomplete -- when using the WindowHelper this is not the case. Perhaps I am missing something?

😅 the one thing i didn't try - lemme see, maybe it's a pixel scaling issue?

@jasonjmcghee
Copy link
Owner

jasonjmcghee commented Dec 31, 2023

hm... seems to work for me somehow:

    ...
    let window = shareableContent.windows.first { $0.owningApplication?.processID == activePid }
    await processScreenshot(frameId: frameId, image: image, frame: window?.frame)
    ....

later

performOCR(..., frame: CGRect?) {
    ...
    // Crop if a frame was provided
    let img = (frame != nil ? image.cropping(to: frame!) : nil) ?? image
    self.logger.debug("Active Frame: \(frame.debugDescription)")

    let configuration = ImageAnalyzer.Configuration([.text])
    let nsImage = NSImage(cgImage: img, size: NSSize(width: img.width, height: img.height))
    ...
image

@jasonjmcghee
Copy link
Owner

jasonjmcghee commented Dec 31, 2023

to be clear, likely better to pass the frame as CGRect? and only crop if it's present - and then we could only even instantiate the window if the setting exists.

(updated above)

@jasonjmcghee
Copy link
Owner

but using this to crop the window renders the OCR output incomplete

Maybe I'm not understanding...

Also, i hope it goes without saying I really appreciate you joining as a contributer. it's awesome.

I just want to make sure that if we can manage to do this in like 5 lines of code, we do - also this is using the ScreenCaptureKit API which we'll need to use for filtering anyway

@seletz
Copy link
Contributor Author

seletz commented Dec 31, 2023

Also, i hope it goes without saying I really appreciate you joining as a contributer. it's awesome.

I just want to make sure that if we can manage to do this in like 5 lines of code, we do - also this is using the ScreenCaptureKit API which we'll need to use for filtering anyway

Hah -- no worries. I've been doing SW development for a very long time. I'm also old enough. I really would also like to have a simple solution. The WindowHelper code seems awkward and surprisingly complex. I'm a Swift and UIKit NOOB, I used to do a little ObjectiveC when iOS was new and hot. So all in all I contribute (a) because I like the idea of the rem app, and (b) I take it as a opportunity to learn something new.

@seletz
Copy link
Contributor Author

seletz commented Dec 31, 2023

Maybe I'm not understanding...

Well. When I use the frame of the window found using the ScreenCapture Framework, I found that the OCR result does not contain what I expect. I tested this by having a terminal window open and active with some text in it. I was unable to find the text using the Search function.
When I use the frame from WindowHelper instead, the text is found.

I did the same test with an open Xcode window -- it does find some text, but not all. I believe the cropped image is not wide enough somehow -- but I need more testing. I'll test this again later tomorrow.

@jasonjmcghee
Copy link
Owner

Happy to merge as-is if we remove ImageUtils which I think is not being used. We can always iterate to find a more ScreenCaptureKit approach!

@jasonjmcghee
Copy link
Owner

If you give me permission, happy to co-author by removing ImageUtils and merging your current code @seletz. It's such a great feature and I don't want to halt progress!

@seletz
Copy link
Contributor Author

seletz commented Jan 2, 2024

If you give me permission, happy to co-author by removing ImageUtils and merging your current code @seletz. It's such a great feature and I don't want to halt progress!

Sure! Sorry for not reacting, RL keeps interfering :)

@jasonjmcghee
Copy link
Owner

@seletz all good! Is ImageUtils being used? (Is it needed?)

@seletz
Copy link
Contributor Author

seletz commented Jan 2, 2024

FWIW, updated this branch to the latest mainline. I believe that the cropping is wrong:

image

So what I said above is wrong., i.e. your window finding method is not the culprit, the cropping is. I believe that the frame of the window found might use some different units or a different coordinate system. I'll need to read apple docs to get to the issue.

We might also save the cropped image to disc in some readable format so we can confirm this visually.

@jasonjmcghee
Copy link
Owner

ImageHelper (removed in this PR) has a function to save images. It's for debugging and shouldn't be deleted imo.

static func saveNSImage(image: NSImage, path: String) {

@seletz
Copy link
Contributor Author

seletz commented Jan 2, 2024

ImageHelper (removed in this PR) has a function to save images. It's for debugging and shouldn't be deleted imo.

static func saveNSImage(image: NSImage, path: String) {

Oh damn, I removed the wrong file -- I'm sorry. Let me add it again.

@jasonjmcghee
Copy link
Owner

NSScreen has scale factor info fwiw. If that's the issue!

@seletz
Copy link
Contributor Author

seletz commented Jan 2, 2024

So the cropped images look all wrong, verified by your png save function. I'm now looking at the docs and the frame info.

@seletz
Copy link
Contributor Author

seletz commented Jan 2, 2024

Ah. CGimage measures in pixels, Window frames in points. We need to scale.

seletz added 2 commits January 2, 2024 23:37
Fix a bug where images would be cropped wrongly due to
`CGImage` and `SCDisplay` size units count in pixels and
points.  We now determine the scale factor and update
the crop rectangle.

Fixed another bug where the active window would not be
correctly found due to not filtering out off-screen windows.
The active window selection is better done using the
ScreenCaptureKit results.
@seletz
Copy link
Contributor Author

seletz commented Jan 2, 2024

@jasonjmcghee ok this should work now. Could you please have a look?

Copy link
Owner

@jasonjmcghee jasonjmcghee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rem.xcodeproj/project.pbxproj looks out of date, but the code looks awesome and thank you for doing this and getting to this implementation!

@jasonjmcghee jasonjmcghee merged commit 6d225ad into jasonjmcghee:main Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants