-
Notifications
You must be signed in to change notification settings - Fork 73
feat:Only OCR active app window #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat:Only OCR active app window #40
Conversation
- Added `WindowHelper` to get bounds for active window - Added `ImageUtil` to create a cropped NSImage based on the bounds of the active window - Changed `remApp` to use the aboce code to only OCR the active window iff the `onlyOCRFrontmostWindow` setting is `true`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good - out of curiosity, could we do something like this instead?
(with your settings flag too - sorry, just wrote it quick)
guard let display = shareableContent.displays.first else { return }
let activeApplicationName = NSWorkspace.shared.frontmostApplication?.localizedName
let window = shareableContent.windows.first {
$0.owningApplication?.processID == NSWorkspace.shared.frontmostApplication?.processIdentifier
}
let frame = window?.frame
// ...
guard let image = CGDisplayCreateImage(display.displayID, rect: display.frame) else { return }
if let region = frame {
let cropped = image.cropping(to: region)
// OCR
} else {
// OCR
}
BTW -- I also added changes to the |
@jasonjmcghee Hmm -- will have a look, I believe I tried that and it didn't work. I'll try again. |
As suggested by @jasonjmcghee. The cropping can be done using CGImage base methods.
@jasonjmcghee OK so I removed the ImageHelper as per your suggestion. However, it seems the window frame returned by the method you suggested does not work properly. It does return a frame, but using this to crop the window renders the OCR output incomplete -- when using the I "lifted" the OCR generation up one function, and removed the now unused For the filter work you suggested in in some other thread I think we'd need to refactor this a bit so that adding / removing steps to the processing pipeline is more straight-forward. It works pretty good now, however. |
fwiw - seems to work (tried it real quick) let activePid = NSWorkspace.shared.frontmostApplication?.processIdentifier
let window = shareableContent.windows.first { $0.owningApplication?.processID == activePid }
let frame = window?.frame
logger.debug("Active Frame: \(frame?.debugDescription ?? "<undefined>")")
logger.debug("Active Application: \(activeApplicationName ?? "<undefined>")") |
😅 the one thing i didn't try - lemme see, maybe it's a pixel scaling issue? |
to be clear, likely better to pass the frame as (updated above) |
Maybe I'm not understanding... Also, i hope it goes without saying I really appreciate you joining as a contributer. it's awesome. I just want to make sure that if we can manage to do this in like 5 lines of code, we do - also this is using the |
Hah -- no worries. I've been doing SW development for a very long time. I'm also old enough. I really would also like to have a simple solution. The |
Well. When I use the frame of the window found using the ScreenCapture Framework, I found that the OCR result does not contain what I expect. I tested this by having a terminal window open and active with some text in it. I was unable to find the text using the I did the same test with an open Xcode window -- it does find some text, but not all. I believe the cropped image is not wide enough somehow -- but I need more testing. I'll test this again later tomorrow. |
Happy to merge as-is if we remove |
If you give me permission, happy to co-author by removing ImageUtils and merging your current code @seletz. It's such a great feature and I don't want to halt progress! |
Sure! Sorry for not reacting, RL keeps interfering :) |
Removing contributors section from README for now as `contrib.rocks` is not updating and github exposes this on the right bar anyway
@seletz all good! Is ImageUtils being used? (Is it needed?) |
ImageHelper (removed in this PR) has a function to save images. It's for debugging and shouldn't be deleted imo. Line 34 in 3431d2a
|
Oh damn, I removed the wrong file -- I'm sorry. Let me add it again. |
NSScreen has scale factor info fwiw. If that's the issue! |
So the cropped images look all wrong, verified by your png save function. I'm now looking at the docs and the frame info. |
Ah. CGimage measures in pixels, Window frames in points. We need to scale. |
Fix a bug where images would be cropped wrongly due to `CGImage` and `SCDisplay` size units count in pixels and points. We now determine the scale factor and update the crop rectangle. Fixed another bug where the active window would not be correctly found due to not filtering out off-screen windows.
The active window selection is better done using the ScreenCaptureKit results.
@jasonjmcghee ok this should work now. Could you please have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rem.xcodeproj/project.pbxproj looks out of date, but the code looks awesome and thank you for doing this and getting to this implementation!
This PR implements feature #38. This feature is only active if the new setting:
Implementation Details
It turns out to be surprisingly difficult to get the bounds of the active window. After much trying and googling around, I finally adopted code from SO. I added
WindowHelper
to hide that code.The signature of
performOCR
now takes aNSImage
instead of aCGImage
. TheprocessScreenshot
now passes a croppedNSImage
based on the bounds of the active window. Cropping is done inImageUtils
.