Skip to content

FEATURE: Versatile Video class #1924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: develop
Choose a base branch
from

Conversation

Ashp116
Copy link
Contributor

@Ashp116 Ashp116 commented Jul 30, 2025

Description

This PR allows the process_video function to include the audio stream from the source video in the final annotated output. Previously, the function only rendered video frames and discarded the audio, resulting in silent output videos. This change ensures that the output video maintains both visual and audio components, addressing issue #1923.

This PR requires the imageio-ffmpeg dependency, which enables audio stream handling during video writing. You can find it here: imageio-ffmpeg on PyPI

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How has this change been tested, please provide a testcase or example of how you tested the change?

Please refer to #1923

Any specific deployment considerations

Ensure that imageio-ffmpeg is installed in the environment.

Docs

  • Docs updated? What were the changes

@Ashp116 Ashp116 requested a review from SkalskiP as a code owner July 30, 2025 19:29
@Ashp116 Ashp116 changed the title ADD: Added audio stream for process_video BUG: Added audio stream for process_video Jul 30, 2025
@SkalskiP
Copy link
Collaborator

Hi @Ashp116 👋🏻 Another great idea! Video processing is probably the oldest part of supervision, written over two years ago, and I’ve been wanting to update its API for a while. Would you be open to not only adding audio support but also helping me with the update?

@Ashp116
Copy link
Contributor Author

Ashp116 commented Jul 31, 2025

Hi @SkalskiP, yea, I would like to help update the API. I was thinking of changing how videos are written in process_video. The original compression is lost when annotations are added and the file is written to a target_path. But yea, I would like to help out with the update.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Aug 1, 2025

Hi @Ashp116 I'm really glad you want to help me! Let's goooo! 🔥 🔥 🔥

I want the functionalities currently found in supervision.utils.video to be reorganized around a new Video class. Importantly, all features previously available in the old API must still be supported in the new one. Ideally, the new API should be more consistent and expressive.

  • get video info (works for files, RTSP, webcams)

    import supervision as sv
     
    # static video
    sv.Video("source.mp4").info
    
    # video stream
    sv.Video("rtsp://...").info
    
    # webcam
    sv.Video(0).info
  • simple frame iteration (object is iterable)

    import supervision as sv
    
    video = sv.Video("source.mp4")
    for frame in video:
        ...
  • advanced frame iteration (stride, sub-clip, on-the-fly resize)

    import supervision as sv
    
    for frame in sv.Video("source.mp4").frames(stride=5, start=100, end=500, resolution_wh=(1280, 720)):
        ...
  • process the video

    import cv2
    import supervision as sv
    
    def blur(frame, i):
        return cv2.GaussianBlur(frame, (11, 11), 0)
    
    sv.Video("source.mp4").save(
        "blurred.mp4",
        callback=blur,
        show_progress=True
    )
  • overwrite target video parameters

    import supervision as sv
    
    sv.Video("source.mp4").save(
        "timelapse.mp4",
        fps=60,
        callback=lambda f, i: f,
        show_progress=True
    )
  • complete manual control with explicit VideoInfo

    from supervision import Video, VideoInfo
    
    source = Video("source.mp4")
    target_info = VideoInfo(width=800, height=800, fps=24)
    
    with src.sink("square.mp4", info=target_info) as sink:
        for f in src.frames():
            f = cv2.resize(f, target_info.resolution_wh)
            sink.write(f)
  • multi-backend support decode/encode

    import supervision as sv
    
    video = sv.Video("source.mkv", backend="pyav")
    
    video = sv.Video("source.mkv", backend="opencv")

    suggested minimal protocol

    class Backend(Protocol):
        def open(self, path: str) -> Any: ...
        def info(self, handle: Any) -> VideoInfo: ...
    
        def read(self, handle: Any) -> tuple[bool, np.ndarray]: ...
        def grab(self, handle: Any) -> bool: ...
        def seek(self, handle: Any, frame_idx: int) -> None: ...
    
        def writer(self, path: str, info: VideoInfo, codec: str) -> Writer: ...
    
    class Writer(Protocol):
        def write(self, frame: np.ndarray) -> None: ...
        def close(self) -> None: ...

@Ashp116
Copy link
Contributor Author

Ashp116 commented Aug 2, 2025

Hi @SkalskiP,

I’ve addressed most of the features you mentioned, but I have some thoughts on a few aspects of the implementation:

  • .save Functionality
    How would you handle .save for a video feed coming from a webcam or an RTSP stream? Currently, I have it where only video files can be saved.

  • Writer and Backend Classes
    This is just my personal opinion, but should these classes be moved to separate scripts/modules? If we add more writers and backends in the future, keeping everything inside the main video script might become cluttered.

  • “Complete manual control with explicit VideoInfo” Functionality

    from supervision import Video, VideoInfo
    
    source = Video("source.mp4")
    target_info = VideoInfo(width=800, height=800, fps=24)
    
    with src.sink("square.mp4", info=target_info) as sink:
        for f in src.frames():
            f = cv2.resize(f, target_info.resolution_wh)
            sink.write(f)

    I’m not fully clear on what this feature is intended to do. In this snippet, the Video instance source is created but never used afterward. Is src supposed to be source? Also, is the goal to create sinks for each backend? Could you please clarify the purpose and expected usage here?

@Ashp116 Ashp116 changed the title BUG: Added audio stream for process_video FEATURE: Versatile Video class Aug 2, 2025
@@ -46,7 +46,8 @@ dependencies = [
"pillow>=9.4",
"requests>=2.26.0",
"tqdm>=4.62.3",
"opencv-python>=4.5.5.64"
"opencv-python>=4.5.5.64",
"imageio-ffmpeg (>=0.6.0,<0.7.0)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not use imageio-ffmpeg and use PyAV instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When adding PyAV as dependency, make sure to make it optional. We don't want to require everyone to install it.

@@ -141,6 +673,7 @@ def _validate_and_setup_video(
return video, start, end


@DeprecationWarning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use deprecated from supervision.utils.internal. it is currently used in the codebase to mark several deprecations.

@@ -192,6 +725,7 @@ def get_video_frames_generator(
video.release()


@DeprecationWarning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use deprecated from supervision.utils.internal. it is currently used in the codebase to mark several deprecations.

@@ -117,6 +648,7 @@ def __exit__(self, exc_type, exc_value, exc_traceback):
self.__writer.release()


@DeprecationWarning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to mark private classes / functions as deprecated. We do it only for classes / functions in public supervision API.

)


@DeprecationWarning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use deprecated from supervision.utils.internal. it is currently used in the codebase to mark several deprecations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internals of this deprecated class should be now reimplemented using new Video API.

@@ -141,6 +673,7 @@ def _validate_and_setup_video(
return video, start, end


@DeprecationWarning
def get_video_frames_generator(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internals of this deprecated function should be now reimplemented using new Video API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants