Skip to content

Conversation

rsy56640
Copy link

update upstream

@KernelPRBot
Copy link

Hi @rsy56640!

Thanks for your contribution to the Linux kernel!

Linux kernel development happens on mailing lists, rather than on GitHub - this GitHub repository is a read-only mirror that isn't used for accepting contributions. So that your change can become part of Linux, please email it to us as a patch.

Sending patches isn't quite as simple as sending a pull request, but fortunately it is a well documented process.

Here's what to do:

  • Format your contribution according to kernel requirements
  • Decide who to send your contribution to
  • Set up your system to send your contribution as an email
  • Send your contribution and wait for feedback

How do I format my contribution?

The Linux kernel community is notoriously picky about how contributions are formatted and sent. Fortunately, they have documented their expectations.

Firstly, all contributions need to be formatted as patches. A patch is a plain text document showing the change you want to make to the code, and documenting why it is a good idea.

You can create patches with git format-patch.

Secondly, patches need 'commit messages', which is the human-friendly documentation explaining what the change is and why it's necessary.

Thirdly, changes have some technical requirements. There is a Linux kernel coding style, and there are licensing requirements you need to comply with.

Both of these are documented in the Submitting Patches documentation that is part of the kernel.

Note that you will almost certainly have to modify your existing git commits to satisfy these requirements. Don't worry: there are many guides on the internet for doing this.

Who do I send my contribution to?

The Linux kernel is composed of a number of subsystems. These subsystems are maintained by different people, and have different mailing lists where they discuss proposed changes.

If you don't already know what subsystem your change belongs to, the get_maintainer.pl script in the kernel source can help you.

get_maintainer.pl will take the patch or patches you created in the previous step, and tell you who is responsible for them, and what mailing lists are used. You can also take a look at the MAINTAINERS file by hand.

Make sure that your list of recipients includes a mailing list. If you can't find a more specific mailing list, then LKML - the Linux Kernel Mailing List - is the place to send your patches.

It's not usually necessary to subscribe to the mailing list before you send the patches, but if you're interested in kernel development, subscribing to a subsystem mailing list is a good idea. (At this point, you probably don't need to subscribe to LKML - it is a very high traffic list with about a thousand messages per day, which is often not useful for beginners.)

How do I send my contribution?

Use git send-email, which will ensure that your patches are formatted in the standard manner. In order to use git send-email, you'll need to configure git to use your SMTP email server.

For more information about using git send-email, look at the Git documentation or type git help send-email. There are a number of useful guides and tutorials about git send-email that can be found on the internet.

How do I get help if I'm stuck?

Firstly, don't get discouraged! There are an enormous number of resources on the internet, and many kernel developers who would like to see you succeed.

Many issues - especially about how to use certain tools - can be resolved by using your favourite internet search engine.

If you can't find an answer, there are a few places you can turn:

If you get really, really stuck, you could try the owners of this bot, @daxtens and @ajdlinux. Please be aware that we do have full-time jobs, so we are almost certainly the slowest way to get answers!

I sent my patch - now what?

You wait.

You can check that your email has been received by checking the mailing list archives for the mailing list you sent your patch to. Messages may not be received instantly, so be patient. Kernel developers are generally very busy people, so it may take a few weeks before your patch is looked at.

Then, you keep waiting. Three things may happen:

  • You might get a response to your email. Often these will be comments, which may require you to make changes to your patch, or explain why your way is the best way. You should respond to these comments, and you may need to submit another revision of your patch to address the issues raised.
  • Your patch might be merged into the subsystem tree. Code that becomes part of Linux isn't merged into the main repository straight away - it first goes into the subsystem tree, which is managed by the subsystem maintainer. It is then batched up with a number of other changes sent to Linus for inclusion. (This process is described in some detail in the kernel development process guide).
  • Your patch might be ignored completely. This happens sometimes - don't take it personally. Here's what to do:
    • Wait a bit more - patches often take several weeks to get a response; more if they were sent at a busy time.
    • Kernel developers often silently ignore patches that break the rules. Check for obvious violations of the Submitting Patches guidelines, the style guidelines, and any other documentation you can find about your subsystem. Check that you're sending your patch to the right place.
    • Try again later. When you resend it, don't add angry commentary, as that will get your patch ignored. It might also get you silently blacklisted.

Further information

Happy hacking!

This message was posted by a bot - if you have any questions or suggestions, please talk to my owners, @ajdlinux and @daxtens, or raise an issue at https://github.com/ajdlinux/KernelPRBot.

@rsy56640 rsy56640 closed this May 30, 2018
torvalds pushed a commit that referenced this pull request Nov 14, 2021
To pick the changes in this cset:

  db8268d ("x86/arch_prctl: Add controls for dynamic XSTATE components")

This picks these new prctls:

  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/before
  $ cp arch/x86/include/uapi/asm/prctl.h tools/arch/x86/include/uapi/asm/prctl.h
  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/after
  $ diff -u /tmp/before /tmp/after
  --- /tmp/before	2021-11-13 10:42:52.787308809 -0300
  +++ /tmp/after	2021-11-13 10:43:02.295558837 -0300
  @@ -6,6 +6,9 @@
   	[0x1004 - 0x1001]= "GET_GS",
   	[0x1011 - 0x1001]= "GET_CPUID",
   	[0x1012 - 0x1001]= "SET_CPUID",
  +	[0x1021 - 0x1001]= "GET_XCOMP_SUPP",
  +	[0x1022 - 0x1001]= "GET_XCOMP_PERM",
  +	[0x1023 - 0x1001]= "REQ_XCOMP_PERM",
   };

   #define x86_arch_prctl_codes_2_offset 0x2001
  $

With this 'perf trace' can translate those numbers into strings and use
the strings in filter expressions:

  # perf trace -e prctl
       0.000 ( 0.011 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9c014b7df5)     = 0
       0.032 ( 0.002 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9bb6b51580)     = 0
       5.452 ( 0.003 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfeb70) = 0
       5.468 ( 0.002 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfea70) = 0
      24.494 ( 0.009 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f562a32ae28) = 0
      24.540 ( 0.002 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f563c6d4b30) = 0
     670.281 ( 0.008 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30805c8) = 0
     670.293 ( 0.002 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30800f0) = 0
  ^C#

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/prctl.h' differs from latest version at 'arch/x86/include/uapi/asm/prctl.h'
  diff -u tools/arch/x86/include/uapi/asm/prctl.h arch/x86/include/uapi/asm/prctl.h

Cc: Borislav Petkov <bp@suse.de>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/YY%2FER104k852WOTK@kernel.org/T/#u
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
ojeda pushed a commit to ojeda/linux that referenced this pull request Nov 20, 2021
Use generic associated types (GATs) to implement `PointerWrapper`.
ammarfaizi2 pushed a commit to ammarfaizi2/linux-fork that referenced this pull request Jan 19, 2022
To pick the changes in this cset:

  980fe2f ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions")

This picks these new prctls:

  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/before
  $ cp arch/x86/include/uapi/asm/prctl.h tools/arch/x86/include/uapi/asm/prctl.h
  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/after
  $ diff -u /tmp/before /tmp/after
  --- /tmp/before	2022-01-19 14:40:05.049394977 -0300
  +++ /tmp/after	2022-01-19 14:40:35.628154565 -0300
  @@ -9,6 +9,8 @@
   	[0x1021 - 0x1001]= "GET_XCOMP_SUPP",
   	[0x1022 - 0x1001]= "GET_XCOMP_PERM",
   	[0x1023 - 0x1001]= "REQ_XCOMP_PERM",
  +	[0x1024 - 0x1001]= "GET_XCOMP_GUEST_PERM",
  +	[0x1025 - 0x1001]= "REQ_XCOMP_GUEST_PERM",
   };

   #define x86_arch_prctl_codes_2_offset 0x2001
  $

With this 'perf trace' can translate those numbers into strings and use
the strings in filter expressions:

  # perf trace -e prctl
       0.000 ( 0.011 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9c014b7df5)     = 0
       0.032 ( 0.002 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9bb6b51580)     = 0
       5.452 ( 0.003 ms): StreamT~ns torvalds#30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfeb70) = 0
       5.468 ( 0.002 ms): StreamT~ns torvalds#30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfea70) = 0
      24.494 ( 0.009 ms): IndexedDB torvalds#556/3722624 prctl(option: SET_NAME, arg2: 0x7f562a32ae28) = 0
      24.540 ( 0.002 ms): IndexedDB torvalds#556/3722624 prctl(option: SET_NAME, arg2: 0x7f563c6d4b30) = 0
     670.281 ( 0.008 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30805c8) = 0
     670.293 ( 0.002 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30800f0) = 0
  ^C#

This addresses these perf build warnings:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/prctl.h' differs from latest version at 'arch/x86/include/uapi/asm/prctl.h'
  diff -u tools/arch/x86/include/uapi/asm/prctl.h arch/x86/include/uapi/asm/prctl.h

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Apr 17, 2023
…ed()

The vsp1 driver uses the vb2_is_streaming() function in its .buf_queue()
handler to check if the .start_streaming() operation has been called,
and decide whether to just add the buffer to an internal queue, or also
trigger a hardware run. vb2_is_streaming() relies on the vb2_queue
structure's streaming field, which used to be set only after calling the
.start_streaming() operation.

Commit a10b215 ("media: vb2: add (un)prepare_streaming queue ops")
changed this, setting the .streaming field in vb2_core_streamon() before
enqueuing buffers to the driver and calling .start_streaming(). This
broke the vsp1 driver which now believes that .start_streaming() has
been called when it hasn't, leading to a crash:

[  881.058705] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[  881.067495] Mem abort info:
[  881.070290]   ESR = 0x0000000096000006
[  881.074042]   EC = 0x25: DABT (current EL), IL = 32 bits
[  881.079358]   SET = 0, FnV = 0
[  881.082414]   EA = 0, S1PTW = 0
[  881.085558]   FSC = 0x06: level 2 translation fault
[  881.090439] Data abort info:
[  881.093320]   ISV = 0, ISS = 0x00000006
[  881.097157]   CM = 0, WnR = 0
[  881.100126] user pgtable: 4k pages, 48-bit VAs, pgdp=000000004fa51000
[  881.106573] [0000000000000020] pgd=080000004f36e003, p4d=080000004f36e003, pud=080000004f7ec003, pmd=0000000000000000
[  881.117217] Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
[  881.123494] Modules linked in: rcar_fdp1 v4l2_mem2mem
[  881.128572] CPU: 0 PID: 1271 Comm: yavta Tainted: G    B              6.2.0-rc1-00023-g6c94e2e99343 torvalds#556
[  881.138061] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
[  881.145981] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  881.152951] pc : vsp1_dl_list_add_body+0xa8/0xe0
[  881.157580] lr : vsp1_dl_list_add_body+0x34/0xe0
[  881.162206] sp : ffff80000c267710
[  881.165522] x29: ffff80000c267710 x28: ffff000010938ae8 x27: ffff000013a8dd98
[  881.172683] x26: ffff000010938098 x25: ffff000013a8dc00 x24: ffff000010ed6ba8
[  881.179841] x23: ffff00000faa4000 x22: 0000000000000000 x21: 0000000000000020
[  881.186998] x20: ffff00000faa4000 x19: 0000000000000000 x18: 0000000000000000
[  881.194154] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  881.201309] x14: 0000000000000000 x13: 746e696174206c65 x12: ffff70000157043d
[  881.208465] x11: 1ffff0000157043c x10: ffff70000157043c x9 : dfff800000000000
[  881.215622] x8 : ffff80000ab821e7 x7 : 00008ffffea8fbc4 x6 : 0000000000000001
[  881.222779] x5 : ffff80000ab821e0 x4 : ffff70000157043d x3 : 0000000000000020
[  881.229936] x2 : 0000000000000020 x1 : ffff00000e4f6400 x0 : 0000000000000000
[  881.237092] Call trace:
[  881.239542]  vsp1_dl_list_add_body+0xa8/0xe0
[  881.243822]  vsp1_video_pipeline_run+0x270/0x2a0
[  881.248449]  vsp1_video_buffer_queue+0x1c0/0x1d0
[  881.253076]  __enqueue_in_driver+0xbc/0x260
[  881.257269]  vb2_start_streaming+0x48/0x200
[  881.261461]  vb2_core_streamon+0x13c/0x280
[  881.265565]  vb2_streamon+0x3c/0x90
[  881.269064]  vsp1_video_streamon+0x2fc/0x3e0
[  881.273344]  v4l_streamon+0x50/0x70
[  881.276844]  __video_do_ioctl+0x2bc/0x5d0
[  881.280861]  video_usercopy+0x2a8/0xc80
[  881.284704]  video_ioctl2+0x20/0x40
[  881.288201]  v4l2_ioctl+0xa4/0xc0
[  881.291525]  __arm64_sys_ioctl+0xe8/0x110
[  881.295543]  invoke_syscall+0x68/0x190
[  881.299303]  el0_svc_common.constprop.0+0x88/0x170
[  881.304105]  do_el0_svc+0x4c/0xf0
[  881.307430]  el0_svc+0x4c/0xa0
[  881.310494]  el0t_64_sync_handler+0xbc/0x140
[  881.314773]  el0t_64_sync+0x190/0x194
[  881.318450] Code: d50323bf d65f03c0 91008263 f9800071 (885f7c60)
[  881.324551] ---[ end trace 0000000000000000 ]---
[  881.329173] note: yavta[1271] exited with preempt_count 1

A different regression report sent to the linux-media mailing list ([1])
was answered with a claim that the vb2_is_streaming() function has never
been meant for this purpose. The document of the function, as well as of
the struct vb2_queue streaming field, is sparse, so this claim may be
hard to verify.

The information needed by the vsp1 driver to decide how to process
queued buffers is also available from the vb2_start_streaming_called()
function. Use it instead of vb2_is_streaming() to fix the problem.

[1] https://lore.kernel.org/linux-media/545610e7-3446-2b82-60dc-7385fea3774f@redhat.com/

Fixes: a10b215 ("media: vb2: add (un)prepare_streaming queue ops")
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Tested-by: Duy Nguyen <duy.nguyen.rh@renesas.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request May 8, 2023
In the process of adding lockdep annotation for GPU job_run() path to
catch potential deadlocks against the shrinker/reclaim path, I turned
up this lockdep splat:

   ======================================================
   WARNING: possible circular locking dependency detected
   6.2.0-rc8-debug+ torvalds#556 Not tainted
   ------------------------------------------------------
   ring0/123 is trying to acquire lock:
   ffffff8087219078 (&devfreq->lock){+.+.}-{3:3}, at: devfreq_monitor_resume+0x3c/0xf0

   but task is already holding lock:
   ffffffd6f64e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (dma_fence_map){++++}-{0:0}:
          __dma_fence_might_wait+0x74/0xc0
          dma_resv_lockdep+0x1f4/0x2f4
          do_one_initcall+0x104/0x2bc
          kernel_init_freeable+0x344/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
          fs_reclaim_acquire+0x80/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc+0xd8/0x100
          topology_parse_cpu_capacity+0x8c/0x178
          get_cpu_for_node+0x88/0xc4
          parse_cluster+0x1b0/0x28c
          parse_cluster+0x8c/0x28c
          init_cpu_topology+0x168/0x188
          smp_prepare_cpus+0x24/0xf8
          kernel_init_freeable+0x18c/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #1 (fs_reclaim){+.+.}-{0:0}:
          __fs_reclaim_acquire+0x3c/0x48
          fs_reclaim_acquire+0x54/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc_node_track_caller+0xb8/0xe0
          kstrdup+0x70/0x90
          kstrdup_const+0x38/0x48
          kvasprintf_const+0x48/0xbc
          kobject_set_name_vargs+0x40/0xb0
          dev_set_name+0x64/0x8c
          devfreq_add_device+0x31c/0x55c
          devm_devfreq_add_device+0x6c/0xb8
          msm_devfreq_init+0xa8/0x16c
          msm_gpu_init+0x38c/0x570
          adreno_gpu_init+0x1b4/0x2b4
          a6xx_gpu_init+0x15c/0x3e4
          adreno_bind+0x218/0x254
          component_bind_all+0x114/0x1ec
          msm_drm_bind+0x2b8/0x608
          try_to_bring_up_aggregate_device+0x88/0x1a4
          __component_add+0xec/0x13c
          component_add+0x1c/0x28
          dsi_dev_attach+0x28/0x34
          dsi_host_attach+0xdc/0x124
          mipi_dsi_attach+0x30/0x44
          devm_mipi_dsi_attach+0x2c/0x70
          ti_sn_bridge_probe+0x298/0x2c4
          auxiliary_bus_probe+0x7c/0x94
          really_probe+0x158/0x290
          __driver_probe_device+0xc8/0xe0
          driver_probe_device+0x44/0x100
          __device_attach_driver+0x64/0xdc
          bus_for_each_drv+0xa0/0xc8
          __device_attach+0xd8/0x168
          device_initial_probe+0x1c/0x28
          bus_probe_device+0x38/0xa0
          deferred_probe_work_func+0xc8/0xe0
          process_one_work+0x2d8/0x478
          process_scheduled_works+0x4c/0x50
          worker_thread+0x218/0x274
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   -> #0 (&devfreq->lock){+.+.}-{3:3}:
          __lock_acquire+0xe00/0x1060
          lock_acquire+0x1e0/0x2f8
          __mutex_lock+0xcc/0x3c8
          mutex_lock_nested+0x30/0x44
          devfreq_monitor_resume+0x3c/0xf0
          devfreq_simple_ondemand_handler+0x54/0x7c
          devfreq_resume_device+0xa4/0xe8
          msm_devfreq_resume+0x78/0xa8
          a6xx_pm_resume+0x110/0x234
          adreno_runtime_resume+0x2c/0x38
          pm_generic_runtime_resume+0x30/0x44
          __rpm_callback+0x15c/0x174
          rpm_callback+0x78/0x7c
          rpm_resume+0x318/0x524
          __pm_runtime_resume+0x78/0xbc
          pm_runtime_get_sync.isra.0+0x14/0x20
          msm_gpu_submit+0x58/0x178
          msm_job_run+0x78/0x150
          drm_sched_main+0x290/0x370
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
     &devfreq->lock --> mmu_notifier_invalidate_range_start --> dma_fence_map

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(dma_fence_map);
                                  lock(mmu_notifier_invalidate_range_start);
                                  lock(dma_fence_map);
     lock(&devfreq->lock);

    *** DEADLOCK ***

   2 locks held by ring0/123:
    #0: ffffff8087201170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
    #1: ffffffd6f64e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   stack backtrace:
   CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ torvalds#556
   Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
   Call trace:
    dump_backtrace.part.0+0xb4/0xf8
    show_stack+0x20/0x38
    dump_stack_lvl+0x9c/0xd0
    dump_stack+0x18/0x34
    print_circular_bug+0x1b4/0x1f0
    check_noncircular+0x78/0xac
    __lock_acquire+0xe00/0x1060
    lock_acquire+0x1e0/0x2f8
    __mutex_lock+0xcc/0x3c8
    mutex_lock_nested+0x30/0x44
    devfreq_monitor_resume+0x3c/0xf0
    devfreq_simple_ondemand_handler+0x54/0x7c
    devfreq_resume_device+0xa4/0xe8
    msm_devfreq_resume+0x78/0xa8
    a6xx_pm_resume+0x110/0x234
    adreno_runtime_resume+0x2c/0x38
    pm_generic_runtime_resume+0x30/0x44
    __rpm_callback+0x15c/0x174
    rpm_callback+0x78/0x7c
    rpm_resume+0x318/0x524
    __pm_runtime_resume+0x78/0xbc
    pm_runtime_get_sync.isra.0+0x14/0x20
    msm_gpu_submit+0x58/0x178
    msm_job_run+0x78/0x150
    drm_sched_main+0x290/0x370
    kthread+0xf0/0x100
    ret_from_fork+0x10/0x20

The issue is that we cannot be holding any lock while doing memory
allocations that is also needed in the job_run (and in the case of
devfreq, this means runpm_resume()) because lockdep sees this as a
potential dependency.

Fortunately there is really no reason to hold the devfreq lock when
we are creating the devfreq device, as it is not yet visible to any
other task.  The only reason it was needed was for a lockdep assert
in devfreq_get_freq_range().  Instead, split this up into an internal
fxn that is used in the devfreq_add_device() (where the lock is not
required).

Signed-off-by: Rob Clark <robdclark@chromium.org>
intersectRaven pushed a commit to intersectRaven/linux that referenced this pull request May 11, 2023
…ed()

[ Upstream commit 52d8cac ]

The vsp1 driver uses the vb2_is_streaming() function in its .buf_queue()
handler to check if the .start_streaming() operation has been called,
and decide whether to just add the buffer to an internal queue, or also
trigger a hardware run. vb2_is_streaming() relies on the vb2_queue
structure's streaming field, which used to be set only after calling the
.start_streaming() operation.

Commit a10b215 ("media: vb2: add (un)prepare_streaming queue ops")
changed this, setting the .streaming field in vb2_core_streamon() before
enqueuing buffers to the driver and calling .start_streaming(). This
broke the vsp1 driver which now believes that .start_streaming() has
been called when it hasn't, leading to a crash:

[  881.058705] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[  881.067495] Mem abort info:
[  881.070290]   ESR = 0x0000000096000006
[  881.074042]   EC = 0x25: DABT (current EL), IL = 32 bits
[  881.079358]   SET = 0, FnV = 0
[  881.082414]   EA = 0, S1PTW = 0
[  881.085558]   FSC = 0x06: level 2 translation fault
[  881.090439] Data abort info:
[  881.093320]   ISV = 0, ISS = 0x00000006
[  881.097157]   CM = 0, WnR = 0
[  881.100126] user pgtable: 4k pages, 48-bit VAs, pgdp=000000004fa51000
[  881.106573] [0000000000000020] pgd=080000004f36e003, p4d=080000004f36e003, pud=080000004f7ec003, pmd=0000000000000000
[  881.117217] Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
[  881.123494] Modules linked in: rcar_fdp1 v4l2_mem2mem
[  881.128572] CPU: 0 PID: 1271 Comm: yavta Tainted: G    B              6.2.0-rc1-00023-g6c94e2e99343 torvalds#556
[  881.138061] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
[  881.145981] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  881.152951] pc : vsp1_dl_list_add_body+0xa8/0xe0
[  881.157580] lr : vsp1_dl_list_add_body+0x34/0xe0
[  881.162206] sp : ffff80000c267710
[  881.165522] x29: ffff80000c267710 x28: ffff000010938ae8 x27: ffff000013a8dd98
[  881.172683] x26: ffff000010938098 x25: ffff000013a8dc00 x24: ffff000010ed6ba8
[  881.179841] x23: ffff00000faa4000 x22: 0000000000000000 x21: 0000000000000020
[  881.186998] x20: ffff00000faa4000 x19: 0000000000000000 x18: 0000000000000000
[  881.194154] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  881.201309] x14: 0000000000000000 x13: 746e696174206c65 x12: ffff70000157043d
[  881.208465] x11: 1ffff0000157043c x10: ffff70000157043c x9 : dfff800000000000
[  881.215622] x8 : ffff80000ab821e7 x7 : 00008ffffea8fbc4 x6 : 0000000000000001
[  881.222779] x5 : ffff80000ab821e0 x4 : ffff70000157043d x3 : 0000000000000020
[  881.229936] x2 : 0000000000000020 x1 : ffff00000e4f6400 x0 : 0000000000000000
[  881.237092] Call trace:
[  881.239542]  vsp1_dl_list_add_body+0xa8/0xe0
[  881.243822]  vsp1_video_pipeline_run+0x270/0x2a0
[  881.248449]  vsp1_video_buffer_queue+0x1c0/0x1d0
[  881.253076]  __enqueue_in_driver+0xbc/0x260
[  881.257269]  vb2_start_streaming+0x48/0x200
[  881.261461]  vb2_core_streamon+0x13c/0x280
[  881.265565]  vb2_streamon+0x3c/0x90
[  881.269064]  vsp1_video_streamon+0x2fc/0x3e0
[  881.273344]  v4l_streamon+0x50/0x70
[  881.276844]  __video_do_ioctl+0x2bc/0x5d0
[  881.280861]  video_usercopy+0x2a8/0xc80
[  881.284704]  video_ioctl2+0x20/0x40
[  881.288201]  v4l2_ioctl+0xa4/0xc0
[  881.291525]  __arm64_sys_ioctl+0xe8/0x110
[  881.295543]  invoke_syscall+0x68/0x190
[  881.299303]  el0_svc_common.constprop.0+0x88/0x170
[  881.304105]  do_el0_svc+0x4c/0xf0
[  881.307430]  el0_svc+0x4c/0xa0
[  881.310494]  el0t_64_sync_handler+0xbc/0x140
[  881.314773]  el0t_64_sync+0x190/0x194
[  881.318450] Code: d50323bf d65f03c0 91008263 f9800071 (885f7c60)
[  881.324551] ---[ end trace 0000000000000000 ]---
[  881.329173] note: yavta[1271] exited with preempt_count 1

A different regression report sent to the linux-media mailing list ([1])
was answered with a claim that the vb2_is_streaming() function has never
been meant for this purpose. The document of the function, as well as of
the struct vb2_queue streaming field, is sparse, so this claim may be
hard to verify.

The information needed by the vsp1 driver to decide how to process
queued buffers is also available from the vb2_start_streaming_called()
function. Use it instead of vb2_is_streaming() to fix the problem.

[1] https://lore.kernel.org/linux-media/545610e7-3446-2b82-60dc-7385fea3774f@redhat.com/

Fixes: a10b215 ("media: vb2: add (un)prepare_streaming queue ops")
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Tested-by: Duy Nguyen <duy.nguyen.rh@renesas.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
damentz pushed a commit to zen-kernel/zen-kernel that referenced this pull request May 11, 2023
…ed()

[ Upstream commit 52d8cac ]

The vsp1 driver uses the vb2_is_streaming() function in its .buf_queue()
handler to check if the .start_streaming() operation has been called,
and decide whether to just add the buffer to an internal queue, or also
trigger a hardware run. vb2_is_streaming() relies on the vb2_queue
structure's streaming field, which used to be set only after calling the
.start_streaming() operation.

Commit a10b215 ("media: vb2: add (un)prepare_streaming queue ops")
changed this, setting the .streaming field in vb2_core_streamon() before
enqueuing buffers to the driver and calling .start_streaming(). This
broke the vsp1 driver which now believes that .start_streaming() has
been called when it hasn't, leading to a crash:

[  881.058705] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[  881.067495] Mem abort info:
[  881.070290]   ESR = 0x0000000096000006
[  881.074042]   EC = 0x25: DABT (current EL), IL = 32 bits
[  881.079358]   SET = 0, FnV = 0
[  881.082414]   EA = 0, S1PTW = 0
[  881.085558]   FSC = 0x06: level 2 translation fault
[  881.090439] Data abort info:
[  881.093320]   ISV = 0, ISS = 0x00000006
[  881.097157]   CM = 0, WnR = 0
[  881.100126] user pgtable: 4k pages, 48-bit VAs, pgdp=000000004fa51000
[  881.106573] [0000000000000020] pgd=080000004f36e003, p4d=080000004f36e003, pud=080000004f7ec003, pmd=0000000000000000
[  881.117217] Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
[  881.123494] Modules linked in: rcar_fdp1 v4l2_mem2mem
[  881.128572] CPU: 0 PID: 1271 Comm: yavta Tainted: G    B              6.2.0-rc1-00023-g6c94e2e99343 torvalds#556
[  881.138061] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
[  881.145981] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  881.152951] pc : vsp1_dl_list_add_body+0xa8/0xe0
[  881.157580] lr : vsp1_dl_list_add_body+0x34/0xe0
[  881.162206] sp : ffff80000c267710
[  881.165522] x29: ffff80000c267710 x28: ffff000010938ae8 x27: ffff000013a8dd98
[  881.172683] x26: ffff000010938098 x25: ffff000013a8dc00 x24: ffff000010ed6ba8
[  881.179841] x23: ffff00000faa4000 x22: 0000000000000000 x21: 0000000000000020
[  881.186998] x20: ffff00000faa4000 x19: 0000000000000000 x18: 0000000000000000
[  881.194154] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  881.201309] x14: 0000000000000000 x13: 746e696174206c65 x12: ffff70000157043d
[  881.208465] x11: 1ffff0000157043c x10: ffff70000157043c x9 : dfff800000000000
[  881.215622] x8 : ffff80000ab821e7 x7 : 00008ffffea8fbc4 x6 : 0000000000000001
[  881.222779] x5 : ffff80000ab821e0 x4 : ffff70000157043d x3 : 0000000000000020
[  881.229936] x2 : 0000000000000020 x1 : ffff00000e4f6400 x0 : 0000000000000000
[  881.237092] Call trace:
[  881.239542]  vsp1_dl_list_add_body+0xa8/0xe0
[  881.243822]  vsp1_video_pipeline_run+0x270/0x2a0
[  881.248449]  vsp1_video_buffer_queue+0x1c0/0x1d0
[  881.253076]  __enqueue_in_driver+0xbc/0x260
[  881.257269]  vb2_start_streaming+0x48/0x200
[  881.261461]  vb2_core_streamon+0x13c/0x280
[  881.265565]  vb2_streamon+0x3c/0x90
[  881.269064]  vsp1_video_streamon+0x2fc/0x3e0
[  881.273344]  v4l_streamon+0x50/0x70
[  881.276844]  __video_do_ioctl+0x2bc/0x5d0
[  881.280861]  video_usercopy+0x2a8/0xc80
[  881.284704]  video_ioctl2+0x20/0x40
[  881.288201]  v4l2_ioctl+0xa4/0xc0
[  881.291525]  __arm64_sys_ioctl+0xe8/0x110
[  881.295543]  invoke_syscall+0x68/0x190
[  881.299303]  el0_svc_common.constprop.0+0x88/0x170
[  881.304105]  do_el0_svc+0x4c/0xf0
[  881.307430]  el0_svc+0x4c/0xa0
[  881.310494]  el0t_64_sync_handler+0xbc/0x140
[  881.314773]  el0t_64_sync+0x190/0x194
[  881.318450] Code: d50323bf d65f03c0 91008263 f9800071 (885f7c60)
[  881.324551] ---[ end trace 0000000000000000 ]---
[  881.329173] note: yavta[1271] exited with preempt_count 1

A different regression report sent to the linux-media mailing list ([1])
was answered with a claim that the vb2_is_streaming() function has never
been meant for this purpose. The document of the function, as well as of
the struct vb2_queue streaming field, is sparse, so this claim may be
hard to verify.

The information needed by the vsp1 driver to decide how to process
queued buffers is also available from the vb2_start_streaming_called()
function. Use it instead of vb2_is_streaming() to fix the problem.

[1] https://lore.kernel.org/linux-media/545610e7-3446-2b82-60dc-7385fea3774f@redhat.com/

Fixes: a10b215 ("media: vb2: add (un)prepare_streaming queue ops")
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Tested-by: Duy Nguyen <duy.nguyen.rh@renesas.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
charwliu pushed a commit to charwliu/linux that referenced this pull request May 12, 2023
…ed()

[ Upstream commit 52d8cac ]

The vsp1 driver uses the vb2_is_streaming() function in its .buf_queue()
handler to check if the .start_streaming() operation has been called,
and decide whether to just add the buffer to an internal queue, or also
trigger a hardware run. vb2_is_streaming() relies on the vb2_queue
structure's streaming field, which used to be set only after calling the
.start_streaming() operation.

Commit a10b215 ("media: vb2: add (un)prepare_streaming queue ops")
changed this, setting the .streaming field in vb2_core_streamon() before
enqueuing buffers to the driver and calling .start_streaming(). This
broke the vsp1 driver which now believes that .start_streaming() has
been called when it hasn't, leading to a crash:

[  881.058705] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020
[  881.067495] Mem abort info:
[  881.070290]   ESR = 0x0000000096000006
[  881.074042]   EC = 0x25: DABT (current EL), IL = 32 bits
[  881.079358]   SET = 0, FnV = 0
[  881.082414]   EA = 0, S1PTW = 0
[  881.085558]   FSC = 0x06: level 2 translation fault
[  881.090439] Data abort info:
[  881.093320]   ISV = 0, ISS = 0x00000006
[  881.097157]   CM = 0, WnR = 0
[  881.100126] user pgtable: 4k pages, 48-bit VAs, pgdp=000000004fa51000
[  881.106573] [0000000000000020] pgd=080000004f36e003, p4d=080000004f36e003, pud=080000004f7ec003, pmd=0000000000000000
[  881.117217] Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
[  881.123494] Modules linked in: rcar_fdp1 v4l2_mem2mem
[  881.128572] CPU: 0 PID: 1271 Comm: yavta Tainted: G    B              6.2.0-rc1-00023-g6c94e2e99343 torvalds#556
[  881.138061] Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT)
[  881.145981] pstate: 400000c5 (nZcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  881.152951] pc : vsp1_dl_list_add_body+0xa8/0xe0
[  881.157580] lr : vsp1_dl_list_add_body+0x34/0xe0
[  881.162206] sp : ffff80000c267710
[  881.165522] x29: ffff80000c267710 x28: ffff000010938ae8 x27: ffff000013a8dd98
[  881.172683] x26: ffff000010938098 x25: ffff000013a8dc00 x24: ffff000010ed6ba8
[  881.179841] x23: ffff00000faa4000 x22: 0000000000000000 x21: 0000000000000020
[  881.186998] x20: ffff00000faa4000 x19: 0000000000000000 x18: 0000000000000000
[  881.194154] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[  881.201309] x14: 0000000000000000 x13: 746e696174206c65 x12: ffff70000157043d
[  881.208465] x11: 1ffff0000157043c x10: ffff70000157043c x9 : dfff800000000000
[  881.215622] x8 : ffff80000ab821e7 x7 : 00008ffffea8fbc4 x6 : 0000000000000001
[  881.222779] x5 : ffff80000ab821e0 x4 : ffff70000157043d x3 : 0000000000000020
[  881.229936] x2 : 0000000000000020 x1 : ffff00000e4f6400 x0 : 0000000000000000
[  881.237092] Call trace:
[  881.239542]  vsp1_dl_list_add_body+0xa8/0xe0
[  881.243822]  vsp1_video_pipeline_run+0x270/0x2a0
[  881.248449]  vsp1_video_buffer_queue+0x1c0/0x1d0
[  881.253076]  __enqueue_in_driver+0xbc/0x260
[  881.257269]  vb2_start_streaming+0x48/0x200
[  881.261461]  vb2_core_streamon+0x13c/0x280
[  881.265565]  vb2_streamon+0x3c/0x90
[  881.269064]  vsp1_video_streamon+0x2fc/0x3e0
[  881.273344]  v4l_streamon+0x50/0x70
[  881.276844]  __video_do_ioctl+0x2bc/0x5d0
[  881.280861]  video_usercopy+0x2a8/0xc80
[  881.284704]  video_ioctl2+0x20/0x40
[  881.288201]  v4l2_ioctl+0xa4/0xc0
[  881.291525]  __arm64_sys_ioctl+0xe8/0x110
[  881.295543]  invoke_syscall+0x68/0x190
[  881.299303]  el0_svc_common.constprop.0+0x88/0x170
[  881.304105]  do_el0_svc+0x4c/0xf0
[  881.307430]  el0_svc+0x4c/0xa0
[  881.310494]  el0t_64_sync_handler+0xbc/0x140
[  881.314773]  el0t_64_sync+0x190/0x194
[  881.318450] Code: d50323bf d65f03c0 91008263 f9800071 (885f7c60)
[  881.324551] ---[ end trace 0000000000000000 ]---
[  881.329173] note: yavta[1271] exited with preempt_count 1

A different regression report sent to the linux-media mailing list ([1])
was answered with a claim that the vb2_is_streaming() function has never
been meant for this purpose. The document of the function, as well as of
the struct vb2_queue streaming field, is sparse, so this claim may be
hard to verify.

The information needed by the vsp1 driver to decide how to process
queued buffers is also available from the vb2_start_streaming_called()
function. Use it instead of vb2_is_streaming() to fix the problem.

[1] https://lore.kernel.org/linux-media/545610e7-3446-2b82-60dc-7385fea3774f@redhat.com/

Fixes: a10b215 ("media: vb2: add (un)prepare_streaming queue ops")
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Tested-by: Duy Nguyen <duy.nguyen.rh@renesas.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Signed-off-by: Sasha Levin <sashal@kernel.org>
akiyks pushed a commit to akiyks/linux that referenced this pull request May 18, 2023
To pick the changes in this cset:

  a03c376 ("x86/arch_prctl: Add AMX feature numbers as ABI constants")
  23e5d9e ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive")
  2f8794b ("x86/mm: Provide arch_prctl() interface for LAM")

This picks these new prctls in a third range, that was also added to the
tools/perf/trace/beauty/arch_prctl.c beautifier.

  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/before
  $ cp arch/x86/include/uapi/asm/prctl.h tools/arch/x86/include/uapi/asm/prctl.h
  $ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/after
  $ diff -u /tmp/before /tmp/after
  @@ -20,3 +20,11 @@
   	[0x2003 - 0x2001]= "MAP_VDSO_64",
   };

  +#define x86_arch_prctl_codes_3_offset 0x4001
  +static const char *x86_arch_prctl_codes_3[] = {
  +	[0x4001 - 0x4001]= "GET_UNTAG_MASK",
  +	[0x4002 - 0x4001]= "ENABLE_TAGGED_ADDR",
  +	[0x4003 - 0x4001]= "GET_MAX_TAG_BITS",
  +	[0x4004 - 0x4001]= "FORCE_TAGGED_SVA",
  +};
  +
  $

With this 'perf trace' can translate those numbers into strings and use
the strings in filter expressions:

  # perf trace -e prctl
       0.000 ( 0.011 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9c014b7df5)     = 0
       0.032 ( 0.002 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9bb6b51580)     = 0
       5.452 ( 0.003 ms): StreamT~ns torvalds#30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfeb70) = 0
       5.468 ( 0.002 ms): StreamT~ns torvalds#30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfea70) = 0
      24.494 ( 0.009 ms): IndexedDB torvalds#556/3722624 prctl(option: SET_NAME, arg2: 0x7f562a32ae28) = 0
      24.540 ( 0.002 ms): IndexedDB torvalds#556/3722624 prctl(option: SET_NAME, arg2: 0x7f563c6d4b30) = 0
     670.281 ( 0.008 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30805c8) = 0
     670.293 ( 0.002 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30800f0) = 0
  ^C#

This addresses this perf build warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/prctl.h' differs from latest version at 'arch/x86/include/uapi/asm/prctl.h'
  diff -u tools/arch/x86/include/uapi/asm/prctl.h arch/x86/include/uapi/asm/prctl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZGTjNPpD3FOWfetM@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Aug 3, 2023
In the process of adding lockdep annotation for GPU job_run() path to
catch potential deadlocks against the shrinker/reclaim path, I turned
up this lockdep splat:

   ======================================================
   WARNING: possible circular locking dependency detected
   6.2.0-rc8-debug+ torvalds#556 Not tainted
   ------------------------------------------------------
   ring0/123 is trying to acquire lock:
   ffffff8087219078 (&devfreq->lock){+.+.}-{3:3}, at: devfreq_monitor_resume+0x3c/0xf0

   but task is already holding lock:
   ffffffd6f64e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (dma_fence_map){++++}-{0:0}:
          __dma_fence_might_wait+0x74/0xc0
          dma_resv_lockdep+0x1f4/0x2f4
          do_one_initcall+0x104/0x2bc
          kernel_init_freeable+0x344/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
          fs_reclaim_acquire+0x80/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc+0xd8/0x100
          topology_parse_cpu_capacity+0x8c/0x178
          get_cpu_for_node+0x88/0xc4
          parse_cluster+0x1b0/0x28c
          parse_cluster+0x8c/0x28c
          init_cpu_topology+0x168/0x188
          smp_prepare_cpus+0x24/0xf8
          kernel_init_freeable+0x18c/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #1 (fs_reclaim){+.+.}-{0:0}:
          __fs_reclaim_acquire+0x3c/0x48
          fs_reclaim_acquire+0x54/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc_node_track_caller+0xb8/0xe0
          kstrdup+0x70/0x90
          kstrdup_const+0x38/0x48
          kvasprintf_const+0x48/0xbc
          kobject_set_name_vargs+0x40/0xb0
          dev_set_name+0x64/0x8c
          devfreq_add_device+0x31c/0x55c
          devm_devfreq_add_device+0x6c/0xb8
          msm_devfreq_init+0xa8/0x16c
          msm_gpu_init+0x38c/0x570
          adreno_gpu_init+0x1b4/0x2b4
          a6xx_gpu_init+0x15c/0x3e4
          adreno_bind+0x218/0x254
          component_bind_all+0x114/0x1ec
          msm_drm_bind+0x2b8/0x608
          try_to_bring_up_aggregate_device+0x88/0x1a4
          __component_add+0xec/0x13c
          component_add+0x1c/0x28
          dsi_dev_attach+0x28/0x34
          dsi_host_attach+0xdc/0x124
          mipi_dsi_attach+0x30/0x44
          devm_mipi_dsi_attach+0x2c/0x70
          ti_sn_bridge_probe+0x298/0x2c4
          auxiliary_bus_probe+0x7c/0x94
          really_probe+0x158/0x290
          __driver_probe_device+0xc8/0xe0
          driver_probe_device+0x44/0x100
          __device_attach_driver+0x64/0xdc
          bus_for_each_drv+0xa0/0xc8
          __device_attach+0xd8/0x168
          device_initial_probe+0x1c/0x28
          bus_probe_device+0x38/0xa0
          deferred_probe_work_func+0xc8/0xe0
          process_one_work+0x2d8/0x478
          process_scheduled_works+0x4c/0x50
          worker_thread+0x218/0x274
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   -> #0 (&devfreq->lock){+.+.}-{3:3}:
          __lock_acquire+0xe00/0x1060
          lock_acquire+0x1e0/0x2f8
          __mutex_lock+0xcc/0x3c8
          mutex_lock_nested+0x30/0x44
          devfreq_monitor_resume+0x3c/0xf0
          devfreq_simple_ondemand_handler+0x54/0x7c
          devfreq_resume_device+0xa4/0xe8
          msm_devfreq_resume+0x78/0xa8
          a6xx_pm_resume+0x110/0x234
          adreno_runtime_resume+0x2c/0x38
          pm_generic_runtime_resume+0x30/0x44
          __rpm_callback+0x15c/0x174
          rpm_callback+0x78/0x7c
          rpm_resume+0x318/0x524
          __pm_runtime_resume+0x78/0xbc
          pm_runtime_get_sync.isra.0+0x14/0x20
          msm_gpu_submit+0x58/0x178
          msm_job_run+0x78/0x150
          drm_sched_main+0x290/0x370
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
     &devfreq->lock --> mmu_notifier_invalidate_range_start --> dma_fence_map

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(dma_fence_map);
                                  lock(mmu_notifier_invalidate_range_start);
                                  lock(dma_fence_map);
     lock(&devfreq->lock);

    *** DEADLOCK ***

   2 locks held by ring0/123:
    #0: ffffff8087201170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
    #1: ffffffd6f64e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   stack backtrace:
   CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ torvalds#556
   Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
   Call trace:
    dump_backtrace.part.0+0xb4/0xf8
    show_stack+0x20/0x38
    dump_stack_lvl+0x9c/0xd0
    dump_stack+0x18/0x34
    print_circular_bug+0x1b4/0x1f0
    check_noncircular+0x78/0xac
    __lock_acquire+0xe00/0x1060
    lock_acquire+0x1e0/0x2f8
    __mutex_lock+0xcc/0x3c8
    mutex_lock_nested+0x30/0x44
    devfreq_monitor_resume+0x3c/0xf0
    devfreq_simple_ondemand_handler+0x54/0x7c
    devfreq_resume_device+0xa4/0xe8
    msm_devfreq_resume+0x78/0xa8
    a6xx_pm_resume+0x110/0x234
    adreno_runtime_resume+0x2c/0x38
    pm_generic_runtime_resume+0x30/0x44
    __rpm_callback+0x15c/0x174
    rpm_callback+0x78/0x7c
    rpm_resume+0x318/0x524
    __pm_runtime_resume+0x78/0xbc
    pm_runtime_get_sync.isra.0+0x14/0x20
    msm_gpu_submit+0x58/0x178
    msm_job_run+0x78/0x150
    drm_sched_main+0x290/0x370
    kthread+0xf0/0x100
    ret_from_fork+0x10/0x20

The issue is that we cannot be holding any lock while doing memory
allocations that is also needed in the job_run (and in the case of
devfreq, this means runpm_resume()) because lockdep sees this as a
potential dependency.

Fortunately there is really no reason to hold the devfreq lock when
we are creating the devfreq device, as it is not yet visible to any
other task.  The only reason it was needed was for a lockdep assert
in devfreq_get_freq_range().  Instead, split this up into an internal
fxn that is used in the devfreq_add_device() (where the lock is not
required).

Signed-off-by: Rob Clark <robdclark@chromium.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Aug 7, 2023
In the process of adding lockdep annotation for GPU job_run() path to
catch potential deadlocks against the shrinker/reclaim path, I turned
up this lockdep splat:

   ======================================================
   WARNING: possible circular locking dependency detected
   6.2.0-rc8-debug+ torvalds#556 Not tainted
   ------------------------------------------------------
   ring0/123 is trying to acquire lock:
   ffffff8087219078 (&devfreq->lock){+.+.}-{3:3}, at: devfreq_monitor_resume+0x3c/0xf0

   but task is already holding lock:
   ffffffd6f64e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (dma_fence_map){++++}-{0:0}:
          __dma_fence_might_wait+0x74/0xc0
          dma_resv_lockdep+0x1f4/0x2f4
          do_one_initcall+0x104/0x2bc
          kernel_init_freeable+0x344/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
          fs_reclaim_acquire+0x80/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc+0xd8/0x100
          topology_parse_cpu_capacity+0x8c/0x178
          get_cpu_for_node+0x88/0xc4
          parse_cluster+0x1b0/0x28c
          parse_cluster+0x8c/0x28c
          init_cpu_topology+0x168/0x188
          smp_prepare_cpus+0x24/0xf8
          kernel_init_freeable+0x18c/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #1 (fs_reclaim){+.+.}-{0:0}:
          __fs_reclaim_acquire+0x3c/0x48
          fs_reclaim_acquire+0x54/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc_node_track_caller+0xb8/0xe0
          kstrdup+0x70/0x90
          kstrdup_const+0x38/0x48
          kvasprintf_const+0x48/0xbc
          kobject_set_name_vargs+0x40/0xb0
          dev_set_name+0x64/0x8c
          devfreq_add_device+0x31c/0x55c
          devm_devfreq_add_device+0x6c/0xb8
          msm_devfreq_init+0xa8/0x16c
          msm_gpu_init+0x38c/0x570
          adreno_gpu_init+0x1b4/0x2b4
          a6xx_gpu_init+0x15c/0x3e4
          adreno_bind+0x218/0x254
          component_bind_all+0x114/0x1ec
          msm_drm_bind+0x2b8/0x608
          try_to_bring_up_aggregate_device+0x88/0x1a4
          __component_add+0xec/0x13c
          component_add+0x1c/0x28
          dsi_dev_attach+0x28/0x34
          dsi_host_attach+0xdc/0x124
          mipi_dsi_attach+0x30/0x44
          devm_mipi_dsi_attach+0x2c/0x70
          ti_sn_bridge_probe+0x298/0x2c4
          auxiliary_bus_probe+0x7c/0x94
          really_probe+0x158/0x290
          __driver_probe_device+0xc8/0xe0
          driver_probe_device+0x44/0x100
          __device_attach_driver+0x64/0xdc
          bus_for_each_drv+0xa0/0xc8
          __device_attach+0xd8/0x168
          device_initial_probe+0x1c/0x28
          bus_probe_device+0x38/0xa0
          deferred_probe_work_func+0xc8/0xe0
          process_one_work+0x2d8/0x478
          process_scheduled_works+0x4c/0x50
          worker_thread+0x218/0x274
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   -> #0 (&devfreq->lock){+.+.}-{3:3}:
          __lock_acquire+0xe00/0x1060
          lock_acquire+0x1e0/0x2f8
          __mutex_lock+0xcc/0x3c8
          mutex_lock_nested+0x30/0x44
          devfreq_monitor_resume+0x3c/0xf0
          devfreq_simple_ondemand_handler+0x54/0x7c
          devfreq_resume_device+0xa4/0xe8
          msm_devfreq_resume+0x78/0xa8
          a6xx_pm_resume+0x110/0x234
          adreno_runtime_resume+0x2c/0x38
          pm_generic_runtime_resume+0x30/0x44
          __rpm_callback+0x15c/0x174
          rpm_callback+0x78/0x7c
          rpm_resume+0x318/0x524
          __pm_runtime_resume+0x78/0xbc
          pm_runtime_get_sync.isra.0+0x14/0x20
          msm_gpu_submit+0x58/0x178
          msm_job_run+0x78/0x150
          drm_sched_main+0x290/0x370
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
     &devfreq->lock --> mmu_notifier_invalidate_range_start --> dma_fence_map

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(dma_fence_map);
                                  lock(mmu_notifier_invalidate_range_start);
                                  lock(dma_fence_map);
     lock(&devfreq->lock);

    *** DEADLOCK ***

   2 locks held by ring0/123:
    #0: ffffff8087201170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
    #1: ffffffd6f64e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   stack backtrace:
   CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ torvalds#556
   Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
   Call trace:
    dump_backtrace.part.0+0xb4/0xf8
    show_stack+0x20/0x38
    dump_stack_lvl+0x9c/0xd0
    dump_stack+0x18/0x34
    print_circular_bug+0x1b4/0x1f0
    check_noncircular+0x78/0xac
    __lock_acquire+0xe00/0x1060
    lock_acquire+0x1e0/0x2f8
    __mutex_lock+0xcc/0x3c8
    mutex_lock_nested+0x30/0x44
    devfreq_monitor_resume+0x3c/0xf0
    devfreq_simple_ondemand_handler+0x54/0x7c
    devfreq_resume_device+0xa4/0xe8
    msm_devfreq_resume+0x78/0xa8
    a6xx_pm_resume+0x110/0x234
    adreno_runtime_resume+0x2c/0x38
    pm_generic_runtime_resume+0x30/0x44
    __rpm_callback+0x15c/0x174
    rpm_callback+0x78/0x7c
    rpm_resume+0x318/0x524
    __pm_runtime_resume+0x78/0xbc
    pm_runtime_get_sync.isra.0+0x14/0x20
    msm_gpu_submit+0x58/0x178
    msm_job_run+0x78/0x150
    drm_sched_main+0x290/0x370
    kthread+0xf0/0x100
    ret_from_fork+0x10/0x20

The issue is that we cannot be holding any lock while doing memory
allocations that is also needed in the job_run (and in the case of
devfreq, this means runpm_resume()) because lockdep sees this as a
potential dependency.

Fortunately there is really no reason to hold the devfreq lock when
we are creating the devfreq device, as it is not yet visible to any
other task.  The only reason it was needed was for a lockdep assert
in devfreq_get_freq_range().  Instead, split this up into an internal
fxn that is used in the devfreq_add_device() (where the lock is not
required).

Signed-off-by: Rob Clark <robdclark@chromium.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Aug 22, 2023
In the process of adding lockdep annotation for GPU job_run() path to
catch potential deadlocks against the shrinker/reclaim path, I turned
up this lockdep splat:

   ======================================================
   WARNING: possible circular locking dependency detected
   6.2.0-rc8-debug+ torvalds#556 Not tainted
   ------------------------------------------------------
   ring0/123 is trying to acquire lock:
   ffffff8087219078 (&devfreq->lock){+.+.}-{3:3}, at: devfreq_monitor_resume+0x3c/0xf0

   but task is already holding lock:
   ffffffd6f64e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:

   -> #3 (dma_fence_map){++++}-{0:0}:
          __dma_fence_might_wait+0x74/0xc0
          dma_resv_lockdep+0x1f4/0x2f4
          do_one_initcall+0x104/0x2bc
          kernel_init_freeable+0x344/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
          fs_reclaim_acquire+0x80/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc+0xd8/0x100
          topology_parse_cpu_capacity+0x8c/0x178
          get_cpu_for_node+0x88/0xc4
          parse_cluster+0x1b0/0x28c
          parse_cluster+0x8c/0x28c
          init_cpu_topology+0x168/0x188
          smp_prepare_cpus+0x24/0xf8
          kernel_init_freeable+0x18c/0x34c
          kernel_init+0x30/0x134
          ret_from_fork+0x10/0x20

   -> #1 (fs_reclaim){+.+.}-{0:0}:
          __fs_reclaim_acquire+0x3c/0x48
          fs_reclaim_acquire+0x54/0xa8
          slab_pre_alloc_hook.constprop.0+0x40/0x25c
          __kmem_cache_alloc_node+0x60/0x1cc
          __kmalloc_node_track_caller+0xb8/0xe0
          kstrdup+0x70/0x90
          kstrdup_const+0x38/0x48
          kvasprintf_const+0x48/0xbc
          kobject_set_name_vargs+0x40/0xb0
          dev_set_name+0x64/0x8c
          devfreq_add_device+0x31c/0x55c
          devm_devfreq_add_device+0x6c/0xb8
          msm_devfreq_init+0xa8/0x16c
          msm_gpu_init+0x38c/0x570
          adreno_gpu_init+0x1b4/0x2b4
          a6xx_gpu_init+0x15c/0x3e4
          adreno_bind+0x218/0x254
          component_bind_all+0x114/0x1ec
          msm_drm_bind+0x2b8/0x608
          try_to_bring_up_aggregate_device+0x88/0x1a4
          __component_add+0xec/0x13c
          component_add+0x1c/0x28
          dsi_dev_attach+0x28/0x34
          dsi_host_attach+0xdc/0x124
          mipi_dsi_attach+0x30/0x44
          devm_mipi_dsi_attach+0x2c/0x70
          ti_sn_bridge_probe+0x298/0x2c4
          auxiliary_bus_probe+0x7c/0x94
          really_probe+0x158/0x290
          __driver_probe_device+0xc8/0xe0
          driver_probe_device+0x44/0x100
          __device_attach_driver+0x64/0xdc
          bus_for_each_drv+0xa0/0xc8
          __device_attach+0xd8/0x168
          device_initial_probe+0x1c/0x28
          bus_probe_device+0x38/0xa0
          deferred_probe_work_func+0xc8/0xe0
          process_one_work+0x2d8/0x478
          process_scheduled_works+0x4c/0x50
          worker_thread+0x218/0x274
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   -> #0 (&devfreq->lock){+.+.}-{3:3}:
          __lock_acquire+0xe00/0x1060
          lock_acquire+0x1e0/0x2f8
          __mutex_lock+0xcc/0x3c8
          mutex_lock_nested+0x30/0x44
          devfreq_monitor_resume+0x3c/0xf0
          devfreq_simple_ondemand_handler+0x54/0x7c
          devfreq_resume_device+0xa4/0xe8
          msm_devfreq_resume+0x78/0xa8
          a6xx_pm_resume+0x110/0x234
          adreno_runtime_resume+0x2c/0x38
          pm_generic_runtime_resume+0x30/0x44
          __rpm_callback+0x15c/0x174
          rpm_callback+0x78/0x7c
          rpm_resume+0x318/0x524
          __pm_runtime_resume+0x78/0xbc
          pm_runtime_get_sync.isra.0+0x14/0x20
          msm_gpu_submit+0x58/0x178
          msm_job_run+0x78/0x150
          drm_sched_main+0x290/0x370
          kthread+0xf0/0x100
          ret_from_fork+0x10/0x20

   other info that might help us debug this:

   Chain exists of:
     &devfreq->lock --> mmu_notifier_invalidate_range_start --> dma_fence_map

    Possible unsafe locking scenario:

          CPU0                    CPU1
          ----                    ----
     lock(dma_fence_map);
                                  lock(mmu_notifier_invalidate_range_start);
                                  lock(dma_fence_map);
     lock(&devfreq->lock);

    *** DEADLOCK ***

   2 locks held by ring0/123:
    #0: ffffff8087201170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
    #1: ffffffd6f64e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150

   stack backtrace:
   CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ torvalds#556
   Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
   Call trace:
    dump_backtrace.part.0+0xb4/0xf8
    show_stack+0x20/0x38
    dump_stack_lvl+0x9c/0xd0
    dump_stack+0x18/0x34
    print_circular_bug+0x1b4/0x1f0
    check_noncircular+0x78/0xac
    __lock_acquire+0xe00/0x1060
    lock_acquire+0x1e0/0x2f8
    __mutex_lock+0xcc/0x3c8
    mutex_lock_nested+0x30/0x44
    devfreq_monitor_resume+0x3c/0xf0
    devfreq_simple_ondemand_handler+0x54/0x7c
    devfreq_resume_device+0xa4/0xe8
    msm_devfreq_resume+0x78/0xa8
    a6xx_pm_resume+0x110/0x234
    adreno_runtime_resume+0x2c/0x38
    pm_generic_runtime_resume+0x30/0x44
    __rpm_callback+0x15c/0x174
    rpm_callback+0x78/0x7c
    rpm_resume+0x318/0x524
    __pm_runtime_resume+0x78/0xbc
    pm_runtime_get_sync.isra.0+0x14/0x20
    msm_gpu_submit+0x58/0x178
    msm_job_run+0x78/0x150
    drm_sched_main+0x290/0x370
    kthread+0xf0/0x100
    ret_from_fork+0x10/0x20

The issue is that we cannot be holding any lock while doing memory
allocations that is also needed in the job_run (and in the case of
devfreq, this means runpm_resume()) because lockdep sees this as a
potential dependency.

Fortunately there is really no reason to hold the devfreq lock when
we are creating the devfreq device, as it is not yet visible to any
other task.  The only reason it was needed was for a lockdep assert
in devfreq_get_freq_range().  Instead, split this up into an internal
fxn that is used in the devfreq_add_device() (where the lock is not
required).

Signed-off-by: Rob Clark <robdclark@chromium.org>
ddiss pushed a commit to ddiss/linux that referenced this pull request Feb 14, 2025
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jul 17, 2025
Add JIT support for the load_acquire and store_release instructions. The
implementation is similar to the kernel where:

        load_acquire  => plain load -> lwsync
        store_release => lwsync -> plain store

To test the correctness of the implementation, following selftests were
run:

  [fedora@linux-kernel bpf]$ sudo ./test_progs -a \
  verifier_load_acquire,verifier_store_release,atomics
  torvalds#11/1    atomics/add:OK
  torvalds#11/2    atomics/sub:OK
  torvalds#11/3    atomics/and:OK
  torvalds#11/4    atomics/or:OK
  torvalds#11/5    atomics/xor:OK
  torvalds#11/6    atomics/cmpxchg:OK
  torvalds#11/7    atomics/xchg:OK
  torvalds#11      atomics:OK
  torvalds#519/1   verifier_load_acquire/load-acquire, 8-bit:OK
  torvalds#519/2   verifier_load_acquire/load-acquire, 8-bit @unpriv:OK
  torvalds#519/3   verifier_load_acquire/load-acquire, 16-bit:OK
  torvalds#519/4   verifier_load_acquire/load-acquire, 16-bit @unpriv:OK
  torvalds#519/5   verifier_load_acquire/load-acquire, 32-bit:OK
  torvalds#519/6   verifier_load_acquire/load-acquire, 32-bit @unpriv:OK
  torvalds#519/7   verifier_load_acquire/load-acquire, 64-bit:OK
  torvalds#519/8   verifier_load_acquire/load-acquire, 64-bit @unpriv:OK
  torvalds#519/9   verifier_load_acquire/load-acquire with uninitialized
  src_reg:OK
  torvalds#519/10  verifier_load_acquire/load-acquire with uninitialized src_reg
  @unpriv:OK
  torvalds#519/11  verifier_load_acquire/load-acquire with non-pointer src_reg:OK
  torvalds#519/12  verifier_load_acquire/load-acquire with non-pointer src_reg
  @unpriv:OK
  torvalds#519/13  verifier_load_acquire/misaligned load-acquire:OK
  torvalds#519/14  verifier_load_acquire/misaligned load-acquire @unpriv:OK
  torvalds#519/15  verifier_load_acquire/load-acquire from ctx pointer:OK
  torvalds#519/16  verifier_load_acquire/load-acquire from ctx pointer @unpriv:OK
  torvalds#519/17  verifier_load_acquire/load-acquire with invalid register R15:OK
  torvalds#519/18  verifier_load_acquire/load-acquire with invalid register R15
  @unpriv:OK
  torvalds#519/19  verifier_load_acquire/load-acquire from pkt pointer:OK
  torvalds#519/20  verifier_load_acquire/load-acquire from flow_keys pointer:OK
  torvalds#519/21  verifier_load_acquire/load-acquire from sock pointer:OK
  torvalds#519     verifier_load_acquire:OK
  torvalds#556/1   verifier_store_release/store-release, 8-bit:OK
  torvalds#556/2   verifier_store_release/store-release, 8-bit @unpriv:OK
  torvalds#556/3   verifier_store_release/store-release, 16-bit:OK
  torvalds#556/4   verifier_store_release/store-release, 16-bit @unpriv:OK
  torvalds#556/5   verifier_store_release/store-release, 32-bit:OK
  torvalds#556/6   verifier_store_release/store-release, 32-bit @unpriv:OK
  torvalds#556/7   verifier_store_release/store-release, 64-bit:OK
  torvalds#556/8   verifier_store_release/store-release, 64-bit @unpriv:OK
  torvalds#556/9   verifier_store_release/store-release with uninitialized
  src_reg:OK
  torvalds#556/10  verifier_store_release/store-release with uninitialized src_reg
  @unpriv:OK
  torvalds#556/11  verifier_store_release/store-release with uninitialized
  dst_reg:OK
  torvalds#556/12  verifier_store_release/store-release with uninitialized dst_reg
  @unpriv:OK
  torvalds#556/13  verifier_store_release/store-release with non-pointer
  dst_reg:OK
  torvalds#556/14  verifier_store_release/store-release with non-pointer dst_reg
  @unpriv:OK
  torvalds#556/15  verifier_store_release/misaligned store-release:OK
  torvalds#556/16  verifier_store_release/misaligned store-release @unpriv:OK
  torvalds#556/17  verifier_store_release/store-release to ctx pointer:OK
  torvalds#556/18  verifier_store_release/store-release to ctx pointer @unpriv:OK
  torvalds#556/19  verifier_store_release/store-release, leak pointer to stack:OK
  torvalds#556/20  verifier_store_release/store-release, leak pointer to stack
  @unpriv:OK
  torvalds#556/21  verifier_store_release/store-release, leak pointer to map:OK
  torvalds#556/22  verifier_store_release/store-release, leak pointer to map
  @unpriv:OK
  torvalds#556/23  verifier_store_release/store-release with invalid register
  R15:OK
  torvalds#556/24  verifier_store_release/store-release with invalid register R15
  @unpriv:OK
  torvalds#556/25  verifier_store_release/store-release to pkt pointer:OK
  torvalds#556/26  verifier_store_release/store-release to flow_keys pointer:OK
  torvalds#556/27  verifier_store_release/store-release to sock pointer:OK
  torvalds#556     verifier_store_release:OK
  Summary: 3/55 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
intel-lab-lkp pushed a commit to intel-lab-lkp/linux that referenced this pull request Jul 17, 2025
Add JIT support for the load_acquire and store_release instructions. The
implementation is similar to the kernel where:

        load_acquire  => plain load -> lwsync
        store_release => lwsync -> plain store

To test the correctness of the implementation, following selftests were
run:

  [fedora@linux-kernel bpf]$ sudo ./test_progs -a \
  verifier_load_acquire,verifier_store_release,atomics
  torvalds#11/1    atomics/add:OK
  torvalds#11/2    atomics/sub:OK
  torvalds#11/3    atomics/and:OK
  torvalds#11/4    atomics/or:OK
  torvalds#11/5    atomics/xor:OK
  torvalds#11/6    atomics/cmpxchg:OK
  torvalds#11/7    atomics/xchg:OK
  torvalds#11      atomics:OK
  torvalds#519/1   verifier_load_acquire/load-acquire, 8-bit:OK
  torvalds#519/2   verifier_load_acquire/load-acquire, 8-bit @unpriv:OK
  torvalds#519/3   verifier_load_acquire/load-acquire, 16-bit:OK
  torvalds#519/4   verifier_load_acquire/load-acquire, 16-bit @unpriv:OK
  torvalds#519/5   verifier_load_acquire/load-acquire, 32-bit:OK
  torvalds#519/6   verifier_load_acquire/load-acquire, 32-bit @unpriv:OK
  torvalds#519/7   verifier_load_acquire/load-acquire, 64-bit:OK
  torvalds#519/8   verifier_load_acquire/load-acquire, 64-bit @unpriv:OK
  torvalds#519/9   verifier_load_acquire/load-acquire with uninitialized
  src_reg:OK
  torvalds#519/10  verifier_load_acquire/load-acquire with uninitialized src_reg
  @unpriv:OK
  torvalds#519/11  verifier_load_acquire/load-acquire with non-pointer src_reg:OK
  torvalds#519/12  verifier_load_acquire/load-acquire with non-pointer src_reg
  @unpriv:OK
  torvalds#519/13  verifier_load_acquire/misaligned load-acquire:OK
  torvalds#519/14  verifier_load_acquire/misaligned load-acquire @unpriv:OK
  torvalds#519/15  verifier_load_acquire/load-acquire from ctx pointer:OK
  torvalds#519/16  verifier_load_acquire/load-acquire from ctx pointer @unpriv:OK
  torvalds#519/17  verifier_load_acquire/load-acquire with invalid register R15:OK
  torvalds#519/18  verifier_load_acquire/load-acquire with invalid register R15
  @unpriv:OK
  torvalds#519/19  verifier_load_acquire/load-acquire from pkt pointer:OK
  torvalds#519/20  verifier_load_acquire/load-acquire from flow_keys pointer:OK
  torvalds#519/21  verifier_load_acquire/load-acquire from sock pointer:OK
  torvalds#519     verifier_load_acquire:OK
  torvalds#556/1   verifier_store_release/store-release, 8-bit:OK
  torvalds#556/2   verifier_store_release/store-release, 8-bit @unpriv:OK
  torvalds#556/3   verifier_store_release/store-release, 16-bit:OK
  torvalds#556/4   verifier_store_release/store-release, 16-bit @unpriv:OK
  torvalds#556/5   verifier_store_release/store-release, 32-bit:OK
  torvalds#556/6   verifier_store_release/store-release, 32-bit @unpriv:OK
  torvalds#556/7   verifier_store_release/store-release, 64-bit:OK
  torvalds#556/8   verifier_store_release/store-release, 64-bit @unpriv:OK
  torvalds#556/9   verifier_store_release/store-release with uninitialized
  src_reg:OK
  torvalds#556/10  verifier_store_release/store-release with uninitialized src_reg
  @unpriv:OK
  torvalds#556/11  verifier_store_release/store-release with uninitialized
  dst_reg:OK
  torvalds#556/12  verifier_store_release/store-release with uninitialized dst_reg
  @unpriv:OK
  torvalds#556/13  verifier_store_release/store-release with non-pointer
  dst_reg:OK
  torvalds#556/14  verifier_store_release/store-release with non-pointer dst_reg
  @unpriv:OK
  torvalds#556/15  verifier_store_release/misaligned store-release:OK
  torvalds#556/16  verifier_store_release/misaligned store-release @unpriv:OK
  torvalds#556/17  verifier_store_release/store-release to ctx pointer:OK
  torvalds#556/18  verifier_store_release/store-release to ctx pointer @unpriv:OK
  torvalds#556/19  verifier_store_release/store-release, leak pointer to stack:OK
  torvalds#556/20  verifier_store_release/store-release, leak pointer to stack
  @unpriv:OK
  torvalds#556/21  verifier_store_release/store-release, leak pointer to map:OK
  torvalds#556/22  verifier_store_release/store-release, leak pointer to map
  @unpriv:OK
  torvalds#556/23  verifier_store_release/store-release with invalid register
  R15:OK
  torvalds#556/24  verifier_store_release/store-release with invalid register R15
  @unpriv:OK
  torvalds#556/25  verifier_store_release/store-release to pkt pointer:OK
  torvalds#556/26  verifier_store_release/store-release to flow_keys pointer:OK
  torvalds#556/27  verifier_store_release/store-release to sock pointer:OK
  torvalds#556     verifier_store_release:OK
  Summary: 3/55 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
maddy-kerneldev pushed a commit to linuxppc/linux that referenced this pull request Jul 28, 2025
Add JIT support for the load_acquire and store_release instructions. The
implementation is similar to the kernel where:

        load_acquire  => plain load -> lwsync
        store_release => lwsync -> plain store

To test the correctness of the implementation, following selftests were
run:

  [fedora@linux-kernel bpf]$ sudo ./test_progs -a \
  verifier_load_acquire,verifier_store_release,atomics
  torvalds#11/1    atomics/add:OK
  torvalds#11/2    atomics/sub:OK
  torvalds#11/3    atomics/and:OK
  torvalds#11/4    atomics/or:OK
  torvalds#11/5    atomics/xor:OK
  torvalds#11/6    atomics/cmpxchg:OK
  torvalds#11/7    atomics/xchg:OK
  torvalds#11      atomics:OK
  torvalds#519/1   verifier_load_acquire/load-acquire, 8-bit:OK
  torvalds#519/2   verifier_load_acquire/load-acquire, 8-bit @unpriv:OK
  torvalds#519/3   verifier_load_acquire/load-acquire, 16-bit:OK
  torvalds#519/4   verifier_load_acquire/load-acquire, 16-bit @unpriv:OK
  torvalds#519/5   verifier_load_acquire/load-acquire, 32-bit:OK
  torvalds#519/6   verifier_load_acquire/load-acquire, 32-bit @unpriv:OK
  torvalds#519/7   verifier_load_acquire/load-acquire, 64-bit:OK
  torvalds#519/8   verifier_load_acquire/load-acquire, 64-bit @unpriv:OK
  torvalds#519/9   verifier_load_acquire/load-acquire with uninitialized
  src_reg:OK
  torvalds#519/10  verifier_load_acquire/load-acquire with uninitialized src_reg
  @unpriv:OK
  torvalds#519/11  verifier_load_acquire/load-acquire with non-pointer src_reg:OK
  torvalds#519/12  verifier_load_acquire/load-acquire with non-pointer src_reg
  @unpriv:OK
  torvalds#519/13  verifier_load_acquire/misaligned load-acquire:OK
  torvalds#519/14  verifier_load_acquire/misaligned load-acquire @unpriv:OK
  torvalds#519/15  verifier_load_acquire/load-acquire from ctx pointer:OK
  torvalds#519/16  verifier_load_acquire/load-acquire from ctx pointer @unpriv:OK
  torvalds#519/17  verifier_load_acquire/load-acquire with invalid register R15:OK
  torvalds#519/18  verifier_load_acquire/load-acquire with invalid register R15
  @unpriv:OK
  torvalds#519/19  verifier_load_acquire/load-acquire from pkt pointer:OK
  torvalds#519/20  verifier_load_acquire/load-acquire from flow_keys pointer:OK
  torvalds#519/21  verifier_load_acquire/load-acquire from sock pointer:OK
  torvalds#519     verifier_load_acquire:OK
  torvalds#556/1   verifier_store_release/store-release, 8-bit:OK
  torvalds#556/2   verifier_store_release/store-release, 8-bit @unpriv:OK
  torvalds#556/3   verifier_store_release/store-release, 16-bit:OK
  torvalds#556/4   verifier_store_release/store-release, 16-bit @unpriv:OK
  torvalds#556/5   verifier_store_release/store-release, 32-bit:OK
  torvalds#556/6   verifier_store_release/store-release, 32-bit @unpriv:OK
  torvalds#556/7   verifier_store_release/store-release, 64-bit:OK
  torvalds#556/8   verifier_store_release/store-release, 64-bit @unpriv:OK
  torvalds#556/9   verifier_store_release/store-release with uninitialized
  src_reg:OK
  torvalds#556/10  verifier_store_release/store-release with uninitialized src_reg
  @unpriv:OK
  torvalds#556/11  verifier_store_release/store-release with uninitialized
  dst_reg:OK
  torvalds#556/12  verifier_store_release/store-release with uninitialized dst_reg
  @unpriv:OK
  torvalds#556/13  verifier_store_release/store-release with non-pointer
  dst_reg:OK
  torvalds#556/14  verifier_store_release/store-release with non-pointer dst_reg
  @unpriv:OK
  torvalds#556/15  verifier_store_release/misaligned store-release:OK
  torvalds#556/16  verifier_store_release/misaligned store-release @unpriv:OK
  torvalds#556/17  verifier_store_release/store-release to ctx pointer:OK
  torvalds#556/18  verifier_store_release/store-release to ctx pointer @unpriv:OK
  torvalds#556/19  verifier_store_release/store-release, leak pointer to stack:OK
  torvalds#556/20  verifier_store_release/store-release, leak pointer to stack
  @unpriv:OK
  torvalds#556/21  verifier_store_release/store-release, leak pointer to map:OK
  torvalds#556/22  verifier_store_release/store-release, leak pointer to map
  @unpriv:OK
  torvalds#556/23  verifier_store_release/store-release with invalid register
  R15:OK
  torvalds#556/24  verifier_store_release/store-release with invalid register R15
  @unpriv:OK
  torvalds#556/25  verifier_store_release/store-release to pkt pointer:OK
  torvalds#556/26  verifier_store_release/store-release to flow_keys pointer:OK
  torvalds#556/27  verifier_store_release/store-release to sock pointer:OK
  torvalds#556     verifier_store_release:OK
  Summary: 3/55 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Reviewed-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Link: https://patch.msgid.link/20250717202935.29018-2-puranjay@kernel.org
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants