Skip to content

Failure to create namespaces; pfn failed to enable + kernel oops #76

@patr-geary-smci

Description

@patr-geary-smci

This is all against Intel DCP DIMMs what were provisioned via ipmctl. I've tried mainline ipmctl builds to no avail; as well as attempting to use the latest 4.18.18-200.fc28.x86_64 fedora kernel.

[root@redacted ~]# ndctl create-namespace -r region0
{
"dev":"namespace0.0",
"mode":"fsdax",
"map":"dev",
"size":"248.00 GiB (266.29 GB)",
"uuid":"f0ba1c10-0cf2-4572-bfec-7f8e5e4098f7",
"raw_uuid":"f9c4a0a8-df1c-4389-89d4-6d5c8bac80d7",
"sector_size":512,
"blockdev":"pmem0",
"numa_node":0
}

[root@redacted ~]# ndctl create-namespace -r region1
libndctl: ndctl_pfn_enable: pfn1.0: failed to enable
Error: namespace1.0: failed to enable

failed to create namespace: No such device or address

I'm seeing OOP's spit out by the kernel (This may not be exact; I just grabbed the first one out of messages with a matching pfn):

Nov 16 11:16:53 localhost kernel: nd_pmem pfn1.0: namespace1.0 alignment collision, truncate 67108864 bytes
Nov 16 11:17:12 localhost kernel: pmem1: detected capacity change from 0 to 266285875200
Nov 16 11:17:18 localhost kernel: WARNING: CPU: 55 PID: 2942 at arch/x86/mm/init_64.c:792 add_pages+0x5a/0x60
Nov 16 11:17:18 localhost kernel: Modules linked in: ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6t$
$er acpi_pad xfs libcrc32c ast i2c_algo_bit drm_kms_helper mlx5_core ttm drm i40e crc32c_intel mlxfw devlink
Nov 16 11:17:18 localhost kernel: CPU: 55 PID: 2942 Comm: ndctl Not tainted 4.17.19-200.fc28.x86_64 #1
Nov 16 11:17:18 localhost kernel: Hardware name: Supermicro Super Server/X11DPU, BIOS 3.0 10/20/2018
Nov 16 11:17:18 localhost kernel: RIP: 0010:add_pages+0x5a/0x60
Nov 16 11:17:18 localhost kernel: RSP: 0018:ffffaeb44ed43c80 EFLAGS: 00010282
Nov 16 11:17:18 localhost kernel: RAX: 00000000fffffff4 RBX: 000000000ed78000 RCX: 0000000000000200
Nov 16 11:17:18 localhost kernel: RDX: 0000000000000200 RSI: 00000000000fc1fe RDI: 0000000000000000
Nov 16 11:17:18 localhost kernel: RBP: 0000000003f08000 R08: ffffa0c63c200000 R09: 00000000000001fe
Nov 16 11:17:18 localhost kernel: R10: ffff9ff071301700 R11: ffff9ff071301d60 R12: ffffa086b6e2e0b0
Nov 16 11:17:18 localhost kernel: R13: ffffa086b6e2e0c0 R14: 0000000003f00000 R15: 0000003f08000000
Nov 16 11:17:18 localhost kernel: FS:  00007fca295e3d00(0000) GS:ffff9ff0dbdc0000(0000) knlGS:0000000000000000
Nov 16 11:17:18 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 16 11:17:18 localhost kernel: CR2: 00007f3416c43f44 CR3: 00000017b08c8001 CR4: 00000000007606e0
Nov 16 11:17:18 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov 16 11:17:18 localhost kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov 16 11:17:18 localhost kernel: PKRU: 55555554
Nov 16 11:17:18 localhost kernel: Call Trace:
Nov 16 11:17:18 localhost kernel: devm_memremap_pages+0x2e4/0x440
Nov 16 11:17:18 localhost kernel: pmem_attach_disk+0x1c6/0x5e0 [nd_pmem]
Nov 16 11:17:18 localhost kernel: ? devm_nsio_enable+0xb8/0x100
Nov 16 11:17:18 localhost kernel: nvdimm_bus_probe+0x64/0x120
Nov 16 11:17:18 localhost kernel: driver_probe_device+0x2da/0x450
Nov 16 11:17:18 localhost kernel: bind_store+0xed/0x160
Nov 16 11:17:18 localhost kernel: kernfs_fop_write+0x116/0x190
Nov 16 11:17:18 localhost kernel: __vfs_write+0x36/0x170
Nov 16 11:17:18 localhost kernel: ? selinux_file_permission+0xf0/0x130
Nov 16 11:17:18 localhost kernel: vfs_write+0xa5/0x1a0
Nov 16 11:17:18 localhost kernel: ksys_write+0x4f/0xb0
Nov 16 11:17:18 localhost kernel: do_syscall_64+0x5b/0x160
Nov 16 11:17:18 localhost kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 16 11:17:18 localhost kernel: RIP: 0033:0x7fca288c7ef4
Nov 16 11:17:18 localhost kernel: RSP: 002b:00007ffcaf9f2d08 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
Nov 16 11:17:18 localhost kernel: RAX: ffffffffffffffda RBX: 0000000001ad7870 RCX: 00007fca288c7ef4
Nov 16 11:17:18 localhost kernel: RDX: 0000000000000007 RSI: 0000000001ad7870 RDI: 0000000000000008
Nov 16 11:17:18 localhost kernel: RBP: 0000000000000007 R08: 0000000000000006 R09: 0000000000000005
Nov 16 11:17:18 localhost kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
Nov 16 11:17:18 localhost kernel: R13: 00007fca295e3c28 R14: 0000000000000001 R15: 0000000001ad7870
Nov 16 11:17:18 localhost kernel: Code: 3b 15 0b 5e 99 01 76 20 48 89 15 02 5e 99 01 48 89 15 0b 5e 99 01 48 c1 e2 0c 48 03 15 70 8b 11 01 48 89 15 61 5d 99 01 5b 5d c3 <0f> 0b eb bc 66 90 0f 1f 44 00 00 41 56 45 89 c6 41 55 49 $
Nov 16 11:17:18 localhost kernel: ---[ end trace b470acdc7eea493e ]---
Nov 16 11:17:18 localhost kernel: nd_pmem: probe of pfn3.0 failed with error -12
Nov 16 11:17:19 localhost kernel: nd_pmem pfn0.0: namespace0.0 alignment collision, truncate 67108864 bytes
Nov 16 11:17:19 localhost kernel: ------------[ cut here ]------------

Very similar to issue 39 ; I will attempt it with the alignment forced once I swap all the hardware back in. I'm posting this since now, even lacking data since the closing of 39 ended with "We have alignment fixes."

Interestingly it does not do this for all dimms; it seems totally based in how I lay out the physical dimms.
This is all Intel Optane DCP, The 2-2-2 configurations I have don't seem to have this issue; but 2-1-1 symmetric population does. I've seen it happen on either IMC; but it seems to be more prolific on IMC0. Asymmetric population does not appear to have the issue either. Additionally, I've only seen this when the dimm is in 0% Memory Mode (All storage). 50% ratios do not have this problem.

I'll bump the issue once I swap all the hardware back in and try with forced alignment.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions