Skip to content

Conversation

chilli13
Copy link
Contributor

The delay in ebpf events processing in user mode may cause tuple not found, new AddConn with same pid will overwrite the tuple which will be deleted by DestroyConn. Add sock consistency check before tuple cleaning.

The delay in ebpf events processing in user mode may cause tuple
not found, new AddConn with same pid will overwrite the tuple
which will be deleted by DestroyConn. Add sock consistency check
before tuple cleaning.
@chilli13
Copy link
Contributor Author

@Asphaltt please help to review, thks
我还尝试过添加 sync.Cond的方式,addCon时发现fd已存在tuple就cond.Wait(), destroyCon时cond.Signal() 通知可以继续addConn,但编码并不简洁,并且因为destroy当前已经添加了sleep 3s,可能对性能也不太友好。destroyCon时发现sock不一致直接不删除tuple更简单

Copy link
Member

@cfc4n cfc4n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢您的贡献。

#744 (comment) 所说,这话问题是产生在FD的短期快速复用,当前PR是解决了错误删除的问题,但并没有解决根因。

你觉得是吗?

对于五元组无法准确关联的问题,我认为最好的解决方案是改从pcapng模式中,还原解析网络包,再读取五元组。

#744 (comment)

The text mode has many flaws. In the future, I am considering enhancing the functionality of the pcapng mode to parse network packets in real time, restore them based on the key, and then convert them into text format.

你有兴趣实现这个方案吗?

@chilli13
Copy link
Contributor Author

chilli13 commented Mar 21, 2025

根据测试,避免错误删除tuple应该可以大概率解决 tuple为0问题,我们的https server运行了12h 没有再出现这个问题,修改前这个问题在http2 模式下非常频繁。但是如果用户态按照add1-add2-data1-destory1的时间线处理,data1会获取到add2的错误tuple,按照分析虽然影响范围有限但依然会影响数据准确性。当前我们的业务需要text模式,我们会先使用这个修改。
请问您提到的 “从pcapng模式中,还原解析网络包,再读取五元组” 方案是结合keylog从pcapng中解析出tuple+http data,再以text 当前的输出格式显示数据吗?还是有其他的实现效果? 从pcap文件还原数据,是否支持分片重组? 或者按照字节流方式交给调用者去重组

@cfc4n
Copy link
Member

cfc4n commented Mar 21, 2025

如果这个PR仅是为了防止错误删除conn的问题,我觉得是可以合并的。

请问您提到的 “从pcapng模式中,还原解析网络包,再读取五元组” 方案是结合keylog从pcapng中解析出tuple+http data,再以text 当前的输出格式显示数据吗?

是的,期待您的贡献。

Copy link
Member

@cfc4n cfc4n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. However, it is limited to fixing the issue of mistakenly deleting the sock.

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Mar 21, 2025
@cfc4n cfc4n merged commit c240984 into gojue:master Mar 21, 2025
7 checks passed
@chilli13 chilli13 deleted the bf/tuple-unreachable branch March 27, 2025 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix bug fix PR lgtm This PR has been approved by a maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants