-
Notifications
You must be signed in to change notification settings - Fork 172
Adding InferenceAggregator #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi , These are some minor changes undoing #113 , making analyzer work for long texts . |
However, the processing time with the current text splitter and analyzer is large for long documents, need to look for ways to optimize |
@akar5h Thanks for the PR. |
@akar5h Until we have inference aggregator, user will never get original output. Can you please let me know what is your thought here. I thought we are doing this -
|
Hi , yes that was the way to go , However since we are working with Pipeline and not directly dealing with tokenizer output and models . We cannot implement the this as it is .
So there are modules in tokenizers called pretokenizer There was no difference model processing time in both this proposed way and the cureent TextSplitter |
However if you follow this link : https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/question_answering.ipynb#scrollTo=n9qywopnIrJH&uniqifier=1 |
Cool thanks Akarsh, I will have a look. |
np , the last part |
@akar5h I had looked into your PR. I feel #113 okay for now. If use do not want truncation they will use TextSplitter anyway. What we need is merge module or way to merge inference after analyzer make inference. My idea is to have multiple configuration of TextSplitter: honour sentence boundary, honour passage boundary, add title to each chunk, etc etc. Which will add value to user's pro processing capabilities. There are few use cases some of Obsei users are asking. Major one is to determine sentiment of customer call/chat conversation. Let me know when you are available next week we can schedule a call and discuss more. |
Hi @lalitpagaria , I am finalizing the Interface Aggregator module, its work in progress , I took the liberty to create a PostProcessing classes just like Preprocessors for InterfaceAggregator, will post a PR by tomorrow on this. Yes, both sentence boundary and paragraph boundaries can be implemented using
Sure , let's lets discuss over call this Saturday or coming Monday/Tuesday after 6 ?
|
@lalitpagaria , Hi , can you have a look at this merge functions . The typecheck is failing because of TextPayload.meta or BasePayload,meta . Kindly suggest changes if any , and some suggestion on how to handle the type check (mypy ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akar5h Thank you very much for this PR.
I liked your design for Aggregator.
I have few superficial comments, we can merge this PR after that.
The sugested changes :
Todos / Feedback required : @lalitpagaria pls comment
|
@akar5h I am doing few changes to your PR. I have some uncommitted changes. I will push them tomorrow. |
@akar5h I have made some refactoring and added them to your PR. @shahrukhx01 Would you like to take a look into changes as well? |
61ee92e
to
b6f383d
Compare
LGTM |
the refactor makes sense. LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Minor Changes to make Analyzer work with TextSplitter for long texts.