Few tokens missing after extraction.

I am using goose3 to extract articles from news websites. I have noticed that letters/words which have been boldened or highlighted go missing after extraction. Try the following:

```
from goose3 import Goose

g = Goose()
article = g.extract(url='https://www.economist.com/united-states/2023/10/04/the-sacking-of-kevin-mccarthy-will-make-supporting-ukraine-harder')
print(article.cleaned_text)
```

### The expected output:

Kevin mccarthy’s stint as speaker of America’s House of Representatives ended the way it had begun.........

### Actual output:

K stint as speaker of America’s House of Representatives ended the way it had begun...........

This is because the words "evin mccarthy’s" are in the "small" tag. 


I believe the problem stems from this line: [Link](https://github.com/goose3/goose3/blob/d3c404a79e0e15b7957355083bd5a7590d4103ba/goose3/outputformatters.py#L65C14-L65C40)

If I remove this function things work fine. I am willing to fix this problem myself and wanted some input from the maintainers. Should I add a boolean in the config file such as remove_fewwords_paragraphs. If true the function is executed, else not.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Few tokens missing after extraction. #190

The expected output:

Actual output:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Few tokens missing after extraction. #190

Description

The expected output:

Actual output:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions