This is a microservice APIof Content Lense, a project that aims at enabling publishers to easily gain insights into their content. This API calculates the complexity, reading time and more basic stats of the given article.
Please note that this repository is part of the Content Lense Project and depends on the Content Lense API.
Build the Docker image by running:
docker build -f Docker/Dockerfile -t content-lense-text-complexity:latest .
Start the container with
docker run -it --rm -p 5001:5001 content-lense-text-complexity
To analyse an article send a post request to the /articles
endpoint as Content-Type: application/json
with the following stucture:
{
"heading": "The Headline of the Article",
"summary": "A short summary / abstract of the article",
"body": "The entire fulltext"
}
The return type looks like the following:
{
"body": {
"descriptives": {
"averageWordsPerSentence": 8.2,
"meanCharsPerWord": 4.439024390243903,
"meanWordsPerSentence": 6.833333333333333,
"medianCharsPerWord": 4,
"medianWordsPerSentence": 5,
"totalChars": 190,
"totalLetters": 182,
"totalSentences": 5,
"totalSyllables": 52,
"totalUniqueWords": 37,
"totalWords": 41,
"totalWordsLongerThanThreeSyllables": 3,
"totalSingleSyllableWords": 33
},
"scores": {
"readingTimeInMinutes": 2.79,
"wienerSachtextIndex": 1.2 // see https://de.wikipedia.org/wiki/Lesbarkeitsindex
}
},
"heading": {/*... same result keys as for body ... */},
"summary": {/*... same result keys as for body ... */}
}
We assume a reading speed of 200 words per minute to calculate the estimated reading time.
wienerSachtextIndex
(https://de.wikipedia.org/wiki/Lesbarkeitsindex)- used Libraries
TextStat
(https://github.com/textstat/textstat)TextDescriptives
(https://hlasse.github.io/TextDescriptives)Spacy Models
(https://spacy.io/usage/models)
Media Tech Lab media-tech-lab
Cloud Creators GmbH cloud-creators