Fast profanity detection and filtering for 13 languages.
- Multi-format Detection: Single words, phrases, and contextual profanity
- Custom Word Lists: Extend built-in lists with your own profanity words
- Whitelisting: Exclude specific words from detection
- Auto Language Detection: From text or subtitle files
- Precise Filtering: Exact position tracking and custom censoring
- Simple Integration: One-line setup with clean API
easily install safetext with pip:
pip install safetext
for development setup, see our scripts documentation.
>>> from safetext import SafeText
>>> st = SafeText(language='en')
>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]
>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."
Add your own profanity words by providing a custom words directory:
# Directory structure:
# custom_profanity_words/
# โโโ en.txt # English custom words
# โโโ tr.txt # Turkish custom words
# โโโ es.txt # Spanish custom words
>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')
>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]
Custom word files should contain one word/phrase per line:
# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term
exclude specific words from profanity detection:
# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])
# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')
# Combining custom words with whitelist
>>> st = SafeText(
... language='en',
... custom_words_dir='custom_profanity_words',
... whitelist=['allowedcustomword']
... )
- from text:
>>> from safetext import SafeText
>>> eng_text = "This story is about to take a dark turn."
>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)
>>> st.language
'en'
- from .srt (subtitle) file:
>>> from safetext import SafeText
>>> turkish_srt_file_path = "turkish.srt"
>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)
>>> st.language
'tr'
safetext currently supports profanity detection in 13 languages:
Language | ISO 639-1 Code | Language Name |
---|---|---|
๐ธ๐ฆ | ar |
Arabic |
๐ฆ๐ฟ | az |
Azerbaijani |
๐ฉ๐ช | de |
German |
๐ฌ๐ง | en |
English |
๐ช๐ธ | es |
Spanish |
๐ฎ๐ท | fa |
Persian (Farsi) |
๐ซ๐ท | fr |
French |
๐ฎ๐ณ | hi |
Hindi |
๐ฏ๐ต | ja |
Japanese |
๐ต๐น | pt |
Portuguese |
๐ท๐บ | ru |
Russian |
๐น๐ท | tr |
Turkish |
๐จ๐ณ | zh |
Chinese |
join our mission in refining content moderation!
contribute by:
- adding new languages: create a folder with the ISO 639-1 code and include a
words.txt
. - enhancing word lists: improve detection accuracy.
- sharing feedback: your ideas can shape
safetext
.
see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.
meet our awesome contributors who make safetext better every day!