GitHub - viddexa/safetext: Fast profanity word, curse word, swear word, bad word filtering tool for English, Spanish, Chinese, Turkish and more.

🤔 why safetext?

Fast profanity detection and filtering for 13 languages.

Multi-format Detection: Single words, phrases, and contextual profanity
Custom Word Lists: Extend built-in lists with your own profanity words
Whitelisting: Exclude specific words from detection
Auto Language Detection: From text or subtitle files
Precise Filtering: Exact position tracking and custom censoring
Simple Integration: One-line setup with clean API

📦 installation

easily install safetext with pip:

pip install safetext

for development setup, see our scripts documentation.

🎯 quickstart

check and censor profanity

>>> from safetext import SafeText

>>> st = SafeText(language='en')

>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]

>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."

extending profanity lists with custom words

Add your own profanity words by providing a custom words directory:

# Directory structure:
# custom_profanity_words/
# ├── en.txt              # English custom words
# ├── tr.txt              # Turkish custom words
# └── es.txt              # Spanish custom words

>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')

>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]

Custom word files should contain one word/phrase per line:

# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term

using whitelist

exclude specific words from profanity detection:

# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])

# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')

# Combining custom words with whitelist
>>> st = SafeText(
...     language='en', 
...     custom_words_dir='custom_profanity_words',
...     whitelist=['allowedcustomword']
... )

automated language detection

from text:

>>> from safetext import SafeText

>>> eng_text = "This story is about to take a dark turn."

>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)

>>> st.language
'en'

from .srt (subtitle) file:

>>> from safetext import SafeText

>>> turkish_srt_file_path = "turkish.srt"

>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)

>>> st.language
'tr'

🌍 supported languages

safetext currently supports profanity detection in 13 languages:

Language	ISO 639-1 Code	Language Name
🇸🇦	`ar`	Arabic
🇦🇿	`az`	Azerbaijani
🇩🇪	`de`	German
🇬🇧	`en`	English
🇪🇸	`es`	Spanish
🇮🇷	`fa`	Persian (Farsi)
🇫🇷	`fr`	French
🇮🇳	`hi`	Hindi
🇯🇵	`ja`	Japanese
🇵🇹	`pt`	Portuguese
🇷🇺	`ru`	Russian
🇹🇷	`tr`	Turkish
🇨🇳	`zh`	Chinese

🤝 contribute to safetext

join our mission in refining content moderation!

contribute by:

adding new languages: create a folder with the ISO 639-1 code and include a words.txt.
enhancing word lists: improve detection accuracy.
sharing feedback: your ideas can shape safetext.

see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.

🏆 contributors

meet our awesome contributors who make safetext better every day!

follow us for more!

LinkedIn • Hugging Face • X

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github		.github
safetext		safetext
scripts		scripts
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
context7.json		context7.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

🤔 why safetext?

📦 installation

🎯 quickstart

check and censor profanity

extending profanity lists with custom words

using whitelist

automated language detection

🌍 supported languages

🤝 contribute to safetext

🏆 contributors

About

Uh oh!

Releases 17

Sponsor this project

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

Uh oh!

License

viddexa/safetext

Folders and files

Latest commit

History

Repository files navigation

🤔 why safetext?

📦 installation

🎯 quickstart

check and censor profanity

extending profanity lists with custom words

using whitelist

automated language detection

🌍 supported languages

🤝 contribute to safetext

🏆 contributors

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 17

Sponsor this project

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages