Skip to content

Fast profanity word, curse word, swear word, bad word filtering tool for English, Spanish, Chinese, Turkish and more.

License

Notifications You must be signed in to change notification settings

viddexa/safetext

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

49 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค” why safetext?

Fast profanity detection and filtering for 13 languages.

  • Multi-format Detection: Single words, phrases, and contextual profanity
  • Custom Word Lists: Extend built-in lists with your own profanity words
  • Whitelisting: Exclude specific words from detection
  • Auto Language Detection: From text or subtitle files
  • Precise Filtering: Exact position tracking and custom censoring
  • Simple Integration: One-line setup with clean API

๐Ÿ“ฆ installation

easily install safetext with pip:

pip install safetext

for development setup, see our scripts documentation.

๐ŸŽฏ quickstart

check and censor profanity

>>> from safetext import SafeText

>>> st = SafeText(language='en')

>>> results = st.check_profanity(text='Some text with <profanity-word>.')
>>> results
[{'word': '<profanity-word>', 'index': 4, 'start': 15, 'end': 31}]

>>> text = st.censor_profanity(text='Some text with <profanity-word>.')
>>> text
"Some text with ***."

extending profanity lists with custom words

Add your own profanity words by providing a custom words directory:

# Directory structure:
# custom_profanity_words/
# โ”œโ”€โ”€ en.txt              # English custom words
# โ”œโ”€โ”€ tr.txt              # Turkish custom words
# โ””โ”€โ”€ es.txt              # Spanish custom words

>>> st = SafeText(language='en', custom_words_dir='custom_profanity_words')

>>> # Custom words from en.txt are now included
>>> results = st.check_profanity('This mycustomword is inappropriate')
>>> results
[{'word': 'mycustomword', 'index': 2, 'start': 5, 'end': 17}]

Custom word files should contain one word/phrase per line:

# custom_profanity_words/en.txt
mycustomword
inappropriate phrase
company specific term

using whitelist

exclude specific words from profanity detection:

# Using a list of words
>>> st = SafeText(language='en', whitelist=['word1', 'word2'])

# Using a file (one word per line)
>>> st = SafeText(language='en', whitelist='path/to/whitelist.txt')

# Combining custom words with whitelist
>>> st = SafeText(
...     language='en', 
...     custom_words_dir='custom_profanity_words',
...     whitelist=['allowedcustomword']
... )

automated language detection

  • from text:
>>> from safetext import SafeText

>>> eng_text = "This story is about to take a dark turn."

>>> st = SafeText(language=None)
>>> st.set_language_from_text(eng_text)

>>> st.language
'en'
  • from .srt (subtitle) file:
>>> from safetext import SafeText

>>> turkish_srt_file_path = "turkish.srt"

>>> st = SafeText(language=None)
>>> st.set_language_from_srt(turkish_srt_file_path)

>>> st.language
'tr'

๐ŸŒ supported languages

safetext currently supports profanity detection in 13 languages:

Language ISO 639-1 Code Language Name
๐Ÿ‡ธ๐Ÿ‡ฆ ar Arabic
๐Ÿ‡ฆ๐Ÿ‡ฟ az Azerbaijani
๐Ÿ‡ฉ๐Ÿ‡ช de German
๐Ÿ‡ฌ๐Ÿ‡ง en English
๐Ÿ‡ช๐Ÿ‡ธ es Spanish
๐Ÿ‡ฎ๐Ÿ‡ท fa Persian (Farsi)
๐Ÿ‡ซ๐Ÿ‡ท fr French
๐Ÿ‡ฎ๐Ÿ‡ณ hi Hindi
๐Ÿ‡ฏ๐Ÿ‡ต ja Japanese
๐Ÿ‡ต๐Ÿ‡น pt Portuguese
๐Ÿ‡ท๐Ÿ‡บ ru Russian
๐Ÿ‡น๐Ÿ‡ท tr Turkish
๐Ÿ‡จ๐Ÿ‡ณ zh Chinese

๐Ÿค contribute to safetext

join our mission in refining content moderation!

contribute by:

  • adding new languages: create a folder with the ISO 639-1 code and include a words.txt.
  • enhancing word lists: improve detection accuracy.
  • sharing feedback: your ideas can shape safetext.

see our contributing guidelines for development workflow, test documentation for running tests, and scripts guide for automation tools.


๐Ÿ† contributors

meet our awesome contributors who make safetext better every day!


follow us for more!

LinkedIn โ€ข Hugging Face โ€ข X