Skip to content

Conversation

wannaphong
Copy link
Member

@wannaphong wannaphong commented Jan 8, 2025

Fixes #663

Add a new function display_cell_tokenize to split Thai text into display cells without splitting tone marks.

  • New Functionality
    • Add display_cell_tokenize function in pythainlp/tokenize/core.py to handle the splitting of Thai text into display cells.
    • Ensure the function does not split tone marks.
  • Initialization
    • Update pythainlp/tokenize/__init__.py to include the new display_cell_tokenize function in the __all__ list.
  • Testing
    • Add tests for the display_cell_tokenize function in tests/core/test_tokenize.py.

For more details, open the Copilot Workspace session.

Fixes #663

Add a new function `display_cell_tokenize` to split Thai text into display cells without splitting tone marks.

* **New Functionality**
  - Add `display_cell_tokenize` function in `pythainlp/tokenize/core.py` to handle the splitting of Thai text into display cells.
  - Ensure the function does not split tone marks.
* **Initialization**
  - Update `pythainlp/tokenize/__init__.py` to include the new `display_cell_tokenize` function in the `__all__` list.
* **Testing**
  - Add tests for the `display_cell_tokenize` function in `tests/core/test_tokenize.py`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/PyThaiNLP/pythainlp/issues/663?shareId=XXXX-XXXX-XXXX-XXXX).
@pep8speaks
Copy link

pep8speaks commented Jan 8, 2025

Hello @wannaphong! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2025-01-08 13:42:07 UTC

Copy link

sonarqubecloud bot commented Jan 8, 2025

@coveralls
Copy link

coveralls commented Jan 8, 2025

Coverage Status

coverage: 52.898% (+0.1%) from 52.753%
when pulling 9c86f85 on wannaphong/add-display-cell-tokenizer
into 7332984 on dev.

@bact bact added the enhancement enhance functionalities label Jan 10, 2025
@wannaphong wannaphong merged commit ef0e01d into dev Jan 13, 2025
24 of 25 checks passed
@wannaphong wannaphong deleted the wannaphong/add-display-cell-tokenizer branch February 10, 2025 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement enhance functionalities
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Thai character splitter to display cell
4 participants