nonAttentionLLM

non-attention-based LLM

Non-Attention LLM

A non-attention-based large language model (LLM) architecture adapted for seq2seq tasks (like summarization). It avoids the typical QK^T V self-attention in Transformers, instead leveraging state-space blocks, multi-resolution convolution, recurrent chunk-level bridging, and an external retriever memory.

Features

State-Space Inspired (S4Block) for near-linear mixing of tokens.
Multi-Resolution Convolution (MultiResConvBlock) capturing local context at multiple dilations.
RecurrentSupervisor for preserving hidden state across chunks.
SimpleRetrieverMemory or advanced ANN-based memory for retrieval augmentation.
Chunked processing of input + partial shift of target for autoregressive seq2seq modeling.

Repo Structure

non_attention_llm/ ├─ README.md ├─ requirements.txt ├─ models/ │ ├─ s4_block.py │ ├─ multi_res_conv_block.py │ ├─ retriever.py │ ├─ recurrent_supervisor.py │ ├─ proposed_llm.py │ └─ init.py ├─ train.py ├─ example_inference.py └─ LICENSE

models/s4_block.py: Implements a simplified S4Block.
models/multi_res_conv_block.py: Dilation-based local mixing.
models/retriever.py: Simple vs. advanced retrieval memory logic.
models/recurrent_supervisor.py: GRU-based global state bridging.
models/proposed_llm.py: The main ProposedNonAttentionLLM class tying it all together.
train.py: Demonstration script for training on a toy or real dataset.
example_inference.py: Example of loading a checkpoint and running inference.
requirements.txt: Python dependencies.

Installation

Clone the repository:

git clone https://github.com/<YourUsername>/non_attention_llm.git
cd non_attention_llm

conda create -n nonattention python=3.9 conda activate nonattention

pip install -r requirements.txt

Training

python train.py
--vocab_size 32000
--embed_dim 256
--hidden_dim 512
--chunk_size 64
--max_memory 100
--s4_kernel_size 16
--conv_kernel_size 3
--conv_dilation "1,2,4"
--dropout 0.1
--epochs 5
--batch_size 4

Inference

python train.py --train "False"
--checkpoint "yourcheckpoint.pt"
--prompt "Summarize:\nThe cat is a domestic species of small carnivorous mammal..."

It will: 1. Load the model + checkpoint. 2. Tokenize the prompt. 3. Generate or compute next-token predictions.

You can customize generation logic (greedy, sampling, partial chunk decoding, etc.) as needed.

Model Components 1. S4Block • Depthwise 1D convolution with a gating mechanism, inspired by structured state-space. 2. MultiResConvBlock • Multiple parallel 1D convolutions with different dilation factors, then combined with a residual. 3. RecurrentSupervisor • Maintains a global hidden state across chunk boundaries using a GRUCell or LSTMCell. 4. SimpleRetrieverMemory • Basic key-value store for chunk embeddings. For larger scale, consider using FAISS or Annoy. 5. ProposedNonAttentionLLM • The top-level seq2seq style model that chunk-processes the input, retrieves augmentations, and outputs token logits.

Advanced Features • Replace SimpleRetrieverMemory with FAISS or any advanced ANN library if you have large-scale retrieval needs. • Configure kernel sizes, dilation factors, or S4 param. • Chunk Size can be adapted for memory constraints or longer contexts. • Attention-Free design for near-linear time with respect to sequence length.

Requirements

See requirements.txt for typical packages:

torch>=2.0 numpy tqdm transformers datasets

Explanation:

README.md is thorough, covering:
- Intro / features
- Repo structure
- Setup and usage
- Summaries of each module
- Potential advanced usage
You can adjust the text (e.g. referencing your actual GitHub URL or your contact info).

3. Example: `requirements.txt`

A minimal list might be:

Add more if your code uses them (faiss, annoy, etc.).

4. Example: `train.py` and `example_inference.py`

You can provide minimal scripts that parse command-line args, instantiate ProposedNonAttentionLLM, and demonstrate training or inference. The README references them, so users can see how to run.

With these files in place, commit and push to your GitHub repo. Your non-attention-based LLM is now publicly available with a detailed README for others to install and experiment with!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

nonAttentionLLM

Non-Attention LLM

Features

Repo Structure

Installation

Training

Inference

3. Example: `requirements.txt`

4. Example: `train.py` and `example_inference.py`

About

Uh oh!

Releases

Packages

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

andrew-jeremy/nonAttentionLLM

Folders and files

Latest commit

History

Repository files navigation

nonAttentionLLM

Non-Attention LLM

Features

Repo Structure

Installation

Training

Inference

3. Example: requirements.txt

4. Example: train.py and example_inference.py

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

3. Example: `requirements.txt`

4. Example: `train.py` and `example_inference.py`

Packages