RCJ 0.00022B LSTM

A Conditional Character-Level Text Generator

This model is a Conditional Character-Level LSTM (Long Short-Term Memory) network designed to generate text in the style of one of several predefined personas. It acts as a component within a larger interactive chat system, where a separate text classification model first identifies the persona, and then this LSTM generates a response conditioned on that identified persona.

The primary purpose of this model is to provide varied and persona-specific utterances for "RCJ," the chatbot persona in the system.

Model Architecture

  • Type: Character-Level Recurrent Neural Network (RNN)
  • Core: LSTM
  • Conditioning: The model takes a one-hot encoded vector representing the identified person ID as a condition. This condition vector is concatenated with the character embeddings at each time step before being fed into the LSTM layers.
  • Input: Sequences of character indices (including special SOS, EOS, PAD tokens).
  • Output: Logits over the vocabulary of characters for the next character prediction.
  • Key Hyperparameters (Example - should match your trained model):
    • VOCAB_SIZE: (Size of your character vocabulary, e.g., ~100-120)
    • EMBEDDING_DIM: (e.g., 64, 128)
    • HIDDEN_DIM: (e.g., 128, 256)
    • NUM_PERSONS (Condition Dimension): 4 (or the number of personas you have)
    • NUM_LAYERS: (e.g., 1, 2)
    • DROPOUT: (e.g., 0.1, 0.3)

How it Works (in the Chat System)

  1. A user provides text input to the chat system.
  2. A Text Classification Model (qingy2024/personify-67m) processes the user's input to predict a person_id (0, 1, 2, or 3).
  3. This person_id is converted into a one-hot condition vector.
  4. The RCJ-Persona-LSTM (this model) takes the <SOS> (Start of Sentence) token and the condition vector as initial input.
  5. It then autoregressively generates a sequence of characters, one at a time, using the previous character and the same condition vector at each step, until an <EOS> (End of Sentence) token is generated or a maximum length is reached.
  6. The generated character sequence forms RCJ's response.

How to Use (as part of the interactive_chat.py script)

This model is primarily intended to be used within the provided rcj_inference.py script. Here's how it's used:

python rcj_inference.py --classifier_model_path /path/to/personify-67m/ --lstm_model_path rcj_lstm.pth

Training Data πŸ“š

This model was trained on a custom dataset consisting of pairs of (person_id, text_sequence). Each text_sequence is a sample utterance attributed to the corresponding person_id.

  • Format:
    • Person 0 Text: "Oh, it's just you Qing, where is handsome GG though?"
    • Person 1 Text: "Hey, if you want to be my friend, you can add me on Discord <3"
    • ... and so on for other personas.
  • Preprocessing: Texts were tokenized into characters, and special SOS_TOKEN and EOS_TOKEN were added to the beginning and end of each sequence, respectively.
  • Vocabulary: A fixed vocabulary of printable ASCII characters plus a few special symbols (<SOS>, <EOS>, <PAD>, ❀, etc.) was used.

Training Procedure βš™οΈ

  • Framework: PyTorch
  • Loss Function: nn.CrossEntropyLoss (ignoring padding token).
  • Optimizer: Adam or AdamW.
  • Technique: Teacher forcing was likely used during training, where the true previous character is fed as input to predict the next character.
  • Epochs/Batch Size: (Specify if known, e.g., 50-200 epochs, batch size 32/64).

Generation Parameters

  • Temperature: A temperature parameter (typically 0.5 - 1.0) is used during sampling to control the randomness of the generated text.
    • Lower temperature (e.g., 0.5): More deterministic, less surprising, potentially more repetitive.
    • Higher temperature (e.g., 1.0): More random, more creative, potentially more nonsensical.

Intended Uses & Limitations ⚠️

Intended Uses:

  • To generate short, stylized textual responses for specific, predefined personas within the RCJ chatbot system.
  • To add personality and variety to chatbot interactions based on a classified user type or topic.

Limitations:

  • Character-Level:
    • May sometimes produce misspelled words or grammatically awkward (but often charmingly so) sentences.
    • Generation can be slower than word/subword level models.
    • May struggle with very long-range dependencies compared to larger transformer models.
  • Data Dependency: The quality and style of generated text are heavily dependent on the training data. If the training data is limited or biased, the generated text will reflect that.
  • Fixed Personas: Only generates text for the personas it was trained on.
  • No True Understanding: The model doesn't "understand" language or context in a human sense. It learns statistical patterns of character sequences associated with each persona ID.
  • Potential for Repetition: Like many generative models, it can sometimes fall into repetitive loops, especially with lower temperatures or if the training data has strong repetitive patterns.
  • Not a General Chatbot: This model itself is not a complete chatbot; it's a response generation component.

Bias, Risks, and Ethical Considerations

  • Learned Biases: If the training data contains biases related to the personas (e.g., stereotypes, offensive language), the model will learn and potentially replicate these biases in its generated text.
  • Misleading "Personality": Users might attribute more understanding or sentience to the generated persona than is actually present.
  • Inappropriate Content: If not carefully curated, the training data could lead to the model generating inappropriate, nonsensical, or offensive text. It's crucial to ensure the training data is clean and aligns with desired output.

Disclaimer

This model is a component created for a specific interactive system. Its performance and suitability are tied to that system and the data it was trained on. Use responsibly.


Model Author(s): qingy2024 (and contributors to the overall system)

Contact: qingy2024

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including qingy2024/RCJ-0.00022B-Instruct