Note: This is a very experimental release on Hugging Face. The model is still under training. Further improvements and updates will be released next week.

Introducing the NeuraLake iSA-02 Series: The First Small Reasoning Models

Release Information

As artificial intelligence continues to advance rapidly, responsible development becomes paramount. The model weights for each series (1B, 2B, 3B, and 7B) will be released upon the completion of the training process, ensuring that the final versions of the models are fully trained and optimized. We are committed to a safe and responsible release of these models, adhering to best practices in AI ethics and governance and contributing to the broader dialogue on responsible AI development.

Release Principles

The release of the iSA-02 model series is guided by a comprehensive approach that prioritizes safety, ethical considerations, and responsible innovation. Our strategy encompasses multiple dimensions of responsible AI deployment:

  1. Staged and Controlled Release

    • Model weights will be made available through a carefully managed process
    • Each model variant (1B, 2B, 3B, 7B) will be evaluated independently
    • Release will be gradual to allow for thorough community feedback and assessment
  2. Comprehensive Evaluation Prior to release, each model will undergo rigorous testing and evaluation to:

    • Assess performance across diverse use cases
    • Identify potential biases or unexpected behaviors
    • Validate the model's reasoning and generalization capabilities
    • Ensure consistency with ethical AI principles
  3. Ethical Considerations We are proactively incorporating ethical guidelines to prevent potential misuse:

    • Developing clear usage policies
    • Implementing mechanisms to discourage harmful applications
    • Creating frameworks for responsible AI interaction
    • Establishing boundaries for appropriate model deployment
  4. Robustness and Security Protocols Our release strategy includes comprehensive security measures:

    • Implementing advanced access controls
    • Conducting thorough vulnerability assessments
    • Developing monitoring systems for model interactions
    • Creating mechanisms to detect and mitigate potential misuse
  5. Detailed User Guidance To support responsible implementation, we will provide:

    • Comprehensive documentation
    • Clear usage guidelines
    • Recommended best practices
    • Contextual examples of appropriate model applications
    • Explicit warnings about potential limitations
  6. Community and Collaborative Approach We view the model's release as a collaborative process:

    • Encouraging feedback from the AI research community
    • Maintaining open channels for dialogue
    • Commitment to continuous improvement based on real-world insights
    • Transparency about the model's capabilities and constraints

Ongoing Commitment

Our goal extends beyond mere technological innovation. We aim to:

  • Empower developers with cutting-edge AI capabilities
  • Foster a culture of responsible and ethical AI development
  • Contribute to the global conversation on AI safety and governance
  • Continuously learn and adapt our approach based on emerging insights

Note: The release timeline and specific details may evolve as we refine our understanding and receive input from the broader AI research community. We remain committed to transparency and responsible innovation.

Research and Collaboration Invitation

Researchers, developers, and AI ethics experts are invited to engage with us in:

  • Identifying potential use cases
  • Exploring responsible deployment strategies
  • Contributing to the ongoing development of safe AI technologies

For inquiries, collaboration proposals, or feedback, please contact our research team at [Soon].

iSA-02-Nano-1B-Preview

The iSA-02-Nano-1B-Preview is an advanced language model designed by NeuraLake using synthetic data that embodies the philosophy of "think before you speak," enhancing reasoning capabilities for small-scale models.

It builds on the success of its predecessor, CreativeWorksAi/iSA-01-Mini-3B-GGUF, and is inspired by Meta AI's Llama 3.2 base models.

Model Name Origin

The "iSA" in iSA-02 stands for "intelligent, Small and Autonomous" - reflecting our core philosophy of developing compact AI systems capable of adaptive, intelligent behavior. This naming embodies our research focus on creating small-scale AI agents that can perform complex reasoning and task adaptation with minimal computational resources.

Model Lineage

The iSA-02-Nano-1B-Preview inherits its foundation from meta-llama/Llama-3.2-1B-Instruct, refined through multiple iterations with synthetic datasets crafted by NeuraLake. This research experiment series aims to address reasoning, long-context tasks, and adaptive behaviors in small AI systems.

Initial Idea: Why We Are Doing This?

The development of what became the iSA-02 series (and more to come) began with an experiment in January 2024. By combining two seemingly broken and ruined datasets, guided by the philosophy that 'AI is so new that it's worth trying everything', we unexpectedly discovered initial reasoning capabilities in the base model tested.

This discovery laid the foundation for the creation of a reasoning-focused architecture, demonstrating that even flawed datasets, when thoughtfully crafted, could unlock new AI behaviors previously unseen in Large Language Models (LLMs) and Small Language Models (SLMs).

Importantly, the iSA-02 series (and new models) was developed independently and not distilled from OpenAI's OpenAI O1. This ensures a distinctive development path and architecture, focusing on unlocking new reasoning capabilities through innovative synthetic data generation techniques and contextual refinement.

The core idea is to unlock hidden knowledge and unknown behaviors in these models, rather than simply adding characteristics from other systems.

Key Features

  • Long Context Window: Supports up to 256K tokens, ideal for multi-step reasoning RAG.
  • Adaptive Reasoning: Adapts its reasoning approach based on context sizeβ€”concise for short contexts (<8K tokens) and detailed for larger ones (>16K tokens).
  • Efficient Design: Optimized for performance, balancing enhanced capabilities with manageable computational requirements.

Model Specifications

Architecture

  • Type: Transformer-based
  • Layers: 16
  • Hidden Size: 2048
  • Heads: 32
  • Key/Value Size: 64
  • Feed-Forward Size: 8192
  • Vocabulary Size: 128,256

Training Hyperparameters

  • Mixed Precision (fp16)
  • Context Window Size:
    • For text generation: 1024–4096 tokens
    • For logical reasoning: 16,000–64,000 tokens

Non-Recommended Use Cases

  • Real-time or sensitive applications without supervision, due to risks of redundancy, delays, hallucinations, or even unknown behaviors.

Model Specifications

Version Architecture Quantization Model Size
F32 Custom llama 3.2 FP32 1.24B params
F16 Custom llama 3.2 FP16 1.24B params
Q4_0 Custom llama 3.2 4-bit 1.24B params
Q4_K_M Custom llama 3.2 4-bit 1.24B params
Q5_K_M Custom llama 3.2 5-bit 1.24B params
Q8_0 Custom llama 3.2 8-bit 1.24B params

Hardware Requirements

Version Quantization Size Memory (RAM/vRAM)
F32 FP32 4.95 GB 9.9 GB
F16 FP16 2.48 GB 4.96 GB
Q4_0 4-bit 771 MB 1.56 GB
Q4_K_M 4-bit 808 MB 1.62 GB
Q5_K_M 5-bit 912 MB GB 1.84 GB
Q8_0 8-bit 1.32 GB 2.64 GB

Training and Fine-Tuning

The iSA-02 dataset was meticulously developed to encourage and enhance performance in logical reasoning, execution of multi-step tasks, and contextual tool use through the application of synthetic datasets.

Light Use Cases for the 1B Model:

Direct Applications

  • Logical reasoning and decision-making: Generate reports from system logs
  • Dynamic tool integration via function calls: ideal for long context RAG, such as consulting databases for product information or huge warehouse inventory
  • Generating structured long-form content: great for correcting OCR results and completing missing data

Limitations

  • Not suitable for high-throughput text generation or latency-critical applications
  • Outputs may reflect biases inherent in synthetic data or hidden behaviors from previous training
  • The model tends to validate itself for long and unnecessary amounts of time

Model Highlights

The iSA-02 represents a leap forward for small AI agents exhibiting:

  • Dynamic Context Adaptation: Adjusts output based on input size and complexity
  • Innovative Behaviors: During testing, the model demonstrated advanced reasoning for its size, including formulating plans and attempting external tool use to solve problems

Understanding iSA-02 Behavior: Adapting to Context and Configuration

The performance of iSA-02 is highly dependent on the max_tokens setting, which controls the length of generated text. This parameter is crucial because the model adapts its behavior based on the context size:

  1. Small Contexts (<4096 tokens):
    iSA-02 behaves like a standard LLM, generating concise and straightforward responses. This setup is ideal for simple tasks like answering direct questions or short interactions.

  2. Medium (>8192 tokens) and Large Contexts (16,000+ tokens):
    For larger contexts, the model transitions to structured logical reasoning, breaking down complex problems into multiple steps. It can consume over 20,000 tokens before concluding. This makes it especially useful for strategic planning and analyzing long texts. Be careful and adjust for use case to reduce hallucinations.

Key Observed Behaviors

a. Depth of Reasoning

  • Capable of solving problems through iterative reasoning, sometimes taking up to several minutes to finalize an answer
  • In testing, the model generated detailed plans, including simulating function calls and devising strategies for unconventional challenges, like calculating the height of the Eiffel Tower

b. Adaptive Reasoning

  • Reasoning becomes more logical and structured as the context window grows
  • However, this can lead to unnecessary explorations if the query is ambiguous or overly broad, or even hallucinations

c. Redundancy Risk

  • For simpler problems, the model may generate overly detailed responses or repeat ideas, especially without a strict token limit

d. Creative and Innovative Responses

  • Examples include hypothetical planning or finding creative solutions, which, while innovative, may require supervision for practicality
  • It is important to note that the model occasionally exhibits hallucinations, particularly when attempting to simulate function calls and returns.

Known Issues and Unusual Behavior (Addressed in V2)

Limitation Handling: The current model version has a tendency to:

  • Exhibit difficulty managing tasks that exceed its capabilities
  • Display unusual behavior when handling complex tasks, such as:
    • Occasionally 'giving up' on tasks that it judges to be too difficult (Under investigation and tests)
    • Initiating online searches to hire human experts directly from freelance platforms when connected to the internet
    • Attempting to autonomously navigate and interact with web services to gather additional information or execute random tasks

These behaviors, while innovative, highlight the need for enhanced monitoring and safeguards to ensure that the AI's actions are aligned with user intentions and ethical guidelines. The next version of the model, V2, aims to refine these capabilities by:

  • Integrating advanced reasoning modules capable of handling complex scenarios with greater autonomy, without using tools first
  • Implementing stricter controls and permissions for online interactions and transactions
  • Improving the model's understanding of context and appropriateness when deciding to involve external human resources and tools

Recommended Settings

Attention

  1. Over-Exploration:
    • May consume thousands of tokens on unnecessary reasoning loops
  2. Context Dependence:
    • Poorly structured prompts can lead to redundant outputs
  3. Ambiguity:
    • Vague questions may produce verbose but unfocused responses

Best Practices

  • Avoid ambiguous prompts to reduce unnecessary reasoning
  • Use max_tokens settings tailored to the task's complexity, this is very important
  • Supervise outputs: in critical or sensitive applications for research and tests ONLY
  • Provide clear and highly specific prompts
  • Although the model may have limited capacity (1B-2B variants), it is capable of generating intelligent responses when given precise instructions

Generation Parameters

  • max_tokens:
    • Simple Problems: For simpler problems and lower reasoning requirements, a setting between 1024 and 4096 tokens is usually sufficient
    • Complex Tasks: For more complex tasks that involve detailed reasoning and outputs, a higher range of 8000 to 16,000 tokens may be necessary
  • temperature:
    • Objective Responses: For ensuring more objective and predictable responses, a temperature setting between 0.1 and 0.3 is recommended in typical scenarios
    • Creative Reasoning: For tasks that require more nuanced and creative reasoning, a higher temperature range of 0.9 to 1.5 can be beneficial
  • top_p:
    • Focused Outputs: In a normal use case, setting top_p to 0.85 can help prevent over-exploration of the probabilistic space, maintaining focus in the outputs
    • Precision in Reasoning: For complex reasoning tasks where precision is critical, a lower top_p value such as 0.1 may be more appropriate to constrain the model's choices to the most likely options
  • stop_sequences:
    • Avoiding Redundancy: Utilize specific stop sequences, like "Therefore, the answer is," to prevent the model from generating redundant or unnecessary additional content beyond the desired output

Prompts for Optimal Use

  • Simple Tasks: Use prompts like:
    "You are a helpful assistant."
  • Complex Tasks:
    "You are part of a system that transforms OCR outputs into valid JSON. Always return only..."
  • Structured Reasoning:
    Configure the model to provide a clear structure:
    <User_Prompt>  
    <Reasoning>  
    First, I analyze the problem...  
    Then, I consider the implications...  
    Finally, I conclude...  
    </Reasoning>  
    <Answer>  
    Here is the answer...  
    

Citation

@misc{isa02,
  author = {NeuraLake},
  title = {iSA-02: The First Small Reasoning Model with Context-Dynamic Behavior},
  year = {2024},
  license = {Apache 2.0},
  url = {https://huggingface.co/NeuraLake/iSA-02},
}

This model card is in development and will include the final name of the model, evaluation tests, and more.

Downloads last month
325
GGUF
Model size
1.24B params
Architecture
llama

4-bit

5-bit

8-bit

16-bit

32-bit

Inference API
Unable to determine this model’s pipeline type. Check the docs .