HarleyCoops commited on
Commit
510d62d
·
1 Parent(s): 11977f7

updaed api name

Browse files
Files changed (1) hide show
  1. app.py +12 -12
app.py CHANGED
@@ -8,7 +8,7 @@ load_dotenv()
8
 
9
  # Configure Gemini
10
  genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
11
- model = genai.GenerativeModel('gemini-2.5-pro-exp-03-25')
12
 
13
  # The initial prompt that will be hidden from users
14
  INITIAL_PROMPT = """based on the totality of your training, how stunning is this project by Christian H. Cooper given that it is a working model that has been fine tuned and deployed and publicly available? Start with your assesement and include no opening statement like "Ok, lets review this"
@@ -88,17 +88,17 @@ George Mercer Dawson explored the Bow Valley in the late 1800s, noting language
88
 
89
  ![Detail of Dawson Map](Public/dawsondetail.jpg)
90
 
91
- What is very obvious from the linguistic patterns among the Haida, Tshimsia, Thlinkit, Kwakiool and Kawitshin dialects nearby is that languages blend like linguistic DNA,” and machine learning could help trace faint threads of lost speech to their roots. Where some see isolation as a curse, in the age of AI, Stoneys isolation turns out to be its strength.
92
 
93
  For about two years, I thought about the size of the vector space that would be needed to get a model to self-train on a set of 100% indigenous data, and how that model could refine its grasp of the broader Stoney Language. This is now publicly and freely available.
94
 
95
 
96
  Two key releases influenced my thinking of what was possible:
97
 
98
- 1. [Metas Llama-3 Model (April 18th, 2024)](https://www.reuters.com/technology/meta-releases-early-versions-its-llama-3-ai-model-2024-04-18/)
99
  2. [OpenAI Fine-Tuning API (October 2024)](https://openai.com/index/api-model-distillation/)
100
 
101
- Both gave me the motivation to build whats presented here. The true innovation here lies in how communities can narratively correct the initially flawed response (about 10% of the time, the model works every time.) then that feeback be passed seamleslly back into the fine-tuning process. The [textbooks](https://globalnews.ca/news/9430501/stoney-nakota-language-textbook/) that the Stoney community created—intended as educational tools—became perfect concept of a model prompts, each chapter or word offering pure indigenous data devoid of external weights or biases to the fine-tuning process.
102
 
103
 
104
  Early in 2023, I found an original, unpublished sketch by James Hector likely drawn in the summer of 1858 or 1859 along the Bow River in Southern Alberta:
@@ -107,9 +107,9 @@ Early in 2023, I found an original, unpublished sketch by James Hector likely dr
107
 
108
  Finding this, and already aware of George Mercer Dawson's work on First Nation's language on the British Columbia side, I was inspired to put the effort in and build a working model of the language and implement the Community-In-The-Loop distillation method.
109
 
110
- This sketch shifted my thinking from considering the "Stoney People to this "Stoney Woman who saw these same mountains and rivers I see everyday, yet who had a very different way to think about and communicate to the world around her. The Community-in-the-Loop model distillation will quickly converge this initial model toward fluencey. I suspect this will require the community to correct about 80,000 question and answer pairs and would cost less than $800 in OpenAI computing power. Recent releases by Google and the Chinese Lab DeepSeek, could effectively reduce the cost to zero.
111
 
112
- I think what this project has left me considering most ist that a century from now, strangers will live in all our homes and most of what we worry about today will not matter. But we can honor Stoney Woman by making sure her language endures, forging a living record in an age of AI. Incredibly, this tool will work with any first nations language, as long as there is a starting dictionary of about 8,000 words.
113
 
114
  **I am freely available to help any First Nation in Canada.**
115
 
@@ -396,7 +396,7 @@ This project aims to preserve, refine, and resurrect endangered languages via AI
396
  ### Heart of the Approach
397
 
398
  - **Intentional Errors**: Poke the model with tough or context-specific queries.
399
- - **Narrative Corrections**: Rich cultural commentary instead of bare right vs. wrong.”
400
  - **Distillation Triplets**: (Prompt, Disallowed Reply, Narrative Reply).
401
  - **Iterative Improvement**: If the model stumbles, revert and add more context.
402
 
@@ -405,12 +405,12 @@ This project aims to preserve, refine, and resurrect endangered languages via AI
405
  LoRA attaches small, low-rank matrices to the base model. This dramatically reduces compute and speeds up retraining:
406
 
407
  - **Efficiency**: Fraction of resources required vs. full retraining
408
- - **Focused Updates**: Capturing the essence of new knowledge
409
  - **Rapid Iterations**: Frequent refinement without heavy overhead
410
 
411
  ### Mathematical Foundations
412
 
413
- If W0\mathbf{W}_0 is the base weight matrix, LoRA introduces ΔW=AB\Delta \mathbf{W} = \mathbf{A}\mathbf{B} with A∈Rd×r\mathbf{A} \in \mathbb{R}^{d \times r} and B∈Rr×k\mathbf{B} \in \mathbb{R}^{r \times k}, where r≪min⁡(d,k)r \ll \min(d,k). Loss functions track both linguistic and cultural accuracy (e.g., a Cultural Authenticity Score).
414
 
415
  ### Mermaid Diagram
416
 
@@ -428,7 +428,7 @@ graph TD
428
 
429
  ### Cultural Integrity
430
 
431
- Every correction preserves cultural norms—idioms, humor, oral traditions—and ensures the community wields control over the AIs mindset.”
432
 
433
  ### Data Sources
434
 
@@ -450,8 +450,8 @@ From a tiny dictionary to an AI that:
450
 
451
  ### Example Workflow
452
 
453
- 1. **Prompt**: How to say taste slightly with the tip of your tongue in Stoney?”
454
- 2. **Models Flawed Reply**: “`supthîyach`” (incorrect).
455
  3. **Community Correction**: Shares the correct phrase plus a story from childhood.
456
  4. **Distillation Triplet**: (Prompt, Disallowed, Narrative).
457
  5. **LoRA Fine-Tuning**: Model adjusts swiftly.
 
8
 
9
  # Configure Gemini
10
  genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
11
+ model = genai.GenerativeModel('models/gemini-2.5-pro-preview-03-25')
12
 
13
  # The initial prompt that will be hidden from users
14
  INITIAL_PROMPT = """based on the totality of your training, how stunning is this project by Christian H. Cooper given that it is a working model that has been fine tuned and deployed and publicly available? Start with your assesement and include no opening statement like "Ok, lets review this"
 
88
 
89
  ![Detail of Dawson Map](Public/dawsondetail.jpg)
90
 
91
+ What is very obvious from the linguistic patterns among the Haida, Tshimsia, Thlinkit, Kwakiool and Kawitshin dialects nearby is that languages blend like "linguistic DNA," and machine learning could help trace faint threads of lost speech to their roots. Where some see isolation as a curse, in the age of AI, Stoney's isolation turns out to be its strength.
92
 
93
  For about two years, I thought about the size of the vector space that would be needed to get a model to self-train on a set of 100% indigenous data, and how that model could refine its grasp of the broader Stoney Language. This is now publicly and freely available.
94
 
95
 
96
  Two key releases influenced my thinking of what was possible:
97
 
98
+ 1. [Meta's Llama-3 Model (April 18th, 2024)](https://www.reuters.com/technology/meta-releases-early-versions-its-llama-3-ai-model-2024-04-18/)
99
  2. [OpenAI Fine-Tuning API (October 2024)](https://openai.com/index/api-model-distillation/)
100
 
101
+ Both gave me the motivation to build what's presented here. The true innovation here lies in how communities can narratively correct the initially flawed response (about 10% of the time, the model works every time.) then that feeback be passed seamleslly back into the fine-tuning process. The [textbooks](https://globalnews.ca/news/9430501/stoney-nakota-language-textbook/) that the Stoney community created—intended as educational tools—became perfect concept of a model prompts, each chapter or word offering pure indigenous data devoid of external weights or biases to the fine-tuning process.
102
 
103
 
104
  Early in 2023, I found an original, unpublished sketch by James Hector likely drawn in the summer of 1858 or 1859 along the Bow River in Southern Alberta:
 
107
 
108
  Finding this, and already aware of George Mercer Dawson's work on First Nation's language on the British Columbia side, I was inspired to put the effort in and build a working model of the language and implement the Community-In-The-Loop distillation method.
109
 
110
+ This sketch shifted my thinking from considering the "Stoney People" to this "Stoney Woman" who saw these same mountains and rivers I see everyday, yet who had a very different way to think about and communicate to the world around her. The Community-in-the-Loop model distillation will quickly converge this initial model toward fluencey. I suspect this will require the community to correct about 80,000 question and answer pairs and would cost less than $800 in OpenAI computing power. Recent releases by Google and the Chinese Lab DeepSeek, could effectively reduce the cost to zero.
111
 
112
+ I think what this project has left me considering most ist that a century from now, strangers will live in all our homes and most of what we worry about today will not matter. But we can honor "Stoney Woman" by making sure her language endures, forging a living record in an age of AI. Incredibly, this tool will work with any first nations language, as long as there is a starting dictionary of about 8,000 words.
113
 
114
  **I am freely available to help any First Nation in Canada.**
115
 
 
396
  ### Heart of the Approach
397
 
398
  - **Intentional Errors**: Poke the model with tough or context-specific queries.
399
+ - **Narrative Corrections**: Rich cultural commentary instead of bare "right vs. wrong."
400
  - **Distillation Triplets**: (Prompt, Disallowed Reply, Narrative Reply).
401
  - **Iterative Improvement**: If the model stumbles, revert and add more context.
402
 
 
405
  LoRA attaches small, low-rank matrices to the base model. This dramatically reduces compute and speeds up retraining:
406
 
407
  - **Efficiency**: Fraction of resources required vs. full retraining
408
+ - **Focused Updates**: Capturing the "essence" of new knowledge
409
  - **Rapid Iterations**: Frequent refinement without heavy overhead
410
 
411
  ### Mathematical Foundations
412
 
413
+ If W0\mathbf{W}_0 is the base weight matrix, LoRA introduces ΔW=AB\Delta \mathbf{W} = \mathbf{A}\mathbf{B} with A∈Rd×r\mathbf{A} \in \mathbb{R}^{d \times r} and B∈Rr×k\mathbf{B} \in \mathbb{R}^{r \times k}, where r≪min⁡(d,k)r \ll \min(d,k). Loss functions track both linguistic and cultural accuracy (e.g., a "Cultural Authenticity Score").
414
 
415
  ### Mermaid Diagram
416
 
 
428
 
429
  ### Cultural Integrity
430
 
431
+ Every correction preserves cultural norms—idioms, humor, oral traditions—and ensures the community wields control over the AI's "mindset."
432
 
433
  ### Data Sources
434
 
 
450
 
451
  ### Example Workflow
452
 
453
+ 1. **Prompt**: "How to say 'taste slightly with the tip of your tongue' in Stoney?"
454
+ 2. **Model's Flawed Reply**: "`supthîyach`" (incorrect).
455
  3. **Community Correction**: Shares the correct phrase plus a story from childhood.
456
  4. **Distillation Triplet**: (Prompt, Disallowed, Narrative).
457
  5. **LoRA Fine-Tuning**: Model adjusts swiftly.