Mozilla
/

smart-tab-topic

Model card Files Files and versions Community

rolf-mozilla commited on May 1

Commit

91a6ebc

·

verified ·

1 Parent(s): 6a9617a

Update README.md

Files changed (1) hide show

README.md +5 -3

README.md CHANGED Viewed

@@ -35,7 +35,7 @@ Training code is available here https://github.com/mozilla/smart-tab-grouping, t
 The model has a strict input in the following formats.
 ```
-Topic from keywords: [up to 3 comma-separated-lower-case-keywords]. titles \n [up to 3 \n separated titles]
 ```
 Keywords is optional and should not be included for single tab use cases.
@@ -48,7 +48,7 @@ Topic from keywords: . titles \n Dogs - Google Search
 ```
 ```
-Topic from keywords: dogs,food,pets. titles \n Dogs - Google Search\nDog Food - Shopping\nHow can I buy a pet - Google Search
 ```
@@ -63,12 +63,14 @@ It filters some swear words. In some instances it may output 'None' when uncerta
 ## Training Details
-Training data was created using OpenAI to create archetypes of 50 fake users and there imagined browsing activity for various tasks.
 Page titles from those synthetic pages were clustered, and then labeld using OpenAI, using an n-shot approach with hand-labeled examples in each query.
 Training data was augmented with page titles extracted from thosands of English page titles Common Crawl dataset.
 The training data was used to fine-tune a flan-t5-base model, which was later distilled on a t5-efficient-tiny model.
 The model was then quantized to q8 (8 bit precision) for use in production Firefox.

 The model has a strict input in the following formats.
 ```
+Topic from keywords: [up to 3 comma-separated-lower-case-keywords]. titles: \n [up to 3 \n separated titles]
 ```
 Keywords is optional and should not be included for single tab use cases.
 ```
 ```
+Topic from keywords: dogs,food,pets. titles: \nDogs - Google Search\nDog Food - Shopping\nHow can I buy a pet - Google Search
 ```
 ## Training Details
+Training data was created using OpenAI to create archetypes of 50 fake users and their imagined browsing activity for various tasks.
 Page titles from those synthetic pages were clustered, and then labeld using OpenAI, using an n-shot approach with hand-labeled examples in each query.
 Training data was augmented with page titles extracted from thosands of English page titles Common Crawl dataset.
+An additional pre-processing step applied during training removes less important words from the training topic labels in order to shorten the topic to 1 word in most cases.
 The training data was used to fine-tune a flan-t5-base model, which was later distilled on a t5-efficient-tiny model.
 The model was then quantized to q8 (8 bit precision) for use in production Firefox.