Spaces:
Runtime error
Runtime error
Domain,Task,Explaination | |
CV,depth-estimation,"Depth estimation is the task of predicting depth of the objects present in an image. About Depth Estimation. Depth estimation models can be used to estimate the depth of different objects present in an image. Estimation of Volumetric Information | |
Depth estimation models are widely used to study volumetric formation of objects present inside an image. This is an important use case in the domain of computer graphics. | |
3D Representation Depth estimation models can also be used to develop a 3D representation from a 2D image." | |
CV,image-classification,"Image classification is the task of assigning a label or class to an entire image. Images are expected to have only one class for each image. Image classification models take an image as input and return a prediction about which class the image belongs to. Use Cases | |
Image classification models can be used when we are not interested in specific instances of objects with location information or their shape. | |
Keyword Classification | |
Image classification models are used widely in stock photography to assign each image a keyword. | |
Image Search | |
Models trained in image classification can improve user experience by organizing and categorizing photo galleries on the phone or in the cloud, on multiple keywords or tags." | |
CV,image-segmentation,"Image Segmentation divides an image into segments where each pixel in the image is mapped to an object. This task has multiple variants such as instance segmentation, panoptic segmentation and semantic segmentation. Use Cases | |
Autonomous Driving | |
Segmentation models are used to identify road patterns such as lanes and obstacles for safer driving. | |
Background Removal | |
Image Segmentation models are used in cameras to erase the background of certain objects and apply filters to them. | |
Medical Imaging | |
Image Segmentation models are used to distinguish organs or tissues, improving medical imaging workflows. Models are used to segment dental instances, analyze X-Ray scans or even segment cells for pathological diagnosis. This dataset contains images of lungs of healthy patients and patients with COVID-19 segmented with masks. Another segmentation dataset contains segmented MRI data of the lower spine to analyze the effect of spaceflight simulation. | |
Task Variants | |
Semantic Segmentation | |
Semantic Segmentation is the task of segmenting parts of an image that belong to the same class. Semantic Segmentation models make predictions for each pixel and return the probabilities of the classes for each pixel. These models are evaluated on Mean Intersection Over Union (Mean IoU). | |
Instance Segmentation | |
Instance Segmentation is the variant of Image Segmentation where every distinct object is segmented, instead of one segment per class. | |
Panoptic Segmentation | |
Panoptic Segmentation is the Image Segmentation task that segments the image both by instance and by class, assigning each pixel a different instance of the class." | |
CV,image-to-image,"Image-to-image is the task of transforming a source image to match the characteristics of a target image or a target image domain. Any image manipulation and enhancement is possible with image to image models. Style transfer | |
One of the most popular use cases of image to image is the style transfer. Style transfer models can convert a regular photography into a painting in the style of a famous painter. | |
Task Variants | |
Image inpainting | |
Image inpainting is widely used during photography editing to remove unwanted objects, such as poles, wires or sensor dust. | |
Image colorization | |
Old, black and white images can be brought up to life using an image colorization model. | |
Super Resolution | |
Super resolution models increase the resolution of an image, allowing for higher quality viewing and printing." | |
CV,object-detection,"Object Detection models allow users to identify objects of certain defined classes. Object detection models receive an image as input and output the images with bounding boxes and labels on detected objects. Autonomous Driving | |
Object Detection is widely used in computer vision for autonomous driving. Self-driving cars use Object Detection models to detect pedestrians, bicycles, traffic lights and road signs to decide which step to take. | |
Object Tracking in Matches | |
Object Detection models are widely used in sports where the ball or a player is tracked for monitoring and refereeing during matches. | |
Image Search | |
Object Detection models are widely used in image search. Smartphones use Object Detection models to detect entities (such as specific places or objects) and allow the user to search for the entity on the Internet. | |
Object Counting | |
Object Detection models are used to count instances of objects in a given image, this can include counting the objects in warehouses or stores, or counting the number of visitors in a store. They are also used to manage crowds at events to prevent disasters." | |
CV,video-classification,"Video classification is the task of assigning a label or class to an entire video. Videos are expected to have only one class for each video. Video classification models take a video as input and return a prediction about which class the video belongs to. Activity Recognition | |
Video classification models are used to perform activity recognition which is useful for fitness applications. Activity recognition is also helpful for vision-impaired individuals especially when they're commuting. | |
Video Search | |
Models trained in video classification can improve user experience by organizing and categorizing video galleries on the phone or in the cloud, on multiple keywords or tags." | |
CV,unconditional-image-generation,"Unconditional image generation is the task of generating images with no condition in any context (like a prompt text or another image). Once trained, the model will create images that resemble its training data distribution. Unconditional image generation is the task of generating new images without any specific input. The main goal of this is to create novel, original images that are not based on existing images. This can be used for a variety of applications, such as creating new artistic images, improving image recognition algorithms, or generating photorealistic images for virtual reality environments. | |
Unconditional image generation models usually start with a seed that generates a random noise vector. The model will then use this vector to create an output image similar to the images used for training the model. | |
An example of unconditional image generation would be generating the image of a face on a model trained with the CelebA dataset or generating a butterfly on a model trained with the Smithsonian Butterflies dataset. | |
Generative adversarial networks and Diffusion are common architectures for this task. | |
Use Cases | |
Unconditional image generation can be used for a variety of applications. | |
Artistic Expression | |
Unconditional image generation can be used to create novel, original artwork that is not based on any existing images. This can be used to explore new creative possibilities and produce unique, imaginative images. | |
Data Augmentation | |
Unconditional image generation models can be used to generate new images to improve the performance of image recognition algorithms. This makes algorithms more robust and able to handle a broader range of images. | |
Virtual Reality | |
Unconditional image generation models can be used to create photorealistic images that can be used in virtual reality environments. This makes the VR experience more immersive and realistic. | |
Medical Imaging | |
Unconditional image generation models can generate new medical images, such as CT or MRI scans, that can be used to train and evaluate medical imaging algorithms. This can improve the accuracy and reliability of these algorithms. | |
Industrial Design | |
Unconditional image generation models can generate new designs for products, such as clothing or furniture, that are not based on any existing designs. This way, designers can explore new creative possibilities and produce unique, innovative designs." | |
CV,zero-shot-image-classification,"Zero shot image classification is the task of classifying previously unseen classes during training of a model. Zero-shot image classification is a computer vision task to classify images into one of several classes, without any prior training or knowledge of the classes. | |
Zero shot image classification works by transferring knowledge learnt during training of one model, to classify novel classes that was not present in the training data. So this is a variation of transfer learning. For instance, a model trained to differentiate cars from airplanes can be used to classify images of ships. | |
The data in this learning paradigm consists of | |
Seen data - images and their corresponding labels | |
Unseen data - only labels and no images | |
Auxiliary information - additional information given to the model during training connecting the unseen and seen data. This can be in the form of textual description or word embeddings. | |
Use Cases | |
Image Retrieval | |
Zero-shot learning resolves several challenges in image retrieval systems. For example, with the rapid growth of categories on the web, it is challenging to index images based on unseen categories. With zero-shot learning we can associate unseen categories to images by exploiting attributes to model the relationships among visual features and labels. | |
Action Recognition | |
Action recognition is the task of identifying when a person in an image/video is performing a given action from a set of actions. If all the possible actions are not known beforehand, conventional deep learning models fail. With zero-shot learning, for a given domain of a set of actions, we can create a mapping connecting low-level features and a semantic description of auxiliary data to classify unknown classes of actions." | |
NLP ,conversational,"Conversational response modelling is the task of generating conversational text that is relevant, coherent and knowledgable given a prompt. These models have applications in chatbots, and as a part of voice assistants Chatbot ?? | |
Chatbots are used to have conversations instead of providing direct contact with a live human. They are used to provide customer service, sales, and can even be used to play games (see ELIZA from 1966 for one of the earliest examples). | |
Voice Assistants ??? | |
Conversational response models are used as part of voice assistants to provide appropriate responses to voice based queries." | |
NLP ,fill-mask,"Masked language modeling is the task of masking some of the words in a sentence and predicting which words should replace those masks. These models are useful when we want to get a statistical understanding of the language in which the model is trained in. Domain Adaptation ????? | |
Masked language models do not require labelled data! They are trained by masking a couple of words in sentences and the model is expected to guess the masked word. This makes it very practical! | |
For example, masked language modeling is used to train large models for domain-specific problems. If you have to work on a domain-specific task, such as retrieving information from medical research papers, you can train a masked language model using those papers. ?? | |
The resulting model has a statistical understanding of the language used in medical research papers, and can be further trained in a process called fine-tuning to solve different tasks, such as Text Classification or Question Answering to build a medical research papers information extraction system. ????? Pre-training on domain-specific data tends to yield better results (see this paper for an example). | |
If you don't have the data to train a masked language model, you can also use an existing domain-specific masked language model from the Hub and fine-tune it with your smaller task dataset. That's the magic of Open Source and sharing your work! ??" | |
NLP ,question-answering,"Question Answering models can retrieve the answer to a question from a given text, which is useful for searching for an answer in a document. Some question answering models can generate answers without context! Frequently Asked Questions | |
You can use Question Answering (QA) models to automate the response to frequently asked questions by using a knowledge base (documents) as context. Answers to customer questions can be drawn from those documents. | |
?? If you�d like to save inference time, you can first use passage ranking models to see which document might contain the answer to the question and iterate over that document with the QA model instead. | |
Task Variants | |
There are different QA variants based on the inputs and outputs: | |
Extractive QA: The model extracts the answer from a context. The context here could be a provided text, a table or even HTML! This is usually solved with BERT-like models. | |
Open Generative QA: The model generates free text directly based on the context. You can learn more about the Text Generation task in its page. | |
Closed Generative QA: In this case, no context is provided. The answer is completely generated by a model. | |
The schema above illustrates extractive, open book QA. The model takes a context and the question and extracts the answer from the given context. | |
You can also differentiate QA models depending on whether they are open-domain or closed-domain. Open-domain models are not restricted to a specific domain, while closed-domain models are restricted to a specific domain (e.g. legal, medical documents)." | |
NLP ,sentence-similarity,"Sentence Similarity is the task of determining how similar two texts are. Sentence similarity models convert input texts into vectors (embeddings) that capture semantic information and calculate how close (similar) they are between them. This task is particularly useful for information retrieval and clustering/grouping. Information Retrieval | |
You can extract information from documents using Sentence Similarity models. The first step is to rank documents using Passage Ranking models. You can then get to the top ranked document and search it with Sentence Similarity models by selecting the sentence that has the most similarity to the input query. | |
The Sentence Transformers library | |
The Sentence Transformers library is very powerful for calculating embeddings of sentences, paragraphs, and entire documents. An embedding is just a vector representation of a text and is useful for finding how similar two texts are. | |
You can find and use hundreds of Sentence Transformers models from the Hub by directly using the library, playing with the widgets in the browser or using the Inference API. | |
Task Variants | |
Passage Ranking | |
Passage Ranking is the task of ranking documents based on their relevance to a given query. The task is evaluated on Mean Reciprocal Rank. These models take one query and multiple documents and return ranked documents according to the relevancy to the query." | |
NLP ,summarization,"Summarization is the task of producing a shorter version of a document while preserving its important information. Some models can extract text from the original input, while other models can generate entirely new text. Research papers can be summarized to allow researchers to spend less time selecting which articles to read. There are several approaches you can take for a task like this: | |
Use an existing extractive summarization model on the Hub to do inference. | |
Pick an existing language model trained for academic papers. This model can then be trained in a process called fine-tuning so it can solve the summarization task. | |
Use a sequence-to-sequence model like T5 for abstractive text summarization." | |
NLP ,table-question-answering,"Table Question Answering (Table QA) is the answering a question about an information on a given table. SQL execution | |
You can use the Table Question Answering models to simulate SQL execution by inputting a table. | |
Table Question Answering | |
Table Question Answering models are capable of answering questions based on a table. | |
Task Variants | |
This place can be filled with variants of this task if there's any." | |
NLP ,text-classification,"Text Classification is the task of assigning a label or class to a given text. Some use cases are sentiment analysis, natural language inference, and assessing grammatical correctness. Sentiment Analysis on Customer Reviews | |
You can track the sentiments of your customers from the product reviews using sentiment analysis models. This can help understand churn and retention by grouping reviews by sentiment, to later analyze the text and make strategic decisions based on this knowledge. | |
Task Variants | |
Natural Language Inference (NLI) | |
In NLI the model determines the relationship between two given texts. Concretely, the model takes a premise and a hypothesis and returns a class that can either be: | |
entailment, which means the hypothesis is true. | |
contraction, which means the hypothesis is false. | |
neutral, which means there's no relation between the hypothesis and the premise. | |
The benchmark dataset for this task is GLUE (General Language Understanding Evaluation). NLI models have different variants, such as Multi-Genre NLI, Question NLI and Winograd NLI." | |
NLP ,text-generation,"Generating text is the task of producing new text. These models can, for example, fill in incomplete text or paraphrase. This task covers guides on both text-generation and text-to-text generation models. Popular large language models that are used for chats or following instructions are also covered in this task. You can find the list of selected open-source large language models here, ranked by their performance scores. | |
Use Cases | |
Instruction Models | |
A model trained for text generation can be later adapted to follow instructions. One of the most used open-source models for instruction is OpenAssistant, which you can try at Hugging Chat. | |
Code Generation | |
A Text Generation model, also known as a causal language model, can be trained on code from scratch to help the programmers in their repetitive coding tasks. One of the most popular open-source models for code generation is StarCoder, which can generate code in 80+ languages. You can try it here. | |
Stories Generation | |
A story generation model can receive an input like ""Once upon a time"" and proceed to create a story-like text based on those first words. You can try this application which contains a model trained on story generation, by MosaicML. | |
If your generative model training data is different than your use case, you can train a causal language model from scratch. Learn how to do it in the free transformers course! | |
Task Variants | |
Completion Generation Models | |
A popular variant of Text Generation models predicts the next word given a bunch of words. Word by word a longer text is formed that results in for example: | |
Given an incomplete sentence, complete it. | |
Continue a story given the first sentences. | |
Provided a code description, generate the code. | |
The most popular models for this task are GPT-based models (such as GPT-2). These models are trained on data that has no labels, so you just need plain text to train your own model. You can train GPT models to generate a wide variety of documents, from code to stories. | |
Text-to-Text Generation Models | |
These models are trained to learn the mapping between a pair of texts (e.g. translation from one language to another). The most popular variants of these models are T5, T0 and BART. Text-to-Text models are trained with multi-tasking capabilities, they can accomplish a wide range of tasks, including summarization, translation, and text classification." | |
NLP ,token-classification,"Token classification is a natural language understanding task in which a label is assigned to some tokens in a text. Some popular token classification subtasks are Named Entity Recognition (NER) and Part-of-Speech (PoS) tagging. NER models could be trained to identify specific entities in a text, such as dates, individuals and places; and PoS tagging would identify, for example, which words in a text are verbs, nouns, and punctuation marks. Information Extraction from Invoices | |
You can extract entities of interest from invoices automatically using Named Entity Recognition (NER) models. Invoices can be read with Optical Character Recognition models and the output can be used to do inference with NER models. In this way, important information such as date, company name, and other named entities can be extracted. | |
Task Variants | |
Named Entity Recognition (NER) | |
NER is the task of recognizing named entities in a text. These entities can be the names of people, locations, or organizations. The task is formulated as labeling each token with a class for each named entity and a class named ""0"" for tokens that do not contain any entities. The input for this task is text and the output is the annotated text with named entities." | |
NLP ,translation,"Translation is the task of converting text from one language to another. You can find over a thousand Translation models on the Hub, but sometimes you might not find a model for the language pair you are interested in. When this happen, you can use a pretrained multilingual Translation model like mBART and further train it on your own data in a process called fine-tuning. | |
Multilingual conversational agents | |
Translation models can be used to build conversational agents across different languages. This can be done in two ways. | |
Translate the dataset to a new language. You can translate a dataset of intents (inputs) and responses to the target language. You can then train a new intent classification model with this new dataset. This allows you to proofread responses in the target language and have better control of the chatbot's outputs. | |
Translate the input and output of the agent. You can use a Translation model in user inputs so that the chatbot can process it. You can then translate the output of the chatbot into the language of the user. This approach might be less reliable as the chatbot will generate responses that were not defined before." | |
NLP ,zero-shot-classification,"Zero-shot text classification is a task in natural language processing where a model is trained on a set of labeled examples but is then able to classify new examples from previously unseen classes. Zero Shot Classification is the task of predicting a class that wasn't seen by the model during training. This method, which leverages a pre-trained language model, can be thought of as an instance of transfer learning which generally refers to using a model trained for one task in a different application than what it was originally trained for. This is particularly useful for situations where the amount of labeled data is small. | |
In zero shot classification, we provide the model with a prompt and a sequence of text that describes what we want our model to do, in natural language. Zero-shot classification excludes any examples of the desired task being completed. This differs from single or few-shot classification, as these tasks include a single or a few examples of the selected task. | |
Zero, single and few-shot classification seem to be an emergent feature of large language models. This feature seems to come about around model sizes of +100M parameters. The effectiveness of a model at a zero, single or few-shot task seems to scale with model size, meaning that larger models (models with more trainable parameters or layers) generally do better at this task. | |
Here is an example of a zero-shot prompt for classifying the sentiment of a sequence of text: | |
Classify the following input text into one of the following three categories: [positive, negative, neutral] | |
Input Text: Hugging Face is awesome for making all of these | |
state of the art models available! | |
Sentiment: positive | |
One great example of this task with a nice off-the-shelf model is available at the widget of this page, where the user can input a sequence of text and candidate labels to the model. This is a word level example of zero shot classification, more elaborate and lengthy generations are available with larger models. Testing these models out and getting a feel for prompt engineering is the best way to learn how to use them." | |