Alibaba-NLP/gte-base-en-v1.5 trained on query-to-dataset-viewer-descriptions

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-base-en-v1.5 on the query-to-dataset-viewer-descriptions dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("query-to-dataset-viewer-descriptions")
# Run inference
sentences = [
    'USER_QUERY: kotlin code dataset',
    'HUB_DATASET_PREVIEW: DATASET_NAME: "mvasiliniuc/iva-kotlin-codeint"\nFEATURES: {\'repo_name\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'path\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'copies\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'size\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'content\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'license\': {\'dtype\': \'string\', \'_type\': \'Value\'}}\nDATA SAMPLE:\n[\n  {\n    "row_idx": 0,\n    "row": {\n      "repo_name": "Cognifide/gradle-aem-plugin",\n      "path": "src/main/kotlin/com/cognifide/gradle/aem/instance/tasks/InstanceReload.kt",\n      "copies": "1",\n      "size": "1052",\n      "content": "package com.cognifide.gradle.aem.instance.tasks\\n\\nimport com.cognifide.gradle.aem.common.instance.action.AwaitUpAction\\nimport com.cognifide.gradle.aem.common.instance.action.ReloadAction\\nimport com.cognifide.gradle.aem.common.instance.names\\nimport com.cognifide.gradle.aem.common.tasks.Instance\\nimport org.gradle.api.tasks.TaskAction\\n\\nopen class InstanceReload : Instance() {\\n\\n    private var reloadOptions: ReloadAction.() -> Unit = {}\\n\\n    fun reload(options: ReloadAction.() -> Unit) {\\n        this.reloadOptions = options\\n    }\\n\\n    private var awaitUpOptions: AwaitUpAction.() -> Unit = {}\\n\\n    fun awaitUp(options: AwaitUpAction.() -> Unit) {\\n        this.awaitUpOptions = options\\n    }\\n\\n    @TaskAction\\n    fun reload() {\\n        instanceManager.awaitReloaded(anyInstances, reloadOptions, awaitUpOptions)\\n        common.notifier.lifecycle(\\"Instance(s) reloaded\\", \\"Which: ${anyInstances.names}\\")\\n    }\\n\\n    init {\\n        description = \\"Reloads all AEM instance(s).\\"\\n    }\\n\\n    companion object {\\n        const val NAME = \\"instanceReload\\"\\n    }\\n}\\n",\n      "license": "apache-2.0"\n    },\n    "truncated_cells": []\n  },\n  {\n    "row_idx": 1,\n    "row": {\n      "repo_name": "80998062/Fank",\n      "path": "presentation/src/main/java/com/sinyuk/fanfou/ui/status/StatusView.kt",\n      "copies": "1",\n      "size": "8490",\n      "content": "/*\\n *\\n *  * Apache License\\n *  *\\n *  * Copyright [2017] Sinyuk\\n *  *\\n *  * Licensed under the Apache License, Version 2.0 (the \\"License\\");\\n *  * you may not use this file except in compliance with the License.\\n *  * You may obtain a copy of the License at\\n *  *\\n *  *     http://www.apache.org/licenses/LICENSE-2.0\\n *  *\\n *  * Unless required by applicable law or agreed to in writing, software\\n *  * distributed under the License is distributed on an \\"AS IS\\" BASIS,\\n *  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\\n *  * See the License for the specific language governing permissions and\\n *  * limitations under the License.\\n *\\n */\\n\\npackage com.sinyuk.fanfou.ui.status\\n\\nimport android.os.Build\\nimport android.os.Bundle\\nimport android.support.v4.app.Fragment\\nimport android.support.v4.app.FragmentPagerAdapter\\nimport android.text.Editable\\nimport android.text.TextWatcher\\nimport android.view.View\\nimport android.view.ViewTreeObserver\\nimport cn.dreamtobe.kpswitch.util.KeyboardUtil\\nimport com.linkedin.android.spyglass.suggestions.SuggestionsResult\\nimport com.linkedin.android.spyglass.suggestions.interfaces.Suggestible\\nimport com.linkedin.android.spyglass.suggestions.interfaces.SuggestionsResultListener\\nimport com.linkedin.android.spyglass.suggestions.interfaces.SuggestionsVisibilityManager\\nimport com.linkedin.android.spyglass.tokenization.QueryToken\\nimport com.linkedin.android.spyglass.tokenization.impl.WordTokenizer\\nimport com.linkedin.android.spyglass.tokenization.impl.WordTokenizerConfig\\nimport com.linkedin.android.spyglass.tokenization.interfaces.QueryTokenReceiver\\nimport com.sinyuk.fanfou.R\\nimport com.sinyuk.fanfou.base.AbstractActivity\\nimport com.sinyuk.fanfou.base.AbstractFragment\\nimport com.sinyuk.fanfou.di.Injectable\\nimport com.sinyuk.fanfou.domain.DO.Player\\nimport com.sinyuk.fanfou.domain.DO.Status\\nimport com.sinyuk.fanfou.domain.STATUS_LIMIT\\nimport com.sinyuk.fanfou.domain.StatusCreation\\nimport com.sinyuk.fanfou.domain.TIMELINE_CONTEXT\\nimport com.sinyuk.fanfou.ui.editor.EditorView\\nimport com.sinyuk.fanfou.ui.editor.MentionListView\\nimport com.sinyuk.fanfou.ui.timeline.TimelineView\\nimport com.sinyuk.fanfou.util.obtainViewModelFromActivity\\nimport com.sinyuk.fanfou.viewmodel.FanfouViewModelFactory\\nimport com.sinyuk.fanfou.viewmodel.PlayerViewModel\\nimport kotlinx.android.synthetic.main.status_view.*\\nimport kotlinx.android.synthetic.main.status_view_footer.*\\nimport kotlinx.android.synthetic.main.status_view_reply_actionbar.*\\nimport javax.inject.Inject\\n\\n\\n/**\\n * Created by sinyuk on 2018/1/12.\\n *\\n */\\nclass StatusView : AbstractFragment(), Injectable, QueryTokenReceiver, SuggestionsResultListener, SuggestionsVisibilityManager {\\n\\n    companion object {\\n        fun newInstance(status: Status, photoExtra: Bundle? = null) = StatusView().apply {\\n            arguments = Bundle().apply {\\n                putParcelable(\\"status\\", status)\\n                putBundle(\\"photoExtra\\", photoExtra)\\n            }\\n        }\\n    }\\n\\n    override fun layoutId() = R.layout.status_view\\n\\n    @Inject\\n    lateinit var factory: FanfouViewModelFactory\\n\\n    private val playerViewModel by lazy { obtainViewModelFromActivity(factory, PlayerViewModel::class.java) }\\n\\n    override fun onEnterAnimationEnd(savedInstanceState: Bundle?) {\\n        super.onEnterAnimationEnd(savedInstanceState)\\n        navBack.setOnClickListener { onBackPressedSupport() }\\n        setupEditor()\\n        setupKeyboard()\\n        onTextChanged(0)\\n        setupViewPager()\\n\\n        val status = arguments!!.getParcelable<Status>(\\"status\\")\\n        fullscreenButton.setOnClickListener {\\n            (activity as AbstractActivity).start(EditorView.newInstance(status.id,\\n                    replyEt.mentionsText,\\n                    StatusCreation.REPOST_STATUS))\\n            replyEt.text = null\\n        }\\n    }\\n\\n    private fun setupViewPager() {\\n        val status = arguments!!.getParcelable<Status>(\\"status\\")\\n        val bundle = arguments!!.getBundle(\\"photoExtra\\")\\n        val fragments: List<Fragment> = if (findChildFragment(TimelineView::class.java) == null) {\\n            val mentionView = MentionListView()\\n            mentionView.onItemClickListener = onSuggestionSelectListener\\n            mutableListOf(TimelineView.contextTimeline(TIMELINE_CONTEXT, status, bundle), mentionView)\\n        } else {\\n            mutableListOf(findChildFragment(TimelineView::class.java), MentionListView())\\n        }\\n\\n        viewPager.setPagingEnabled(false)\\n        viewPager.offscreenPageLimit = 1\\n        viewPager.adapter = object : FragmentPagerAdapter(childFragmentManager) {\\n            override fun getItem(position: Int) = fragments[position]\\n\\n            override fun getCount() = fragments.size\\n        }\\n    }\\n\\n    private var keyboardListener: ViewTreeObserver.OnGlobalLayoutListener? = null\\n\\n    private fun setupKeyboard() {\\n        keyboardListener = KeyboardUtil.attach(activity, panelRoot, {\\n            // TODO: how comes the Exception: panelRootContainer must not be null\\n            panelRootContainer?.visibility =\\n                    if (it) {\\n                        if (replyEt.requestFocus()) replyEt.setSelection(replyEt.text.length)\\n                        View.VISIBLE\\n                    } else {\\n                        replyEt.clearFocus()\\n                        View.GONE\\n                    }\\n        })\\n    }\\n\\n    private val config = WordTokenizerConfig.Builder()\\n            .setExplicitChars(\\"@\\")\\n            .setThreshold(3)\\n            .setMaxNumKeywords(5)\\n            .setWordBreakChars(\\" \\").build()\\n\\n    private fun setupEditor() {\\n        replyEt.tokenizer = WordTokenizer(config)\\n        replyEt.setAvoidPrefixOnTap(true)\\n        replyEt.setQueryTokenReceiver(this)\\n        replyEt.setSuggestionsVisibilityManager(this)\\n        replyEt.setAvoidPrefixOnTap(true)\\n\\n        replyCommitButton.setOnClickListener { }\\n\\n        if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.O)\\n            textCountProgress.min = 0\\n        textCountProgress.max = STATUS_LIMIT\\n        replyEt.addTextChangedListener(object : TextWatcher {\\n            override fun afterTextChanged(s: Editable?) {\\n                onTextChanged(s?.length ?: 0)\\n            }\\n\\n            override fun beforeTextChanged(s: CharSequence?, start: Int, count: Int, after: Int) {\\n\\n            }\\n\\n            override fun onTextChanged(s: CharSequence?, start: Int, before: Int, count: Int) {\\n\\n            }\\n        })\\n    }\\n\\n\\n    /**\\n     * @param count \\u5b57\\u6570\\n     */\\n    private fun onTextChanged(count: Int) {\\n        textCountProgress.progress = count\\n        replyCommitButton.isEnabled = count in 1..STATUS_LIMIT\\n    }\\n\\n\\n    private val onSuggestionSelectListener = object : MentionListView.OnItemClickListener {\\n        override fun onItemClick(position: Int, item: Suggestible) {\\n            (item as Player).let {\\n                replyEt.insertMention(it)\\n                displaySuggestions(false)\\n                playerViewModel.updateMentionedAt(it) //\\n                onTextChanged(replyEt.text.length)\\n                replyEt.requestFocus()\\n                replyEt.setSelection(replyEt.text.length)\\n            }\\n        }\\n    }\\n\\n    @Suppress(\\"PrivatePropertyName\\")\\n    private val BUCKET = \\"player-mentioned\\"\\n\\n    override fun onQueryReceived(queryToken: QueryToken): MutableList<String> {\\n        val data = playerViewModel.filter(queryToken.keywords)\\n        onReceiveSuggestionsResult(SuggestionsResult(queryToken, data), BUCKET)\\n        return arrayOf(BUCKET).toMutableList()\\n    }\\n\\n    override fun onReceiveSuggestionsResult(result: SuggestionsResult, bucket: String) {\\n        val data = result.suggestions\\n        if (data?.isEmpty() != false) return\\n        displaySuggestions(true)\\n        findChildFragment(MentionListView::class.java).setData(data)\\n    }\\n\\n    override fun displaySuggestions(display: Boolean) {\\n        viewPager.setCurrentItem(if (display) 1 else 0, true)\\n    }\\n\\n    override fun isDisplayingSuggestions() = viewPager.currentItem == 1\\n\\n    override fun onBackPressedSupport(): Boolean {\\n        when {\\n            panelRootContainer.visibility == View.VISIBLE -> KeyboardUtil.hideKeyboard(panelRootContainer)\\n            isDisplayingSuggestions -> displaySuggestions(false)\\n            else -> pop()\\n        }\\n        return true\\n\\n    }\\n\\n    override fun onDestroy() {\\n        keyboardListener?.let { KeyboardUtil.detach(activity, it) }\\n        activity?.currentFocus?.let { KeyboardUtil.hideKeyboard(it) }\\n        super.onDestroy()\\n    }\\n\\n}",\n      "license": "mit"\n    },\n    "truncated_cells": []\n  }\n]',
    'NEGATIVE: DATASET_NAME: "vikp/starcoder_cleaned"\nFEATURES: {\'code\': {\'dtype\': \'string\', \'_type\': \'Value\'}, \'repo_path\': {\'dtype\': \'string\', \'_type\': \'Value\'}}\nDATA SAMPLE:\n[\n  {\n    "row_idx": 0,\n    "row": {\n      "code": "# ---\\n# jupyter:\\n#   jupytext:\\n#     text_representation:\\n#       extension: .py\\n#       format_name: light\\n#       format_version: \'1.5\'\\n#       jupytext_version: 1.14.4\\n#   kernelspec:\\n#     display_name: Python 3\\n#     language: python\\n#     name: python3\\n# ---\\n\\n# # 09 Strain Gage\\n#\\n# This is one of the most commonly used sensor.  It is used in many transducers.  Its fundamental operating principle is fairly easy to understand and it will be the purpose of this lecture. \\n#\\n# A strain gage is essentially a thin wire that is wrapped on film of plastic.  \\n# <img src=\\"img/StrainGage.png\\" width=\\"200\\">\\n# The strain gage is then mounted (glued) on the part for which the strain must be measured.  \\n# <img src=\\"img/Strain_gauge_2.jpg\\" width=\\"200\\">\\n#\\n# ## Stress, Strain\\n# When a beam is under axial load, the axial stress, $\\\\sigma_a$, is defined as:\\n# \\\\begin{align*}\\n# \\\\sigma_a = \\\\frac{F}{A}\\n# \\\\end{align*}\\n# with $F$ the axial load, and $A$ the cross sectional area of the beam under axial load.\\n#\\n# <img src=\\"img/BeamUnderStrain.png\\" width=\\"200\\">\\n#\\n# Under the load, the beam of length $L$ will extend by $dL$, giving rise to the definition of strain, $\\\\epsilon_a$:\\n# \\\\begin{align*}\\n# \\\\epsilon_a = \\\\frac{dL}{L}\\n# \\\\end{align*}\\n# The beam will also contract laterally: the cross sectional area is reduced by $dA$.  This results in a transverval strain $\\\\epsilon_t$.  The transversal and axial strains are related by the Poisson\'s ratio:\\n# \\\\begin{align*}\\n# \\\\nu = - \\\\frac{\\\\epsilon_t }{\\\\epsilon_a}\\n# \\\\end{align*}\\n# For a metal the Poission\'s ratio is typically $\\\\nu = 0.3$, for an incompressible material, such as rubber (or water), $\\\\nu = 0.5$.\\n#\\n# Within the elastic limit, the axial stress and axial strain are related through Hooke\'s law by the Young\'s modulus, $E$:\\n# \\\\begin{align*}\\n# \\\\sigma_a = E \\\\epsilon_a\\n# \\\\end{align*}\\n#\\n# <img src=\\"img/ElasticRegime.png\\" width=\\"200\\">\\n\\n# ## Resistance of a wire\\n#\\n# The electrical resistance of a wire $R$ is related to its physical properties (the electrical resistiviy, $\\\\rho$ in $\\\\Omega$/m) and its geometry: length $L$ and cross sectional area $A$.\\n#\\n# \\\\begin{align*}\\n# R = \\\\frac{\\\\rho L}{A}\\n# \\\\end{align*}\\n#\\n# Mathematically, the change in wire dimension will result inchange in its electrical resistance.  This can be derived from first principle:\\n# \\\\begin{align}\\n# \\\\frac{dR}{R} = \\\\frac{d\\\\rho}{\\\\rho} + \\\\frac{dL}{L} - \\\\frac{dA}{A}\\n# \\\\end{align}\\n# If the wire has a square cross section, then:\\n# \\\\begin{align*}\\n# A & = L\'^2 \\\\\\\\\\n# \\\\frac{dA}{A} & = \\\\frac{d(L\'^2)}{L\'^2} = \\\\frac{2L\'dL\'}{L\'^2} = 2 \\\\frac{dL\'}{L\'}\\n# \\\\end{align*}\\n# We have related the change in cross sectional area to the transversal strain.\\n# \\\\begin{align*}\\n# \\\\epsilon_t = \\\\frac{dL\'}{L\'}\\n# \\\\end{align*}\\n# Using the Poisson\'s ratio, we can relate then relate the change in cross-sectional area ($dA/A$) to axial strain $\\\\epsilon_a = dL/L$.\\n# \\\\begin{align*}\\n# \\\\epsilon_t &= - \\\\nu \\\\epsilon_a \\\\\\\\\\n# \\\\frac{dL\'}{L\'} &= - \\\\nu \\\\frac{dL}{L} \\\\; \\\\text{or}\\\\\\\\\\n# \\\\frac{dA}{A} & = 2\\\\frac{dL\'}{L\'} = -2 \\\\nu \\\\frac{dL}{L}\\n# \\\\end{align*}\\n# Finally we can substitute express $dA/A$ in eq. for $dR/R$ and relate change in resistance to change of wire geometry, remembering that for a metal $\\\\nu =0.3$:\\n# \\\\begin{align}\\n# \\\\frac{dR}{R} & = \\\\frac{d\\\\rho}{\\\\rho} + \\\\frac{dL}{L} - \\\\frac{dA}{A} \\\\\\\\\\n# & = \\\\frac{d\\\\rho}{\\\\rho} + \\\\frac{dL}{L} - (-2\\\\nu \\\\frac{dL}{L}) \\\\\\\\\\n# & = \\\\frac{d\\\\rho}{\\\\rho} + 1.6 \\\\frac{dL}{L} = \\\\frac{d\\\\rho}{\\\\rho} + 1.6 \\\\epsilon_a\\n# \\\\end{align}\\n# It also happens that for most metals, the resistivity increases with axial strain.  In general, one can then related the change in resistance to axial strain by defining the strain gage factor:\\n# \\\\begin{align}\\n# S = 1.6 + \\\\frac{d\\\\rho}{\\\\rho}\\\\cdot \\\\frac{1}{\\\\epsilon_a}\\n# \\\\end{align}\\n# and finally, we have:\\n# \\\\begin{align*}\\n# \\\\frac{dR}{R} =  S \\\\epsilon_a\\n# \\\\end{align*}\\n# $S$ is materials dependent and is typically equal to 2.0 for most commercially availabe strain gages. It is dimensionless.\\n#\\n# Strain gages are made of thin wire that is wraped in several loops, effectively increasing the length of the wire and therefore the sensitivity of the sensor.\\n#\\n# _Question:\\n#\\n# Explain why a longer wire is necessary to increase the sensitivity of the sensor_.\\n#\\n# Most commercially available strain gages have a nominal resistance (resistance under no load, $R_{ini}$) of 120 or 350 $\\\\Omega$.\\n#\\n# Within the elastic regime, strain is typically within the range $10^{-6} - 10^{-3}$, in fact strain is expressed in unit of microstrain, with a 1 microstrain = $10^{-6}$.  Therefore, changes in resistances will be of the same order.  If one were to measure resistances, we will need a dynamic range of 120 dB, whih is typically very expensive.  Instead, one uses the Wheatstone bridge to transform the change in resistance to a voltage, which is easier to measure and does not require such a large dynamic range.\\n\\n# ## Wheatstone bridge:\\n# <img src=\\"img/WheatstoneBridge.png\\" width=\\"200\\">\\n#\\n# The output voltage is related to the difference in resistances in the bridge:\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\frac{R_1R_3-R_2R_4}{(R_1+R_4)(R_2+R_3)}\\n# \\\\end{align*}\\n#\\n# If the bridge is balanced, then $V_o = 0$, it implies: $R_1/R_2 = R_4/R_3$.\\n#\\n# In practice, finding a set of resistors that balances the bridge is challenging, and a potentiometer is used as one of the resistances to do minor adjustement to balance the bridge.  If one did not do the adjustement (ie if we did not zero the bridge) then all the measurement will have an offset or bias that could be removed in a post-processing phase, as long as the bias stayed constant.\\n#\\n# If each resistance $R_i$ is made to vary slightly around its initial value, ie $R_i = R_{i,ini} + dR_i$.  For simplicity, we will assume that the initial value of the four resistances are equal, ie $R_{1,ini} = R_{2,ini} = R_{3,ini} = R_{4,ini} = R_{ini}$.  This implies that the bridge was initially balanced, then the output voltage would be:\\n#\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\frac{1}{4} \\\\left( \\\\frac{dR_1}{R_{ini}} - \\\\frac{dR_2}{R_{ini}} + \\\\frac{dR_3}{R_{ini}} - \\\\frac{dR_4}{R_{ini}} \\\\right)\\n# \\\\end{align*}\\n#\\n# Note here that the changes in $R_1$ and $R_3$ have a positive effect on $V_o$, while the changes in $R_2$ and $R_4$ have a negative effect on $V_o$.  In practice, this means that is a beam is a in tension, then a strain gage mounted on the branch 1 or 3 of the Wheatstone bridge will produce a positive voltage, while a strain gage mounted on branch 2 or 4 will produce a negative voltage.  One takes advantage of this to increase sensitivity to measure strain.\\n#\\n# ### Quarter bridge\\n# One uses only one quarter of the bridge, ie strain gages are only mounted on one branch of the bridge.\\n#\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\pm \\\\frac{1}{4} \\\\epsilon_a S\\n# \\\\end{align*}\\n# Sensitivity, $G$:\\n# \\\\begin{align*}\\n# G = \\\\frac{V_o}{\\\\epsilon_a} = \\\\pm \\\\frac{1}{4}S V_s\\n# \\\\end{align*}\\n#\\n#\\n# ### Half bridge\\n# One uses half of the bridge, ie strain gages are mounted on two branches of the bridge.\\n#\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\pm \\\\frac{1}{2} \\\\epsilon_a S\\n# \\\\end{align*}\\n#\\n# ### Full bridge\\n#\\n# One uses of the branches of the bridge, ie strain gages are mounted on each branch.\\n#\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} = \\\\pm \\\\epsilon_a S\\n# \\\\end{align*}\\n#\\n# Therefore, as we increase the order of bridge, the sensitivity of the instrument increases.  However, one should be carefull how we mount the strain gages as to not cancel out their measurement.\\n\\n# _Exercise_\\n#\\n# 1- Wheatstone bridge\\n#\\n# <img src=\\"img/WheatstoneBridge.png\\" width=\\"200\\">\\n#\\n# > How important is it to know \\\\& match the resistances of the resistors you employ to create your bridge?\\n# > How would you do that practically?\\n# > Assume $R_1=120\\\\,\\\\Omega$, $R_2=120\\\\,\\\\Omega$, $R_3=120\\\\,\\\\Omega$, $R_4=110\\\\,\\\\Omega$, $V_s=5.00\\\\,\\\\text{V}$.  What is $V_\\\\circ$?\\n\\nVs = 5.00\\nVo = (120**2-120*110)/(230*240) * Vs\\nprint(\'Vo = \',Vo, \' V\')\\n\\n# typical range in strain a strain gauge can measure\\n# 1 -1000 micro-Strain\\nAxialStrain = 1000*10**(-6)  # axial strain\\nStrainGageFactor = 2\\nR_ini = 120 # Ohm\\nR_1 = R_ini+R_ini*StrainGageFactor*AxialStrain\\nprint(R_1)\\nVo = (120**2-120*(R_1))/((120+R_1)*240) * Vs\\nprint(\'Vo = \', Vo, \' V\')\\n\\n# > How important is it to know \\\\& match the resistances of the resistors you employ to create your bridge?\\n# > How would you do that practically?\\n# > Assume $R_1= R_2 =R_3=120\\\\,\\\\Omega$, $R_4=120.01\\\\,\\\\Omega$, $V_s=5.00\\\\,\\\\text{V}$.  What is $V_\\\\circ$?\\n\\nVs = 5.00\\nVo = (120**2-120*120.01)/(240.01*240) * Vs\\nprint(Vo)\\n\\n# 2- Strain gage 1:\\n#\\n# One measures the strain on a bridge steel beam.  The modulus of elasticity is $E=190$ GPa.  Only one strain gage is mounted on the bottom of the beam; the strain gage factor is $S=2.02$.\\n#\\n# > a) What kind of electronic circuit will you use?  Draw a sketch of it.\\n#\\n# > b) Assume all your resistors including the unloaded strain gage are balanced and measure $120\\\\,\\\\Omega$, and that the strain gage is at location $R_2$.  The supply voltage is $5.00\\\\,\\\\text{VDC}$.  Will $V_\\\\circ$ be positive or negative when a downward load is added?\\n\\n# In practice, we cannot have all resistances = 120 $\\\\Omega$.  at zero load, the bridge will be unbalanced (show $V_o \\\\neq 0$). How could we balance our bridge?\\n#\\n# Use a potentiometer to balance bridge, for the load cell, we \'\'zero\'\' the instrument.\\n#\\n# Other option to zero-out our instrument? Take data at zero-load, record the voltage, $V_{o,noload}$.  Substract $V_{o,noload}$ to my data.\\n\\n# > c) For a loading in which $V_\\\\circ = -1.25\\\\,\\\\text{mV}$, calculate the strain $\\\\epsilon_a$ in units of microstrain.\\n\\n# \\\\begin{align*}\\n# \\\\frac{V_o}{V_s} & = - \\\\frac{1}{4} \\\\epsilon_a S\\\\\\\\\\n# \\\\epsilon_a & = -\\\\frac{4}{S} \\\\frac{V_o}{V_s}\\n# \\\\end{align*}\\n\\nS = 2.02\\nVo = -0.00125\\nVs = 5\\neps_a = -1*(4/S)*(Vo/Vs)\\nprint(eps_a)\\n\\n# > d) Calculate the axial stress (in MPa) in the beam under this load.\\n\\n\\n\\n# > e) You now want more sensitivity in your measurement, you install a second strain gage on to\\n\\n# p of the beam.  Which resistor should you use for this second active strain gage?\\n#\\n# > f) With this new setup and the same applied load than previously, what should be the output voltage?\\n\\n# 3- Strain Gage with Long Lead Wires \\n#\\n# <img src=\\"img/StrainGageLongWires.png\\" width=\\"360\\">\\n#\\n# A quarter bridge strain gage Wheatstone bridge circuit is constructed with $120\\\\,\\\\Omega$ resistors and a $120\\\\,\\\\Omega$ strain gage.  For this practical application, the strain gage is located very far away form the DAQ station and the lead wires to the strain gage are $10\\\\,\\\\text{m}$ long and the lead wire have a resistance of $0.080\\\\,\\\\Omega/\\\\text{m}$.  The lead wire resistance can lead to problems since $R_{lead}$ changes with temperature.\\n#\\n# > Design a modified circuit that will cancel out the effect of the lead wires.\\n\\n# ## Homework\\n#\\n",\n      "repo_path": "Lectures/09_StrainGage.ipynb"\n    },\n    "truncated_cells": []\n  },\n  {\n    "row_idx": 1,\n    "row": {\n      "code": "# ---\\n# jupyter:\\n#   jupytext:\\n#     split_at_heading: true\\n#     text_representation:\\n#       extension: .py\\n#       format_name: light\\n#       format_version: \'1.5\'\\n#       jupytext_version: 1.14.4\\n#   kernelspec:\\n#     display_name: Python 3\\n#     language: python\\n#     name: python3\\n# ---\\n\\n#export\\nfrom fastai.basics import *\\nfrom fastai.tabular.core import *\\nfrom fastai.tabular.model import *\\n\\nfrom fastai.tabular.data import *\\n\\n#hide\\nfrom nbdev.showdoc import *\\n\\n\\n# +\\n#default_exp tabular.learner\\n# -\\n\\n# # Tabular learner\\n#\\n# > The function to immediately get a `Learner` ready to train for tabular data\\n\\n# The main function you probably want to use in this module is `tabular_learner`. It will automatically create a `TabulaModel` suitable for your data and infer the irght loss function. See the [tabular tutorial](http://docs.fast.ai/tutorial.tabular) for an example of use in context.\\n\\n# ## Main functions\\n\\n#export\\n@log_args(but_as=Learner.__init__)\\nclass TabularLearner(Learner):\\n    \\"`Learner` for tabular data\\"\\n    def predict(self, row):\\n        tst_to = self.dls.valid_ds.new(pd.DataFrame(row).T)\\n        tst_to.process()\\n        tst_to.conts = tst_to.conts.astype(np.float32)\\n        dl = self.dls.valid.new(tst_to)\\n        inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)\\n        i = getattr(self.dls, \'n_inp\', -1)\\n        b = (*tuplify(inp),*tuplify(dec_preds))\\n        full_dec = self.dls.decode((*tuplify(inp),*tuplify(dec_preds)))\\n        return full_dec,dec_preds[0],preds[0]\\n\\n\\nshow_doc(TabularLearner, title_level=3)\\n\\n\\n# It works exactly as a normal `Learner`, the only difference is that it implements a `predict` method specific to work on a row of data.\\n\\n#export\\n@log_args(to_return=True, but_as=Learner.__init__)\\n@delegates(Learner.__init__)\\ndef tabular_learner(dls, layers=None, emb_szs=None, config=None, n_out=None, y_range=None, **kwargs):\\n    \\"Get a `Learner` using `dls`, with `metrics`, including a `TabularModel` created using the remaining params.\\"\\n    if config is None: config = tabular_config()\\n    if layers is None: layers = [200,100]\\n    to = dls.train_ds\\n    emb_szs = get_emb_sz(dls.train_ds, {} if emb_szs is None else emb_szs)\\n    if n_out is None: n_out = get_c(dls)\\n    assert n_out, \\"`n_out` is not defined, and could not be infered from data, set `dls.c` or pass `n_out`\\"\\n    if y_range is None and \'y_range\' in config: y_range = config.pop(\'y_range\')\\n    model = TabularModel(emb_szs, len(dls.cont_names), n_out, layers, y_range=y_range, **config)\\n    return TabularLearner(dls, model, **kwargs)\\n\\n\\n# If your data was built with fastai, you probably won\'t need to pass anything to `emb_szs` unless you want to change the default of the library (produced by `get_emb_sz`), same for `n_out` which should be automatically inferred. `layers` will default to `[200,100]` and is passed to `TabularModel` along with the `config`.\\n#\\n# Use `tabular_config` to create a `config` and cusotmize the model used. There is just easy access to `y_range` because this argument is often used.\\n#\\n# All the other arguments are passed to `Learner`.\\n\\npath = untar_data(URLs.ADULT_SAMPLE)\\ndf = pd.read_csv(path/\'adult.csv\')\\ncat_names = [\'workclass\', \'education\', \'marital-status\', \'occupation\', \'relationship\', \'race\']\\ncont_names = [\'age\', \'fnlwgt\', \'education-num\']\\nprocs = [Categorify, FillMissing, Normalize]\\ndls = TabularDataLoaders.from_df(df, path, procs=procs, cat_names=cat_names, cont_names=cont_names, \\n                                 y_names=\\"salary\\", valid_idx=list(range(800,1000)), bs=64)\\nlearn = tabular_learner(dls)\\n\\n#hide\\ntst = learn.predict(df.iloc[0])\\n\\n# +\\n#hide\\n#test y_range is passed\\nlearn = tabular_learner(dls, y_range=(0,32))\\nassert isinstance(learn.model.layers[-1], SigmoidRange)\\ntest_eq(learn.model.layers[-1].low, 0)\\ntest_eq(learn.model.layers[-1].high, 32)\\n\\nlearn = tabular_learner(dls, config = tabular_config(y_range=(0,32)))\\nassert isinstance(learn.model.layers[-1], SigmoidRange)\\ntest_eq(learn.model.layers[-1].low, 0)\\ntest_eq(learn.model.layers[-1].high, 32)\\n\\n\\n# -\\n\\n#export\\n@typedispatch\\ndef show_results(x:Tabular, y:Tabular, samples, outs, ctxs=None, max_n=10, **kwargs):\\n    df = x.all_cols[:max_n]\\n    for n in x.y_names: df[n+\'_pred\'] = y[n][:max_n].values\\n    display_df(df)\\n\\n\\n# ## Export -\\n\\n#hide\\nfrom nbdev.export import notebook2script\\nnotebook2script()\\n\\n\\n",\n      "repo_path": "nbs/43_tabular.learner.ipynb"\n    },\n    "truncated_cells": []\n  }\n]',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 1.0
dot_accuracy 0.0
manhattan_accuracy 1.0
euclidean_accuracy 1.0
max_accuracy 1.0

Training Details

Training Dataset

query-to-dataset-viewer-descriptions

  • Dataset: query-to-dataset-viewer-descriptions
  • Size: 1,141 training samples
  • Columns: query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    query positive negative
    type string string string
    details
    • min: 9 tokens
    • mean: 11.72 tokens
    • max: 19 tokens
    • min: 40 tokens
    • mean: 2018.88 tokens
    • max: 8192 tokens
    • min: 41 tokens
    • mean: 2125.25 tokens
    • max: 8192 tokens
  • Samples:
    query positive negative
    USER_QUERY: LLM paper dataset HUB_DATASET_PREVIEW: DATASET_NAME: "MarkrAI/AutoRAG-evaluation-2024-LLM-paper-v1"
    FEATURES: {'doc_id': {'dtype': 'string', '_type': 'Value'}, 'contents': {'dtype': 'string', '_type': 'Value'}, 'metadata': {'creation_datetime': {'dtype': 'string', '_type': 'Value'}, 'file_name': {'dtype': 'string', '_type': 'Value'}, 'file_path': {'dtype': 'string', '_type': 'Value'}, 'file_size': {'dtype': 'int64', '_type': 'Value'}, 'file_type': {'dtype': 'null', '_type': 'Value'}, 'last_accessed_datetime': {'dtype': 'string', '_type': 'Value'}, 'last_modified_datetime': {'dtype': 'string', '_type': 'Value'}}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "doc_id": "6f86094c-47fe-43de-a77a-e8c34c69c997",
    "contents": "# Rag-Driver: Generalisable Driving Explanations With Retrieval-Augmented In-Context Learning In Multi-Modal Large Language Model\n\nJianhao Yuan1, Shuyang Sun1, Daniel Omeiza1, Bo Zhao2, Paul Newman1, Lars Kunze1, Matthew Gadd1\n1 University of Oxford 2 Beijing Academy of Artificial Intelligence\n{jianhaoyuan,kevinsun,daniel,pnewman,lars,mattgadd}@robots.ox.ac.uk \nAbstract\u2014Robots powered by 'blackbox' models need to provide\nhuman-understandable explanations which we can trust. Hence,\nexplainability plays a critical role in trustworthy autonomous\ndecision-making to foster transparency and acceptance among\nend users, especially in complex autonomous driving. Recent\nadvancements in Multi-Modal Large Language models (MLLMs)\nhave shown promising potential in enhancing the explainability\nas a driving agent by producing control predictions along with\nnatural language explanations. However, severe data scarcity\ndue to expensive annotation costs and significant domain gaps\nbetween different datasets makes the development of a robust and\ngeneralisable system an extremely challenging task. Moreover, the\nprohibitively expensive training requirements of MLLM and the\nunsolved problem of catastrophic forgetting further limit their\ngeneralisability post-deployment. To address these challenges, we\npresent RAG-Driver, a novel retrieval-augmented multi-modal\nlarge language model that leverages in-context learning for high-\nperformance, explainable, and generalisable autonomous driving.\nBy grounding in retrieved expert demonstration, we empirically\nvalidate that RAG-Driver achieves state-of-the-art performance in\nproducing driving action explanations, justifications, and control\nsignal prediction. More importantly, it exhibits exceptional zero-\nshot generalisation capabilities to unseen environments without \nfurther training endeavours1.\nIndex Terms\u2014Autonomous driving, multi-modal language\nmodel, end-to-end driving, domain generalisation",
    "metadata": {
    "creation_datetime": "2024-03-04",
    "file_name": "2402.10828v1.md",
    "file_path": "paper_data/2402.10828v1.md",
    "file_size": 64885,
    "file_type": null,
    "last_accessed_datetime": "2024-03-04",
    "last_modified_datetime": "2024-02-22"
    }
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "doc_id": "cf485ad0-8ec4-4a63-a0c6-5d7eb499c0c8",
    "contents": "# Rag-Driver: Generalisable Driving Explanations With Retrieval-Augmented In-Context Learning In Multi-Modal Large Language Model\n## I. Introduction\n\nDriven by the emerging development of deep learning, autonomous driving has observed a paradigm shift from rulesbased decision systems [66, 21] to data-driven learning-based approaches [28, 6, 36]. However, this comes at the cost of transparency in decision-making, especially for end-to-end autonomous driving systems which are considered black-box in nature [13]. Thus, in addition to precision in action control, explanation provision is key in ensuring trustworthy decisionmaking to reconcile the system's decisions with end-user expectations to foster confidence and acceptance [79, 8, 57] in dynamic driving environments. \nTraditional approaches have mainly relied on attention visualisation [5, 7, 55] as a proxy to rationalise the decisions of the black-box systems or auxiliary intermediate tasks such as semantic segmentation [25, 32], object detection [16, 31], and affordance prediction [68, 45] provide meaningful intermediate representation for decision-making. However, these methods do not engage end-users in the dialogue as they are onedirectional and not readily comprehensible by the general users for the purpose of fostering trust and confidence. An alternative promising approach is the integration of natural language explanations [38, 33, 54], in particular through Multi-Modal Large Language Models (MLLMs) [1, 70]. These models, pretrained on extensive web-scale datasets, demonstrate remarkable reasoning capacity, enabling the transformation of complex vehicular decision-making processes into more understandable narrative formats, thereby offering a new layer of explainability to conventional systems. \nWhile several early attempts have demonstrated the potential of MLLMs as general explainable driving agents [78, 76, 51], these methods fall short of human-level understanding. One of the limitations is their failure to generalise to unseen environments. A primary obstacle is the lack of high-quality annotated data [56], coupled with the significant domain shift across various datasets [23], which hinders the models' generalisation capacity to novel environments outside of the training data distribution. Another critical challenge is the prohibitively expensive training requirement and the unsolved problem of catastrophic forgetting [39], which make re-training or finetuning impractical solutions due to the immense computational demands and severe performance degradation. Consequently, this further limits the models' generalisability after deployment, as they struggle to effectively utilise new data in constantly evolving environments and driving scenarios. \nTo address these challenges, we introduce RAG-Driver, a novel retrieval-augment",
    "metadata": {
    "creation_datetime": "2024-03-04",
    "file_name": "2402.10828v1.md",
    "file_path": "paper_data/2402.10828v1.md",
    "file_size": 64885,
    "file_type": null,
    "last_accessed_datetime": "2024-03-04",
    "last_modified_datetime": "2024-02-22"
    }
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "emozilla/dolma-v1_7-arxiv"
    FEATURES: {'text': {'dtype': 'string', '_type': 'Value'}, 'id': {'dtype': 'string', '_type': 'Value'}, 'metadata': {'file_path': {'dtype': 'string', '_type': 'Value'}}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "text": "\section{Introduction}\nLet $G$ be a simple undirected graph with the \textit{vertex set} $V(G)$ and the \textit{edge set} $E(G)$. A vertex with degree one is called a \textit{pendant vertex}. The distance between the vertices $u$ and $v$ in graph $G$ is denoted by $d_G(u,v)$. A cycle $C$ is called \textit{chordless} if $C$ has no \textit{cycle chord} (that is an edge not in the edge set of $C$ whose endpoints lie on the vertices of $C$).\nThe \textit{Induced subgraph} on vertex set $S$ is denoted by $\langle S\rangle$. A path that starts in $v$ and ends in $u$ is denoted by $\stackrel\frown{v u}$.\nA \textit{traceable} graph is a graph that possesses a Hamiltonian path.\nIn a graph $G$, we say that a cycle $C$ is \textit{formed by the path} $Q$ if $
    USER_QUERY: code vulnerability dataset HUB_DATASET_PREVIEW: DATASET_NAME: "benjis/bigvul"
    FEATURES: {'CVE ID': {'dtype': 'string', '_type': 'Value'}, 'CVE Page': {'dtype': 'string', '_type': 'Value'}, 'CWE ID': {'dtype': 'string', 'type': 'Value'}, 'codeLink': {'dtype': 'string', 'type': 'Value'}, 'commit_id': {'dtype': 'string', 'type': 'Value'}, 'commit_message': {'dtype': 'string', 'type': 'Value'}, 'func_after': {'dtype': 'string', 'type': 'Value'}, 'func_before': {'dtype': 'string', 'type': 'Value'}, 'lang': {'dtype': 'string', 'type': 'Value'}, 'project': {'dtype': 'string', 'type': 'Value'}, 'vul': {'dtype': 'int8', 'type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "CVE ID": "CVE-2017-7586",
    "CVE Page": "https://www.cvedetails.com/cve/CVE-2017-7586/",
    "CWE ID": "CWE-119",
    "codeLink": "https://github.com/erikd/libsndfile/commit/708e996c87c5fae77b104ccfeb8f6db784c32074",
    "commit_id": "708e996c87c5fae77b104ccfeb8f6db784c32074",
    "commit_message": "src/ : Move to a variable length header buffer\n\nPreviously, the psf->header buffer was a fixed length specified by\nSF_HEADER_LEN which was set to 12292. This was problematic for\ntwo reasons; this value was un-necessarily large for the majority\nof files and too small for some others.\n\nNow the size of the header buffer starts at 256 bytes and grows as\nnecessary up to a maximum of 100k.",
    "func_after": "psf_get_date_str (char *str, int maxlen)\n{\ttime_t\t\tcurrent ;\n\tstruct tm\ttimedata, tmptr ;\n\n\ttime (¤t) ;\n\n#if defined (HAVE_GMTIME_R)\n\t/ If the re-entrant version is available, use it. /\n\ttmptr = gmtime_r (¤t, &timedata) ;\n#elif defined (HAVE_GMTIME)\n\t/ Otherwise use the standard one and copy the data to local storage. /\n\ttmptr = gmtime (¤t) ;\n\tmemcpy (&timedata, tmptr, sizeof (timedata)) ;\n#else\n\ttmptr = NULL ;\n#endif\n\n\tif (tmptr)\n\t\tsnprintf (str, maxlen, "%4d-%02d-%02d %02d:%02d:%02d UTC",\n\t\t\t1900 + timedata.tm_year, timedata.tm_mon, timedata.tm_mday,\n\t\t\ttimedata.tm_hour, timedata.tm_min, timedata.tm_sec) ;\n\telse\n\t\tsnprintf (str, maxlen, "Unknown date") ;\n\n\treturn ;\n} / psf_get_date_str */\n",
    "func_before": "psf_get_date_str (char *str, int maxlen)\n{\ttime_t\t\tcurrent ;\n\tstruct tm\ttimedata, tmptr ;\n\n\ttime (¤t) ;\n\n#if defined (HAVE_GMTIME_R)\n\t/ If the re-entrant version is available, use it. /\n\ttmptr = gmtime_r (¤t, &timedata) ;\n#elif defined (HAVE_GMTIME)\n\t/ Otherwise use the standard one and copy the data to local storage. /\n\ttmptr = gmtime (¤t) ;\n\tmemcpy (&timedata, tmptr, sizeof (timedata)) ;\n#else\n\ttmptr = NULL ;\n#endif\n\n\tif (tmptr)\n\t\tsnprintf (str, maxlen, "%4d-%02d-%02d %02d:%02d:%02d UTC",\n\t\t\t1900 + timedata.tm_year, timedata.tm_mon, timedata.tm_mday,\n\t\t\ttimedata.tm_hour, timedata.tm_min, timedata.tm_sec) ;\n\telse\n\t\tsnprintf (str, maxlen, "Unknown date") ;\n\n\treturn ;\n} / psf_get_date_str */\n",
    "lang": "C",
    "project": "libsndfile",
    "vul": 0
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "CVE ID": "CVE-2018-18352",
    "CVE Page": "https://www.cvedetails.com/cve/CVE-2018-18352/",
    "CWE ID": "CWE-732",
    "codeLink": "https://github.com/chromium/chromium/commit/a9cbaa7a40e2b2723cfc2f266c42f4980038a949",
    "commit_id": "a9cbaa7a40e2b2723cfc2f266c42f4980038a949",
    "commit_message": "Simplify "WouldTaintOrigin" concept in media/blink\n\nCurrently WebMediaPlayer has three predicates:\n - DidGetOpaqueResponseFromServiceWorker\n - HasSingleSecurityOrigin\n - DidPassCORSAccessCheck\n. These are used to determine whether the response body is available\nfor scripts. They are known to be confusing, and actually\nMediaElementAudioSourceHandler::WouldTaintOrigin misuses them.\n\nThis CL merges the three predicates to one, WouldTaintOrigin, to remove\nthe confusion. Now the "response type" concept is available and we\ndon't need a custom CORS check, so this CL removes\nBaseAudioContext::WouldTaintOrigin. This CL also renames\nURLData::has_opaque_data
    and its (direct and indirect) data accessors\nto match the spec.\n\nBug: 849942, 875153\nChange-Id: I6acf50169d7445c4ff614e80ac606f79ee577d2a\nReviewed-on: https://chromium-review.googlesource.com/c/1238098\nReviewed-by: Fredrik Hubinette hubbe@chromium.org\nReviewed-by: Kinuko Yasuda kinuko@chromium.org\nReviewed-by: Raymond Toy rtoy@chromium.org\nCommit-Queue: Yutaka Hirano yhirano@chromium.org\nCr-Commit-Position: refs/heads/master@{#598258}",
    "func_after": "void MultibufferDataSource::CreateResourceLoader(int64_t first_byte_position,\n int64_t last_byte_position) {\n DCHECK(render_task_runner
    ->BelongsToCurrentThread());\n\n SetReader(new MultiBufferReader(\n url_data()->multibuffer(), first_byte_position, last_byte_position,\n base::Bind(&MultibufferDataSource::ProgressCallback, weak_ptr
    )));\n reader
    ->SetIsClientAudioElement(is_client_audio_element
    );\n UpdateBufferSizes();\n}\n",
    "func_before": "void MultibufferDataSource::CreateResourceLoader(int64_t first_byte_position,\n int64_t last_byte_position) {\n DCHECK(render_task_runner
    ->BelongsToCurrentThread());\n\n SetReader(new MultiBufferReader(\n url_data()->multibuffer(), first_byte_position, last_byte_position,\n base::Bind(&MultibufferDataSource::ProgressCallback, weak_ptr
    )));\n reader
    ->SetIsClientAudioElement(is_client_audio_element
    );\n UpdateBufferSizes();\n}\n",
    "lang": "C",
    "project": "Chrome",
    "vul": 0
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "sfakhoury/NL2Fix"
    FEATURES: {'defects4j_project': {'dtype': 'string', '_type': 'Value'}, 'defects4j_bug_id': {'dtype': 'string', '_type': 'Value'}, 'file_path': {'dtype': 'string', '_type': 'Value'}, 'bug_start_line': {'dtype': 'string', '_type': 'Value'}, 'bug_end_line': {'dtype': 'string', '_type': 'Value'}, 'issue_title': {'dtype': 'string', '_type': 'Value'}, 'issue_description': {'dtype': 'string', '_type': 'Value'}, 'original_src': {'dtype': 'string', '_type': 'Value'}, 'original_src_wo_comments': {'dtype': 'string', '_type': 'Value'}, 'fixed_src': {'dtype': 'string', '_type': 'Value'}, 'fixed_src_wo_comments': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "defects4j_project": "Math",
    "defects4j_bug_id": "19",
    "file_path": "src/main/java/org/apache/commons/math3/optimization/direct/CMAESOptimizer.java",
    "bug_start_line": "504",
    "bug_end_line": "561",
    "issue_title": "Wide bounds to CMAESOptimizer result in NaN parameters passed to fitness function",
    "issue_description": "If you give large values as lower/upper bounds (for example -Double.MAX_VALUE as a lower bound), the optimizer can call the fitness function with parameters set to NaN. My guess is this is due to FitnessFunction.encode/decode generating NaN when normalizing/denormalizing parameters. For example, if the difference between the lower and upper bound is greater than Double.MAX_VALUE, encode could divide infinity by infinity.",
    "original_src": "private void checkParameters() {\n final double[] init = getStartPoint();\n final double[] lB = getLowerBound();\n final double[] uB = getUpperBound();\n\n // Checks whether there is at least one finite bound value.\n boolean hasFiniteBounds = false;\n for (int i = 0; i < lB.length; i++) {\n if (!Double.isInfinite(lB[i])
    USER_QUERY: english korean translation dataset HUB_DATASET_PREVIEW: DATASET_NAME: "yoonjae22/Aihub_translate"
    FEATURES: {'instruction': {'dtype': 'string', '_type': 'Value'}, 'output': {'dtype': 'string', '_type': 'Value'}, 'text': {'dtype': 'string', '_type': 'Value'}, 'input': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "input": "Bible Coloring' is a coloring application that allows you to experience beautiful stories in the Bible.",
    "output": "'Bible Coloring'\uc740 \uc131\uacbd\uc758 \uc544\ub984\ub2e4\uc6b4 \uc774\uc57c\uae30\ub97c \uccb4\ud5d8 \ud560 \uc218 \uc788\ub294 \uceec\ub7ec\ub9c1 \uc571\uc785\ub2c8\ub2e4.",
    "instruction": "Please translate the English sentence into Korean.",
    "text": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nBible Coloring' is a coloring application that allows you to experience beautiful stories in the Bible.\n\n###Response:\n'Bible Coloring'\uc740 \uc131\uacbd\uc758 \uc544\ub984\ub2e4\uc6b4 \uc774\uc57c\uae30\ub97c \uccb4\ud5d8 \ud560 \uc218 \uc788\ub294 \uceec\ub7ec\ub9c1 \uc571\uc785\ub2c8\ub2e4."
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "input": "Do you work at a City bank?",
    "output": "\uc528\ud2f0\uc740\ud589\uc5d0\uc11c \uc77c\ud558\uc138\uc694?",
    "instruction": "Please translate the English sentence into Korean.",
    "text": "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\nDo you work at a City bank?\n\n###Response:\n\uc528\ud2f0\uc740\ud589\uc5d0\uc11c \uc77c\ud558\uc138\uc694?"
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "werty1248/EnKo-Translation-LongTextOnly-dedup"
    FEATURES: {'english': {'dtype': 'string', '_type': 'Value'}, 'korean': {'dtype': 'string', '_type': 'Value'}, 'from': {'dtype': 'string', '_type': 'Value'}, 'category': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "english": "ROOFTOP GREENING STRUCTURETo provide a structure firmly and easily installing a house cultivation arch-like aggregate in a rooftop greening structure. This rooftop greening structure includes pressingly fixing each of support stands 24 in each of support stand line groups 24A to a rooftop slab surface through a greening support layer 6, using a fastener which pierces into the greening support layer 6, and steps over between each of the support stands 24 and the rooftop slab surface 2, and installing a holding member 36 for holding a house cultivation arch-like aggregate 50 each on the upper end surface of each of the support stands 24 in each of the support stand line groups 24A. As a result of this, the support stand 24 which has stiffness higher than the greening support layer 6, and is firmly fixed to the rooftop slab surface 2 through the greening support layer 6 is used for holding the end part of the arch-like aggregate 50. The holding member 36 for holding the end part of the arch-like aggregate 50 is installed on the upper end surface of the support stand 24 so as to suppress the holding member 36 from burying in soil and increase the visibility.In a rooftop greening structure in which a greening support layer is formed by laying a plurality of greening support panels on the rooftop floor and soil is arranged on the greening support layer, a pair of support stands are placed on the greening support layer. The rows are arranged so as to be separated from each other, and each of the support rows is arranged upright so as to form a row with a plurality of supports having higher rigidity than the greening support layer. Each support pedestal in each support pedestal row group is configured through the greening support layer by using a fastener that penetrates the greening support layer and straddles between each support pedestal and the rooftop floor surface. It is characterized in that it is pressed and fixed to the rooftop floor surface, and the upper end surface of each support stand in each support stand row group is provided with a holding portion for holding an arch-shaped aggregate for house cultivation. Rooftop greening structure.",
    "korean": "\uc625\uc0c1 \ub179\ud654 \uad6c\uc870\uc625\uc0c1 \ub179\ud654 \uad6c\uc870\uc5d0 \uc788\uc5b4\uc11c \ud558\uc6b0\uc2a4 \uc7ac\ubc30\uc6a9 \uc544\uce58\ud615 \uace8\uc7ac\ub97c \uacac\uace0\ud558\uace0 \uc6a9\uc774\ud558\uac8c \uace0\uc815\ud558\ub294 \uad6c\uc870\ub97c \uc81c\uacf5\ud55c\ub2e4. \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70 24 A\uc758 \uac01 \uc9c0\uc9c0\ub300 24\ub97c \ub179\ud654 \uc9c0\uc6d0\uce35 6\uc744 \uad00\ud1b5\ud574 \uac01 \uc9c0\uc9c0\ub300 24\uc640 \uc625\uc0c1 \uc2ac\ub798\ube0c\uba74 2 \uc0ac\uc774\ub97c \ub118\ub294 \uace0\uc815\uad6c\ub97c \uc774\uc6a9\ud568\uc73c\ub85c\uc368, \ub179\ud654 \uc9c0\uc6d0\uce35 6\uc744 \ud1b5\ud574 \uc0c1\uae30 \uc625\uc0c1 \uc2ac\ub798\ube0c\uba74\uc5d0 \uac00\uc555 \uace0\uc815\ud558\uace0 \uadf8 \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70 24 A\uc758 \uac01 \uc9c0\uc9c0\ub300 24\uc758 \uc0c1\ub2e8\uba74\uc5d0 \ud558\uc6b0\uc2a4 \uc7ac\ubc30\uc6a9 \uc544\uce58\ud615 \uace8\uc7ac 50\uc744 \uc9c0\uc9c0\ud558\uae30 \uc704\ud55c \uc9c0\uc9c0 \ubd80\uc7ac 36\uc744 \uac01\uac01 \ub9c8\ub828\ud55c\ub2e4. \uc774\uac83\uc5d0 \uc758\ud574 \uc544\uce58\ud615 \uace8\uc7ac 50\uc758 \ub2e8\ubd80\ub97c \uc9c0\uc9c0\ud558\ub294 \uac83\uc73c\ub85c\uc11c \ub179\ud654 \uc9c0\uc6d0\uce35 6\ubcf4\ub2e4 \uac15\uc131\uc774 \ub192\uace0 \uc625\uc0c1 \uc2ac\ub798\ube0c\uba74 2\uc5d0 \ub179\ud654 \uc9c0\uc6d0\uce35 6\uc744 \ud1b5\ud574 \uc81c\ub300\ub85c \uace0\uc815\ub41c \uc9c0\uc9c0\ub300 24\uac00 \uc774\uc6a9\ub418\ub3c4\ub85d \ud55c\ub2e4. \ub610\ud55c \uc544\uce58\ud615 \uace8\uc7ac 50\uc758 \ub2e8\ubd80\ub97c \uc9c0\uc9c0\ud558\ub294 \uc9c0\uc9c0 \ubd80\uc7ac 36\uc744 \uc9c0\uc9c0\ub300 24\uc758 \uc0c1\ub2e8\uba74\uc5d0 \ub9c8\ub828\ud568\uc73c\ub85c\uc368, \ud1a0\uc591\uc5d0 \ud30c\ubb3b\ud788\ub294 \uac83\uc744 \uc5b5\uc81c\ud558\uace0 \uadf8 \uc9c0\uc9c0 \ubd80\uc7ac 36\uc758 \uc2dc\uc778\uc131\uc744 \ud5a5\uc0c1\uc2dc\ud0a8\ub2e4.\uc625\uc0c1 \ubc14\ub2e5\uba74\uc0c1\uc5d0 \ubcf5\uc218\uc758 \ub179\ud654 \uc9c0\uc6d0 \ud328\ub110\uc744 \ubd80\uc124\ud568\uc73c\ub85c\uc368 \ub179\ud654 \uc9c0\uc6d0\uce35\uc774 \ud615\uc131\ub418\uace0 \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\uc0c1\uc5d0 \ud1a0\uc591\uc774 \ubc30\uc124\ub418\ub294 \uc625\uc0c1 \ub179\ud654 \uad6c\uc870\uc5d0 \uc788\uc5b4\uc11c \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\uc0c1\uc5d0 \ud55c \uc30d\uc758 \uc9c0\uc9c0\ub300\ub82c\uad70\uc774 \uc11c\ub85c \uc774\uaca9\ub41c \uc0c1\ud0dc\ub97c \uac00\uc9c0\uace0 \ubc30\uce58\ub418\uace0 \uc0c1\uae30 \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70\uc774 \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\ubcf4\ub2e4 \uac15\uc131\uc774 \ud5a5\uc0c1\ub41c \ubcf5\uc218\uc758 \uc9c0\uc9c0\ub300\ub97c \uac04\uaca9\uc744 \ub450\uba74\uc11c \uc5f4\uc744 \uc774\ub8e8\ub3c4\ub85d \uc785\uc124 \ubc30\uce58\ud568\uc73c\ub85c\uc368 \uad6c\uc131\ub418\uace0 \uc0c1\uae30 \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70\uc758 \uac01 \uc9c0\uc9c0\ub300\uac00 \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\uc744 \uad00\ud1b5\ud574 \uc0c1\uae30 \uac01 \uc9c0\uc9c0\ub300\uc640 \uc0c1\uae30 \uc625\uc0c1 \ubc14\ub2e5\uba74 \uc0ac\uc774\ub97c \ub118\ub294 \uace0\uc815\uad6c\ub97c \uc774\uc6a9\ud568\uc73c\ub85c\uc368, \uc0c1\uae30 \ub179\ud654 \uc9c0\uc6d0\uce35\uc744 \ud1b5\ud574 \uc0c1\uae30 \uc625\uc0c1 \ubc14\ub2e5\uba74\uc5d0 \uac00\uc555 \uace0\uc815\ub418\uc5b4 \uc0c1\uae30 \uac01 \uc9c0\uc9c0\ub300\ub82c\uad70\uc758 \uac01 \uc9c0\uc9c0\ub300\uc758 \uc0c1\ub2e8\uba74\uc5d0\ub294 \ud558\uc6b0\uc2a4 \uc7ac\ubc30\uc6a9 \uc544\uce58\ud615 \uace8\uc7ac\ub97c \uc9c0\uc9c0\ud558\uae30 \uc704\ud55c \uc9c0\uc9c0\ubd80\uac00 \uac01\uac01 \uad6c\ube44\ub418\uc5b4 \uc788\ub294, \uac83\uc744 \ud2b9\uc9d5\uc73c\ub85c \ud558\ub294 \uc625\uc0c1 \ub179\ud654 \uad6c\uc870.",
    "from": "nayohan/aihub-en-ko-translation-12m",
    "category": "full"
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "english": "Native chicken breeding methodThe invention discloses a native chicken breeding method, which includes steps that the shield degree of a breeding grove is 60-65%; a native chicken activity field with area of 5-8 mu is encircled bya 1.8-2.2m high nylon mesh; a ventilating and warming device is arranged in a henhouse; feed and water are delivered at 8: 00-15: 00 in every morning, and native chicken are put in grove activity field at 15:00-17: 00 in the afternoon; music is displayed at 17: 00-18: 30, and feed is delivered at outside of the henhouse to domesticize the native chickens , and then chickens are returned to the henhouse; the henhouse is cleaned at intervals of 12-15 days; the henhouse is sterilized by an automatic sterilizing system during the stocking period in the afternoon at intervals of 3-5 days. The native chicken breeding method can well consider about the stocking time, thus the stocking instinct of the native chickens is well guaranteed, the food intake of the native chickens is increased throughthe reasonable captive time; the meat growth is accelerated, the breeding cycle is shortened, and the meat quality of the native chickens is ensured.A kind of 1. cultural method of chicken, it is characterised in that\uff1ait the described method comprises the following steps\uff1a\uff081\uff09selection cultivation ground\uff1aselection away from livestock and poultry transaction place, slaughtering field, chemical plant, garbage disposal plant, avoid air, dust, water source, germ and the cultivation of the woods of noise pollution, the moon degree of covering of the woods is 6065%, with 1.82.2 meters of high nylon net circle area is 58 mu of chicken playground, and vegetable seeds is broadcasted sowing in forest land\uff1b\uff082\uff09build chicken house\uff1athe wind sheltering in woods ground on the sunny side, hen house is built in the chicken playground centre position that physical features is high and dry, draining blowdown condition is good, and ventilation heating is set in hen house equipment, hen house is interior to set automatic sterilizing system\uff1b\uff083\uff09select kind\uff1aselect it is resistance to it is extensive, action flexibly, the pure native that power of looking for food is strong, premunition is strong\uff1b\uff084\uff09dietary management\uff1aevery mu of forest land puts 260280 in a suitable place to breed, every morning 8:0015:feed and water are launched in stable breeding when 00, afternoon 15:0017:it is put into forest land playground when 00 to put in a suitable place to breed, 17:0018:dispensing feed outside music colony house is played when 30 to enter row domestication makes chicken return to colony house, and day temperature is maintained at 2023 degrees celsius in circle, and nocturnal temperature is maintained at 2023 degrees celsius\uff1b \uff085\uff09disinfectant management\uff1ato being cleaned in hen house, colony house is started certainly during chicken is put in a suitable place to breed afternoon within every 35 days within every 1215 days dynamic disinfection system is sterilized, and lime powder for every 23 months to the main passageway in woods forest land.",
    "korean": "\ud1a0\uc885\ub2ed \uc0ac\uc721\ubc29\ubc95\uc774 \ubc1c\uba85\ud488\uc740 \uc0ac\uc721\uc7a5\uc758 \ubc29\ud328\ub3c4\uac00 6065%\uc778 \ud1a0\uc885\ub2ed \uc0ac\uc721\ubc95\uc744 \uacf5\uac1c\ud558\uace0 \uc788\uc73c\uba70, \uba74\uc801\uc774 58m\uc778 \ud1a0\uc885\ub2ed \ud65c\ub3d9\uc7a5\uc744 1.82.2m \ub192\uc774\uc758 \ub098\uc77c\ub860 \uba54\uc2dc\ub85c \ub458\ub7ec\uc2f8\uace0 \uc788\uc73c\uba70, \ub2ed\uc7a5\uc5d0 \ud658\uae30 \ubc0f \ub09c\ubc29 \uc7a5\uce58\uac00 \ubc30\uce58\ub418\uc5b4 \uc788\uc73c\uba70, \ub9e4\uc77c \uc544\uce68 8\uc2dc15\ubd84\uc5d0 \uc0ac\ub8cc\uc640 \ubb3c\uc774 \uc804\ub2ec\ub418\uace0 \uc788\ub2e4. \uadf8\ub9ac\uace0 \ud1a0\uc885\ub2ed\uc740 \uc624\ud6c4 15:00-17:00\uc5d0 \uc232 \ud65c\ub3d9\uc7a5\uc5d0 \ud22c\uc785\ub418\uace0, 17: 00-18:30\uc5d0\ub294 \uc74c\uc545\uc774 \uc5f0\uc8fc\ub418\uba70, \ubaa8\uc774\ub294 \ub2ed\uc7a5 \ubc16\uc5d0\uc11c \ubc30\ub2ec\uc744 \ubc1b\uc544 \ud1a0\uc885\ub2ed\uc744 \uae38\ub4e4\uc774\uace0, \ub2ed\uc7a5\uc740 12-15\uc77c \uac04\uaca9\uc73c\ub85c \ub2ed\uc7a5\uc73c\ub85c \ub3cc\ub824\ubcf4\ub0b8\ub2e4; \ub2ed\uc7a5\uc740 \uc790\ub3d9\uc18c\ub3c5\ub41c\ub2e4.c \uc624\ud6c4\uc758 \ubcf4\uad00 \uae30\uac04 \ub3d9\uc548 35\uc77c \uac04\uaca9\uc73c\ub85c \uba78\uade0 \uc2dc\uc2a4\ud15c. \ud1a0\uc885\ub2ed \uc0ac\uc721\ubc95\uc740 \uc0ac\uc721 \uc2dc\uac04\uc744 \uc798 \uace0\ub824\ud560 \uc218 \uc788\uae30 \ub54c\ubb38\uc5d0 \ud1a0\uc885\ub2ed\uc758 \uc0ac\uc721 \ubcf8\ub2a5\uc774 \uc798 \ubcf4\uc7a5\ub418\uace0, \ud1a0\uc885\ub2ed\uc758 \uba39\uc774 \uc12d\ucde8\uac00 \uc801\uc808\ud55c \ud3ec\ud68d \uc2dc\uac04\uc744 \ud1b5\ud574 \uc99d\uac00\ud55c\ub2e4; \uc721\uc2dd \uc131\uc7a5\uc774 \uac00\uc18d\ud654\ub418\uace0, \ubc88\uc2dd \uc8fc\uae30\uac00 \uc9e7\uc544\uc9c0\uba70, \ud1a0\uc885\ub2ed\uc758 \uc721\uc9c8\ub3c4 e\uc774\ub2e4.\ub204\uc5d0\uc288\uc5b4\ub2ed\uc758 \uc77c\uc885\uc73c\ub85c, \ubb18\uc0ac\ub41c \ubc29\ubc95\uc740 \ub2e4\uc74c\uacfc \uac19\uc740 \ub2e8\uacc4\ub85c \uad6c\uc131\ub41c\ub2e4: \uff091select\uc120\uc815\uc7ac\ubc30\uc7a5: \uac00\ucd95\uacfc \uac00\uae08\ub958 \uac70\ub798\uc7a5\uc18c\ub85c\ubd80\ud130\uc758 \uc120\ud0dd, \ub3c4\ucd95\uc7a5, \ud654\ud559\uacf5\uc7a5, \uc4f0\ub808\uae30 \ucc98\ub9ac\uc7a5, \uacf5\uae30, \uba3c\uc9c0, \uc218\uc6d0, \uc138\uade0, \uadf8\ub9ac\uace0 \uc232\uc758 \ubb34\uade0 \uc7ac\ubc30\uc774\uc138\uc624\uc5fc, \uc232\uc758 \ub2ec\uc758 \ub36e\uc784\ub3c4\ub294 6065%\uc774\uace0, \ub192\uc740 \ub098\uc77c\ub860 \uadf8\ubb3c\ub9dd \uba74\uc801 1.82.2m\ub294 \ub2ed \ub180\uc774\ud130\uc758 58mu\uc774\uba70, \uc232 \uc18d\uc5d0 \ucc44\uc18c \uc528\uc557\uc744 \ubfcc\ub9ac\ub294 \uac83\uc744 \ubc29\uc1a1\ud55c\ub2e4. \uc2e0\uccb4\uc801 \ud2b9\uc9d5\uc774 \ub192\uace0 \uac74\uc870\ud558\uba70 \ubc30\uc218 \ube14\ub85c\uc6b0\ub2e4\uc6b4 \uc870\uac74\uc774 \uc88b\ub2e4, \uadf8\ub9ac\uace0 \ud658\uae30 \ub09c\ubc29\uc740 \ub2ed\uc9d1 \uc7a5\ube44\uc5d0 \uc124\uc815\ub41c\ub2e4, \ub2ed\uc9d1\uc740 \uc790\ub3d9 \uc0b4\uade0 \uc2dc\uc2a4\ud15c\uc744 \uc124\uc815\ud558\uae30 \uc704\ud55c \ub0b4\ubd80\uc774\ub2e4;33selectselect cind;select codelt it's \uad11\ubc94\uc704\ud558\uace0, \uc720\uc5f0\ud558\uac8c \uc791\uc6a9\ud558\uba70, \uc74c\uc2dd\uc744 \ucc3e\ub294 \ud798\uc774 \uac15\ud55c \uc21c\uc218\ud55c \ud1a0\uc885, \uc608\uac10\uc774 \uac15\ud558\ub2e4;select4aary \uad00\ub9ac:\uc784\uc57c\uc758 \ubaa8\ub4e0 \ubba4\ub294 260280\ubc88\uc2dd\uc744 \ud558\uae30\uc5d0 \uc801\ud569\ud55c \uc7a5\uc18c\uc5d0 \ubc30\uce58\ud558\uace0, \ub9e4\uc77c \uc544\uce68 8:0015:\uc0ac\ub8cc\uc640 \ubb3c\uc740 00\ubc88\uc2dd\uc744 \ud560 \ub54c \uc548\uc815\uc801\uc778 \ubc88\uc2dd\uc9c0\ub85c \ud22c\uc785\ud558\uace0, 17:0018:\uc74c\uc545\uc9d1 \uc678\ubd80\uc758 \uc0ac\ub8cc\ub4e4\uc774 30\ubc88 \uc904\uc5d0 \ub4e4\uc5b4\uc11c\uba74 \uc7ac\uc0dd\ub429\ub2c8\ub2e4.\ub2ed\uc758 \uad70\uc9d1 \ubcf5\uadc0\ub294 \uc544\uc774\ub514\ucf00\uc774\uc158\uc73c\ub85c, \ub0ae \uae30\uc628\uc740 \uc6d0\uc8fc 2023\ub3c4, \uc57c\ud589\uc131 \uc628\ub3c4\ub294 2023\ub3c4\ub97c \uc720\uc9c0\ud558\uba70, \u30105\u3011\uc911\uc694\ud55c \uad00\ub9ac:\ub2ed\uc9d1 \uccad\uc18c\ub294 \ubc18\ub4dc\uc2dc \uc2dc\uc791\ud558\uba70, \ub2ed\uc740 3\ub144\ub9c8\ub2e4 \uc624\ud6c4\ub9c8\ub2e4 \ubc88\uc2dd\ud558\uae30\uc5d0 \uc801\ud569\ud55c \uc7a5\uc18c\uc5d0 \ub454\ub2e4.1215\uc77c \uc774\ub0b4\uc5d0\ub294 5\uc77c \uc774\ub0b4 \ub3d9\uc801\uc18c\ub3c5\uc2dc\uc2a4\ud15c\uc774 \uba78\uade0 \ucc98\ub9ac\ub418\uba70, \uc232\uc18d\uc758 \uc8fc\ud1b5\ub85c\ub85c 2~3\uac1c\uc6d4\ub9c8\ub2e4 \ub77c\uc784\ud30c\uc6b0\ub354\uac00 \ud22c\uc785\ub41c\ub2e4.",
    "from": "nayohan/aihub-en-ko-translation-12m",
    "category": "full"
    },
    "truncated_cells": []
    }
    ]
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

query-to-dataset-viewer-descriptions

  • Dataset: query-to-dataset-viewer-descriptions at eb9d1be
  • Size: 1,433 evaluation samples
  • Columns: query, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    query positive negative
    type string string string
    details
    • min: 10 tokens
    • mean: 11.75 tokens
    • max: 17 tokens
    • min: 46 tokens
    • mean: 2188.82 tokens
    • max: 8192 tokens
    • min: 33 tokens
    • mean: 1769.64 tokens
    • max: 8192 tokens
  • Samples:
    query positive negative
    USER_QUERY: multimodal conversation dataset HUB_DATASET_PREVIEW: DATASET_NAME: "BUAADreamer/llava-en-zh-2k"
    FEATURES: {'messages': [{'role': {'dtype': 'string', 'type': 'Value'}, 'content': {'dtype': 'string', 'type': 'Value'}}], 'images': {'feature': {'type': 'Image'}, 'type': 'Sequence'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "messages": [
    {
    "role": "user",
    "content": "How many baseball players are visible in the image?"
    },
    {
    "role": "assistant",
    "content": "There are three baseball players visible in the image."
    },
    {
    "role": "user",
    "content": "What are the players holding in their hands?"
    },
    {
    "role": "assistant",
    "content": "The players are holding baseball bats in their hands."
    },
    {
    "role": "user",
    "content": "Are the players in a dugout?"
    },
    {
    "role": "assistant",
    "content": "Yes, the three baseball players are standing in the dugout."
    },
    {
    "role": "user",
    "content": "Is the image in color or black and white?"
    },
    {
    "role": "assistant",
    "content": "The image is an old black and white photo of the three baseball players."
    },
    {
    "role": "user",
    "content": "Do the players belong to a specific baseball team?"
    },
    {
    "role": "assistant",
    "content": "Yes, the players belong to the Boston Red Sox baseball team."
    }
    ],
    "images": [
    {
    "src": "https://datasets-server.huggingface.co/assets/BUAADreamer/llava-en-zh-2k/--/fba994c834822bddd3cd79e929c33135f4289d2b/--/en/train/0/images/image-1d100e9.jpg?Expires=1726591851&Signature=QHGD147HyWamORfSjz0QoG51Ru86g3STPBNDAEOLK7NTq8Y~b4vVt3u~XF9njlRWwNnVF7AQ8-l9f2pCWxggPnZw1wZEfBAC5Q1oOW2CwT-gMYME~I-9qeJrbQtkszer9U0-H5rkECK0DVgWKKIagyjjJBmSSLz2QAb3BlV5aEMGEbJj-4G1X2xzoVaMIfg~ogGSb6~2PAKoLoKMPUZIxrm~2LoPEGVJeXr7JIa8-ISVukshW8ZEDQII~eXfOBeEsTd1Mk5BpJz26dH3W7X2rdhq54KeQ8H9OBt~HlU0XbXigFABeMymb-GBH-fDNHUDmb9Zt3io5hBMOd7I64DLyQ
    &Key-Pair-Id=K3EI6M078Z3AC3",
    "height": 308,
    "width": 384
    }
    ]
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "messages": [
    {
    "role": "user",
    "content": "What is the man wearing in the image?"
    },
    {
    "role": "assistant",
    "content": "The man in the image is wearing chaps, a cowboy hat, and is dressed like a cowboy."
    },
    {
    "role": "user",
    "content": "What is the cowboy doing in the image?"
    },
    {
    "role": "assistant",
    "content": "The cowboy is riding on a horse in a grassy field and pointing into the distance, likely directing attention towards a specific location or object."
    },
    {
    "role": "user",
    "content": "What kind of landscape surrounds the cowboy and his horse?"
    },
    {
    "role": "assistant",
    "content": "The cowboy and his horse are surrounded by a grassy field, which creates a natural and open setting for the scene."
    },
    {
    "role": "user",
    "content": "What is the significance of the cowboy attire and its relation to horse riding?"
    },
    {
    "role": "assistant",
    "content": "Cowboy attire is deeply rooted in the history, culture, and traditions of the American West. It is both functional and symbolic, representing the hardworking and rugged lifestyle of cowboys who spent much of their time working with livestock and riding horses.\n\nThe key elements of cowboy attire, like the cowboy hat and chaps, serve specific purposes:\n1. Cowboy Hat: A wide-brimmed hat made of stiffened felt or straw is designed to protect the cowboy from the sun, wind, and rain. The hat's distinctive high crown allows for air circulation and provides a touch of style.\n2. Chaps: These are sturdy leather leg coverings worn over pants to protect the cowboy's legs from rough terrain, thorny bushes, and other elements while riding or working with livestock. Chaps provide an additional layer of protection and durability, particularly while riding through rugged landscapes and herding cattle.\n\nIn summary, the cowboy attire seen in the image is not just a fashion statement but is deeply connected to the history and traditions of the American West. The functional pieces of clothing, like the cowboy hat and chaps, are designed to protect and support the cowboy during horse riding and working with livestock."
    }
    ],
    "images": [
    {
    "src": "https://datasets-server.huggingface.co/assets/BUAADreamer/llava-en-zh-2k/--/fba994c834822bddd3cd79e929c33135f4289d2b/--/en/train/1/images/image-1d100e9.jpg?Expires=1726591851&Signature=WyNDGZXVbzPOU9iOQSDPFt1MizgmdT-KqdVAG8nIVSK0Gg8OO-qmhKxgIVjyWMHnWyNbW5svuMoukPMyv9hiHMsNh0YmzdjMR9Gwb6mRvsisEAdaLl71Q053MYxEqkZWCB6PbXG5yEazHL4RHvDphsUEhZS-0Yk8Kzx0HHc12HNaJfiO4fO4IPkY3eLw5xLgNoKIcvvO9TDo0JEbc1ej6YkxGUdqXyVrG2Y4zYnhrCM0drgKVzq24cQ9YZ78HW5f-EsXsftbj0ZzEg4SKcuVgrqaKG8SJ~i0aV-OtkXiTCWxW16D4hfsmpXZShZAHesa1EOGprkYdtQG4Kfte12maQ
    &Key-Pair-Id=K3EI6M078Z3AC3",
    "height": 288,
    "width": 384
    }
    ]
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "passing2961/photochat_plus"
    FEATURES: {'photo_description': {'dtype': 'string', '_type': 'Value'}, 'trigger_sentences': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'dialogue_id': {'dtype': 'int64', '_type': 'Value'}, 'photo_url': {'dtype': 'string', '_type': 'Value'}, 'dialogue': [{'message': {'dtype': 'string', '_type': 'Value'}, 'share_photo': {'dtype': 'bool', '_type': 'Value'}, 'user_id': {'dtype': 'int64', '_type': 'Value'}}], 'image_descriptions': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'intents': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'salient_information': {'feature': {'dtype': 'string', '_type': 'Value'}, '_type': 'Sequence'}, 'photo_id': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "photo_description": "The photo has your brother Kannon. Objects in the photo: Man",
    "trigger_sentences": [
    "How is Kannon doing?"
    ],
    "dialogue_id": 500,
    "photo_url": "https://farm6.staticflickr.com/151/369716968_bde7e83418_o.jpg",
    "dialogue": [
    {
    "message": "Hello, how have you been, dear friend?",
    "share_photo": false,
    "user_id": 1
    },
    {
    "message": "Great!",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "Thanks for asking",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "And how have you been?",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "It seems like we haven't talked in forever",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "I have been doing well, keeping busy, spent a lot of time outdoors. What have you been up to?",
    "share_photo": false,
    "user_id": 1
    },
    {
    "message": "Last night my brother Kannon did a poetry reading",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "Really? How did it go? You know how much I love poetry.",
    "share_photo": false,
    "user_id": 1
    },
    {
    "message": "It went really well",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "Do you remember my brother Kannon?",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "Absolutely! How could I forget, he left quite an impression",
    "share_photo": false,
    "user_id": 1
    },
    {
    "message": "How is Kannon doing?",
    "share_photo": false,
    "user_id": 1
    },
    {
    "message": "",
    "share_photo": true,
    "user_id": 0
    },
    {
    "message": "Great",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "Here is a photo from last night",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "Wow, he seems so confident in that pic! Wish that I could have been there.",
    "share_photo": false,
    "user_id": 1
    }
    ],
    "image_descriptions": [
    "A photo of Kannon",
    "A picture of Kannon.",
    "a photo of recent situation"
    ],
    "intents": [
    "Information Dissemination",
    "Social Bonding"
    ],
    "salient_information": [
    "poetry",
    "How is Kannon doing?",
    "Kannon doing"
    ],
    "photo_id": "train/19e8f436d4b2fc25"
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "photo_description": "The photo has your uncle Kieran. Objects in the photo: Clothing, Man",
    "trigger_sentences": [
    "guess what new animal he got?",
    "He's always had goats and chickens, but guess what new animal he got?"
    ],
    "dialogue_id": 501,
    "photo_url": "https://farm8.staticflickr.com/53/189664134_f70fc8947a_o.jpg",
    "dialogue": [
    {
    "message": "Hey! You remember my uncle who owns the hobby farm, right?",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "Yeah i do",
    "share_photo": false,
    "user_id": 1
    },
    {
    "message": "Uncle Keiran?",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "How about him?",
    "share_photo": false,
    "user_id": 1
    },
    {
    "message": "He's always had goats and chickens, but guess what new animal he got?",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "Dog?",
    "share_photo": false,
    "user_id": 1
    },
    {
    "message": "Nope, a wild hog!",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "And not the motorcycle kind ;)",
    "share_photo": false,
    "user_id": 0
    },
    {
    "message": "",
    "share_photo": true,
    "user_id": 0
    },
    {
    "message": "Wow",
    "share_photo": false,
    "user_id": 1
    }
    ],
    "image_descriptions": [
    "A photo of the hog's appearance.",
    "a photo of wild hog",
    "An image of the new wild hog"
    ],
    "intents": [
    "Social Bonding",
    "Visual Clarification"
    ],
    "salient_information": [
    "hog",
    "not the motorcycle kind",
    "wild hog",
    "a wild hog"
    ],
    "photo_id": "train/07d688f5e2142b87"
    },
    "truncated_cells": []
    }
    ]
    USER_QUERY: question answering dataset for technical documentation HUB_DATASET_PREVIEW: DATASET_NAME: "hythyt/preguntas-respuestas-RAG"
    FEATURES: {'DATOS': {'dtype': 'string', '_type': 'Value'}, 'PREGUNTA': {'dtype': 'string', '_type': 'Value'}, 'RESPUESTA': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "DATOS": "ARISbyhyt Manual de Usuario\nSecci\u00f3n 8\nARISbyhyt es un programa dise\u00f1ado para gestionar proyectos de forma eficiente. Permite crear tareas, asignar recursos y realizar un seguimiento detallado del progreso. Para agregar una nueva tarea, accede al men\u00fa "Proyectos" y selecciona "Nueva Tarea". Completa la informaci\u00f3n requerida, como nombre, fecha de inicio y asignaci\u00f3n de recursos.\n\nSecci\u00f3n 4\nEn ARISbyhyt, puedes personalizar tus informes para obtener datos espec\u00edficos sobre el progreso del proyecto. Accede al men\u00fa "Informes", elige la opci\u00f3n "Personalizar Informe" y selecciona los campos que deseas incluir, como el estado de las tareas o el tiempo dedicado a cada fase.",
    "PREGUNTA": "\u00bfC\u00f3mo puedo crear una nueva tarea en ARISbyhyt?",
    "RESPUESTA": "Para agregar una nueva tarea en ARISbyhyt, sigue estos pasos:\n1. Ve al men\u00fa "Proyectos".\n2. Selecciona "Nueva Tarea".\n3. Completa la informaci\u00f3n solicitada, como nombre, fecha de inicio y asignaci\u00f3n de recursos."
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "DATOS": "ARISbyhyt Manual de Usuario \nSecci\u00f3n 2\nEn ARISbyhyt, puedes personalizar tus informes para obtener datos espec\u00edficos sobre el progreso del proyecto. Accede al men\u00fa "Informes", elige la opci\u00f3n "Personalizar Informe" y selecciona los campos que deseas incluir, como el estado de las tareas o el tiempo dedicado a cada fase.",
    "PREGUNTA": "\u00bfC\u00f3mo puedo personalizar un informe en ARISbyhyt para obtener datos espec\u00edficos sobre el progreso del proyecto?",
    "RESPUESTA": "Para personalizar un informe en ARISbyhyt, sigue estos pasos:\n1. Dir\u00edgete al men\u00fa "Informes".\n2. Selecciona "Personalizar Informe".\n3. Elige los campos que deseas incluir, como el estado de las tareas o el tiempo dedicado a cada fase."
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "cmalaviya/expertqa"
    FEATURES: {'example_id': {'dtype': 'int64', '_type': 'Value'}, 'context': {'dtype': 'string', '_type': 'Value'}, 'question': {'dtype': 'string', '_type': 'Value'}, 'answer': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "example_id": 0,
    "context": "",
    "question": "Some customers of mine are not paying their debts on time. Do I have to keep all my customers?",
    "answer": "You don't necessarily have to keep all your customers, especially if they consistently fail to pay their debts on time. There are different types of non-paying customers, such as cash-strapped, purposefully late, and non-payer by nature . It is essential to maintain a positive attitude and treat your customers with respect while trying to collect their debts . However, if you consistently face issues with particular customers not paying their debts, you may opt to discontinue providing services or products to them and focus on other reliable customers. You may need to consult a professional debt collector or a business attorney in such cases to decide the appropriate next steps in debt collections . To prevent nonpayment issues in the future, you can implement various strategies, such as researching new prospects, being clear with your payment policies, and setting up contracts detailing payment expectations and late fees ."
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "example_id": 1,
    "context": "",
    "question": "When accounts are faced with ethical dilemmas that often bring their integrity into question, the question is whether they are equipped enough tp deal with those?",
    "answer": " The context provided does not give specific information on whether accountants are adequately equipped to handle ethical dilemmas that could question their integrity. The text does suggest, however, that when faced with an ethical dilemma, one must question the situation honestly and transparently. And, if doubts persist, they have the obligation to raise these questions with those in authority . This suggests the need for a strong understanding of ethics to navigate such situations. The text also implies a correlation between integrity and ethics stating, "Integrity can be measured by ethics" . In a broader perspective, the text suggests that professionals, like nurses for example, often face dilemmas uncommon to the general populace . Due to the rapid advancement in medical technology, the study of ethics has become increasingly relevant, indicating that equipping professionals with adequate knowledge in ethics is necessary to navigate the demands and challenges of their roles effectively . Furthermore, it shows that managers grapple with ethical decisions involving questions of morality and integrity especially in situations where prior decisions by other managers create ethical dilemmas . While this analysis provides general insights on the significance of ethical decision-making and the need for professionals to navigate ethical dilemmas effectively, it does not provide a specific commentary on the readiness or the adequacy of training or framework available to accountants to deal with such scenarios. Hence, it is not possible to definitively answer the question based on the context provided. In South Africa SAICA has equipped accountants with code of professional conduct that they should follow when faced with ethical dilemmas. the code gives them guidance on how to deal with those. SAICA code of professional conduct https://www.misti.com/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor freely express these thoughts and ideas, the culture may be sending the wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top three people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to https://misti.com/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor thoughts and ideas, the culture may be sending the wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top three people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to a moral code, https://www.misti.co.uk/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top 3 people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to a moral code, reflected in honesty and harmony in what one thinks, SAICA equip accountants with all the relevant information in order to be able to identify ethical dilemmas https://www.misti.com/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor freely express these thoughts and ideas, the culture may be sending the wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top three people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to https://misti.com/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor thoughts and ideas, the culture may be sending the wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top three people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to a moral code, https://www.misti.co.uk/internal-audit-insights/ethics-and-the-internal-auditor Ethics and the Internal Auditor wrong message. When you are personally faced with an ethical dilemma, you must ask yourself whether you are looking at the situation as honestly and transparently as you can. If questions still arise, it is your obligation to raise those questions to individuals in positions of responsibility. Integrity can be measured by ethics If someone had you name the top 3 people in history that you felt displayed unquestionable integrity, would those same individuals measure high on the ethics scale? Most likely they would. Integrity is adherence to a moral code, reflected in honesty and harmony in what one thinks, https://www.bartleby.com/essay/The-Ethical-Dilemma-Of-A-Family-Nurse-F3H66JS4CPLLX The Ethical Dilemma Of A Family Nurse Practitioner
    USER_QUERY: conversational ai training dataset HUB_DATASET_PREVIEW: DATASET_NAME: "dataautogpt3/flan1m-alpaca-uncensored"
    FEATURES: {'instruction': {'dtype': 'string', '_type': 'Value'}, 'input': {'dtype': 'string', '_type': 'Value'}, 'output': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "instruction": "You are a helpful assistant, who always provide explanation. Think like you are answering to a five year old.",
    "input": "Continue the following story.\n\nEmily held and rocked Kayla as they sobbed because \nSelect from the following.\n +Emily really needed a good friend.\n +Kayla really needed a good friend.",
    "output": "Once upon a time, Emily held and rocked Kayla as they both sobbed because Kayla really needed a good friend. You see, little Kayla was feeling very lonely and sad because she didn't have anyone to play with or talk to.\n\nEmily wanted to help Kayla and be her friend, so she did something truly special. Emily decided to throw a magical friendship party for Kayla. Emily knew that this would cheer her up and help her make some new friends!\n\nEmily worked really hard, preparing for the big day. She bought balloons, baked cookies, and even created a treasure hunt for everyone to enjoy. She invited lots of other kids from their neighborhood, so Kayla could meet them all and find a friend to play with.\n\nOn the day of the party, the sun was shining, and the sky was filled with fluffy clouds. All the kids in the neighborhood were excited to come to the party and play with Kayla.\n\nEmily and Kayla welcomed their guests with huge smiles and warm hugs. The kids played games, laughed, and explored the treasure hunt that Emily had set up all around the backyard.\n\nSoon, Kayla was surrounded by new friends, chatting and laughing together. She began to feel so happy and loved. As the sun began to set and the party ended, Kayla felt like she finally had found the friends she needed.\n\nThat night, as Emily tucked Kayla into bed, Kayla thanked her for being such an amazing friend and throwing the best party ever. Emily smiled and said, "That's what friends are for, to help each other when we're feeling sad and lonely."\n\nAnd from that day on, Emily and Kayla were the best of friends and shared many more magical adventures together. Now, Kayla knew that she always had a good friend by her side."
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "instruction": "You are an AI assistant that helps people find information. User will you give you a question. Your task is to answer as faithfully as you can. While answering think step-bystep and justify your answer.",
    "input": "For this chain-of-thought reasoning and answer, what was the question?\nA boy playing ball is not necessarily not very good at playing ball. There is no way to know what level of skill the boy has.\n A: it is not possible to tell",
    "output": "The question likely was:\n\n"Does a boy playing ball automatically mean he is not very good at playing ball?""
    },
    "truncated_cells": []
    }
    ]
    NEGATIVE: DATASET_NAME: "sgp-bench/open-instruct-gpt4o_40k"
    FEATURES: {'instruction': {'dtype': 'string', '_type': 'Value'}, 'input': {'dtype': 'string', '_type': 'Value'}, 'output': {'dtype': 'string', '_type': 'Value'}}
    DATA SAMPLE:
    [
    {
    "row_idx": 0,
    "row": {
    "instruction": "Can you tell me how cookies work for tracking user data?",
    "input": "",
    "output": "Sure! Cookies are small pieces of data that websites can store on your computer, allowing them to track your activities and improve the user experience. For example, cookies can remember your preferences, keep track of your shopping history, or save information you enter into forms. They\u2019re typically stored in a file on your hard drive, and you can configure your browser to block or delete cookies, or\n"
    },
    "truncated_cells": []
    },
    {
    "row_idx": 1,
    "row": {
    "instruction": "Given the sentence "White fluffy dog hangs out of the window of a turquoise building." is it true that "The dog is indoors."? Let's be accurate as possible and think first.",
    "input": "",
    "output": "A dog being in a turquoise building is the same as a dog being indoors. Final answer: yes."
    },
    "truncated_cells": []
    }
    ]
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • num_train_epochs: 200
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 200
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss max_accuracy
0 0 - - 0.5
0.3497 100 1.0509 0.7070 -
0.6993 200 0.6183 0.3396 -
1.0490 300 0.3746 0.2282 -
1.3986 400 0.2481 0.1616 -
1.7483 500 0.2198 0.1302 -
2.0979 600 0.166 0.1164 -
2.4476 700 0.1045 0.1174 -
2.7972 800 0.0797 0.1095 -
3.1469 900 0.0422 0.1176 -
3.4965 1000 0.0595 0.1115 -
3.8462 1100 0.0416 0.1008 -
4.1958 1200 0.0174 0.1233 -
4.5455 1300 0.0273 0.1032 -
4.8951 1400 0.0389 0.0990 -
5.2448 1500 0.0126 0.0963 -
5.5944 1600 0.0074 0.1193 -
5.9441 1700 0.0165 0.1379 -
6.2937 1800 0.0046 0.1127 -
6.6434 1900 0.0158 0.1289 -
6.9930 2000 0.0157 0.1009 -
7.3427 2100 0.0032 0.1075 -
7.6923 2200 0.0072 0.1289 -
8.0420 2300 0.0192 0.1176 -
8.3916 2400 0.001 0.1214 -
8.7413 2500 0.024 0.1320 1.0
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.44.2
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}
Downloads last month
32
Safetensors
Model size
137M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for davanstrien/query-to-dataset-viewer-descriptions

Finetuned
(11)
this model

Dataset used to train davanstrien/query-to-dataset-viewer-descriptions

Space using davanstrien/query-to-dataset-viewer-descriptions 1

Collection including davanstrien/query-to-dataset-viewer-descriptions

Evaluation results