krshahvivek's picture
Add new SentenceTransformer model
e2eed1c verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:809
  - loss:MultipleNegativesRankingLoss
base_model: sentence-transformers/all-distilroberta-v1
widget:
  - source_sentence: Data pipeline architecture, Azure Data Factory, Apache Spark
    sentences:
      - >-
        Experience »


        Prior experience working on a SAP ECC to SAP S4 Hana Migration
        Project.4+ years in an ETL or Data Engineering roles; building and
        implementing data pipelines and modeling data.Experience with SAP data
        and data structures.Experience managing Snowflake instances, including
        data ingestion and modeling.Experience with IBM DataStage is a plus.Very
        strong skills with SQL with the ability to write efficient
        queries.Familiarity with Fivetran for replication.


        What You’ll Do


        Job requirements are met.Perform data analysis required to troubleshoot
        data related issues and assist in the resolution of data issues.


        Interested?


        Qualified candidates should send their resumes to
        [email protected]


        V-Soft Consulting Group is recognized among the top 100 fastest growing
        staffing companies in North America, V-Soft Consulting Group is
        headquartered in Louisville, KY with strategic locations in India,
        Canada and the U.S. V-Soft is known as an agile, innovative technology
        services company holding several awards and distinctions and has a wide
        variety of partnerships across diverse technology stacks.


        As a valued V-Soft Consultant, you’re eligible for full benefits
        (Medical, Dental, Vision), a 401(k) plan, competitive compensation and
        more. V-Soft is partnered with numerous Fortune 500 companies,
        exceptionally positioned to advance your career growth.


        V-Soft Consulting provides equal employment opportunities to all
        employees and applicants for employment and prohibits discrimination and
        harassment of any type without regard to race, color, religion, age,
        sex, national origin, disability status, genetics, protected veteran
        status, sexual orientation, gender identity or expression, or any other
        characteristic protected by federal, state or local laws.


        For more information or to view all our open jobs, please visit
        www.vsoftconsulting.com or call (844) 425-8425.
      - >-
        experiences that leverage the latest technologies in open source and the
        Cloud. Digital Information Management (DIM) is a team of engineers
        committed to championing a data-driven decision-making culture and meets
        the business demand for timely insight-focused analytics and information
        delivery.


        You will be working with all levels of technology from backend data
        processing technologies (Databricks/Apache Spark) to other Cloud
        computing technologies / Azure Data Platform. You should be a strong
        analytical thinker, detail-oriented and love working with data with a
        strong background in data engineering and application development. Must
        be a hand-on technologist passionate about learning new technologies and
        help improve the ways we can better leverage Advanced Analytics and
        Machine Learning.


        Responsibilities


        Build end-to-end direct capabilities.Create and maintain optimal data
        pipeline architecture.Build the infrastructure required for optimal
        extraction, transformation, and loading of data from a wide variety of
        data sources.Use analytics for capitalizing on the data for making
        decisions and achieving better outcomes for the business.Derive insights
        to differentiate member and team member experiences. Collaborate with
        cross-functional teams.Analyze and define with product teams the data
        migration and data integration strategies.Apply experience in analytics,
        data visualization and modeling to find solutions for a variety of
        business and technical problems.Querying and analyzing small and large
        data sets to discover patterns and deliver meaningful insights.
        Integrate source systems with information management solutions and
        target systems for automated migration processes.Create
        proof-of-concepts to demonstrate viability of solutions under
        consideration.



        Qualifications


        Bachelor’s degree in computer science, information systems, or other
        technology-related field or equivalent number of years of
        experience.Advanced hands-on experience implementing and supporting
        large scale data processing pipelines and migrations using technologies
        (eg. Azure Services, Python programming).Significant hands-on experience
        with Azure services such as Azure Data Factory (ADF), Azure Databricks,
        Azure Data Lake Storage (ADLS Gen2), Azure SQL, and other data sources.
        Significant hands-on experience designing and implementing reusable
        frameworks using Apache Spark (PySpark preferred or Java/Scala).Solid
        foundation in data structures, algorithms, design patterns and strong
        analytical and problem-solving skills.Strong hands-on experience leading
        design thinking as well as the ability to translate ideas to clearly
        articulate technical solutions. Experience with any of the following
        Analytics and Information Management competencies: Data Management and
        Architecture, Performance Management, Information Delivery and Advanced
        Analytics.



        Desired Qualifications


        Proficiency in collaborative coding practices, such as pair programming,
        and ability to thrive in a team-oriented environment.The following
        certifications:Microsoft Certified Azure Data EngineerMicrosoft
        Certified Azure Solutions ArchitectDatabricks Certified Associate
        Developer for Apache 2.4/3.0

        Hours: Monday - Friday, 8:00AM - 4:30PM


        Location: 820 Follin Lane, Vienna, VA 22180 | 5510 Heritage Oaks Drive
        Pensacola, FL 32526 | 141 Security Drive Winchester, VA 22602


        About Us


        You have goals, dreams, hobbies, and things you're passionate
        about—what's important to you is important to us. We're looking for
        people who not only want to do meaningful, challenging work, keep their
        skills sharp and move ahead, but who also take time for the things that
        matter to them—friends, family, and passions. And we're looking for team
        members who are passionate about our mission—making a difference in
        military members' and their families' lives. Together, we can make it
        happen. Don't take our word for it:

         Military Times 2022 Best for Vets Employers WayUp Top 100 Internship Programs Forbes® 2022 The Best Employers for New Grads Fortune Best Workplaces for Women Fortune 100 Best Companies to Work For® Computerworld® Best Places to Work in IT Ripplematch Campus Forward Award - Excellence in Early Career Hiring Fortune Best Place to Work for Financial and Insurance Services




        Disclaimers: Navy Federal reserves the right to fill this role at a
        higher/lower grade level based on business need. An assessment may be
        required to compete for this position. Job postings are subject to close
        early or extend out longer than the anticipated closing date at the
        hiring team’s discretion based on qualified applicant volume. Navy
        Federal Credit Union assesses market data to establish salary ranges
        that enable us to remain competitive. You are paid within the salary
        range, based on your experience, location and market position


        Bank Secrecy Act: Remains cognizant of and adheres to Navy Federal
        policies and procedures, and regulations pertaining to the Bank Secrecy
        Act.
      - >-
        Data AnalystDakota Dunes, SD

        Entry Level SQL, Run SQL The queries. Client is using
        ThoughtspotUnderstanding of Dashbord and Proficient in Microsoft Office
        and excel 

        Please share your profile to [email protected] or reach me on
        619 771 1188.
  - source_sentence: >-
      Customer data management, regulatory compliance, advanced Excel and Access
      proficiency
    sentences:
      - >-
        skills, attention to detail, and experience working with data in Excel.
        The candidate must enjoy collaborative work, actively participate in the
        development of team presentations, and engage in review of other analyst
        findings. ResponsibilitiesThe Junior Analyst will be responsible for
        examining data from different sources with the goal of providing
        insights into NHLBI, its mission, business processes, and information
        systems. Responsibilities for this position include:Develop a strong
        understanding of the organization, functions, and data sources to be
        able to ensure analytical sources and methodologies are appropriately
        applied for the data need.Develop clear and well-structured analytical
        plans.Ensure data sources, assumptions, methodologies, and visualization
        approaches are consistent with prior work by the OPAE.Assess the
        validity of source data and subsequent findings.Produce high quality,
        reliable data analysis on a variety of functional areas.Explain the
        outcome/results by identifying trends and creating visualizations.Use
        best practices in data analysis and visualization.Exhibit results,
        conclusions, and recommendations to leadership, and customize
        presentations to align with various audiences.Document and communicate
        analysis results (briefings, reports, and/or backup analysis files) in a
        manner that clearly articulates the approach, results, and data-driven
        recommendations.Continually assess all current activities and
        proactively communicate potential issues and/or challenges.May support
        data scientists on various projects. Qualifications Minimum
        qualifications:Bachelor’s degree in data science or related
        fields.Minimum of 2 years of demonstrable experience in data
        analysis.Must have 2 years of experience in using Excel for data
        analysis and visualization andWillingness to learn basic data science
        tools and methodologies.Intermediate to advanced proficiency with
        industry-standard word processing, spreadsheet, and presentation
        software programs.Excellent verbal and written communication
        skills.Strong attention to detail.Collaborative team player.Proven
        problem solving and critical thinking skills.Must be able to obtain
        Public Trust Clearance.US work authorization (we participate in
        E-Verify). Preferred qualifications:Proficient in the use of basic data
        science tools and methodologies (python, SQL, machine learning).MS in
        data science or related fields.

        Salary and benefitsWe offer a competitive salary and a generous benefits
        package, including full health and dental, HSA and retirement accounts,
        short- and long-term disability insurance, life insurance, paid time off
        and 11 federal holidays. Location: Washington DC, Hybrid
      - >-
        SKILLS – Very Strong, Microsoft Excel (Pivot Tables, Sumifs, Vlookups
        etc), Data manipulation, Logistics and operations terminology Job
        SummaryApple AMR Ops Logistics is looking for an experienced Data
        Analyst to support its Business Analytics team. This position will be
        responsible for ensuring maintenance and frequent updates to Apple’s
        internal Shipping Exceptions Management System. The position will work
        closely with AMR Logistics stakeholders to ensure timely execution of
        daily jobs by transforming data in Excel into Apple’s internal tools. 
        Key Responsibilities• Review multiple Excel reports and ensure timely
        uploads into the Shipping Exceptions Management System• Develop robust
        data visualizations that will help to answer commonly asked questions
        quickly and thoroughly about Shipping Exceptions• Identify data
        anomalies, work to root cause and remediate issues in data collection,
        storage, transformation, or reporting Key Qualifications1 – 2 years of
        work experience preferredSkilled in Excel and data manipulation
        (mandatory)Familiarity with Logistics and Operations
        terminologyFamiliarity with Business Objects a plusAbility to create
        cross-platform reportsAbility to turn data into information and
        insightsHigh-level attention to detail, including the ability to spot
        data errors and potential issues in Apple’s internal systems Hard
        Skills:Microsoft Excel (Pivot Tables, Sumifs, Vlookups etc)Good Verbal
        and Communication skills
      - >-
        Qualifications:0-2 years relevant experienceAdvanced knowledge of MS
        Office Suite, including proficiency in Excel and Access.Consistently
        demonstrates clear and concise written and verbal communication
        skills.Demonstrated organization skills with an excellent attention to
        detail.Ability to focus on high quality work.

        Education:Bachelor’s/University degree or equivalent experiencePlease
        share with me your updated resume if you are interested in applying for
        this role.

        Dexian is a leading provider of staffing, IT, and workforce solutions
        with over 12,000 employees and 70 locations worldwide. As one of the
        largest IT staffing companies and the 2nd largest minority-owned
        staffing company in the U.S., Dexian was formed in 2023 through the
        merger of DISYS and Signature Consultants. Combining the best elements
        of its core companies, Dexian's platform connects talent, technology,
        and organizations to produce game-changing results that help everyone
        achieve their ambitions and goals.Dexian's brands include Dexian DISYS,
        Dexian Signature Consultants, Dexian Government Solutions, Dexian Talent
        Development and Dexian IT Solutions. Visit https://dexian.com/ to learn
        more.Dexian is
  - source_sentence: >-
      Clarity PPM reporting, data dashboard customization, performance quality
      assurance
    sentences:
      - >-
        skills and the ability to connect and communicate across multiple
        departments.Adept at report writing and presenting findings.Ability to
        work under pressure and meet tight deadlines.Be able to read and update
        project and program level resource forecasts.Identify recurring process
        issues and work with managers to find solutions and initiate
        improvements to mitigate future recurrence. 

        Skills and Qualifications:5+ years in a Data Analyst and/or Data
        Scientist capacity.5 years of experience with Clarity PPM reporting,
        developing data dashboards, charts and datasets in Clarity.Strong
        knowledge of and experience with reporting packages (Business Objects,
        Tableau, Power BI, etc.), databases (SQL), programming (XML, JavaScript,
        etc.).Knowledge of statistics and experience using statistical packages
        for analyzing datasets (Excel, SAS, R, SPSS, etc.)High understanding of
        PPM disciplines has worked in a team and covered strategic projects.
        Experience with Dashboard customization, configuration, user interface
        personalization and infrastructure management will be helpful.Strong
        analytical skills with the ability to collect, organize, analyze, and
        disseminate significant amounts of information with attention to detail,
        accuracy, and actionable insights.Excellent communicator, adjusting
        communication styles based on your audience.Quick learner, adaptable and
        can thrive in new environments.Proactive, confident, and engaging;
        especially when it comes to large stakeholder groups.Capable of
        critically evaluating data to derive meaningful, actionable
        insights.Demonstrate superior communication and presentation
        capabilities, adept at simplifying complex data insights for audiences
        without a technical background.
      - >-
        skills and current Lubrizol needs):


        Create predictive models by mining complex data for critical formulating
        or testing insights Implement and assess algorithms in R, Python, SAS,
        JMP or C#/C++ Research and implement new statistical, machine learning
        and/or optimization approaches (PhD level)Collaborate with data science
        team, as well as, scientists and engineers, to understand their needs,
        and find creative solutions to meet those needs 


        Previous Intern Projects Include


        Predictive modeling using Bayesian and machine learning methods R/Shiny
        tool development to enable model predictions and formulation
        optimization Creation of an interactive visualization tool for
        monitoring predictive models Multitask learning (transfer learning)
        using co-regionalized Gaussian Processes (PhD level)Multi-objective
        optimization using genetic algorithms (PhD level)Survival modeling using
        bagged Cox proportional hazards regression trees (PhD level)Bootstrap
        variance estimation for complex nonlinear models (PhD level)


        What tools do you need for success?


        Enrolled in a Masters or PhD program such as statistics, data analytics,
        machine learningExcellent programming skills with the ability to learn
        new methods quicklyExposure to database systems and the ability to
        efficiently manipulate complex data Interest and experience in advanced
        statistical modeling/machine learning methods (PhD level)Coursework in
        statistical modeling and data mining methodsCuriosity and creativity


        Benefits Of Lubrizol’s Chemistry Internship Programs


        Rewarding your hard work!Competitive payHoliday pay for holidays that
        fall within your work periodFUN! We host a variety of events and
        activities for our students. Past events include a Cleveland Cavaliers
        game, paid volunteering days, professional development and networking
        events, and even a picnic hosted by our CEO!

        While headquartered in the United States, Lubrizol is truly a global
        specialty chemical company. We have a major presence in five global
        regions and do business in more than 100 countries. Our corporate
        culture ensures that Lubrizol is one company throughout the world, but
        you will find each region is a unique place to work, live and play.


        Lubrizol is
      - >-
        experience with agile engineering and problem-solving creativity. United
        by our core values and our purpose of helping people thrive in the brave
        pursuit of next, our 20,000+ people in 53 offices around the world
        combine experience across technology, data sciences, consulting and
        customer obsession to accelerate our clients’ businesses through
        designing the products and services their customers truly value.

        Job Description

        This position requires in-depth knowledge and expertise in GCP services,
        architecture, and best practices. Will work closely with clients to
        understand their business objectives and develop strategies to leverage
        GCP to meet their needs. They will collaborate with cross-functional
        teams to design, implement, and manage scalable and reliable cloud
        solutions. They will also be responsible for driving innovation and
        staying up-to-date with the latest GCP technologies and trends to
        provide industry-leading solutions.

        Your Impact:

        Collaborate with clients to understand their business requirements and
        design GCP architecture to meet their needs.Develop and implement cloud
        strategies, best practices, and standards to ensure efficient and
        effective cloud utilization.Work with cross-functional teams to design,
        implement, and manage scalable and reliable cloud solutions on
        GCP.Provide technical guidance and mentorship to the team to develop
        their skills and expertise in GCP.Stay up-to-date with the latest GCP
        technologies, trends, and best practices and assess their applicability
        to client solutions.Drive innovation and continuous improvement in GCP
        offerings and services to provide industry-leading solutions.Collaborate
        with sales and business development teams to identify and pursue new
        business opportunities related to GCP.Ensure compliance with security,
        compliance, and governance requirements in GCP solutions.Develop and
        maintain strong relationships with clients, vendors, and internal
        stakeholders to promote the adoption and success of GCP solutions.

        Qualifications

        Must have good implementationexperience onvariousGCP’s Data Storage and
        Processing services such as BigQuery, Dataflow, Bigtable, Dataform, Data
        fusion, cloud spanner, Cloud SQLMust have programmatic experience with
        tools like Javascript, Python, Apache Spark.Experience in building
        advance Bigquery SQL and Bigquery modelling is requiredExperience in
        orchestrating end-end data pipelines with tools like cloud composer,
        Dataform is highly desired.Experience in managing complex and reusable
        dataflow pipelines is highly desired.

        What sets you apart:

        Experience in complex migrations from legacy data warehousing solutions
        or on-prem datalakes to GCPExperience in maneuvering resources in
        delivering tight projectsExperience in building real-time ingestion and
        processing frameworks on GCP.Adaptability to learn new technologies and
        products as the job demands.Experience in implementing Data-governance
        solutionsKnowledge in AI, ML and GEN-AI use casesMulti-cloud & hybrid
        cloud experienceAny cloud certification

        Additional Information

        Flexible vacation policy; Time is not limited, allocated, or accrued16
        paid holidays throughout the yearGenerous parental leave and new parent
        transition programTuition reimbursementCorporate gift matching program

        Career Level: Senior Associate

        Base Salary Range for the Role: 115,000-150,000 (varies depending on
        experience) The range shown represents a grouping of relevant ranges
        currently in use at Publicis Sapient. Actual range for this position may
        differ, depending on location and specific skillset required for the
        work itself.
  - source_sentence: Go-to-Market strategy, Salesforce dashboard development, SQL data analysis
    sentences:
      - >-
        experience: from patients finding clinics and making appointments, to
        checking in, to clinical documentation, and to the final bill paid by
        the patient. Our team is committed to changing healthcare for the better
        by innovating and revolutionizing on-demand healthcare for millions of
        patients across the country.


        Experity offers the following:


        Benefits  Comprehensive coverage starts first day of employment and
        includes Medical, Dental/Orthodontia, and Vision.Ownership - All Team
        Members are eligible for synthetic ownership in Experity upon one year
        of employment with real financial rewards when the company is
        successful!Employee Assistance Program - This robust program includes
        counseling, legal resolution, financial education, pet adoption
        assistance, identity theft and fraud resolution, and so much
        more.Flexibility Experity is committed to helping team members face
        the demands of juggling work, family and life-related issues by offering
        flexible work scheduling to manage your work-life balance.Paid Time Off
        (PTO) - Experity offers a generous PTO plan and increases with
        milestones to ensure our Team Members have time to recharge, relax, and
        spend time with loved ones.Career Development Experity maintains a
        learning program foundation for the company that allows Team Members to
        explore their potential and achieve their career goals.Team Building
        We bring our Team Members together when we can to strengthen the team,
        build relationships, and have fun! We even have a family company picnic
        and a holiday party.Total Compensation - Competitive pay, quarterly
        bonuses and a 401(k) retirement plan with an employer match to help you
        save for your future and ensure that you can retire with financial
        security.


        Hybrid workforce:


        Experity offers Team Members the opportunity to work remotely or in an
        office. While this position allows remote work, we require Team Members
        to live within a commutable distance from one of our locations to ensure
        you are available to come into the office as needed.


        Job Summary: 


        We are seeking a highly skilled and data-driven Go-to-Market (GTM) Data
        Analyst to join our team. The ideal candidate will be adept at
        aggregating and analyzing data from diverse sources, extracting valuable
        insights to inform strategic decisions, and proficient in building
        dynamic dashboards in Salesforce and other BI tools. Your expertise in
        SQL and data analytics will support our go-to-market strategy, optimize
        our sales funnel, and contribute to our overall success.


        Experience: 


        Bachelor’s or Master’s degree in Data Science, Computer Science,
        Information Technology, or a related field.Proven experience as a Data
        Analyst or similar role, with a strong focus on go-to-market
        strategies.Expertise in SQL and experience with database
        management.Proficiency in Salesforce and other BI tools (e.g., Tableau,
        Power BI).Strong analytical skills with the ability to collect,
        organize, analyze, and disseminate significant amounts of information
        with attention to detail and accuracy.Excellent communication and
        presentation skills, capable of conveying complex data insights in a
        clear and persuasive manner.Adept at working in fast-paced environments
        and managing multiple projects simultaneously.Familiarity with sales and
        marketing metrics, and how they impact business decisions.


        Budgeted salary range:


        $66,900 to $91,000


        Team Member Competencies:


        Understands role on the team and works to achieve goals to the best of
        your ability.Working within a team means there will be varying opinions
        and ideas. Active listening and thoughtfully responding to what your
        team member says.Take responsibility for your mistakes and look for
        solutions. Understand how your actions impact team.Provides assistance,
        information, or other support to others to build or maintain
        relationships.Maintaining a positive attitude. Tackle challenges as they
        come, and don’t let setbacks get you down.Gives honest and constructive
        feedback to other team members.When recognizing a problem, take action
        to solve it.Demonstrates and supports the organization's core values.


        Every team member exhibits our core values:


        Team FirstLift Others UpShare OpenlySet and Crush GoalsDelight the
        Client


        Our urgent care solutions include:


        Electronic Medical Records (EMR): Software that healthcare providers use
        to input patient data, such as medical history, diagnoses, treatment
        plans, medications, and test results.Patient Engagement (PE): Software
        that shows patients the wait times at various clinics, allows patients
        to reserve a spot in line if there's a wait, and book the
        appointment.Practice Management (PM): Software that the clinic front
        desk staff uses to register the patient once they arrive for their
        appointment.Billing and Revenue Cycle Management (RCM): Software that
        manages coding, billing and payer contracts for clinics so they don’t
        have to.Teleradiology: Board certified radiologist providing accurate
        and timely reads of results from X-rays, CT scans, MRIs, and
        ultrasounds, for our urgent care clients.Consulting: Consulting services
        for urgent care clinics to assist with opening, expanding and enhancing
        client's businesses
      - >-
        experience with Cloud Engineering / Services.3+ years of work experience
        as a backend software engineer in Python with exceptional software
        engineering knowledge. Experience with ML workflow orchestration tools:
        Airflow, Kubeflow etc. Advanced working knowledge of
        object-oriented/object function programming languages: Python, C/C++,
        JuliaExperience in DevOps: Jenkins/Tekton etc. Experience with cloud
        services, preferably GCP Services like Vertex AI, Cloud Function,
        BigQuery etc. Experience in container management solution: Kubernetes,
        Docker.Experience in scripting language: Bash, PowerShell etc.
        Experience with Infrastructure as code: Terraform etc.

        Skills Preferred:Master focused on Computer Science / Machine Learning
        or related field. Experience working with Google Cloud platform (GCP) -
        specifically Google Kubernetes engine, Terraform, and
        infrastructure.Experience in delivering cloud engineering
        products.Experience in programming concepts such as Paired Programming,
        Test Driven Development, etc. Understanding of MLOPs/Machine Learning
        Life Cycle and common machine learning frameworks: sklearn, TensorFlow,
        pytorch etc. is a big plus.Must be a quick learner and open to learning
        new technology. Experience applying agile practices to solution
        delivery. Experience in all phases of the development lifecycle. Must be
        team-oriented and have excellent oral and written communication skills.
        Good organizational and time-management skills. Must be a self-starter
        to understand existing bottlenecks and come up with innovative
        solutions. Knowledge of coding and software craftsmanship
        practices.Experience and good understanding of GCP processing /DevOPs/
        Machine Learning
      - >-
        Skills

         Good  banking domain  background with  Advanced SQL  knowledge is a MUST 

         Expert in Advanced Excel functions used for data analysis  Ability to Understand Physical and Logical Data Models and understanding of Data Quality Concepts. Write SQL Queries to pull/fetch data from systems/DWH Understanding of Data WareHousing concepts Understanding the Data Movement between Source and Target applications and perform data quality checks to maintain the data integrity, accuracy and consistency Experience in analysis/reconciliation of data as per the business requirements Conduct research and Analysis in order to come up with solution to business problems Understanding requirements directly from clients/ client stakeholders and writing code to extract relevant data and produce report

        Experience Required


        10-12 Years


        Roles & Responsibilities


        Interpret data, analyze results using Data Analysis techniques and
        provide ongoing reports

         Develop and implement databases, data repositories for performing analysis Acquire data from primary or secondary data sources and maintain databases/data repositories Identify, analyze, and interpret trends or patterns in complex data sets Filter and “clean” data by reviewing computer reports, printouts, and performance indicators to locate and correct code problems ; Work with management to prioritize business and information needs Locate and define new process improvement opportunities Good exposure and hands on exp with Excel features used for data analysis & reporting
  - source_sentence: >-
      Senior Data Scientist, Statistical Analysis, Data Interpretation, TS/SCI
      Clearance
    sentences:
      - >-
        Skills :8+ years of relevant experienceExperience with big data
        technology(s) or ecosystem in Hadoop, HDFS (also an understanding of
        HDFS Architecture), Hive, Map Reduce, Base - this is considering all of
        AMP datasets are in HDFS/S3Advanced SQL and SQL performance tuningStrong
        experience in Spark and Scala
      - >-
        experience, regulatory compliance & operational efficiencies, enabled by
        Google Cloud.


        This position will lead integration of core data from New North America
        Lending platforms into Data Factory (GCP BQ), and build upon the
        existing analytical data, including merging historical data from legacy
        platforms with data ingested from new platforms. To enable critical
        regulatory reporting, operational analytics, risk analytics and modeling


        Will provide overall technical guidance to implementation teams and
        oversee adherence to engineering patterns and data quality and
        compliance standards, across all data factory workstreams. Support
        business adoption of data from new platform and sunset of legacy
        platforms & technology stack.


        This position will collaborate with technical program manager, data
        platform enablement manager, analytical data domain leaders, subject
        matter experts, supplier partners, business partner and IT operations
        teams to deliver the Data integration workstream plan following agile
        framework.


        Responsibilities


        We are looking for dynamic, technical leader with prior experience of
        leading data warehouse as part of complex business & tech
        transformation. Has strong experience in Data Engineering, GCP Big
        Query, Data ETL pipelines, Data architecture, Data Governance, Data
        protection, security & compliance, and user access enablement.


        Key responsibilities -


        This role will focus on implementing data integration of new lending
        platform into Google Cloud Data Platform (Data factory), existing
        analytical domains and building new data marts, while ensuring new data
        is integrated seamlessly with historical data. Will lead a dedicated
        team of data engineers & analysts to understand and assess new data
        model and attributes, in upstream systems, and build an approach to
        integrate this data into factory.Will lead the data integration
        architecture (in collaboration with core mod platform & data factory
        architects) and designs, and solution approach for Data FactoryWill
        understand the scope of reporting for MMP (Minimal Marketable Product)
        launch & build the data marts required to enable agreed use cases for
        regulatory, analytical & operational reporting, and data required for
        Risk modeling. Will collaborate with Data Factory Analytical domain
        teams, to build new pipelines & expansion of analytical domains. Will
        lead data integration testing strategy & its execution within Data
        Factory (end-to-end, from ingestion, to analytical domains, to marts) to
        support use cases.Will be Data Factory SPOC for all Core Modernization
        program and help facilitate & prioritize backlogs of data
        workstreams.Ensure the data solutions are aligned to overall program
        goals, timing and are delivered with qualityCollaborate with program
        managers to plan iterations, backlogs and dependencies across all
        workstream to progress workstreams at required pace.Drive adoption of
        standardized architecture, design and quality assurance approaches
        across all workstreams and ensure solutions adheres to established
        standards.People leader for a team of 5+ data engineers and analysts.
        Additionally manage supplier partner team who will execute the migration
        planLead communication of status, issues & risks to key stakeholders



        Qualifications


        You'll have…..


        Bachelor’s degree in computer science or equivalent5+ years of
        experience delivering complex Data warehousing projects and leading
        teams of 10+ engineers and suppliers to build Big Data/Datawarehouse
        solutions.10+ years of experience in technical delivery of Data
        Warehouse Cloud Solutions for large companies, and business adoption of
        these platforms to build analytics , insights & modelsPrior experience
        with cloud data architecture, data modelling principles, DevOps,
        security and controls Google Cloud certified - Cloud Data Engineer
        preferred.Hands on experience of the following:Orchestration of data
        pipelines (e.g. Airflow, DBT, Dataform, Astronomer).Batch data pipelines
        (e.g. BQ SQL, Dataflow, DTS).Streaming data pipelines (e.g. Kafka,
        Pub/Sub, gsutil)Data warehousing techniques (e.g. data modelling,
        ETL/ELT).



        Even better, you may have….


        Master’s degree in- Computer science, Computer engineering, Data science
        or related fieldKnowledge of Ford credit business functional, core
        systems, data knowledge Experience in technical program management &
        delivering complex migration projects.Building high performance
        teamsManaging/or working with globally distributed teamsPrior experience
        in leveraging offshore development service providers.Experience in a
        Fintech or large manufacturing company.Very strong leadership,
        communication, organizing and problem-solving skills.Ability to
        negotiate with and influence stakeholders & drive forward strategic data
        transformation.Quick learner, self-starter, energetic leaders with drive
        to deliver results. Empathy and care for customers and teams, as a
        leader guide teams on advancement of skills, objective setting and
        performance assessments



        You may not check every box, or your experience may look a little
        different from what we've outlined, but if you think you can bring value
        to Ford Motor Company, we encourage you to apply!


        As an established global company, we offer the benefit of choice. You
        can choose what your Ford future will look like: will your story span
        the globe, or keep you close to home? Will your career be a deep dive
        into what you love, or a series of new teams and new skills? Will you be
        a leader, a changemaker, a technical expert, a culture builder...or all
        of the above? No matter what you choose, we offer a work life that works
        for you, including:


        Immediate medical, dental, and prescription drug coverageFlexible family
        care, parental leave, new parent ramp-up programs, subsidized back-up
        childcare and moreVehicle discount program for employees and family
        members, and management leasesTuition assistanceEstablished and active
        employee resource groupsPaid time off for individual and team community
        serviceA generous schedule of paid holidays, including the week between
        Christmas and New Year's DayPaid time off and the option to purchase
        additional vacation time



        For a detailed look at our benefits, click here:


        2024 New Hire Benefits Summary


        Visa sponsorship is not available for this position.


        Candidates for positions with Ford Motor Company must be legally
        authorized to work in the United States. Verification of employment
        eligibility will be required at the time of hire.


        We are
      - >-
        experience to solve some of the most challenging intelligence issues
        around data.


        Job Responsibilities & Duties


        Devise strategies for extracting meaning and value from large datasets.
        Make and communicate principled conclusions from data using elements of
        mathematics, statistics, computer science, and application specific
        knowledge. Through analytic modeling, statistical analysis, programming,
        and/or another appropriate scientific method, develop and implement
        qualitative and quantitative methods for characterizing, exploring, and
        assessing large datasets in various states of organization, cleanliness,
        and structure that account for the unique features and limitations
        inherent in data holdings. Translate practical needs and analytic
        questions related to large datasets into technical requirements and,
        conversely, assist others with drawing appropriate conclusions from the
        analysis of such data. Effectively communicate complex technical
        information to non-technical audiences.


        Minimum Qualifications


        10 years relevant experience with Bachelors in related field; or 8 years
        experience with Masters in related field; or 6 years experience with a
        Doctoral degree in a related field; or 12 years of relevant experience
        and an Associates may be considered for individuals with in-depth
        experienceDegree in an Mathematics, Applied Mathematics, Statistics,
        Applied Statistics, Machine Learning, Data Science, Operations Research,
        or Computer Science, or related field of technical
        rigorAbility/willingness to work full-time onsite in secure government
        workspacesNote: A broader range of degrees will be considered if
        accompanied by a Certificate in Data Science from an accredited
        college/university.


        Clearance Requirements


        This position requires a TS/SCI with Poly


        Looking for other great opportunities? Check out Two Six Technologies
        Opportunities for all our Company’s current openings!


        Ready to make the first move towards growing your career? If so, check
        out the Two Six Technologies Candidate Journey! This will give you
        step-by-step directions on applying, what to expect during the
        application process, information about our rich benefits and perks along
        with our most frequently asked questions. If you are undecided and would
        like to learn more about us and how we are contributing to essential
        missions, check out our Two Six Technologies News page! We share
        information about the tech world around us and how we are making an
        impact! Still have questions, no worries! You can reach us at Contact
        Two Six Technologies. We are happy to connect and cover the information
        needed to assist you in reaching your next career milestone.


        Two Six Technologies is 


        If you are an individual with a disability and would like to request
        reasonable workplace accommodation for any part of our employment
        process, please send an email to [email protected].
        Information provided will be kept confidential and used only to the
        extent required to provide needed reasonable accommodations.


        Additionally, please be advised that this business uses E-Verify in its
        hiring practices.




        By submitting the following application, I hereby certify that to the
        best of my knowledge, the information provided is true and accurate.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: SentenceTransformer based on sentence-transformers/all-distilroberta-v1
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: ai job validation
          type: ai-job-validation
        metrics:
          - type: cosine_accuracy
            value: 0.9900990128517151
            name: Cosine Accuracy
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: ai job test
          type: ai-job-test
        metrics:
          - type: cosine_accuracy
            value: 1
            name: Cosine Accuracy

SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("krshahvivek/distilroberta-ai-job-embeddings")
# Run inference
sentences = [
    'Senior Data Scientist, Statistical Analysis, Data Interpretation, TS/SCI Clearance',
    'experience to solve some of the most challenging intelligence issues around data.\n\nJob Responsibilities & Duties\n\nDevise strategies for extracting meaning and value from large datasets. Make and communicate principled conclusions from data using elements of mathematics, statistics, computer science, and application specific knowledge. Through analytic modeling, statistical analysis, programming, and/or another appropriate scientific method, develop and implement qualitative and quantitative methods for characterizing, exploring, and assessing large datasets in various states of organization, cleanliness, and structure that account for the unique features and limitations inherent in data holdings. Translate practical needs and analytic questions related to large datasets into technical requirements and, conversely, assist others with drawing appropriate conclusions from the analysis of such data. Effectively communicate complex technical information to non-technical audiences.\n\nMinimum Qualifications\n\n10 years relevant experience with Bachelors in related field; or 8 years experience with Masters in related field; or 6 years experience with a Doctoral degree in a related field; or 12 years of relevant experience and an Associates may be considered for individuals with in-depth experienceDegree in an Mathematics, Applied Mathematics, Statistics, Applied Statistics, Machine Learning, Data Science, Operations Research, or Computer Science, or related field of technical rigorAbility/willingness to work full-time onsite in secure government workspacesNote: A broader range of degrees will be considered if accompanied by a Certificate in Data Science from an accredited college/university.\n\nClearance Requirements\n\nThis position requires a TS/SCI with Poly\n\nLooking for other great opportunities? Check out Two Six Technologies Opportunities for all our Company’s current openings!\n\nReady to make the first move towards growing your career? If so, check out the Two Six Technologies Candidate Journey! This will give you step-by-step directions on applying, what to expect during the application process, information about our rich benefits and perks along with our most frequently asked questions. If you are undecided and would like to learn more about us and how we are contributing to essential missions, check out our Two Six Technologies News page! We share information about the tech world around us and how we are making an impact! Still have questions, no worries! You can reach us at Contact Two Six Technologies. We are happy to connect and cover the information needed to assist you in reaching your next career milestone.\n\nTwo Six Technologies is \n\nIf you are an individual with a disability and would like to request reasonable workplace accommodation for any part of our employment process, please send an email to [email protected]. Information provided will be kept confidential and used only to the extent required to provide needed reasonable accommodations.\n\nAdditionally, please be advised that this business uses E-Verify in its hiring practices.\n\n\n\nBy submitting the following application, I hereby certify that to the best of my knowledge, the information provided is true and accurate.',
    'Skills :8+ years of relevant experienceExperience with big data technology(s) or ecosystem in Hadoop, HDFS (also an understanding of HDFS Architecture), Hive, Map Reduce, Base - this is considering all of AMP datasets are in HDFS/S3Advanced SQL and SQL performance tuningStrong experience in Spark and Scala',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric ai-job-validation ai-job-test
cosine_accuracy 0.9901 1.0

Training Details

Training Dataset

Unnamed Dataset

  • Size: 809 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 809 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 8 tokens
    • mean: 15.02 tokens
    • max: 40 tokens
    • min: 7 tokens
    • mean: 348.14 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1
    GCP Data Engineer, BigQuery, Airflow DAG, Hadoop ecosystem requirements for our direct client, please go through the below Job Description. If you are interested please send me your updated word format resume to [email protected] and reach me @ 520-231-4672.
    Title: GCP Data EngineerLocation: Hartford, CTDuration: Full Time
    6-8 Years of experience in data extraction and creating data pipeline workflows on Bigdata (Hive, HQL/PySpark) with knowledge of Data Engineering concepts.Experience in analyzing large data sets from multiple data sources, perform validation of data.Knowledge of Hadoop eco-system components like HDFS, Spark, Hive, Sqoop.Experience writing codes in Python.Knowledge of SQL/HQL to write optimized queries.Hands on with GCP Cloud Services such as Big Query, Airflow DAG, Dataflow, Beam etc.
    Data analysis for legal documents, meticulous data entry, active Top-Secret security clearance Requirements NOTE: Applicants with an Active TS Clearance preferred Requirements * High School diploma or GED, Undergraduate degree preferred Ability to grasp and understand the organization and functions of the customer Meticulous data entry skills Excellent communication skills; oral and written Competence to review, interpret, and evaluate complex legal and non-legal documents Attention to detail and the ability to read and follow directions is extremely important Strong organizational and prioritization skills Experience with the Microsoft Office suite of applications (Excel, PowerPoint, Word) and other common software applications, to include databases, intermediate skills preferred Proven commitment and competence to provide excellent customer service; positive and flexible Ability to work in a team environment and maintain a professional dispositionThis position requires U.S. Citizenship and a 7 (or 10) year minimum background investigation ** NOTE: The 20% pay differential is d...
    Trust & Safety, Generative AI, Recommender Systems experiences achieve more in their careers. Our vision is to create economic opportunity for every member of the global workforce. Every day our members use our products to make connections, discover opportunities, build skills and gain insights. We believe amazing things happen when we work together in an environment where everyone feels a true sense of belonging, and that what matters most in a candidate is having the skills needed to succeed. It inspires us to invest in our talent and support career growth. Join us to challenge yourself with work that matters.

    Location:

    At LinkedIn, we trust each other to do our best work where it works best for us and our teams. This role offers a hybrid work option, meaning you can work from home and commute to a LinkedIn office, depending on what’s best for you and when it is important for your team to be together.

    This role is based in Sunnyvale, CA.


    Team Information:


    The mission of the Anti-Abuse AI team is to build trust in every inte...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • num_train_epochs: 2
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss ai-job-validation_cosine_accuracy ai-job-test_cosine_accuracy
-1 -1 - 0.8812 -
1.0 405 - 0.9901 -
1.2346 500 0.07 - -
2.0 810 - 0.9901 -
-1 -1 - 0.9901 1.0

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.4.1
  • Transformers: 4.48.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}