Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
admarcosai 's Collections
Function Calling Datasets
Pending 2
Pending Papers
Architectures
LLM x Finance
HCI
Position Papers
Coding
Reasoning | Planning
Alignment: FineTuning-Preference
Data Efficiency
Survey
Efficient Inference
LLM x GRAPHS
AI x GAMES
Benchmarks
Libraries and Framworks
Agentics
Preference Dataset
QA Dataset
Coding Dataset
LLM Evaluation
Function Calling Dataset
Conversation
Alignment
Model Architectures
LLM x RL
Serving
Datasets
LLM x RAG
LMMM
LLM Pretraining
Models
Self-Learning AI
LLM-Security
XAI
MultiLingual
Efficient-Continuous Training
ParadigmShift-Inquiry
Sparsity
Math Datasets
AI UX
Parallellism
InContext Learning
Efficient Training
LLM x Symbolics
Long Context
Tool Use | Function Calling
Quantization | Compression
Regulation
LLM | Writing
Math
LLM x Animation
3D Generation
Memory
Modality: Video
3D - AI
Mambas and LLM-AltArch
World Models

LLM-Security

updated Jan 13, 2024
Upvote
-

  • Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

    Paper • 2401.05566 • Published Jan 10, 2024 • 30
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs