tokyotech-llm

university

swallow-llm

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

Taishi-N324 authored a paper about 2 months ago

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

iwiwi authored a paper about 2 months ago

Adversarial Attacks and Defences Competition

iwiwi authored a paper about 2 months ago

Agent Skill Acquisition for Large Language Models via CycleQD

View all activity

Taishi-N324

authored a paper about 2 months ago

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Paper • 2505.02881 • Published May 5 • 4

iwiwi

authored 3 papers about 2 months ago

Adversarial Attacks and Defences Competition

Paper • 1804.00097 • Published Mar 31, 2018

Agent Skill Acquisition for Large Language Models via CycleQD

Paper • 2410.14735 • Published Oct 16, 2024 • 2

ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

Paper • 2506.09050 • Published Jun 10 • 7

kazukifujii

authored a paper 3 months ago

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Paper • 2505.02881 • Published May 5 • 4

Taishi-N324

authored a paper 4 months ago

Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models

Paper • 2503.23714 • Published Mar 31 • 1

Taishi-N324

authored 3 papers 5 months ago

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs

Paper • 2411.08719 • Published Nov 10, 2024

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

Paper • 2412.14471 • Published Dec 19, 2024

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Paper • 2503.04412 • Published Mar 6 • 4

iwiwi

authored 2 papers 5 months ago

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Paper • 2503.04412 • Published Mar 6 • 4

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Paper • 2502.19261 • Published Feb 26 • 7

Taishi-N324

authored a paper 5 months ago

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization

Paper • 2502.19261 • Published Feb 26 • 7

nokazaki

authored a paper 6 months ago

ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability

Paper • 2502.11336 • Published Feb 17

iwiwi

authored a paper 6 months ago

TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models

Paper • 2501.16937 • Published Jan 28 • 7

aya-se

authored 2 papers 9 months ago

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

Paper • 2404.17790 • Published Apr 27, 2024 • 5

Building a Large Japanese Web Corpus for Large Language Models

Paper • 2404.17733 • Published Apr 27, 2024 • 4

Taishi-N324

authored a paper 9 months ago

Agent Skill Acquisition for Large Language Models via CycleQD

Paper • 2410.14735 • Published Oct 16, 2024 • 2

kazukifujii

authored 3 papers about 1 year ago

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Paper • 2407.03963 • Published Jul 4, 2024 • 19

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

Paper • 2404.17790 • Published Apr 27, 2024 • 5

Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese

Paper • 2404.07824 • Published Apr 11, 2024 • 3