Skip to content
@redwoodresearch

Redwood Research

Popular repositories Loading

  1. Easy-Transformer Easy-Transformer Public

    Forked from TransformerLensOrg/TransformerLens

    Python 122 18

  2. mlab mlab Public

    Machine Learning for Alignment Bootcamp

    Jupyter Notebook 78 42

  3. alignment_faking_public alignment_faking_public Public

    Forked from rgreenblatt/model_organism_public

    Python 73 14

  4. rust_circuit_public rust_circuit_public Public

    Rust 63 2

  5. Text-Steganography-Benchmark Text-Steganography-Benchmark Public

    Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.

    Python 21 4

  6. remix_public remix_public Public

    Python 18 3

Repositories

Showing 10 of 20 repositories
  • dev-af Public

    Alignment Faking Model Organism Finetuning and Evaluation Utils

    redwoodresearch/dev-af’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Sep 5, 2025
  • ctrlz-traj-viewer Public

    Better traj viewer for ctrlz

    redwoodresearch/ctrlz-traj-viewer’s past year of commit activity
    TypeScript 0 0 0 1 Updated Sep 1, 2025
  • bench-af Public

    Bench Alignment Faking: Alignment faking model organisms, detectors, and environments to catch misaligned models (Joshua Clymer MATS Stream Summer 2025)

    redwoodresearch/bench-af’s past year of commit activity
    Jupyter Notebook 4 3 4 3 Updated Aug 23, 2025
  • redwood-control-arena Public Forked from UKGovernmentBEIS/control-arena

    (Fork of ControlArena for Redwood Research's Control purposes)

    redwoodresearch/redwood-control-arena’s past year of commit activity
    JavaScript 0 MIT 63 0 0 Updated Aug 18, 2025
  • redwoodresearch/subversion-strategy-eval’s past year of commit activity
    Python 9 MIT 1 0 0 Updated Jun 2, 2025
  • redwoodresearch/alignment_faking_public’s past year of commit activity
    Python 73 MIT 16 0 1 Updated May 24, 2025
  • apps-monitor-control-eval Public

    A repo to evaluate the performance of different monitors and attack policies

    redwoodresearch/apps-monitor-control-eval’s past year of commit activity
    Jupyter Notebook 1 1 0 0 Updated Feb 5, 2025
  • redwoodresearch/Easy-Transformer’s past year of commit activity
    Python 122 MIT 440 0 0 Updated Aug 4, 2024
  • Text-Steganography-Benchmark Public

    Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.

    redwoodresearch/Text-Steganography-Benchmark’s past year of commit activity
    Python 21 4 0 0 Updated Jan 26, 2024
  • redwoodresearch/Gradient-Machine’s past year of commit activity
    Python 0 1 0 0 Updated Oct 20, 2023

Most used topics

Loading…