SAE Visualizer

Index

EXPLANATIONS

Method 1 works: all MAX_ACTIVATING_TOKENS involve the token “nob” attending to itself, indicating the neuron fires on the “nob” (nobility) token. [nob] => [nob]

INFERENCE

IFRAME INTEGRATION

Direct URL

Iframe Code

<iframe src="http://localhost:3000/embed/dictionaries/qwen3-1.7b-lorsa-8x-topk64-layer0/features/46" width="100%" height="800" frameborder="0"></iframe>

NEGATIVE LOGITS

olis-17.875

.Option-15.938

陀-15.813

_amen-15.625

責-15.438

镐-15.438

_partisan-15.250

_Pradesh-15.250

than-15.125

ój-15.125

POSITIVE LOGITS

素14.000

暂14.000

_Dutch13.875

chw13.813

_lut13.750

ivan13.625

连13.625

_illeg13.500

_noble13.188

典型13.125

ACTIVATION TIMES 0.0157%

Activation histogram is not available.

Logits histogram is not available.

TOPACTIVATIONS

SUBSAMPLING80%

SUBSAMPLING60%

SUBSAMPLING40%

NONACTIVATING