Code-Lens: On the Latent Language of Code in Code Language Models
In this work, we use the logit lens1 rather than the tuned lens2. The tuned lens would undermine our goal of understanding whether the models, when prompted with X
, take a detour through Y
internal states before outputting the X
text. Since the tuned lens is specifically trained to map internal states to the final X
next-token prediction, it eliminates our signal of interest.
Consider a pre-LayerNorm transformer model
- The very first embedding vectors are just the input tokens.
- The very last embedding vectors are just the output logits.
What about the embedding vectors of the
-
$\mathcal{M}_{\leq \ell}$ : This portion includes all layers up to and including layer$l$ , which maps input tokens to hidden states. -
$\mathcal{M}_{>\ell}$ : This portion encompasses all layers after$l$ , which convert hidden states into logits.
The update mechanism for a transformer layer at index
where
where
Then, softmax is applied to these logits to calculate the probabilities of the next output token
We use beam search to select
We included keywords and builtins for different programming languages in the code_lens/utils/keywords.
Builtins include: primitive types, macros, modules, collections, containers, and builtin functions, excluding keywords.