Emil Sit
emilsit@discuss.systems

Can someone point me to details of the Code Llama training set? Curious about the licenses of the code, recency & distribution of input programming languages. The research paper only says: “Code Llama is trained predominantly on a near-deduplicated dataset of publicly available code.” ai.meta.com/blog/code-llama-la

August 25, 2023