• nitroemdash@lemmy.wtf
    link
    fedilink
    English
    arrow-up
    15
    ·
    12 hours ago

    Tokens are well-defined groups of bytes ranged by frequency of occurrence in texts to efficiently translate them into a sequence of 32 or 64-bit binary integers, an LLM-optimised form if compression. They are well-known, you can play with them here: https://gpt-tokenizer.dev/