next-token prediction

  • Reevaluating LLMs: Prediction vs. Reasoning


    "Next token prediction is not real reasoning"The argument that large language models (LLMs) merely predict the next token in a sequence without engaging in real reasoning is challenged by questioning if human cognition might operate in a similar manner. The focus should not be on the method of next-token prediction itself, but rather on the complexity and structure of the internal processes that drive it. If the system behind token selection is sophisticated enough, it could be considered a form of reasoning. The debate highlights the need to reconsider what constitutes intelligence and reasoning, suggesting that the internal processes are more crucial than the sequential output of tokens. This matters because it challenges our understanding of both artificial intelligence and human cognition, potentially reshaping how we define intelligence.

    Read Full Article: Reevaluating LLMs: Prediction vs. Reasoning

  • End-to-End Test-Time Training for Long Context


    [R] End-to-End Test-Time Training for Long ContextLong-context language modeling is approached as a continual learning problem, utilizing a standard Transformer architecture with sliding-window attention. The model continues to learn during test time by predicting the next token based on the given context, effectively compressing the context into its weights. By employing meta-learning during training, the model's initialization is enhanced for learning at test time. This End-to-End Test-Time Training (TTT-E2E) method demonstrates scalability similar to full attention Transformers while maintaining constant inference latency, offering a significant speed advantage. This development is crucial as it provides a more efficient approach to handling long-context language tasks, improving both performance and speed.

    Read Full Article: End-to-End Test-Time Training for Long Context