I've been dedicating considerable effort to neural architecture design, and while I acknowledge the value in current innovations, I believe there's room for strategic contemplation before delving further into this field.
One aspect I'd like to stress is the potential for focusing on generative models rather than discriminative ones. Despite the profound developments in architecture design for tasks like ImageNet classification, the time seems ripe to pivot towards generative models. This shift is particularly compelling in light of advancements in technologies like ChatGPT and Sora. Persisting with incremental improvements on ImageNet classification may not yield significant breakthroughs or be as engaging. Therefore, I advocate for exploring generative models more deeply.
Another area for reflection is the next-token prediction framework, commonly associated with models like the Transformer. It's crucial to assess whether this approach is the zenith of our aspirations or if we should strive for an alternative paradigm. The Transformer, despite its strengths, is not without flaws. There's an opportunity to conceive a more efficient framework that either addresses these shortcomings or ventures beyond the Transformer's limitations, especially in modeling sequential dependencies. Moreover, the exclusivity of next-token prediction as a pathway to artificial general intelligence (AGI) remains a topic of debate. It may be beneficial for researchers to explore novel paradigms that diverge from next-token prediction, as such explorations could be promising.
While today I won't talk about specifics on devising a groundbreaking generative architecture, I encourage fellow researchers to ponder these considerations. There's a wealth of potential in challenging the status quo and innovating beyond current methodologies.