OpenAI has announced plans to increase research into “superintelligence” – a form of AI with capabilities beyond human intellect.
While AI superintelligence might seem a distant prospect, OpenAI believes its development could happen within a few years.
OpenAI says, “Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI and preventing it from going rogue.” That’s quite an ominous statement from the world’s leading AI company.
To mitigate the risks of superintelligent AI, OpenAI plans to introduce new governance structures and dedicate resources to superintelligence alignment, which seeks to align highly intelligent AIs with human principles and values. Their time frame for accomplishing this is 4 years.
In a blog post, OpenAI identified 3 main goals – developing scalable training methods, validating the resulting model, and thoroughly testing the alignment process. This includes automating the process of discovering problematic behavior in large models. To accomplish this, OpenAI suggests developing a specialized “automated alignment researcher” – an AI tasked with aligning AI.
OpenAI is also rallying a team of top-tier (human) machine learning researchers and engineers to take on this Herculean task. To assist, the company is committing 20% of its acquired compute resources over the next 4 years to tackling superintelligence alignment.
Its newly formed ‘superalignment team’ will shoulder the task, spearheaded by OpenAI co-founder and Chief Scientist Ilya Sutskever and Head of Alignment Jan Leike.
Additionally, OpenAI plans to share the results of this work with others. They also pointed out that their existing work on improving the safety of current models, like ChatGPT, and mitigating other AI risks, such as misuse, economic disruption, and disinformation, will continue.
The blog post also requests interest in their positions for research engineer, research scientist, and research manager positions. The lowest quoted salary is $245,000 annually, ranging up to $450,000.
OpenAI’s techniques for AI alignment
In a previous blog post on superintelligence, OpenAI describes 2 broad alignment techniques:
1: Training AI with human feedback: This approach is about refining AI using human responses and instructions. For example, an AI is trained to provide responses that align with both direct commands and more subtle intentions. The AI learns from explicit instructions and more implicit signals, such as ensuring truthful and safe responses. However, a human-centric approach can struggle with complex capabilities and is laborious and time-consuming.
2: Training models to assist human evaluation: The second approach recognizes that as AI becomes more sophisticated, there are tasks that it can do which are hard for humans to evaluate. So, in this approach, AI is used to perform tasks and to evaluate the quality of those tasks. For instance, an AI could be used to help check the accuracy of information, summarize lengthy documents for easier human evaluation, or even critique its own work.
The ultimate goal here is to develop AI systems that can effectively help humans evaluate complex tasks as AI intelligence pushes beyond the realms of human cognition.
OpenAI says they believe superintelligence alignment is “tractable.” While the prospect of superintelligent AI seems lightyears away, it would be unwise to assume OpenAI is just being optimistic.