Yesterday, OpenAI unveiled the new GPT-4 model, which “exhibits human-level performance on various professional and academic benchmarks” and “often outscores the vast majority of human test takers”. Accompanying the model release, OpenAI published a technical report that delved into the intricacies of the model development. Specifically the report features several case studies conducted by experts possessing diverse domain knowledge, including chemistry. Although the tests were geared towards AI safety, they also provided unique insight into the capability of GPT-4. So what does GPT-4 hold for the future of chemistry research and development (R&D)?
GPT-4 introduces a host of new capabilities and improvements. Here are some of the key advancements that I think are particularly relevant to R&D:
- Image-based input: In addition to processing text, GPT-4 can now interpret images as inputs. This functionality significantly broadens the model's potential use cases in a wider range of wet-lab operations and result analyses.
- Expanded prompt size: GPT-4 supports input text of up to 25,000 words — an increase from the previous 8,000-word limit. This allows users to provide more detailed context and instructions, enabling the model to tackle more complex tasks and nuanced scenarios.
- Advanced reasoning: GPT-4 can more effectively use “reasoning” to break down complex tasks, such as mathematical problems, into multiple steps and progressively construct solutions. This further bolsters GPT-4's potential for addressing more sophisticated R&D use cases.
A comprehensive evaluation of GPT-4's capabilities and their impact on Chemistry R&D will take time. Nevertheless, the case studies presented in the technical report shed light on promising future prospects. These insights suggest a vision of an end-to-end automated discovery process that encompasses everything from literature search and review, to hypothesis generation, novelty assessment, experimental design, and result analysis.
For example, the image-based analysis functionality enables GPT-4 to interpret information from data visualization and conduct step-by-step reasoning for data analysis. This is exemplified in the “Chart Reasoning” case study (GPT-4 Technical Report, Page 32).
Even more remarkable, GPT-4's image processing capability can be directly applied to entire documents without the need for an OCR (Optical Character Recognition) step. As illustrated in the “Pixel to Paper” case, the model can perform summarization of the complete content while also explaining specific details found in figures (GPT-4 Technical Report, Page 35). However, it is important to note that, according to Prof. Andrew White who participated in GPT-4 development, the model has not yet achieved the ability to recognize chemical structures.
In another example, the “Chemical Compound Similarity and Purchase Tool Use” study led by Prof. Andrew White showcases a hypothetical, end-to-end drug discovery process. By integrating GPT-4 with external tools, an automated workflow was established to identify analogs to an existing drug molecule, screen their efficacy and patentability, and ultimately purchase (or request custom synthesis of) the selected chemical for evaluation. (GPT-4 Technical Report, Page 59-60)
To summarize, the new GPT-4 model holds immense potential for revolutionizing chemistry R&D. Its advanced capabilities may pave the way for a future where end-to-end automated discovery processes become a reality. More detailed information regarding the test cases can be found in the GPT-4 technical report: https://cdn.openai.com/papers/gpt-4.pdf.
- Language ModelGPT-4Chemistry R&D