OpenAI developed a model called CriticGPT specifically for finding errors in ChatGPT

7 min readJul 16, 2024

OpenAI has developed a model called CriticGPT, which is based on GPT-4 and is used to help find errors in the code generated by ChatGPT. CriticGPT will criticize ChatGPT’s answers and point out the errors in them, which can help human trainers find problems more effectively and improve accuracy when performing reinforcement learning from human feedback (RLHF). By integrating CriticGPT into the RLHF annotation process, the accuracy and comprehensiveness of AI-generated criticism can be enhanced, thereby improving the overall performance and alignment of ChatGPT.

Limitations of human evaluation : Current LLM models such as ChatGPT, when generating complex code, make it difficult for even experienced experts to reliably evaluate the quality and correctness of their output. CriticGPT overcomes the limitations of human evaluation by training the model to generate natural language comments, helping humans to evaluate code more accurately.
Improved error detection : Model-generated code often contains errors that may not be discovered by human evaluators. CriticGPT performs well in detecting these errors, and the study found that it detected more errors than human contractors.
Reducing bias and hallucinations : Although CriticGPT is also prone to hallucination errors, by collaborating with humans (i.e., human-machine teaming), it is able to significantly reduce these hallucinations while maintaining efficient error detection capabilities.

The study found that with the help of CriticGPT, people reviewed ChatGPT code 60% better than without help.

Key features of CriticGPT

1. Error Detection

CriticGPT is able to identify a variety of errors in the code, including syntax errors, logic errors, and security vulnerabilities. By comprehensively analyzing the code, it can generate reviews that include all obvious and serious errors and ensure that no important issues are missed. At the same time, CriticGPT avoids hallucination errors and unnecessary picky questions when generating reviews.

Functional description :

CriticGPT is trained to identify various errors in ChatGPT-generated code, including syntax errors, logic errors, and functional errors.
It can automatically scan the code, find parts that do not meet expectations or have potential problems, and mark them out.

Application examples :

During the code generation or review process, CriticGPT can be used as an auxiliary tool to quickly mark errors that need attention and correction.

2. Critical review generation

CriticGPT takes a code snippet and a description of its intended functionality and generates detailed natural language reviews. These reviews point out potential errors in the code and provide suggestions for improvements. For example, it might point out a security vulnerability in a code snippet and suggest a safer approach.

Functional description :

CriticGPT not only identifies errors, but is also able to generate detailed critical comments explaining the nature of the errors and possible impact.
The comments include the specific location of the error, the type of error, why it occurred, and possible correction suggestions.

Application examples :

In code review meetings, CriticGPT provides detailed error explanations and improvement suggestions to help team members better understand and solve problems.

Tired of Cyber Threats? Protect your business with us

3. Enhance training effect

Functional description :

By collaborating with CriticGPT, trainers are able to generate more comprehensive reviews than when working alone. The study showed that the quality and coverage of reviews improved when humans worked with CriticGPT.
CriticGPT can augment the trainer’s ability to more effectively identify and correct complex problems.

Application examples :

During AI model training, trainers use CriticGPT’s feedback to improve the efficiency of reviewing and improving model-generated content.

4. Reduce false errors

CriticGPT uses a forced sampling beam search (FSBS) strategy when generating reviews, generating multiple reviews through forced sampling and selecting the review with the highest score. This method ensures that the generated reviews are comprehensive and reduce hallucination errors, significantly improving the quality and accuracy of the reviews.

Functional description :

When generating critical reviews, CriticGPT can reduce the annotation of “false errors” that do not exist. Compared with independent models, CriticGPT’s reviews are more accurate and avoid unnecessary interference.

Application examples :

In daily code maintenance and optimization, CriticGPT provides accurate error reports, reducing the time developers waste on irrelevant issues.

5. Model training and optimization

The reviews generated by CriticGPT are evaluated and compared based on their comprehensiveness, error inclusion rate, frequency of hallucinations and nitpicking, and overall subjective usefulness. Through these evaluation indicators, it is possible to determine which reviews are most helpful in discovering and solving problems, thereby continuously optimizing and improving the performance of the model.

Functional description :

CriticGPT itself is trained via RLHF (Reinforcement Learning from Human Feedback), specifically by processing inputs containing errors and generating critical feedback.
During training, human trainers insert intentional errors for CriticGPT to identify and criticize, which helps the model learn how to accurately point out and explain errors.

Application examples :

When developing a new version of an AI model, we use CriticGPT for internal testing and optimization to ensure that the model has as few errors as possible before it is officially released.

6. Accurate search and evaluation

Functional description :

CriticGPT uses a precise search and evaluation mechanism to balance the relationship between actively finding problems and reducing false positives during testing. This makes the generated reviews more targeted and practical.

Application examples :

In large projects, CriticGPT is used to conduct a comprehensive code review, provide detailed and accurate error reports, and ensure the smooth progress of the project.

Losing Sleep Over Data Loss? Secure your future with our servers

7. Human-AI Collaborative Augmentation

CriticGPT can be used as an auxiliary tool to pre-fill initial comments during the evaluation process, helping human evaluators identify problems faster and more accurately. By working in collaboration with human evaluators, CriticGPT is able to generate more comprehensive reviews and reduce hallucinations and nitpicking issues, significantly improving evaluation efficiency and accuracy.

Functional description :

One of the original intentions of CriticGPT’s design is to enhance the collaboration between humans and AI, by using AI to assist human trainers, making them more efficient and accurate in evaluating and correcting AI outputs.

Application examples :

When educating and training novice programmers, CriticGPT provides real-time feedback and suggestions to improve learning outcomes and code quality.

Technical method

CriticGPT is also trained through reinforcement learning from human feedback (RLHF), but its training content mainly contains wrong inputs, which are specifically used for critical evaluation. Researchers will artificially insert errors in the code generated by ChatGPT, and then write feedback on these wrong codes to train CriticGPT. Experimental results show that in 63% of cases, trainers prefer the criticism provided by CriticGPT to that of ChatGPT. This is partly because CriticGPT reduces the “nitpicking” of details and the number of fictitious questions, making the criticism more effective and accurate.

1. RLHF Training :

CriticGPT is similar to ChatGPT and is trained via reinforcement learning from human feedback (RLHF).
During training, CriticGPT mainly processes inputs containing errors and performs critical evaluation.

2. Error insert :

The researchers manually inserted errors into the code generated by ChatGPT to create input data containing errors.
Human trainers write feedback on these bug-inserted codes just as if they had discovered the bugs.

3. Criticism example comparison :

The same human trainer compares multiple criticism examples to determine which ones accurately capture the inserted errors.
This approach ensures that the trainer can identify and assess the validity of different criticisms.

4. Naturally occurring errors and inserted errors :

The study also examined whether CriticGPT could capture both inserted errors and naturally occurring errors that had been previously discovered by trainers.
The results show that CriticGPT’s criticism of naturally occurring errors is considered better than ChatGPT in 63% of cases, mainly because CriticGPT reduces unnecessary “minority” criticisms and fictitious questions.

5. Search and Balance :

During testing, an additional search mechanism is used to search the critic reward model to balance the aggressiveness and precision of detecting errors.
This search process allows for a trade-off between reducing the number of fabrication errors and detection errors, generating criticisms that are as helpful to the RLHF as possible.

Experimental Results

1. Criticism Preference :

In experiments, when trainers used criticisms provided by CriticGPT, they preferred CriticGPT’s criticisms over ChatGPT’s 63% of the time.
This preference is mainly because CriticGPT reduces unnecessary “niggling” nitpicking (i.e., insignificant criticisms of minor details) and imaginary questions (i.e., errors imagined by the model).

2. Results of human and CriticGPT teams :

In the experiment, a second randomly selected trainer preferred criticisms from the combined human and CriticGPT team over criticisms from humans alone more than 60% of the time.
This result shows that CriticGPT can significantly improve the performance of human trainers when helping them with criticism.

3. Reduce fictional errors and nitpicking :

The number of fabricated errors and “minor nitpicking” nitpicking in CriticGPT’s criticisms has been significantly reduced, which has improved the quality and effectiveness of criticisms.
This approach reduces fictional errors and nitpicking, making criticism more precise and helping trainers better identify real problems.

Original article: https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/

Paper: https://cdn.openai.com/llm-critics-help-catch-llm-bugs-paper.pdf