Min-K% Prob: Detecting if text is used to train LLM
Min-K% Prob: Detecting whether text was used to train LLM
Min-K% Prob: A method for detecting whether a piece of text was used to train a large model.
The Min-K% Prob method does not require any knowledge of the pre-training corpus or any additional training. It determines whether a given text is part of the model’s pre-training data by calculating the average probability of anomaly markers in that text.
Simply put, this method looks at which words in a piece of text are “uncommon”. If these “uncommon” words appear less frequently in the model, it may prove that the text has been learned by the model.
Main principles of Min-K% Prob
:
In large language models, each word has a “probability” — that is, the likelihood of the word appearing. This probability is learned during the training process of the model.
Common words vs uncommon words:
Common words: Words like “and”, “is”, and “the” appear in almost all texts, so the model will give them a high probability.
Uncommon words: Words like technical terms or rare names do not appear in most texts, so the model will give them a lower probability.
How to tell whether the model has “learned” a certain text?
When we use the Min-K% Prob method to examine a specific text, we mainly focus on the probability of “uncommon words” in this text.
If the probability of these “uncommon words” is relatively high in the model, then it is possible that this text has been learned by the model. Because the model has learned this text, it knows more about these “uncommon words” and will give them a relatively high probability.
On the other hand, if the probability of these “uncommon words” is relatively low in the model, then this text may not have been learned by the model.
Simply put, if a word has a higher probability in the model, it probably means the model has seen that word in the training data and therefore understands it better. This is why by examining the probabilities of “uncommon words” we can have a certain degree of confidence in whether the model has learned about a particular text.
What’s the use of Min-K% Prob?
1. Detect contamination: Sometimes, the model may learn something it should not learn, such as exam questions or copyrighted content. This method can help us identify these problems.
2. Protect privacy: This method can also come in handy if you want to know whether a model has learned data that contains personal information.
3. Copyright issues: This method can also be used to detect whether the model has learned copyrighted texts, such as novels or articles.
Project address: https://t.co/24jcNAiiPK
Paper: https://t.co/fXBzlpgT36
GitHub: https://t.co/PBtGUdgHvW
Dynamic benchmark WikiMIA:
The researchers also introduced a dynamic benchmark called WikiMIA. WikiMIA uses data created before and after model training to support gold standard detection.
Simply put, it provides a way to verify the accuracy and validity of the Min-K% Prob method (or other similar methods).
WikiMIA dataset: huggingface.co/datasets/swj04