

And the worst part is, I’m not even sure if they believe it, or if they’re just lying to try and pump the value of the coins they’re investing in that claim to be capable of doing that in the future.
And honestly, I don’t know which I dislike more. Deliberate ignorance, or actual stupidity.
While true, it doesn’t keep you safe from sleeper agent attacks.
These can essentially allow the creator of your model to inject (seamlessly, undetectably until the desired response is triggered) behaviors into a model that will only trigger when given a specific prompt, or when a certain condition is met. (such as a date in time having passed)
https://arxiv.org/pdf/2401.05566
It’s obviously not as likely as a company simply tweaking their models when they feel like it, and it prevents them from changing anything on the fly after the training is complete and the model is distributed, (although I could see a model designed to pull from the internet being given a vulnerability where it queries a specific URL on the company’s servers that can then be updated with any given additional payload) but I personally think we’ll see vulnerabilities like this become evident over time, as I have no doubts it will become a target, especially for nation state actors, to simply slip some faulty data into training datasets or fine-tuning processes that get picked up by many models.