‘Poisoned’ data could wreck AIs in wartime

“Any commercial LLM that is out there, that is learning from the internet, is poisoned today,” Jennifer Swanson said, “but our main concern [is] those algorithms that are going to be informing battlefield decisions.”

Even as the Pentagon makes big bets on big data and artificial intelligence, the Army’s software acquisition chief is raising a new warning that adversaries could “poison” the well of data from which AI drinks, subtly sabotaging algorithms the US will use in future conflicts.

The fundamental problem is that every machine-learning algorithm has to be trained on data — lots and lots of data. The Pentagon is making a tremendous effort to collect, collate, curate, and clean its data so analytic algorithms and infant AIs can make sense of it. In particular, the prep team needs to throw out any erroneous datapoints before the algorithm can learn the wrong thing.

Commercial chatbots from 2016’s Microsoft Tay to 2023’s ChatGPT are notorious for sucking up misinformation and racism along with all the other internet content they consume. But what’s worse, Swanson argued, is that the military’s own training data might be deliberately targeted by an adversary – a technique known as “data poisoning.”

Breaking Defense - More Here

Think of it this way; if AI is learning from Campfire Forum posts, it going to be pretty messed-up, huh.

Imagine the 24hourcampfire as the pool of data that an AI learns from. Imagine the 5 worst users. I won't name names. Everyone has their own tastes. Imagine that every question you'd ask of that AI is somehow influenced by those users.

Imagine that every question you ask it, gets the response: "You Suck" or "Hint" or "You're retarded" "It's Trumps fault." or just "GFY."

That's the dilemma in a nutshell.

Sadly, of the 5 worst users I can imagine, you could probably write a bot that produced random "GFY" comments and they'd be as entertaining.

Would it be too complicated to separate defense computer networks from the global networks?

Just asking for a friend.

Military networks are the most protected networks in existence obviously. Any military AI is firewalled to the max.

This is just more scary talk about AI. Democrats are terrified of AI as they know the technology will displace 99% of their voting block.

Originally Posted by STRSWilson

This is just more scary talk about AI.

No it's not.

AI is already poisoned by Woke.

Originally Posted by DigitalDan

Would it be too complicated to separate defense computer networks from the global networks?

Internal sabotage.

NOTHING is SECURE.

Go write that on the Blackboard 500 times before you go home today.

Given the massive amounts of data collected, the poison data would have to be massive as well and likely that's rather noticeable at scale.

Be like adding salt to Lake Superior - you're going to notice it long before its enough to make it salty.

Originally Posted by Teal

Given the massive amounts of data collected, the poison data would have to be massive as well and likely that's rather noticeable at scale.

I wouldn't be too sure about that.

Here is probably a silly example:

Designing a Smart Bomb with AI. You don't need or want to sweep the entire internet. So your sweep is targeted to likely areas. Bad Guys site up a fake site "Smart Bombs for Dummies" filled with bullchit and the AI sweep picks it up. They would also ignore Face Book for, well any kind of sweep on anything.

Originally Posted by Teal

Given the massive amounts of data collected, the poison data would have to be massive as well and likely that's rather noticeable at scale.

Be like adding salt to Lake Superior - you're going to notice it long before its enough to make it salty.

And data quality is assessed virtually as much as the AI engine to ensure accuracy. It's critical in M&S and one of the reasons that global climate models vary so widely and one of the reasons "they" alter the data so often in the name of "data normalization".

Originally Posted by SupFoo

Originally Posted by Teal

Given the massive amounts of data collected, the poison data would have to be massive as well and likely that's rather noticeable at scale.

I wouldn't be too sure about that.

Here is probably a silly example:

Designing a Smart Bomb with AI. You don't need or want to sweep the entire internet. So your sweep is targeted to likely areas. Bad Guys site up a fake site "Smart Bombs for Dummies" filled with bullchit and the AI sweep picks it up. They would also ignore Face Book for, well any kind of sweep on anything.

AI doesn't create - it predicts based on data according to how it's been told/trained to interpret said data.

The point of AI is to let it sweep massive amounts of data at a speed humans can't and pick the fly crap out of the pepper so to speak and create a predictive model or outcome based upon that. Attended an excellent talk by Christoph Burkhart about this just last Thursday.

My company uses both AI and ML to process massive amounts of email data and attached documents in organizations and create outcomes but it does NOT discount any email, it looks at all of them to get the data it needs (or not) to create the predictive outcome desired.

Opinions like these are always based around the author not liking the conclusion an AI comes to given whatever data it is fed. Essentially the same thing they apply to other people, if they don't like your conclusion they say you are spreading misinformation and call you a racist or some sort of phobe.

But who are they to question any of this? If an AI is fed an enormous amount of data and it comes to the conclusion there are differences between the races, maybe there are differences between the races.

[Linked Image from i.ibb.co]

Originally Posted by Teal

[quote=SupFoo]My company uses both AI and ML to process massive amounts of email data and attached documents in organizations and create outcomes but it does NOT discount any email, it looks at all of them to get the data it needs (or not) to create the predictive outcome desired.

So your sweep IS targeted. It's looking at email only. All in your case. You could target specific (criteria) email if you chose to.

Targeting will have to be used in many (most?) cases for AI. Sweeping the entire internet on any issue would take too much time and horsepower. e.g. Why would you include NK in a sweep about the "Virtues of Capitalism?"

Originally Posted by SupFoo

Originally Posted by Teal

[quote=SupFoo]My company uses both AI and ML to process massive amounts of email data and attached documents in organizations and create outcomes but it does NOT discount any email, it looks at all of them to get the data it needs (or not) to create the predictive outcome desired.

So your sweep IS targeted. It's looking at email only. All in your case. You could target specific (criteria) email if you chose to.

Targeting will have to be used in many (most?) cases for AI. Sweeping the entire internet on any issue would take too much time and horsepower. e.g. Why would you include NK in a sweep about the "Virtues of Capitalism?"

The data being swept or targeted is massive in scale on purpose otherwise you'd not need AI to do it. It's a tool to create efficiency. One that's unneeded at small data set levels.
Sweep the email not a singular spreadsheet. Email being more volume than the spreadsheets. Emails with attachments have the attachments also crunched and sorted - per expected and programmed results.

In my example - using AI to sweep Lake Superior, not the pond behind your house. It's also extremely expensive which is why it's used on the largest of problems not the smallest - so again, you need to poison the largest of data sets to skew them which requires large amounts of bad data that also gets past the data validation in AI.

Data poisoning... Non-whites being labeled as Whites during booking and intake. News reports decrying the rise of Christian Nationalism. Claiming that men can babies. "Mostly peaceful" protests. January 6th "Insurrection". Disinformation can be just as harmful as intentional misinformation from an enemy when feeding your LLM...

Fully trust nothing of what you see or hear unless in person. Then retain some skepticism. Cops know all about "witnesses".

One could ďevelop an accurate map of the bad players scanning emails from Clintons, Podesta, etc, etc.