What are the AI Vulnerabilities We Need to Worry About

Episode Summary

Timothy De Block sits down with Keith Hoodlet, Security Researcher and founder of Securing.dev, to navigate the chaotic and rapidly evolving landscape of AI security.

They discuss why "learning" is the only vital skill left in security, how Large Language Models (LLMs) actually work (and how to break them), and the terrifying rise of AI Agents that can access your email and bank accounts. Keith explains the difference between inherent AI vulnerabilities—like model inversion—and the reckless implementation of AI agents that leads to "free DoorDash" exploits. They also dive into the existential risks of disinformation, where bots manipulate human outrage and poison the very data future models will train on.

Key Topics

  • Learning in the AI Era:

    • The "Zero to Hero" approach: How Keith uses tools like Claude to generate comprehensive learning plans and documentation for his team.

    • Why accessible tools like YouTube and AI make learning technical concepts easier than ever.

  • Understanding the "Black Box":

    • How LLMs Work: Keith breaks down LLMs as a "four-dimensional array of numbers" (weights) where words are converted into tokens and calculated against training data. * Open Weights: The ability for users to manipulate these weights to reinforce specific data (e.g., European history vs. Asian Pacific history).

  • AI Vulnerabilities vs. Attacks:

    • Prompt Injection: "Social engineering" the chatbot to perform unintended actions.

    • Membership Inference: Determining if specific data (like yours) is in a training set, which has massive implications for GDPR and the "right to be forgotten".

    • Model Inversion: Stealing weights and training data. Keith cites speculation that Chinese espionage used this technique to "shortcut" their own model training using US labs' data.

    • Evasion Attacks: A technique rather than a vulnerability. Example: Jason Haddix bypassing filters to generate an image of Donald Duck smoking a cigar by describing the attributes rather than naming the character.

  • The "Agent" Threat:

    • Running with Katanas: Giving AI agents access to browsers, file systems (~/.ssh), and payment methods is a massive security risk.

    • The DoorDash Exploit: A real-world example where a user tricked a friend's email-connected AI bot into ordering them free lunch for a week.

  • Supply Chain & Disinformation:

    • Hallucination Squatting: AI generating code that pulls from non-existent packages, which attackers can then register to inject malware.

    • The Cracker Barrel Outrage: How a bot-driven disinformation campaign manufactured fake outrage over a logo change, fooling a major company and the news media.

    • Data Poisoning: The "Russian Pravda network" seeding false information to shape the training data of future US models.

Memorable Quotes

  • "It’s like we’re running with... not just scissors, we’re running with katanas. And the ground that we're on is constantly changing underneath our feet." — Keith Hoodlet

  • "We never should have taught runes to sand and allowed it to think." — Keith Hoodlet

  • "The biggest bombshell here is that we are the vulnerability. Because we're going to get manipulated by AI in some form or fashion." — Timothy De Block

Resources Mentioned

Books:

Videos & Articles:

About the Guest

Keith Hoodlet is a Security Researcher at Trail of Bits and the creator of Securing.dev. A self-described "technologist who wants to move to the woods," Keith specializes in application security, threat modeling, and deciphering the complex intersection of code and human behavior.

Support the Podcast:

Enjoyed this episode? Leave us a review and share it with your network! Subscribe for more insightful discussions on information security and privacy.

Contact Information:

Leave a comment below or reach out via the contact form on the site, email timothy.deblock[@]exploresec[.]com, or reach out on LinkedIn.

Check out our services page and reach out if you see any services that fit your needs.

Social Media Links:

[RSS Feed] [iTunes] [LinkedIn][YouTube]

Exploring AI, APIs, and the Social Engineering of LLMs

Summary:

Timothy De Block is joined by Keith Hoodlet, Engineering Director at Trail of Bits, for a fascinating, in-depth look at AI red teaming and the security challenges posed by Large Language Models (LLMs). They discuss how prompt injection is effectively a new form of social engineering against machines, exploiting the training data's inherent human biases and logical flaws. Keith breaks down the mechanics of LLM inference, the rise of middleware for AI security, and cutting-edge attacks using everything from emojis and bad grammar to weaponized image scaling. The episode stresses that the fundamental solutions—logging, monitoring, and robust security design—are simply timeless principles being applied to a terrifyingly fast-moving frontier.

Key Takeaways

The Prompt Injection Threat

  • Social Engineering the AI: Prompt injection works by exploiting the LLM's vast training data, which includes all of human history in digital format, including movies and fiction. Attackers use techniques that mirror social engineering to trick the model into doing something it's not supposed to, such as a customer service chatbot issuing an unauthorized refund.

  • Business Logic Flaws: Successful prompt injections are often tied to business logic flaws or a lack of proper checks and guardrails, similar to vulnerabilities seen in traditional applications and APIs.

  • Novel Attack Vectors: Attackers are finding creative ways to bypass guardrails:

    • Image Scaling: Trail of Bits discovered how to weaponize image scaling to hide prompt injections within images that appear benign to the user, but which pop out as visible text to the model when downscaled for inference.

    • Invisible Text: Attacks can use white text, zero-width characters (which don't show up when displayed or highlighted), or Unicode character smuggling in emails or prompts to covertly inject instructions.

    • Syntax & Emojis: Research has shown that bad grammar, run-on sentences, or even a simple sequence of emojis can successfully trigger prompt injections or jailbreaks.

Defense and Design

  • LLM Security is API Security: Since LLMs rely on APIs for their "tool access" and to perform actions (like sending an email or issuing a refund), security comes down to the same principles used for APIs: proper authorization, access control, and eliminating misconfiguration.

  • The Middleware Layer: Some companies are using middleware that sits between their application and the Frontier LLMs (like GPT or Claude) to handle system prompting, guard-railing, and filtering prompts, effectively acting as a Web Application Firewall (WAF) for LLM API calls.

  • Security Design Patterns: To defend against prompt injection, security design patterns are key:

    • Action-Selector Pattern: Instead of a text field, users click on pre-defined buttons that limit the model to a very specific set of safe actions.

    • Code-Then-Execute Pattern (CaMeL): The first LLM is used to write code (e.g., Pythonic code) based on the natural language prompt, and a second, quarantined LLM executes that safer code.

    • Map-Reduce Pattern: The prompt is broken into smaller chunks, processed, and then passed to another model, making it harder for a prompt injection to be maintained across the process.

  • Timeless Hygiene: The most critical defenses are logging, monitoring, and alerting. You must log prompts and outputs and monitor for abnormal behavior, such as a user suddenly querying a database thousands of times a minute or asking a chatbot to write Python code.

Resources & Links Mentioned

Support the Podcast:

Enjoyed this episode? Leave us a review and share it with your network! Subscribe for more insightful discussions on information security and privacy.

Contact Information:

Leave a comment below or reach out via the contact form on the site, email timothy.deblock[@]exploresec[.]com, or reach out on LinkedIn.

Check out our services page and reach out if you see any services that fit your needs.

Social Media Links:

[RSS Feed] [iTunes] [LinkedIn][YouTube]


What are bug bounty programs?

In this hunting edition of the Exploring Information Security podcast, Keith Hoodlet of Bugcrowd joins me to discuss bug bounty programs.

Keith (@andMYhacks), is a solutions architect at Bugcrowd. He's also the co-host of Application Security Weekly. While Keith works at Bugcrowd, he also has a lot of experience participating in bug bounty programs. Check out his website AttackDriven.io.

In this episode we discuss:

  • What are bug bounty programs?

  • Who are security researchers.

  • Who is running the bug bounty program?

  • When should an organization implement a program.

More resources: