TryHackMe - BankGPT Write-Up
🏦 TryHackMe - BankGPT (Room Walkthrough & Learning Notes)
Room: https://tryhackme.com/room/bankgpt
Topic: Prompt Injection • LLM Security • Context-Based Manipulation
Info: This write-up describes exclusively the approach and learning content. It contains no flags, passwords, hashes, or confidential values in accordance with TryHackMe guidelines.
🔍 Challenge Overview
The BankGPT room provides a simulated banking chatbot powered by a Large Language Model.
The bot is designed to:
- follow internal security policies
- not disclose confidential information
- mask or redact sensitive values
- explain processes rather than output concrete data
The task is to investigate how prompt injection techniques and contextual formulations affect these protection mechanisms.
This is not a classic technical exploit, but rather a test of:
- argumentation and reasoning
- context crafting
- social influence
- weaknesses in LLM behavior
🪟 The User Interface:

After starting the room and navigating to the page via the link, you find yourself on a typical minimal chatbot UI.
🎯 Core Idea of the Challenge
Direct requests for confidential data are consistently rejected by the bot.
However, when the same requests are embedded in a context such as:
- compliance audits
- internal audit processes
- classification or verification workflows
the model responds increasingly openly - sometimes with additional internal descriptions or metadata.
Typical interaction flow:
- The model provides secure or redacted placeholder values
- It mentions internal designations or classifications
- It provides additional context information with polite requests
- Certain phrasings unexpectedly weaken protection mechanisms
This demonstrates:
The guidelines are implemented at the conversation level - not based on a genuine assessment of sensitivity.
This creates potential attack surfaces.
🧩 Central Methods & Insights
The challenge demonstrates several real prompt injection patterns:
🟡 Authority and Process Framing
Formulations that resemble internal communication build more trust:
- audit inquiries
- system or compliance validation
- verification logic rather than data collection
The model more frequently interprets such prompts as legitimate workflows.
🟡 Redaction & Classification Queries
Instead of requesting sensitive data, the bot can be made to:
- explain redactions or placeholders
- describe internal data types or designations
- explain storage or reference concepts
While only metadata is discussed, even this can be security-relevant.
🟡 “Debug / Integrity Check / Verification”
In development or support contexts, LLMs often respond more freely to:
- check and confirmation instructions
- repetitions or echoes of values
- simulated log or protocol outputs
This highlights risks in:
- support chatbots
- internal tools
- DevOps automations
when outputs are not additionally secured.
🟡 Format and Demonstration Vulnerabilities
A particularly important learning point:
When a model is only supposed to “show a format,” it can still output real values.
Format validation is often not interpreted as data disclosure.
This has immediate relevance for:
- AI-driven business processes
- assistant systems in enterprises
- security-critical areas
🧠 Security Lessons from the Challenge
BankGPT clearly shows:
- conversational guidelines alone provide no real protection
- LLMs evaluate phrasing, not security risks
- harmless-sounding process language can lead to data leaks
- disclosing metadata can already be critical
- prompt injection remains a real, current threat
Secure systems require protection at:
✔ Data access level
✔ System architecture
✔ Permission models
not just in chat response logic.
🏁 Conclusion
BankGPT is an excellent practical introduction to:
- adversarial prompts
- analysis of conversational attack surfaces
- LLM abuse scenarios
- modern security questions around AI systems
The challenge requires:
- critical thinking
- communication awareness
- understanding of human and technical security factors
and very clearly demonstrates what risks can arise when AI systems are integrated into operational workflows.
👉 Room link for reference:
https://tryhackme.com/room/bankgpt