v1.5.0 — open source

The privacy boundary between
your data and LLMs.

Sensitive-data teams want LLM automation — but can't casually send names, IDs, tax records, or health data to external models. Armos is the local detection and reversible token layer that makes it safe. Built for developers. One line to integrate.

View on GitHub
pip install armos
One line to integrate No data sent during detection Open source · MIT
Before
from openai import OpenAI

client = OpenAI()

response = client.chat
  .completions.create(
    model="gpt-4o",
    messages=[{
      "content": prompt
    }]
)
After — one word changed
from openai import OpenAI
from armos import ArmosOpenAI

client = ArmosOpenAI(OpenAI())

response = client.chat
  .completions.create(
    model="gpt-4o",
    messages=[{
      "content": prompt
    }]
)

How it works

Three steps. Fully local.

No Armos server. No data sent anywhere. Detection and masking run entirely in your process.

Armos how it works — sequence diagram showing PII masking flow
01
Detect
Presidio + a custom-trained NER model (armos-ner-en) scan every prompt locally. Names, Aadhaar numbers, emails, PAN cards, API keys — found before anything leaves your server.
02
Mask
Detected entities are replaced with deterministic tokens like [PII:NAME:a1b2c3d4]. The same value always maps to the same token.
03
Restore
After the LLM responds, tokens are swapped back to real values automatically. Your application receives the full, real response.

Detection

11 entity types, with more coming.

Covers global PII — including Indian identifiers no other library handles reliably.

Entity Token Example
Person name [PII:NAME:…] John Smith
Email address [PII:EMAIL:…] john@hospital.com
Phone number [PII:PHONE:…] +91 98765 43210
Aadhaar number 🇮🇳 India [PII:AADHAAR:…] 2345 6789 0123
PAN card 🇮🇳 India [PII:PAN:…] ABCDE1234F
SSN 🇺🇸 US [PII:SSN:…] 371-53-1234
IBAN 🌍 Global [PII:IBAN:…] GB29NWBK60161331926819
Credit / debit card [PII:CARD:…] 4111 1111 1111 1111
IP address [PII:IP:…] 192.168.1.100
API keys & secrets [PII:APIKEY:…] sk-abc… / AKIA… / ghp_…
Physical address [PII:ADDRESS:…] 123 Oak Ave, Chicago / Flat 4B, Koramangala
Date of birth soon [PII:DOB:…] 12/04/1982

Accuracy

Benchmarked across 10,000+ samples.

Tested on real Indian and Western names, addresses, and structured identifiers. Zero false positives.

Custom-trained NER model. Armos ships with armos-ner-en — a spaCy NER model trained specifically for PII detection across Indian and Western text. This is not a generic off-the-shelf model. It was built to catch names and addresses that standard models miss, covering patterns from Bengaluru to Birmingham.
99.3%
PERSON detection
Indian + Western names
100%
ADDRESS detection
Indian + Western formats
99.8–100%
Structured identifiers
Email, Phone, ID numbers
0%
False positive rate
All entity types
Entity type Accuracy
Person name (Indian + Western) 99.3%
Physical address (Indian + Western) 100%
Email address 100%
Phone number 99.8%
Aadhaar number 100%
PAN card 100%
SSN 100%
IBAN 100%
Credit / debit card 99.8%
IP address 100%
API keys & secrets 99.9%

Why Armos

Built for teams who don't have time to build this.

The alternatives all fall short in different ways.

vs. Building your own
Weeks of work, not a pip install.
A correct masking layer takes weeks to build and months to handle edge cases — casing, multi-turn sessions, vault management, format consistency. Armos handles all of it.
vs. LLM Guard
Different problem entirely.
LLM Guard focuses on prompt injection and toxicity — not PII masking and data privacy. Useful for adversarial inputs. Not built for compliance or sensitive data protection.
vs. Presidio directly
Detection without the pipeline.
Presidio finds PII — it doesn't replace it, manage tokens, maintain a vault, or integrate with your LLM SDK. Armos wraps all of that into two lines of code.
Indian PII
Aadhaar and PAN, first-class.
No existing tool handles Indian identifiers reliably. Armos ships with native Aadhaar and PAN detection — the two most sensitive identifiers in India — out of the box.

One command.

pip install armos

Requires Python 3.10+ · spaCy model downloads automatically on first use

Building an AI product where users share personal data?

We're looking for 3–5 early teams to shape where armos goes next. You get direct access to us, your use case influences the roadmap, and you'll be the first to get new features. In return, we just want honest feedback.

Let's talk
FAQ

Common questions

How do I mask PII before sending to OpenAI or Anthropic?
Replace OpenAI() with ArmosOpenAI(OpenAI()). Armos intercepts every request, masks PII locally before it leaves your server, and restores real values in the response automatically.
Does Armos send data to any external server during detection?
No. Detection runs entirely locally using Presidio and a custom-trained NER model (armos-ner-en). No data is sent anywhere during masking. There is no Armos server.
Is Armos a Presidio wrapper for OpenAI?
Armos uses Presidio for detection but goes beyond it — adding tokenization, a reversible vault, and direct OpenAI and Anthropic SDK integration. Presidio detects PII; Armos masks it, manages the token vault, and wires everything into your LLM calls.
Can I use Armos for HIPAA or GDPR workflows?
Yes — Armos is built for exactly this. HIPAA and GDPR both require data minimization: sensitive data should only be shared when necessary and in the least identifiable form. Armos enforces that at the LLM layer — raw PII never leaves your infrastructure in readable form. The model provider only ever sees anonymized tokens. If your team is building LLM workflows that touch health records, financial data, or personal information, Armos handles the PII-in-transit problem so you can focus on the rest of your compliance stack.
Where does the token-to-value mapping live?
In-memory by default — ephemeral, nothing persists after the process ends. For multi-turn conversations across requests, use the Redis backend (pip install armos[redis]). The vault always lives in your own infrastructure — never ours.