Nightshade

Protects source code from being scraped for AI training.

Problem

AI companies scrape public code to train their models — usually without asking the people who wrote it.

Nightshade gives developers a way to fight back. It subtly rewrites your source code so it still compiles and behaves exactly the same for humans and machines — but becomes "poisoned", low-quality training material for any AI that scrapes it. Your code keeps working; the scraper gets noise.

What I built

Key Decision

Every transformation must be provably semantics-preserving — the pipeline verifies after each pass that the code still compiles and behaves identically.

Eight obfuscation strategies — including misleading identifier renames, plausible dead-code injection, comment poisoning, string encoding, control-flow flattening, and a steganographic watermark that can prove ownership — applied through a weighted entropy pipeline over an AST/lexer. Built for Java first, with Python, JavaScript, and TypeScript support, it ships as a CLI, a GitHub Action, and a pre-commit hook. Named in homage to the Nightshade image-poisoning research project — this applies the same idea to source code. Co-created with Saif-ur-Rehman.

Impact / Learnings

Released at v3.5.0 with a JUnit 5 test suite and a hardened release pipeline: SLSA provenance, Sigstore signing, an SBOM, and CodeQL scanning.

This project taught me compiler-level engineering: lexing, AST manipulation, and what "the code must behave identically" actually demands from a verification step.