Everyone says they can "feel" when code is written by AI. But feelings aren't data. So we decided to put the vibes to the test.
We collected 5,000 human-written snippets from open-source repositories (pre-2021 to avoid contamination) and generated 5,000 corresponding snippets using GPT-4, Claude 3.5, and Llama 3.
Then we ran them through our heuristic engine. The results were not just statistically significant—they were staggering.
The Dataset
- Human Set: Popular JS/TS repos (React, Vue, D3, Express).
- AI Set: Generated via prompts like "Write a function to..." based on the human function signatures.
- Languages: JavaScript, TypeScript, Python.
Top 10 Observed Differences
| Metric | Human Code | AI Code |
|---|---|---|
| Avg. Comment Density | 12% | 28% (Over-commenting) |
| Variable Name Length | 8.4 chars | 6.1 chars (Generic names) |
| "Guard Clause" Usage | 68% of functions | 22% of functions |
| Error Handling | Specific / Bubbling | Generic `try/catch` |
| Unique Vocabulary | High (Domain slang) | Low (Standard English) |
Visualizing the "Vibe Gap"
The most striking difference was in structural variety. Human code has peaks and valleys. AI code is a flat plain.
AI Structure (The Block)
const process = (data) => {
if (data) {
const result = [];
for (let i = 0; i < data.length; i++) {
if (data[i].isValid) {
result.push(data[i]);
}
}
return result;
}
return [];
};
Dense. Nested. Uniform.
Human Structure (The Flow)
const process = (items = []) => {
if (!items.length) return [];
return items.filter(isValid);
};
Spaced out. Linear. Expressive.
What Surprised Us
We expected AI to be "better" at syntax. And it is. It rarely makes syntax errors. But we were surprised by how insecure AI coding styles are.
AI code is terrified of runtime errors. It checks for `null` constantly, even when types guarantee existence. It wraps simple logic in `try/catch`. It defaults to defensive coding patterns that actually make debugging harder because they swallow the useful crash data.
The "Turing Test" for Code
Based on this data, we've refined our detection algorithm. We don't just look for "bad" code. We look for "statistically average" code.
If your code looks like the average of all code on GitHub, it triggers our AI detector. If it has quirks, weird formatting choices, and domain-specific slang, it passes as human.
Does Your Code Pass the Test?
We've fed these 10,000 learnings into our Vibe Engine. Paste your snippet to see where it falls on the Human-AI spectrum.
Run the Vibe Test →