Introducing the Content Farm Detector

Content farms have a specific signature: articles written not for humans, but for search engines. They use generic templates, filler phrases, and surface-level coverage to rank for keywords without providing real value.

We built the Content Farm Detector to identify these articles automatically — via a single REST API call.

What Makes Something a “Content Farm” Article?

Content farm articles share common characteristics:

Template openings — “In today’s fast-paced world…”, “It is important to note that…”, “When it comes to…”
Filler phrases — “cutting-edge solutions”, “leveraging technology”, “maximizing efficiency”
Low information density — Lots of words, not much substance
Surface-level coverage — Never goes deep enough to be useful
Keyword stuffing — The article exists to rank for specific terms

Traditional plagiarism checkers miss these because they’re not copying words — they’re copying patterns.

How the Detector Works

The Content Farm Detector evaluates articles against a defined set of content-farm signals using an AI model. You submit an article by URL or paste the raw text, and the service returns a structured JSON assessment in under 3 seconds.

The analysis checks for:

Keyword stuffing — unnatural repetition of target phrases beyond what is editorially justified
Thin content — articles below ~400 words with no substantive information
Structural uniformity — rigid template patterns (e.g., H2 → 3 sentences → H2 → 3 sentences) with no natural variation
Missing original research — no data, quotes, studies, interviews, or firsthand reporting
Filler paragraphs — content that restates the introduction without adding new information
Generic stock advice — content that could appear verbatim on any low-effort site
No author voice — no discernible author perspective, opinion, or expertise signals

Each request returns a structured JSON response with scores, detected signals, SEO manipulation risk, and a clear safe_to_link_to verdict — all from a single POST /api/analyze call.

Quick Start

Send a POST request to the /api/analyze endpoint with either a URL or raw article text:

const RAPIDAPI_KEY = process.env.RAPIDAPI_KEY
const RAPIDAPI_HOST = 'content-farm-detector-api-ai-seo-quality-scorer.p.rapidapi.com'

async function analyzeArticle(articleUrl) {
  const response = await fetch(`https://${RAPIDAPI_HOST}/api/analyze`, {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'X-RapidAPI-Key': RAPIDAPI_KEY,
      'X-RapidAPI-Host': RAPIDAPI_HOST,
    },
    body: JSON.stringify({
      url: articleUrl,
      options: {
        check_seo_signals: true,
      },
    }),
  })

  if (!response.ok) {
    const error = await response.json()
    throw new Error(`API error ${response.status}: ${error.message}`)
  }

  return response.json()
}

const result = await analyzeArticle('https://example-blog.com/10-best-seo-tips')

console.log(`Quality Score: ${result.content_quality_score}/100`)
console.log(`AI Probability: ${result.ai_probability}`)
console.log(`Safe to link:   ${result.safe_to_link_to}`)
console.log(`SEO Risk:       ${result.seo_manipulation_risk}`)
console.log(`Signals:        ${result.content_farm_signals.join(', ')}`)
console.log(`Cached:         ${result.cached}`)

You can also submit raw text directly instead of a URL:

const result = await fetch(`https://${RAPIDAPI_HOST}/api/analyze`, {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-RapidAPI-Key': RAPIDAPI_KEY,
    'X-RapidAPI-Host': RAPIDAPI_HOST,
  },
  body: JSON.stringify({
    text: articleText, // minimum 50 characters
    options: {
      domain: 'example-blog.com',
      check_seo_signals: true,
    },
  }),
}).then((r) => r.json())

The API returns two key scores: content_quality_score (0–100, where higher is better — 80–100 is excellent, 0–39 is content-farm/spam) and ai_probability (0.0–1.0, where higher means more likely AI-generated). The safe_to_link_to boolean combines these with SEO risk analysis into a single actionable verdict: true only when content_quality_score ≥ 50, seo_manipulation_risk is not “high”, and ai_probability ≤ 0.85.

Response Fields

Field	Type	Description
`ai_probability`	`number \| null`	0.0–1.0 probability the content was AI-generated
`content_quality_score`	`number \| null`	0–100 editorial quality score (higher is better)
`content_farm_signals`	`string[]`	Detected signals: `keyword_stuffing`, `thin_content`, `structural_uniformity`, `no_original_research`, and more
`seo_manipulation_risk`	`string`	One of: `low`, `medium`, `high`, `unknown`
`original_research_detected`	`boolean`	True if the text contains original data, quotes, or firsthand reporting
`keyword_density_score`	`number \| null`	0.0–1.0 measure of keyword stuffing intensity
`recommendation`	`string`	1–3 sentence explanation and recommended action
`safe_to_link_to`	`boolean`	Actionable verdict for automated filtering rules
`cached`	`boolean`	True if served from the 2-hour KV cache

Use Cases

SEO Agencies — Automate Link Prospect Filtering

Link-building teams prospect 200–500 domains per campaign using Ahrefs or Semrush, then spend days manually rejecting content farms. Pipe your export through the API overnight: sites scoring below 40 on content_quality_score are auto-rejected. A 500-domain batch costs under $5 on the PRO plan.

Ad Networks — Gate Publisher Onboarding

Add the API to your publisher onboarding flow. Sample three URLs from each new domain and reject any with an average ai_probability above 0.75 or seo_manipulation_risk of “high”. This cuts MFA (Made-for-Advertising) sites from your inventory at the source.

Content Aggregators — Filter Before Storage

Call the API on every article before storing it in your database. Articles with content_quality_score below 50 or safe_to_link_to set to false are discarded silently. The filter adds 2–3 seconds on a cache miss and under 200ms on a cache hit.

Performance & Caching

Responses for identical inputs are cached for 2 hours in KV storage. This means:

Cache miss: ~2,200ms (dominated by AI inference)
Cache hit: ~90ms (KV read + JSON parse)
Cost: repeated checks on the same URL during a crawl batch are effectively free

At a 35%+ cache hit rate, average response time drops below 1.5 seconds and AI costs decrease proportionally.

Limitations

No detector is perfect. We want to be clear about what this does and doesn’t catch:

Catches: Template-based, SEO-focused content at scale; AI-generated articles with robotic structure and no original reporting
May miss: Well-researched AI-assisted articles with genuine data and quotes; human-written content that happens to use common phrases

An article written by a human who researched thoroughly but used some common phrases would score differently than an AI-generated article using the same phrases without research backing.

Use this as one signal in your assessment, not a definitive verdict.

The Bigger Picture

Content farms exist because search engines have historically rewarded volume and keywords over quality. As AI makes it easier to generate high-volume, keyword-optimized content, the problem gets worse.

Tools like this one won’t solve the incentive problem. But they can help the humans in the loop — editors, link builders, ad ops engineers — make better decisions about what to read, what to link to, and what to recommend.

We’re not trying to police AI-generated content. We’re trying to help identify content that doesn’t serve human readers, regardless of how it was made.

Try the Content Farm Detector free on RapidAPI — the BASIC plan includes 50 requests per month at no cost. If you find it useful (or not), we’d love to hear your feedback.