One API for GTM integrations, built for Claude Code. Install once, then use slash commands or plain English to enrich, scrape, validate, and sequence. Data lives in a PostgreSQL database you own. Build whatever you want on top.

Do I need Claude Code to use Deepline?

No. Deepline is a standard CLI. Run deepline enrich email@domain.com in any terminal. Claude Code skills are optional; they help you run full workflows in one command. Deepline also works with Cursor and Codex.

What is the best GTM data enrichment tool for Claude Code?

Deepline is designed specifically for Claude Code. It provides a unified CLI and API for GTM data enrichment across 34+ providers (Apollo, Crustdata, Leadmagic, Apify, People Data Labs, Hunter, and more), with waterfall logic, email sequencer integration, and Claude Code slash commands. Install with: curl -s 'https://code.deepline.com/api/v2/cli/install' | bash

How is Deepline different from Clay?

Clay is great for no-code workflows. Deepline is for builders: one data layer, easy integrations with any platform via Claude Code, and an included PostgreSQL database. Your data lives in your DB. No ingest or egress fees. Use your own API keys or ours.

Why does Deepline use a CLI instead of MCP?

Direct API calls are faster and burn fewer tokens than MCP tool calls. In tests, the difference is measurable, especially for waterfall enrichment across multiple providers. MCP support is on the roadmap as the protocol matures.

What enrichment providers does Deepline support?

Deepline supports 34+ providers including Apollo, Crustdata, Leadmagic, Apify, People Data Labs (PDL), Hunter, Icypeas, Prospeo, ZeroBounce, Forager, Exa, and more. Waterfall logic queries them in sequence and returns the best match per field. You only pay for successful matches.

What email sequencers does Deepline integrate with?

Deepline integrates with Instantly, Lemlist, HeyReach, and Smartlead. Push enriched contacts directly to any of these sequencers from the CLI or Claude Code in one command.

Can I use my own API keys with Deepline?

Yes. Connect your own Apollo, Crustdata, People Data Labs, or any provider accounts. Using your own keys is free. You pay providers directly. Deepline also offers managed credits for providers where you don't have a direct account.

What does Deepline cost?

Pay provider rates with zero markup by bringing your own API keys. Or use managed credits and we handle provider billing. You see the cost before every run and can set a monthly cap.

What happens when a data provider goes down?

Deepline tracks provider health in real time. When one provider fails repeatedly, it automatically routes traffic to the next available provider in your waterfall sequence. No manual intervention required.

Do I need to be technical to use Deepline?

Right now, yes. Deepline is built for GTM engineers and RevOps practitioners comfortable in a terminal (Claude Code, Codex, or Cursor). MCP support is on the roadmap.

How do I install Deepline?

Run this one command: curl -s 'https://code.deepline.com/api/v2/cli/install' | bash. Then optionally install the Claude Code skill with /skill install deepline inside Claude Code. The CLI auto-detects dev mode and works with Claude Code, Cursor, and Codex.

Will I accidentally burn through credits?

No. You can set a monthly spending cap from the dashboard. Deepline checks your balance before every API call and shows the estimated cost before running. Use --rows 0:1 to pilot on two rows before committing a full run.

Do I get charged for failed lookups?

Most enrichment operations only charge on a successful match. If a provider returns no result, you usually do not pay. Failed lookups do not consume credits.

Can I see what I am spending?

Yes. The dashboard shows your credit balance, per-provider cost breakdowns, and full charge history. You can also set a monthly billing cap with deepline billing --set-monthly-limit.

Skill

niche-signal-discovery

"Discover niche first-party signals that differentiate Closed Won vs Closed Lost accounts for ICP analysis. Use when the user provides won/lost customer domain lists and wants differential signals (website content, job listings, tech stack, maturity markers) to build account scoring models and prospecting criteria. Triggers: ICP analysis, niche signals, won vs lost analysis, differential signals, signal discovery, ICP signal report, account scoring signals, lead scoring, first-party signals, buyer signals. Before reading this file, first read gtm-meta-skill to understand the Deepline CLI tool and how to use it. Then read this file for guidance on the task."

Back to Skills

# Niche Signal Discovery

Discover differential signals between Closed Won and Closed Lost accounts by extracting multi-page website content and job listings, then computing Laplace-smoothed lift scores to identify what distinguishes buyers from non-buyers.

## Prerequisites

- **Deepline CLI** — All enrichment runs through `deepline enrich`. No separate API keys for exa/crustdata/apollo etc.
- **Python 3** stdlib only — no pip dependencies for any shipped script.
- **Credits** — ~6 credits/company for enrichment. Step 7 contact discovery is additional. **Always get user approval before paid steps.**

## Deepline-First Principle

Use `deepline enrich` for all enrichment, `deepline tools execute` for one-offs, `deepline playground` for inspection. Reruns are idempotent. Refer to `gtm-meta-skill` for command patterns and provider playbooks.

## Input requirements

- Won and lost customer domain lists (≥20 won + ≥10 lost for statistical significance)
- **Lookalikes can supplement Won** if Closed Won < 15. Add a Dataset Caveat to the report.
- **Target company context** from Step 0 — what they sell, who they sell to, key personas.

## Pipeline

```
0.    Discover target company (what they sell, who they sell to)
0.5.  Discover ecosystem (competitors, tech stack, buyer personas)
1.    Prepare input CSV (deduplicate within won/lost groups)
1.0.5 Build "do not re-contact" index from user's existing list (scripts/dedupe_utils.py)
1.5.  Generate vertical-specific configs (keywords, tools, job roles)
2.    Multi-page website + job extraction (deepline enrich)
3.    Quality gate — verify file completeness + coverage (>80%)
3.5.  Review configs against enriched data
4.    Differential analysis (scripts/analyze_signals.py)
5.    Generate report — every top signal must include cited evidence
6.    Signal interpretation review
7.    Top 10 net-new prospects [REQUIRED] + contacts/emails [optional, costs credits]
```

**Step 7 is required.** A signal report without 10 actionable companies forces the reader to do their own prospecting pass — exactly the expensive thing they wanted to skip. Contacts/emails are optional only because they cost extra credits; always offer them.

## Signal reliability hierarchy

Highest → lowest confidence:

1. **Job listings** — active budget + acknowledged pain. Highest-intent.
2. **Analyst validation** (Gartner/Forrester) — typically 4-7x lift, rare in lost.
3. **Compliance infrastructure** (SOC2/GDPR/ISO) — procurement maturity.
4. **Buyer pain language** on careers/blog — operational awareness.
5. **Tech stack tools** (niche SaaS) — infrastructure readiness.
6. **Website product/marketing content** — variable; can be buyer OR competitor.

**When website signals fail:** For B2B back-office tools (AR, billing, compliance), buyers don't publish their pain on marketing pages. Prioritize jobs + tech stack + firmographics for these verticals.

## What NOT to use for scoring

CRM fields populated by AE activity — catalyst note count, OCR-derived counts (`number_of_champions_c`, `number_of_decision_makers_c`), MEDDPICC picklists, any "did the AE do X on this opp" field — correlate with win-rate as **engagement artifacts, not causal signals**. They get filled in *after* the AE decides an opp is worth working. **Never use them as scoring inputs.** On one real run, catalyst notes showed "109x lift" — almost made the TL;DR before we caught the direction of causality.

Rule of thumb: every scoring input must be observable BEFORE the AE touches the account. Read `references/scoring-pitfalls.md` for the full list and the "safer alternative read" for loss-reason data.

## Step 0: Target company discovery

**Do this FIRST.** The entire pipeline (exa query, keywords, tech stack, job roles) adapts based on this discovery; skipping it produces generic/irrelevant signals.

```bash
deeplineagent: "Research {{company-domain}}. Summarize what the company sells, who they sell to, what makes them different, and any example customers."
```

Document: (1) product category, (2) target buyer persona, (3) key differentiation, (4) example customers.

## Step 0.5: Ecosystem discovery

Three parallel `deeplineagent` queries:

- **Competitors** — `"{product category} software alternatives competitors"` → 3-5 names
- **Tech stack** — `"{buyer persona} software stack"` → 10-15 tools by category
- **Job roles** — `"{buyer persona} job titles"` → 10-15 title variations

These feed Step 1.5 config generation.

## Step 1: Prepare input CSV

```csv
domain,status
customer1.com,won
non-customer1.com,lost
```

**Deduplicate within the input.** If a domain appears in BOTH won and lost (same company, multiple deals), Deepline only fetches job listings once — silently undercounting `won_with_jobs`. Remove ALL rows for cross-group domains:

```python
from collections import Counter
counts = Counter(r['domain'] for r in rows)
duplicate_domains = {d for d, c in counts.items() if c > 1}
# Drop every row in duplicate_domains, not just one copy.
```

## Step 1.0.5: Build "do not re-contact" index

Before any prospects ship in Step 7, dedupe candidates against whatever "already known" list the user provides — customers, CRM export, past outbound, a previous run's output. **Always ask explicitly**; if the user has no list, note it as a caveat in the final report rather than silently skipping.

**Order: apex domain first, fuzzy company name as fallback.** Use the shipped helper — it handles public-suffix multi-label TLDs (`co.uk`, `co.jp`, `com.au`) and corporate-suffix stripping:

```bash
python3 scripts/dedupe_utils.py --selftest   # one-time sanity check
python3 scripts/dedupe_utils.py \
    --existing customers.csv --candidates prospects_raw.csv \
    --out-actionable prospects_actionable.csv --out-matched already_known.csv
```

Don't silently drop CRM matches — **categorize** them: Net-new / Account-only / Re-engage / Active-open / Current-customer.

**Read `references/dedupe.md`** for the failure modes (raw-string match missing `amsynergy.nikon.com → nikon.com` cost 24 of 50 prospects in one run), category definitions, and library usage.

## Step 1.5: Generate vertical-specific configs

Create three JSON files in `output/{{company}}/`:

```
{{company}}-keywords.json    # product category, pain language, competitor names, maturity terms
{{company}}-tools.json       # niche SaaS tools by category
{{company}}-job-roles.json   # buyer persona job titles
```

**Read `references/keyword-catalog.md`** for the JSON schema, generation patterns, and multi-vertical examples (creative ops, AR automation, sales engagement, developer tools).

**Validation:** Do the configs match the target's vertical and buyer persona? If not, refine based on Step 0/0.5 findings.

## Step 2: Deepline enrichment

**Never scrape just the homepage.** Use `exa_search` with `contents.text` to discover AND scrape ~8 pages per domain in one call.

Generate the exa query dynamically based on the target's product category — see `gtm-meta-skill` for query variants by buyer type (back-office, developer, creative, sales).

```bash
deepline enrich \
  --input output/{{company}}-icp-input.csv \
  --output output/{{company}}-enriched.csv \
  --with '{"alias":"website","tool":"exa_search","payload":{"query":"{{exa-query}}","numResults":8,"type":"auto","includeDomains":["{{domain}}"],"contents":{"text":{"maxCharacters":3000,"verbosity":"compact","includeSections":["body"]}}}}' \
  --with '{"alias":"jobs","tool":"crustdata_job_listings","payload":{"companyDomains":"{{domain}}","limit":50}}'
```

**Cost:** ~5 credits/company (exa) + ~1 credit (crustdata) = **~6 credits/company**. Get user approval first.

## Step 3: Quality gate

`deepline enrich` returns to terminal **before** OS buffers fully flush. Running the analysis script immediately can read a partially-written file and produce `won_with_jobs: 0` even when data is fine. Always verify:

```bash
INPUT_ROWS=$(wc -l < output/{{company}}-icp-input.csv)
OUTPUT_ROWS=$(wc -l < output/{{company}}-enriched.csv)
echo "Input: $INPUT_ROWS, Output: $OUTPUT_ROWS"  # should match
```

Then spot-check that won rows have job data, that website coverage is >80%, and that average content depth is 6-8 pages / 12-20K chars per company.

**Read `references/quality-gate.md`** for the full verification script, the buffer-flush retry pattern, and the "auto-extracted domain validation" check that has caught up to **53% false-positive rates** in CRM-exported customer lists.

## Step 3.5: Review configs against enriched data

```bash
deepline playground output/{{company}}-enriched.csv
```

**Red flags:**
- Keyword in <10% of enriched companies → too niche, broaden
- Keyword in >90% → too generic, refine
- Product-category keywords appear frequently in Won → wrong product category, those companies are competitors not buyers
- Job roles missing from actual listings → wrong buyer persona

Fix and regenerate configs if needed.

## Step 4: Differential analysis

```bash
python3 scripts/analyze_signals.py \
  --input output/{{company}}-enriched.csv \
  --keywords output/{{company}}-keywords.json \
  --tools output/{{company}}-tools.json \
  --job-roles output/{{company}}-job-roles.json \
  --output output/{{company}}-analysis.json
```

The script computes substring-match presence, Laplace-smoothed lift, source breakdown (website/jobs/both), tech-stack mentions, job-role prevalence, anti-fit signals, and **per-keyword evidence quotes** (±40 chars with URLs) — the evidence array is what Step 5 renders.

## Step 5: Report generation

**Read `references/report-template.md`** for the full report structure (Quick Reference Dashboard at the top, then detail sections), the signal-strength visual scale, Apollo URL format, and all quality rules. Critical rules in brief:

- Raw counts always (`15% (6)`, not just `15%`); sample sizes in headers (`Won (n=37)`)
- Bold only signals with lift > 2x AND count ≥ 3 companies
- Flag n=1 signals — they're statistically meaningless
- **Source evidence is mandatory for every top signal** (lift ≥ 1.5 AND won ≥ 3) — 3-5 cited quotes per signal with source type, company, page/job title, ±40-char quote, and live URL. The analysis script outputs this; render it, don't decide whether to. Signals without 3+ citations get demoted and flagged `*(insufficient evidence)*`.
- Annotate each evidence quote with ✅ (clear buyer signal) or ⚠️ (vendor-adjacent — the company sells something similar, so the keyword on their product page isn't a buyer signal)
- Tier 1 cheatsheet point values must match the Section 6 scoring model — cross-check both before shipping

## Step 6: Signal interpretation

**Read `references/signal-interpretation.md`** before writing interpretation columns. Key rules:
- Website content mentioning what the target sells = competitor signal (not buyer)
- Job listings = highest-intent buyer signal
- Same keyword means different things on product page vs careers page vs blog
- Tech stack tools need context — do they create or solve the target's problem?

## Step 7: Top 10 net-new prospects (required)

**10 companies are required for every run; contacts + emails are optional** (additional Deepline credits). Always offer contact discovery; only run it if the user approves the spend.

```bash
# Companies only — no extra credits beyond Step 2 enrichment:
python3 scripts/find_contacts.py --input prospects_actionable.csv --output top10.csv --top 10 --no-contacts

# Companies + contacts + emails — asks for credit approval.
# --roles is REQUIRED in --contacts mode and must be the buyer-persona job
# titles surfaced in YOUR Step 0/0.5 (not a stale list from a different vertical):
python3 scripts/find_contacts.py --input prospects_actionable.csv --output top10.csv --top 10 \
    --contacts --roles "<persona job titles from Step 0.5>"
```

When `--contacts` is on, the orchestrator runs a 3-phase chain via Deepline:
1. `company_to_contact_by_role_waterfall` (free, mature companies)
2. **`exa_search_people` fallback for any company Phase 1 missed** — mandatory. On the run that motivated this, Phase 1 returned 0 contacts on all 10 top prospects (small/non-US industrial); Exa found 15 real contacts at 6 of those 10 in the same pass.
3. `person_linkedin_only_to_email_waterfall` with **apex-domain validation** — providers return stale addresses (`@orbitalatk.com` for someone now at X-Bow, personal Gmails, wrong-company false positives). Mismatched apex → publish "(email not found)", keep the raw value in `raw_email` for auditing.

**Read `references/step-7-prospects.md`** for the required vs. optional output fields, the prospect-card skeleton, the Phase 2 Exa guardrails (title parsing + company-match filter), and the "10 is a ceiling, not a floor" guidance.

## Enrichment data structure

After unwrap, each enriched row has:

- `website` cell → JSON with `data.results[]`, each item: `{text, url, title}`
- `jobs` cell → JSON with `data.listings[]`, each item: `{title, description, url, category}` (note: `title` not `job_title`)

`scripts/analyze_signals.py` auto-detects these columns; override with `--website-col N --jobs-col N` if your CSV uses different column names.

## Common pitfalls (top 6 — full list in references/pitfalls.md)

1. **Skipping target discovery (Step 0)** → generic/irrelevant configs.
2. **Homepage-only scraping** → misses pricing, integrations, security, careers.
3. **Generic tech stack** ("AWS", "GitHub", "Slack" appear on most B2B sites) → search for niche SaaS specific to the buyer persona.
4. **Trusting n=1 signals** → require 3+ companies for Tier 1 scoring; flag single-company signals with a verification note.
5. **Raw-string dedupe missing parent domains** — `amsynergy.nikon.com ≠ nikon.com` for naive comparison. Always use `extract_apex()`. **24 of 50 "net-new" prospects in one real run were already in the CRM** as parent-domain entries the raw-string dedupe missed.
6. **Trusting confirmation-biased CRM fields** (catalyst notes, OCR counts, MEDDPICC) as signals — they're downstream of AE engagement, not causal. Read the "What NOT to use for scoring" section above.

**Read `references/pitfalls.md`** for the full 18-item list including substring false positives, vendor-vs-buyer signal context, back-office-tool interpretation, and shipping-without-prospects.

## Proven signal patterns

**Read `references/proven-signals.md`** for typical lift ranges across verticals (analyst validation 4.5-6.5x, hiring signals 3.8-5.5x, compliance infra 2.1-6.5x, etc.), high-confidence anti-fit patterns (consumer signals 0.2x, retention/churn 0.2-0.4x), and a starter 0-100 scoring model with three tiers (Core Fit / Buying Intent / Infrastructure Readiness).

## References

- **`references/keyword-catalog.md`** — JSON schema + multi-vertical examples for Step 1.5 config generation
- **`references/dedupe.md`** — Step 1.0.5 dedupe failure modes, categorization rules, library usage
- **`references/quality-gate.md`** — Step 3 verification scripts, buffer-flush retry pattern, auto-extracted-domain validation
- **`references/report-template.md`** — Step 5 full report structure, signal-strength scale, Apollo URL format, all quality rules
- **`references/signal-interpretation.md`** — Step 6 buyer-vs-seller-vs-competitor rules
- **`references/step-7-prospects.md`** — Step 7 prospect-card skeleton, Exa guardrails, Phase 3 apex validation
- **`references/scoring-pitfalls.md`** — Confirmation-biased CRM fields to exclude from scoring
- **`references/pitfalls.md`** — Full 18-item pitfalls list
- **`references/proven-signals.md`** — Typical lift ranges + scoring model guidance
- **`scripts/analyze_signals.py`** — Step 4 differential analysis. Auto-detects columns.
- **`scripts/dedupe_utils.py`** — Step 1.0.5 + Step 7 email validation. `extract_apex()`, `norm_name()`, `match_against_existing()`. Stdlib only. `--selftest` flag for one-time install verification.
- **`scripts/find_contacts.py`** — Step 7 orchestrator. `--contacts` / `--no-contacts` toggle, 3-phase Deepline chain.

## Changelog

- **2026-04-07** — Added Step 1.0.5 (dedupe with apex helper), Step 7 (top 10 prospects required, contacts optional via `--contacts`/`--no-contacts`), `references/scoring-pitfalls.md` warning about confirmation-biased CRM fields, mandatory citation rule. Shipped `scripts/dedupe_utils.py` + `scripts/find_contacts.py`. Aggressively trimmed inline detail to references — moved Step 3 quality gate, Step 5 quality rules, Common Pitfalls (items 7-15), and Proven Signal Patterns into `references/`. SKILL.md went from 650 to ~250 lines via progressive disclosure.