One API for GTM integrations, built for Claude Code. Install once, then use slash commands or plain English to enrich, scrape, validate, and sequence. Data lives in a PostgreSQL database you own. Build whatever you want on top.

Do I need Claude Code to use Deepline?

No. Deepline is a standard CLI. Run deepline enrich email@domain.com in any terminal. Claude Code skills are optional; they help you run full workflows in one command. Deepline also works with Cursor and Codex.

What is the best GTM data enrichment tool for Claude Code?

Deepline is designed specifically for Claude Code. It provides a unified CLI and API for GTM data enrichment across 34+ providers (Apollo, Crustdata, Leadmagic, Apify, People Data Labs, Hunter, and more), with waterfall logic, email sequencer integration, and Claude Code slash commands. Install with: curl -s 'https://code.deepline.com/api/v2/cli/install' | bash

How is Deepline different from Clay?

Clay is great for no-code workflows. Deepline is for builders: one data layer, easy integrations with any platform via Claude Code, and an included PostgreSQL database. Your data lives in your DB. No ingest or egress fees. Use your own API keys or ours.

Why does Deepline use a CLI instead of MCP?

Direct API calls are faster and burn fewer tokens than MCP tool calls. In tests, the difference is measurable, especially for waterfall enrichment across multiple providers. MCP support is on the roadmap as the protocol matures.

What enrichment providers does Deepline support?

Deepline supports 34+ providers including Apollo, Crustdata, Leadmagic, Apify, People Data Labs (PDL), Hunter, Icypeas, Prospeo, ZeroBounce, Forager, Exa, and more. Waterfall logic queries them in sequence and returns the best match per field. You only pay for successful matches.

What email sequencers does Deepline integrate with?

Deepline integrates with Instantly, Lemlist, HeyReach, and Smartlead. Push enriched contacts directly to any of these sequencers from the CLI or Claude Code in one command.

Can I use my own API keys with Deepline?

Yes. Connect your own Apollo, Crustdata, People Data Labs, or any provider accounts. Using your own keys is free. You pay providers directly. Deepline also offers managed credits for providers where you don't have a direct account.

What does Deepline cost?

Pay provider rates with zero markup by bringing your own API keys. Or use managed credits and we handle provider billing. You see the cost before every run and can set a monthly cap.

What happens when a data provider goes down?

Deepline tracks provider health in real time. When one provider fails repeatedly, it automatically routes traffic to the next available provider in your waterfall sequence. No manual intervention required.

Do I need to be technical to use Deepline?

Right now, yes. Deepline is built for GTM engineers and RevOps practitioners comfortable in a terminal (Claude Code, Codex, or Cursor). MCP support is on the roadmap.

How do I install Deepline?

Run this one command: curl -s 'https://code.deepline.com/api/v2/cli/install' | bash. Then optionally install the Claude Code skill with /skill install deepline inside Claude Code. The CLI auto-detects dev mode and works with Claude Code, Cursor, and Codex.

Will I accidentally burn through credits?

No. You can set a monthly spending cap from the dashboard. Deepline checks your balance before every API call and shows the estimated cost before running. Use --rows 0:1 to pilot on two rows before committing a full run.

Do I get charged for failed lookups?

Most enrichment operations only charge on a successful match. If a provider returns no result, you usually do not pay. Failed lookups do not consume credits.

Can I see what I am spending?

Yes. The dashboard shows your credit balance, per-provider cost breakdowns, and full charge history. You can also set a monthly billing cap with deepline billing --set-monthly-limit.

Skill

clay-to-deepline

Back to Skills

# Clay → Deepline Migration Skill

---

## ⚠️ Security: Cookie Isolation — Read First

Clay session cookies in `--with` args are sent to Deepline telemetry (`POST /api/v2/telemetry/activity`). `run_javascript` executes locally but the command string is transmitted.

**Two rules, always apply both:**

1. **Never embed the cookie in the payload.** Read it from env in JS:
   ```js
   'cookie': process.env.CLAY_COOKIE   // RIGHT
   'cookie': 'claysession=abc123'      // WRONG — appears in telemetry
   ```

2. **Store in `.env.deepline`, add to `.gitignore`:**
   ```bash
   # .env.deepline  (migration-specific — does not affect any other project env files)
   # Use SINGLE quotes — GA cookies contain $o7, $g1 etc. that bash expands with double quotes
   CLAY_COOKIE='claysession=<value>; ajs_user_id=<id>'
   ```
   Load with: `set -a; source .env.deepline; set +a`

   **How to get the cookie:** HAR exports strip `HttpOnly` cookies — `claysession` will be missing. Instead, copy a `curl` command from Chrome DevTools Network tab (right-click any api.clay.com request → Copy → Copy as cURL). Extract everything between `-b '...'` and store as `CLAY_COOKIE`.

---

## Input Forms

| Input type | What it contains | How to parse |
|---|---|---|
| **HAR file** (`app.clay.com_*.har`) | Full network traffic including `bulk-fetch-records` responses with rendered formula cell values — the richest input | Base64-decode + gunzip `response.content.text`; extract `results[].cells` |
| **ClayMate Lite export** (`clay-claude-t_xxx-date.json`) | `tableSchema` (raw GET response) + `portableSchema` ({{@Name}} refs) + `bulkFetchRecords` (N sample rows, or `null` if table was empty at export time) — second richest input after HAR | Access `.tableSchema` for field IDs, `.bulkFetchRecords.results[].cells` for cell values (if non-null), `.portableSchema.columns[].typeSettings.inputsBinding` for **full prompts and JSON schemas** — recoverable even when `bulkFetchRecords` is null |
| **`GET /v3/tables/{ID}` JSON** | Schema only: field names, IDs, action types, column order. No cell values, no prompts | Use for column inventory and field ID→name mapping |
| **`POST bulk-fetch-records` response** | Schema + actual cell values for sampled rows — contains rendered formula prompts, action outputs, `NO_CELL` markers | `results[].cells[field_id].value` for each field |
| **Clay workbook URL** | Nothing directly — extract `TABLE_ID` from URL; fetch schema + records via API with user's `CLAY_COOKIE` | `GET /v3/tables/{TABLE_ID}` then `POST bulk-fetch-records` |
| **User description** | Column names + action types only — no field IDs, no actual prompts | Weakest input; must approximate everything |

**Priority order when multiple inputs available: HAR > ClayMate Lite export > bulk-fetch-records > schema JSON > user description.** Always use the richest available input. A ClayMate Lite export already bundles schema + records — use `.tableSchema` and `.bulkFetchRecords` from it directly.

**When `bulkFetchRecords` is null:** Fall back to `portableSchema` for prompt and schema recovery:
- Full prompts: `.portableSchema.columns[].typeSettings.inputsBinding` → find `{name: "prompt"}` entry → `.formulaText`
- JSON schemas: same → find `{name: "answerSchemaType"}` → `.formulaMap.jsonSchema` (double-escaped string — `JSON.parse` twice)
- Conditional run logic: `.typeSettings.conditionalRunFormulaText` — convert to row-filter in Deepline
- Mark prompts as `# RECOVERED FROM PORTABLE SCHEMA — field f_xxx` (verbatim, not approximated)

**How to extract bulk-fetch-records from a HAR:**
```bash
# Find bulk-fetch-records entries (response body is base64+gzip)
python3 - <<'EOF'
import json, base64, gzip
with open('your-export.har') as f:  # replace with your HAR filename
    har = json.load(f)
for entry in har['log']['entries']:
    url = entry['request']['url']
    if 'bulk-fetch-records' in url:
        body = entry['response']['content'].get('text', '')
        enc  = entry['response']['content'].get('encoding', '')
        data = base64.b64decode(body) if enc == 'base64' else body.encode()
        try:
            data = gzip.decompress(data)
        except Exception:
            pass
        print(json.dumps(json.loads(data), indent=2)[:5000])
EOF
```

---

## Output Files

Every migration produces this structure:

```
project/
├── .env.deepline             # Clay credentials (never commit — add to .gitignore)
├── .env.deepline.example     # Template showing required vars — safe to commit
├── .gitignore                # Excludes .env.deepline, *.csv, work_*.csv
├── prompts/
│   └── <name>.txt            # One file per AI column. Header documents source:
│                             #   "# RECOVERED FROM HAR — field f_xxx"  ← verbatim from Clay
│                             #   "# ⚠️ APPROXIMATED — could not recover from HAR"  ← guessed
├── scripts/
│   ├── fetch_<table>.sh      # Fetches Clay records → seed_<table>.csv
│   └── enrich_<table>.sh     # Runs deepline enrich passes → output_<table>.csv
```

**Script output CSV columns:** All columns fetched from Clay (using exact field IDs) + one column per enrichment pass. Column names use `snake_case` aliases matching the pass plan.

**Prompt file format:** Plain text system prompt. Variables use `{{column_name}}` syntax (Deepline's interpolation). First line is always a `#` comment documenting the source (HAR field ID or approximation warning).

---

## Phase 1: Documentation (Always First)

Produce before writing any scripts. Get user confirmation before Phase 2.

### 1.1 — Table Summary

| # | Column Name | Clay Action | Tool/Model | Output Type | Notes |
|---|---|---|---|---|---|
| 1 | `record_id` | built-in | — | string | |
| … | | | | | |

### 1.2 — Dependency Graph (Mermaid)

```mermaid
graph TD
    A[record_id] --> B[clay_record]
    B --> C[fields]
    C --> D[exa_research]
    D --> E[strategic_initiatives]
    C --> F[qualify_person]
    E --> F
```

Use `classDef` colors: blue = local (`run_javascript`), orange = remote API, green = AI (`deeplineagent`).

### 1.3 — Pass Plan

**Column alias rule:** Derive aliases from the actual Clay column name, snake_cased (e.g. "Work Email" → `work_email`, "Strategic Initiatives" → `strategic_initiatives`). The two structural aliases `clay_record` and `fields` are fixed — all others follow the Clay schema. Do NOT invent names from a memorized list.

```markdown
| Pass | Column alias | Deepline tool | Depends on | Notes |
|---|---|---|---|---|
| 1 | clay_record | run_javascript (fetch) | record_id | Cookie from env; alias is always clay_record |
| 2 | fields | run_javascript (flatten) | clay_record | alias is always fields |
| N | <clay_col_snake> | <see clay-action-mappings.md> | <prior passes> | Alias = snake_case(Clay column name) |
```

Example (illustrative — actual aliases must match your table's column names):

```markdown
| Pass | Column alias | Deepline tool | Depends on | Notes |
|---|---|---|---|---|
| 1 | clay_record | run_javascript (fetch) | record_id | |
| 2 | fields | run_javascript (flatten) | clay_record | |
| 3 | work_email | cost_aware_first_name_and_domain_to_email_waterfall | fields.first_name, fields.last_name, fields.company_domain | Primary |
| 4 | work_email_li | person_linkedin_to_email_waterfall | fields.linkedin_url | Fallback for rows where 3 returned empty |
| 5 | email_valid | leadmagic_email_validation | work_email | Optional final gate |
| 6 | job_function | deeplineagent | fields.job_title | Classification |
| 7 | company_research | deeplineagent or exa_search -> deeplineagent | fields.company_domain | Pass 1 of 2 |
| 8 | strategic_initiatives | deeplineagent + jsonSchema | company_research | Pass 2 of 2 |
| 9 | qualify_person | deeplineagent + jsonSchema | fields.*, strategic_initiatives | ICP score |
```

### 1.4 — Assumptions Log

State every unverifiable assumption. Get confirmation before Phase 2.

### 1.5 — HAR/bulk-fetch-records Prompt Extraction

**Do this before writing any prompt approximations.** Actual Clay prompt templates often live in formula field cell values in the bulk-fetch-records response.

**Discovery procedure:**
1. In the bulk-fetch-records response, look for `formula` type fields with cell values that start with "You are..." or contain numbered requirements — these are Clay's rendered prompt templates.
2. Check `action` type fields for actual cell values: if they contain `"Status Code: 200"` or `"NO_CELL"`, they are webhook calls or unfired actions — not AI outputs.
3. Any field whose value reads like prompt instructions (numbered requirements, tone description, example output) is the rendered prompt template — use it verbatim, noting the field ID.

**Prompt recovery priority (richest to weakest):**
1. **HAR** — bulk-fetch-records cell values rendered formula prompts verbatim. Use directly.
2. **ClayMate `portableSchema`** — `columns[].typeSettings.inputsBinding[name=prompt].formulaText` has the full prompt even when `bulkFetchRecords` is null. Mark as `# RECOVERED FROM PORTABLE SCHEMA — field f_xxx`.
3. **Approximated** — reverse-engineer from outputs or user description. Mark as `# ⚠️ APPROXIMATED — could not recover`.

**JSON schema recovery from portableSchema:**
```python
import json
for col in d['portableSchema']['columns']:
    if col['type'] == 'action':
        for inp in col['typeSettings'].get('inputsBinding', []):
            if inp['name'] == 'answerSchemaType':
                schema_raw = inp.get('formulaMap', {}).get('jsonSchema', '').strip('"')
                # Double-escaped: unescape \\" → " and \\n → \n, then parse
                schema_raw = schema_raw.replace('\\"', '"').replace('\\n', '\n').replace('\\\\', '\\')
                schema = json.loads(schema_raw)
```

**Fix any Clay formula bugs in recovered prompts:** Wrong field references (e.g., `{{@Name}}` should become `{{name}}`), `{single_brace}` syntax (not interpolated by Deepline), `Clay.formatForAIPrompt(...)` wrapper calls (strip, use field ref directly).

### 1.6 — Pipeline Architecture Verification

Before assuming how many AI passes a pipeline has, check actual cell values across 3+ records:

| Cell value | Meaning | How to replicate |
|---|---|---|
| `NO_CELL` | Action never fired | Build from scratch |
| `"Status Code: 200"` / `{"status":200}` | HTTP/webhook action (n8n, Zapier) — NOT AI output | `run_javascript` fetch or stub |
| `""` (empty string) | Column ran but produced nothing, or was disabled | Treat as NO_CELL |
| Varied generation-shaped text | Actual AI output | `deeplineagent` |

A column in the schema may never have run. Always verify cell values before counting AI passes. An empty or "Status Code: 200" column is not a pipeline step.

---

## Phase 2: Pre-flight Checklist

Answer these **before writing scripts** based on what Phase 1 revealed. Only answer the questions that apply to your table — not every table has email columns or AI passes.

**Table type (check all that apply):**
- [ ] Has person enrichment columns (LinkedIn lookup, profile data) → verify enrichment tool with `deepline tools search "person enrichment linkedin"`
- [ ] Has email finding columns → see Email strategy below
- [ ] Has AI generation columns (use-ai, claygent, octave) → see AI strategy below
- [ ] Has scoring/qualification columns → see Scoring below
- [ ] Has campaign push / CRM update columns → identify target platform, verify tool with `deepline tools search "<platform> add leads"`
- [ ] Has cross-table lookups → export linked table to CSV first
- [ ] **Is a company intelligence table** (source = Mixrank/company search, no `record_id` fetch step) → see Company Intelligence pipeline below

**Email strategy (only if table has email columns):**
- [ ] Does it have `generate-email-permutations` OR `validate-email`? → Use `cost_aware_first_name_and_domain_to_email_waterfall` as primary (permutation + validation built-in), not `person_linkedin_to_email_waterfall` alone.
- [ ] Does it have a LinkedIn URL column? → Add `person_linkedin_to_email_waterfall` as a fallback pass for rows that the primary approach missed.
- [ ] What column identifies a row? → This is your join key for merging enriched output back.

**AI strategy (only if table has use-ai / claygent / octave columns):**
- [ ] Are recovered prompts from HAR (Phase 1 §1.5)? → Use verbatim. Fix any `{single_brace}` bugs and wrong field refs.
- [ ] Was any AI column empty / `NO_CELL` / `"Status Code: 200"` (Phase 1 §1.6)? → Exclude it from the pass plan entirely.
- [ ] Does any AI pass do web research + generation? → Split into 2 passes (research first, generation second). Never combine.

**Scoring / qualification (only if table has scoring columns):**
- [ ] Were ICP criteria visible in the Clay table config or HAR? → Use them verbatim. Do NOT invent scoring dimensions.
- [ ] If no criteria visible → ask the user before writing a scoring prompt.

**Dependency ordering (all tables):**
- [ ] Which columns are `run_javascript`? → Must execute before any `--in-place` pass that references `{{col_name}}`.
- [ ] Are any paid columns (exa_search, email finders) only needed on rows where cheaper stages returned empty? → Use the filter → enrich subset → merge pattern.

**Company intelligence pipeline (only if `source` field is Mixrank/company-search):**
- [ ] `Find companies` / `source` field → Replace with `apollo_company_search` + optional `prospeo_enrich_company`. Ask user for original Mixrank filter criteria (location, size, industry, tech stack).
- [ ] `route-row` action columns → NOT replicable. Produce filtered output CSV instead; document destination table IDs for reference.
- [ ] Conditional run columns (check `conditionalRunFormulaText` in portableSchema) → implement as row-filter before each pass (skip rows that don't match condition).
- [ ] Before generating scripts, confirm with user: (1) how to provide seed company data, (2) row count / scope, (3) whether downstream table routing is needed.

**Security (all tables):**
- [ ] Is `CLAY_COOKIE` stored in `.env.deepline` (not hardcoded in the script)? → Verify `.env.deepline` in `.gitignore`.
- [ ] Is `output/` in `.gitignore`? → Output CSVs contain Clay record PII (names, emails, LinkedIn URLs).
- [ ] Do all `run_javascript` fetch calls use `process.env.CLAY_COOKIE`, not a hardcoded string?
- [ ] Does `.env.deepline` use **single quotes** for `CLAY_COOKIE`? GA cookie values contain `$` characters that bash expands with double quotes.
- [ ] Does your CSV have any legacy columns from a previous script version? → Document as legacy in a comment.

---

## Phase 2: Script Generation

**Two scripts per migration:**

**`clay_fetch_records.sh`** — Fetches Clay records via `run_javascript` + `fetch()`.
- `schema` mode: `GET /v3/tables/{id}` metadata
- `pilot` mode: `--rows 0:3` (rows 0-2)
- `full` mode: all rows

**`claygent_replicate.sh`** — Replicates AI + enrichment columns via `deepline enrich`.

**Architecture choice: `deepline enrich` CLI vs Python SDK**

For Claygent-heavy tables (multiple `use-ai (claygent+web)` columns), the validated pattern is a **pure Python script** that calls `deepline tools execute exa_search` for external research and `deepline tools execute deeplineagent` for AI synthesis. This approach:
- Avoids `deepline enrich` entirely for AI passes when the logic is easier to orchestrate in Python
- Enables true parallel execution with `ThreadPoolExecutor` across both rows and passes simultaneously
- Gives full control over retry logic, confidence gates, and conditional branching
- Is compatible with `{{field}}` interpolation only when accessing `clay_record` data directly in Python (`json.loads(row['clay_record'])`) rather than via `deepline enrich --in-place`

The `deepline enrich` CLI pattern (shown below) still applies for non-AI passes (`run_javascript`, email waterfall, provider lookups) and for simple single-column `deeplineagent` enrichments.

**Cookie pattern (mandatory):**
```bash
set -a; source .env.deepline; set +a
: "${CLAY_COOKIE:?CLAY_COOKIE must be set in .env.deepline}"
CLAY_VERSION="${CLAY_VERSION:-v20260311_192407Z_5025845142}"

# clay_curl wrapper — required for all Clay API calls (bare curl gets 401)
clay_curl() {
  curl -s --fail \
    -b "${CLAY_COOKIE}" \
    -H "accept: application/json, text/plain, */*" \
    -H "origin: https://app.clay.com" \
    -H "referer: https://app.clay.com/" \
    -H "x-clay-frontend-version: ${CLAY_VERSION}" \
    -H "user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/145.0.0.0 Safari/537.36" \
    "$@"
}
```

**⚠️ Security: Never hardcode `CLAY_COOKIE` as a literal value in the script — it will appear in logs, telemetry, and git history.** Read it from env only. Check your `.env.deepline` is in `.gitignore`. Check your `output/` directory (which contains Clay PII data) is also gitignored.

**Clay API endpoint facts (verified):**

| What you need | Correct endpoint | Notes |
|---|---|---|
| All record IDs | `GET /v3/tables/{TABLE_ID}/views/{VIEW_ID}/records/ids` | `GET /v3/tables/{TABLE_ID}/records/ids` returns `NotFound` — **view ID required** |
| View ID | `GET /v3/tables/{TABLE_ID}` → `.table.firstViewId` | Always fetch dynamically from schema |
| Fetch records | `POST /v3/tables/{TABLE_ID}/bulk-fetch-records` | Body: `{"recordIds": [...], "includeExternalContentFieldIds": []}` |
| Response format | `{"results": [{id, cells, ...}]}` | Key is `results`; record ID is `.id` (not `.recordId`) |
| Record IDs response | `{"results": ["r_abc", "r_def", ...]}` | Parse with `.get("results", [])` |

**Python subprocess for JSON payloads (mandatory when JS code is in the payload):**
```bash
WITH_ARG=$(python3 - <<'PYEOF'
import json
code = "const fn=(row.first_name||'').toLowerCase()..."
print('col_name=run_javascript:' + json.dumps({'code': code}))
PYEOF
)
deepline enrich --input seed.csv --output work.csv --with "$WITH_ARG"
```
This avoids all bash/JSON quoting issues. Never hand-build JSON with embedded JS in bash strings.

**Execution ordering** — always follow the staged pattern:
1. Declare all independent columns → execute run_javascript first
2. Add validation/AI columns that reference JS output (--in-place) → execute
3. Add merge column (--in-place) → execute
4. Export

See [execution-ordering.md](references/execution-ordering.md) for the full pattern with polling loop.

---

## Clay Action → Deepline Tool Mapping

Full CLI patterns: [clay-action-mappings.md](references/clay-action-mappings.md). Always verify tool IDs before use.

### Unknown Action Fallback — Required When Action Is Not in the Table

When you encounter a Clay action not listed below, **do not guess**. Run discovery:

```bash
# Step 1 — describe what the action does, search by intent
deepline tools search "<what the action does>"

# Step 2 — inspect the best candidate
deepline tools get <candidate_tool_id>

# Step 3 — if nothing found, fall back to deeplineagent
# Use deeplineagent with jsonSchema for structured outputs; split exa_search and synthesis for research-heavy work
```

**Discovery examples by Clay action type:**

| You see in Clay | Search query | Likely result |
|---|---|---|
| `enrich-person-with-*` | `deepline tools search "person enrich linkedin"` | `leadmagic_profile_search`, `crustdata_person_enrichment` |
| `find-email-*` | `deepline tools search "email finder"` | `hunter_email_finder`, `leadmagic_email_finder` |
| `verify-email-*` | `deepline tools search "email verify validate"` | `leadmagic_email_validation`, `zerobounce_email_validation` |
| `company-*` / `enrich-company-*` | `deepline tools search "company enrich"` | `apollo_enrich_company`, `prospeo_enrich_company` |
| `add-to-campaign-*` | `deepline tools search "add leads campaign"` | `instantly_add_to_campaign`, `smartlead_api_request` |
| `social-media-*` | `deepline tools search "linkedin posts scrape"` | `crustdata_linkedin_posts`, `apify_run_actor_sync` |
| Completely novel action | `deepline tools search "<verb> <noun from column name>"` | Use top result or `deeplineagent` fallback |

**When to use `deeplineagent` fallback**: If `deepline tools search` returns no relevant tools, or the action involves model judgment (classification, scoring, generation, summarization), use `deeplineagent`. Reconstruct the prompt from `portableSchema.inputsBinding[name=prompt].formulaText` or the cell values in `bulkFetchRecords`.

| Clay action | Deepline tool | Test status |
|---|---|---|
| `generate-email-permutations` + entire email waterfall + `validate-email` | **`cost_aware_first_name_and_domain_to_email_waterfall`** (primary) + manual `perm_fln` + `leadmagic_email_validation` + `person_linkedin_to_email_waterfall` (fallback). See Pass Plan 5a–5e. **⚠️ `person_linkedin_to_email_waterfall` alone = ~13% match rate vs ~99% with permutation-first approach.** CLI syntax: `deepline enrich --input seed.csv --output out.csv --with '{"alias":"email_result","tool":"cost_aware_first_name_and_domain_to_email_waterfall","payload":{"first_name":"{{first_name}}","last_name":"{{last_name}}","domain":"{{domain}}"}}'` — the play expansion is previewed before running and the waterfall stops early once a valid email is found. | ✅ Tested — found `patrick.valle@zoominfo.com` (status=valid) via dot pattern on first try; 7 downstream providers skipped |
| `enrich-person-with-mixrank-v2` | `leadmagic_profile_search` → `crustdata_person_enrichment` | Not yet tested against real Clay Mixrank output |
| `lookup-company-in-other-table` | `run_javascript` (local CSV join) | Not yet tested |
| `chat-gpt-schema-mapper` | `deeplineagent`; add `jsonSchema` when you need structured extraction | ✅ Tested (analogous to `data_warehouse` + `job_function` passes) |
| `use-ai` (no web) | `deeplineagent` | ✅ Tested (data_warehouse, job_function, technical_resources_readiness, key_gtm_friction passes) |
| `use-ai` (claygent + web) | **Binary search optimizer** — Pass 1: 3× parallel `exa_search` (highlights-only, ~10x cheaper) targeting financial/IR, product launches, new segments → Pass 2: `deeplineagent` synthesis with `jsonSchema` containing `confidence: "high\|medium\|low"` and `missing_angles: [...]` → **Binary gate**: only if `confidence != "high"`: Pass 3a targeted follow-up exa searches on `missing_angles` → Pass 3b: re-synthesize → Pass 3c: primary-source deep-read (IR site, newsroom, official blog — scored by domain type) with `text: true` → final synthesis. Tracks `research_confidence` and `research_passes` columns. | ✅ Tested — 26 rows, side-by-side vs Clay traces (see empirical results in Phase 3) |
| `octave-qualify-person` | `deeplineagent` + `jsonSchema` ICP scorer | ✅ Tested — 26 rows |
| `octave-run-sequence-runner` | Pass 1: `deeplineagent` (signals) → Pass 2: `deeplineagent` (email) | Pattern tested (find_tension_mapping + verified_pvp_messages); not yet validated against a real sequence-runner Clay column |
| `add-lead-to-campaign` (Smartlead) | `smartlead_api_request` POST /v1/campaigns/{id}/leads | Not yet tested |
| `add-lead-to-campaign` (Instantly) | `instantly_add_to_campaign` — use `deepline tools execute instantly_add_to_campaign --payload '{"campaign_id":"<id>","contacts":[{"email":"...","first_name":"...","last_name":"...","company":"..."}]}'`. List campaigns first with `instantly_list_campaigns`. | ✅ Tested — `{"pushed": 1, "failed": 0, "errors": []}` |
| `exa_search` | `exa_search` (direct) | ✅ Tested extensively — highlights-only and full-text modes, include_domains |
| `route-row` | **Not replicable in Deepline.** Clay's routing action pushes rows to downstream tables conditionally. Replace with: produce a filtered output CSV per destination. Note destination `tableId` values from `inputsBinding` so user knows where data was going. | N/A — produces output CSV instead |
| `find-lists-of-companies-with-mixrank-source` (source type) | **Pass 1**: `apollo_company_search` (location, size, industry, tech stack filters) → **Pass 2** (optional): `prospeo_enrich_company` (description, industry, employee count). Ask user for original Mixrank filter criteria. See [clay-action-mappings.md](references/clay-action-mappings.md) Company Source section. | ✅ apollo_company_search tested |
| `social-posts-*` | Two separate tools: (1) **Profile post scraper** → `apify_run_actor_sync` (apimaestro/linkedin-profile-scraper) for fetching posts from a specific person's profile. (2) **Content search/signal monitoring** → `crustdata_linkedin_posts` (requires `keyword` field + optional `filters` for `MEMBER`, `COMPANY`, `AUTHOR_COMPANY`, etc.) — this is a keyword search, NOT a profile URL scraper. Use (1) for Clay's `social-posts-person` action; use (2) for signal monitoring (e.g., "who at ZoomInfo is posting about GTM?"). | ✅ `crustdata_linkedin_posts` keyword search tested; `apify_run_actor_sync` profile scraper not yet validated end-to-end |

---

## Binary Search Optimizer (Claygent Web Research Pattern)

Use whenever replicating a `use-ai (claygent + web)` column. Full pattern, pass structure, `_extract_primary_source_url()` implementation, confidence calibration data, failure modes, and search angle variants: **[binary-search-optimizer.md](references/binary-search-optimizer.md)**

**Summary:** Pass A: 3× parallel `exa_search` (highlights-only, domain-quoted queries) → Pass B: `deeplineagent` synthesis with `jsonSchema` containing `confidence` + `missing_angles` → **gate**: if `confidence == "high"` stop → Pass C: follow-up searches on `missing_angles` → Pass D: re-synthesize → Pass E: primary-source deep-read via `_extract_primary_source_url(company_domain=domain)`.

Always add `research_confidence` and `research_passes` tracking columns. `low` confidence ≠ bad output (26-row test: 0% high, 35% medium, 65% low — but 50% of `low` rows had specific useful content).

---

## Critical Rules

- **Execution ordering**: `run_javascript` columns must be executed before adding `--in-place` columns that reference their values. See [execution-ordering.md](references/execution-ordering.md).
- **Conditional row execution**: Never run expensive paid columns (provider waterfalls, validation APIs) on rows where a cheaper stage already found the answer. Use the filter → enrich → merge pattern: after each "find" stage, filter to only rows still missing a value, then run the next stage on that subset only. See [execution-ordering.md](references/execution-ordering.md) for the full pattern. Local `run_javascript` columns are exempt — always run them on all rows.
- **Flatten first** (CLI pattern): `fields=run_javascript:{flatten clay_record}` — required before any `{{fields.xxx}}` reference when using `deepline enrich --in-place`. **Not required when using the Python SDK approach** — just call `json.loads(row['clay_record'])` directly in Python and access any key.
- **2-level max interpolation**: `{{col.field}}` only works when the stored cell actually has `field` at the root. `{{col.field.nested}}` still fails. Flatten first (CLI) or go multi-level in Python.
- **`MAX_LONG` cap warning**: If you add a row-count cap for expensive downstream passes (e.g. `row_range_long = row_range[:20]`), make sure to document this explicitly — it's easy to miss that some batches silently skip rows beyond the cap on a `full` run.
- **Always use structured JSON for `deeplineagent`**: Make a **single** `deeplineagent` invocation per column and extract all needed fields from one structured response. Never make multiple model calls where one `jsonSchema` output would suffice. Example: `qualify_person` should return one JSON object with `score`, `tier`, and `reasoning` — not three separate calls.
- **Structured output access**: In `deepline enrich`, do not assume `deeplineagent` with `jsonSchema` stores a bare object at the cell root. Passing `{{col}}` as whole-context is reliable; if you need a specific field later, flatten it into a dedicated scalar column with `run_javascript` first. For Python SDK flows, parse the JSON and access the nested keys directly in Python.
- **Separate passes for deps**: A column referenced by `{{xxx}}` must be in a prior enrich call.
- **Python subprocess for payloads**: Use `python3 -c "import json; print('col=tool:' + json.dumps({...}))"` — never hand-write JSON with embedded JS in bash strings.
- **Cookie in env**: Never embed `CLAY_COOKIE` value in code — always `process.env.CLAY_COOKIE`.
- **Catch-all is valid**: Accept `valid`, `valid_catch_all`, AND `catch_all` from `leadmagic_email_validation` — all three are as reliable as Clay's own ZeroBounce output. `valid_catch_all` is the highest-confidence version (engagement-confirmed, <5% bounce rate). Do NOT accept `unknown`.

See [patterns.md](references/patterns.md) for prescriptive patterns + antipatterns for every common mistake.

---

## Phase 3: Evaluation

After running the full pipeline, compare against Clay ground truth.

**Run the bundled comparison script:**
```bash
# Auto-detects clay_ prefixed columns → unprefixed mapping
python3 /path/to/skill/scripts/compare.py ground_truth.csv enriched.csv

# Or with explicit column mapping
python3 /path/to/skill/scripts/compare.py ground_truth.csv enriched.csv \
  --map '{"clay_final_email":"work_email","clay_job_function":"job_function"}'
```

### Accuracy Thresholds

| Column type | Pass threshold | How to check |
|---|---|---|
| Email (`work_email`) | DL found rate ≥ 95% of Clay found rate | compare.py auto-flags |
| Classify (`job_function`) | ≥ 95% exact match on pilot rows | compare.py distribution output |
| Structured (`deeplineagent` + `jsonSchema`) | Object present in 100% of rows, all schema fields populated | Spot-check 5 rows |
| Fetch (`run_javascript`) | 100% non-null for all mapped fields | compare.py fill rate |
| Claygent/web research (`exa_search` + `deeplineagent`) | `is_failed_research()` returns False on ≥ 85% of rows | See confidence calibration section |

### Empirical results from 26-row production test (March 2026)

**Company mix**: ZoomInfo, Klaviyo, PetScreening, PulsePoint, Tines, Bloomreach, Aviatrix, BambooHealth, Circle, ZipHQ, Edmentum, PandaDoc, project44, Apollo GraphQL, Digital Turbine, Amagi, Stash, ConstructConnect, Momentive Software, and others.

| Metric | Value |
|---|---|
| Total rows | 26 |
| All 3 passes fired | 26/26 (100%) |
| Confidence: high | 0/26 (0%) |
| Confidence: medium/medium-high | 9/26 (35%) |
| Confidence: low (but useful content) | 13/26 (50%) |
| Failed (UNCHANGED/UNRESOLVED) | 4/26 (15%) |
| Score: Tier B (5–6) | 14/26 (54%) |
| Score: Tier C (0–4) | 12/26 (46%) |

**Clay vs Deepline content comparison (ZoomInfo, Patrick Valle):**
- Clay: `confidence=high`, 10 search steps, 133.75s, $0.13 cost, queries Google
- Deepline: `confidence=low` (despite specific accurate content), 3 passes, parallel execution, ~$0.02–0.05 cost, queries Exa
- Content quality: equivalent — both surface the ZI→GTM ticker rebranding, AI platform expansion, and international data initiatives. Deepline output is more recent (includes 2025 10-K filed Feb 2026 vs Clay's older run).

**Companies that reliably reach medium confidence**: Large public companies (ZoomInfo, Klaviyo, Circle), well-funded startups with active newsrooms (Tines, Aviatrix, project44), and companies with recent PR activity (PandaDoc, BambooHealth, Digital Turbine, Amagi, Onit).

**Companies that stay low**: Small niche B2B (LERETA, ConstructConnect, ZipHQ, Edmentum), companies with ambiguous domain names (getflex.com, onit.com, stash.com), private equity-owned businesses with sparse web presence.

### Accuracy Expectations (What "Found" Actually Means)

**Email accuracy is not 100% even at 99% match rate.** There are two categories:

1. **Confirmed deliverable**: LeadMagic returns `valid` or `valid_catch_all` → high confidence. `valid_catch_all` means engagement signal data confirmed the address on a catch-all domain (<5% bounce rate).
2. **Unverifiable catch-all**: Returns `catch_all` → domain accepts all addresses. The permutation format (fn.ln, fln) is a best guess. Same limitation as Clay (Clay uses the same ZeroBounce validation).
3. **Unknown**: Server no response → skip; do not treat as found.

This matches Clay's own accuracy characteristics — Clay uses the same ZeroBounce validation. If a user asks "is this 100% accurate?", the honest answer is: **same accuracy as Clay, which is high but not 100%** due to catch-all domains and format edge cases.

---

## Workflow

1. **Phase 1**: Table summary, dependency graph, pass plan, assumptions log
2. **Phase 2 pre-flight**: Complete checklist — especially email strategy verification
3. **Confirm**: Get user approval on assumptions
4. **Phase 2**: Generate `clay_fetch_records.sh` + `claygent_replicate.sh`
5. **Pilot gate**: Run `--rows 0:1` for any paid tools; show preview
6. **Full run**: After approval
7. **Phase 3**: `python3 compare.py ground_truth.csv enriched.csv` — confirm all thresholds pass

---

## Pilot Gate (before paid tools)

`run_javascript` needs no pilot gate.
For `deeplineagent`, `exa_search`, `leadmagic_*`, `hunter_*`, and other non-trivial or paid tools: run rows 0:1 first.

```bash
./claygent_replicate.sh           # pilot: row 0 only
./claygent_replicate.sh 0:3       # rows 0-2
./claygent_replicate.sh full      # all rows
```