INFRATEX | Document Parsing

/ QUICKSTART

Run the full document pipeline in minutes.

Start server-side. Upload a document, wait for parsing, create a hybrid index, then stream a cited answer from that indexed context.

Install

Use the SDK in your backend service, or start with curl while wiring environment variables.

export INFRATEX_API_KEY=infratex_sk_...

Get an API key

Create a key in the dashboard. The full key is shown once and should be stored in server secrets.

http

Authorization: Bearer infratex_sk_your_key_here

# 1. Upload and parse
curl -X POST https://api.infratex.io/api/v1/documents \
  -H "Authorization: Bearer $INFRATEX_API_KEY" \
  -F "file=@contract.pdf" \
  -F "method=standard"

# 2. Create a hybrid index
curl -X POST https://api.infratex.io/api/v1/documents/{document_id}/indexes \
  -H "Authorization: Bearer $INFRATEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"method":"hybrid"}'

# 3. Stream a cited answer
curl -N -X POST https://api.infratex.io/api/v1/responses \
  -H "Authorization: Bearer $INFRATEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"message":"Summarize termination rights with citations.","method":"hybrid","model":"fast","document_ids":["{document_id}"],"limit":5}'

/ SDKS

Use the same resource model from Python or Node.

Python service

Best for ETL workers, extraction jobs, notebooks, and backend APIs.

pip install infratex

Node.js service

Best for Next.js route handlers, Express services, queues, and streaming app backends.

npm install infratex

from infratex import Infratex

client = Infratex(api_key="infratex_sk_...")

doc = client.documents.upload(
    "board_pack.pdf",
    method="standard",
    collection_id="col_123",
)
client.documents.wait_until_parsed(doc.id)

markdown = client.documents.markdown(doc.id)
print(markdown[:1000])

/ DOCUMENTS

Upload files and retrieve parsed Markdown.

POST/api/v1/documentsHTTP 202

Upload PDF

Multipart upload with file, method, optional pipeline for legacy, and optional collection_id.

POST/api/v1/documents/imagesHTTP 202

Upload page images

Multipart upload with repeated files. File order is treated as page order.

GET/api/v1/documents/{id}

Get document

Returns parse status, metadata, and index summaries.

GET/api/v1/documents/{id}/markdown

Get Markdown

Returns extracted Markdown as text/markdown after parsing completes.

curl -X POST https://api.infratex.io/api/v1/documents \
  -H "Authorization: Bearer $INFRATEX_API_KEY" \
  -F "file=@report.pdf" \
  -F "method=standard" \
  -F "collection_id=col_123"

/ INDEXES

Create retrieval artifacts before search or generation.

vector

Semantic retrieval over document chunks. Good default for natural-language questions.

hybrid

Semantic, keyword, and document-structure retrieval. Recommended for production contracts, filings, and tables.

POST/api/v1/documents/{id}/indexesHTTP 202

Create index

Queues vector or hybrid indexing for a parsed document.

GET/api/v1/documents/{id}/indexes/{method}

Get index status

Poll until status is indexed before search or response calls.

curl -X POST https://api.infratex.io/api/v1/documents/{document_id}/indexes \
  -H "Authorization: Bearer $INFRATEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"method":"hybrid"}'

/ SEARCH

Retrieve cited context without generating text.

Search is for previews, evidence panels, ranking inspection, and retrieval debugging. Send one scope: document_ids, collection_id, or conversation-backed scope through responses.

POST/api/v1/searches

Search indexed context

Returns ranked chunks with document, page, score, content, and metadata.

curl -X POST https://api.infratex.io/api/v1/searches \
  -H "Authorization: Bearer $INFRATEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query":"Find indemnity carve-outs","method":"hybrid","document_ids":["doc_123"],"limit":5}'

/ RESPONSES

Stream answers grounded in indexed documents.

fast

Lower-latency response model for product surfaces, summaries, and routine Q&A.

pro

Higher-capability response model for complex synthesis, legal analysis, and cross-document questions.

POST/api/v1/responses

Create streaming response

Streams server-sent events: sources, thinking when enabled, text deltas, then done.

curl -N -X POST https://api.infratex.io/api/v1/responses \
  -H "Authorization: Bearer $INFRATEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"message":"What are the top risks?","method":"hybrid","model":"fast","collection_id":"col_123","limit":8,"reasoning":false}'

/ EXTRACTION

Extract structured fields with evidence.

Extraction runs accept either a reusable template_id or inline fields. Inline fields require a name, type, and description, and can include objects, arrays, enums, and field-specific instructions.

POST/api/v1/documents/{id}/extractionsHTTP 202

Create extraction run

Queues a run against parsed Markdown and returns pending status.

GET/api/v1/extractions/{run_id}

Poll or fetch result

Use include_evidence=true when you need evidence payloads in the response.

GET/api/v1/documents/{id}/extractions

List document runs

Returns prior extraction runs for a document with pagination.

GET/api/v1/extractions/{run_id}/export

Export tabular results

Download xlsx or csv when the result contains array<object> fields.

curl -X POST https://api.infratex.io/api/v1/documents/{document_id}/extractions \
  -H "Authorization: Bearer $INFRATEX_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fast",
    "include_evidence": true,
    "inline_fields": [
      {
        "name": "counterparty",
        "type": "string",
        "description": "The legal name of the counterparty"
      },
      {
        "name": "effective_date",
        "type": "date",
        "description": "The contract effective date"
      },
      {
        "name": "termination_fee",
        "type": "number",
        "description": "Any explicit termination fee amount"
      }
    ]
  }'

/ COLLECTIONS

Group documents into product-ready scopes.

Collections let you upload many documents into a durable scope and query them together with the same search and response APIs.

POST/api/v1/collections

Create collection

Create a named document group for retrieval and responses.

GET/api/v1/collections

List collections

Return all tenant collections.

PATCH/api/v1/documents/{id}

Move document

Set collection_id or remove_collection on an existing document.

DELETE/api/v1/collections/{id}

Delete collection

Deletes the collection record and unsets it from documents.

json

{
  "message": "Compare the warranty limits across the uploaded agreements.",
  "method": "hybrid",
  "model": "pro",
  "collection_id": "col_123",
  "limit": 10,
  "reasoning": true
}

/ MCP

Connect agent clients to the same pipeline.

The remote MCP server exposes document creation, indexing, retrieval, and grounded answer generation over the same tenant-scoped API key model.

Endpoint

https://api.infratex.io/mcp

Use streamable HTTP and pass the same Authorization: Bearer infratex_sk_... header.

json

{
  "mcpServers": {
    "infratex": {
      "url": "https://api.infratex.io/mcp",
      "headers": {
        "Authorization": "Bearer infratex_sk_..."
      }
    }
  }
}

create_document

Queue PDF parsing from a base64 payload.

create_document_images

Queue parsing for ordered image batches.

create_index

Queue vector or hybrid indexing.

search_documents

Run retrieval across documents or collections.

answer_documents

Generate cited answers from indexed context.

/ REFERENCE

Core parameters and endpoint map.

Parameter	Values	Use when
`Parse method`	standard, max, legacy, cost-efficient, standard-html, standard-ultra-2, dots-mocr, infratex-phi	Controls parser quality, cost profile, or compatibility.
`Image parse method`	standard, max, standard-html, standard-ultra-2, dots-mocr, infratex-phi	Used with ordered PNG, JPEG, or WebP page batches.
`Retrieval method`	vector, hybrid	Use hybrid for exact terms, tables, identifiers, and audit-heavy workflows.
`Response model`	fast, pro	Use fast for latency-sensitive product surfaces, pro for harder synthesis.
`reasoning`	true, false	When true, response streams may include thinking events before text.

Documents

POST /api/v1/documentsPOST /api/v1/documents/imagesGET /api/v1/documents/{id}GET /api/v1/documents/{id}/markdownGET /api/v1/documents/{id}/ast

Retrieval

POST /api/v1/documents/{id}/indexesGET /api/v1/documents/{id}/indexesPOST /api/v1/searchesPOST /api/v1/responses

Extraction

POST /api/v1/extraction-templatesGET /api/v1/extraction-templatesPOST /api/v1/documents/{id}/extractionsGET /api/v1/extractions/{run_id}GET /api/v1/extractions/{run_id}/export

Account

GET /api/v1/accountGET /api/v1/billingPOST /api/v1/keysGET /api/v1/collectionsPOST /api/v1/collections

Readiness invariant

Search and response calls require a ready index for the selected method. If you request hybrid, the selected documents or collection must already have a hybrid index.

Integrate document AI without stitching five systems together.

Server-side keys

Async resources

Explicit scope

Citations by default