Process File

POST
SYNC
/api/v1/document/{extractionSchemaId}/process-file/{fileId}

Extract structured data from uploaded files using AI-powered document processing. This endpoint analyzes your document synchronously and returns structured data immediately upon completion.

For large files or scalable workflows, consider using async processing instead.

Synchronous vs Asynchronous Processing

Choose the right processing method based on your use case and file characteristics.

Synchronous Processing
SYNC

Blocks and waits for processing to complete before returning results.

Best For:

  • • Small to medium files (< 10MB)
  • • Real-time user interfaces
  • • Simple integration workflows
  • • When you need immediate results

Limitations:

  • • May timeout on large files
  • • Blocks the request thread
  • • Not suitable for batch processing
Asynchronous Processing
ASYNC

Returns immediately with a job ID, process in background, poll for results.

Best For:

  • • Large files (10MB+)
  • • Batch processing workflows
  • • Scalable applications
  • • When using webhooks

Benefits:

  • • No timeout limitations
  • • Duplicate job prevention
  • • Better resource utilization

When to Use Synchronous Processing

Real-time UI

When building user interfaces that need to show processing results immediately after upload, such as document viewers or data entry forms.

Small Documents

For documents under 10MB that typically process within 30-60 seconds, such as invoices, receipts, or single-page forms.

Simple Workflows

For straightforward integrations where you want to keep the implementation simple without polling or webhook complexity.

Request Parameters

Path Parameters
required
extractionSchemaIdstring

The unique identifier of the extraction schema to use for processing.

required
fileIdstring

The file ID returned from the get-url endpoint after uploading your file.

Headers
required
Authorizationstring

Bearer token for authentication.

required
Content-Typestring

Must be set to application/json

Request Body

Send an empty JSON object {} as the request body.

process-file-request.js

Response

Success Response (200) - Simple Table

Example response for a document with a single table structure.

simple-table-response.json
Success Response (200) - Nested Tables

Example response for a document with nested table structures (parent-child relationships).

nested-tables-response.json
Error Responses

Common error responses you might encounter.

error-responses.json
💡 Timeout Handling

If you encounter timeout errors (408), consider switching to async processing for better handling of large files and complex documents.

Data Structure Details

Table Structure
id

Unique table identifier

name

Human-readable table name

columns

Array of column definitions

rows

Array of extracted data rows

Row Structure
id

Unique row identifier

index

Row position (as string)

status

Row status (PENDING, ACCEPTED, REJECTED)

cells

Object containing cell data keyed by column name

childTables

Optional: Nested child tables (if schema has relationships)

Cell Structure
value

Extracted cell value

columnId

Reference to the column definition

metadata

Column metadata (type, description)

Complete Workflow Example

Here's a complete example showing the full synchronous processing workflow from upload to result processing.

complete-sync-workflow.js

Best Practices

File Size Considerations
  • • Keep files under 10MB for reliable sync processing
  • • Use async processing for files larger than 10MB
  • • Monitor processing times and switch to async if needed
  • • Consider file compression for large documents
Error Handling
  • • Always implement timeout handling for large files
  • • Provide fallback to async processing on timeouts
  • • Include file metadata in error logs for debugging
  • • Handle network errors gracefully with retries
User Experience
  • • Show progress indicators during processing
  • • Provide estimated processing times to users
  • • Offer async processing as an option for large files
  • • Display helpful error messages with next steps
Performance
  • • Process files during off-peak hours when possible
  • • Implement proper request timeouts in your client
  • • Consider caching results for frequently processed files
  • • Monitor API response times and adjust strategy