NAV

Getting Started

Introduction

The Rossum API allows you to programmatically access and manage your organization's Rossum data and account information. The API allows you to do the following programmatically:

On this page, you will find an introduction to the API usage from a developer perspective, and a reference to all the API objects and methods.

Developer Resources

There are several other key resources related to implementing, integrating and extending the Rossum platform:

Quick API Tutorial

For a quick tutorial on how to authenticate, upload a document and export extracted data, see the sections below. If you want to skip this quick tutorial, continue directly to the Overview section.

It is a good idea to go through the introduction to the Rossum platform on the Developer Portal first to make sure you are up to speed on the basic Rossum concepts.

If in trouble, feel free to contact us at support@rossum.ai.

Install curl tool

Test curl is installed properly

curl https://<example>.rossum.app/api/v1
{
  "organizations":"https://<example>.rossum.app/api/v1/organizations",
  "workspaces":"https://<example>.rossum.app/api/v1/workspaces",
  "schemas":"https://<example>.rossum.app/api/v1/schemas",
  "connectors":"https://<example>.rossum.app/api/v1/connectors",
  "inboxes":"https://<example>.rossum.app/api/v1/inboxes",
  "queues":"https://<example>.rossum.app/api/v1/queues",
  "documents":"https://<example>.rossum.app/api/v1/documents",
  "users":"https://<example>.rossum.app/api/v1/users",
  "groups":"https://<example>.rossum.app/api/v1/groups",
  "annotations":"https://<example>.rossum.app/api/v1/annotations",
  "pages":"https://<example>.rossum.app/api/v1/pages"
}

All code samples included in this API documentation use curl, the command line data transfer tool. On MS Windows 10, MacOS X and most Linux distributions, curl should already be pre-installed. If not, please download it from curl.haxx.se).

Optionally use jq tool to pretty-print JSON output

curl https://<example>.rossum.app/api/v1 | jq
{
  "organizations": "https://<example>.rossum.app/api/v1/organizations",
  "workspaces": "https://<example>.rossum.app/api/v1/workspaces",
  "schemas": "https://<example>.rossum.app/api/v1/schemas",
  "connectors": "https://<example>.rossum.app/api/v1/connectors",
  "inboxes": "https://<example>.rossum.app/api/v1/inboxes",
  "queues": "https://<example>.rossum.app/api/v1/queues",
  "documents": "https://<example>.rossum.app/api/v1/documents",
  "users": "https://<example>.rossum.app/api/v1/users",
  "groups": "https://<example>.rossum.app/api/v1/groups",
  "annotations": "https://<example>.rossum.app/api/v1/annotations",
  "pages": "https://<example>.rossum.app/api/v1/pages"
}

You may also want to install jq tool to make curl output human-readable.

Use the API on Windows

This API documentation is written for usage in command line interpreters running on UNIX based operation systems (Linux and Mac). Windows users may need to use the following substitutions when working with API:

Character used in this documentation Meaning/usage Substitute character for Windows users
' single quotes "
" double quotes "" or \"
\ continue the command on the next line ^

Example of API call on UNIX-based OS

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"target_queue": "https://<example>.rossum.app/api/v1/queues/8236", "target_status": "to_review"}' \
  'https://<example>.rossum.app/api/v1/annotations/315777/copy'

Examples of API call on Windows

curl -H "Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03" -H "Content-Type: application/json" ^
  -d "{""target_queue"": ""https://<example>.rossum.app/api/v1/queues/8236"", ""target_status"": ""to_review""}" ^
  "https://<example>.rossum.app/api/v1/annotations/315777/copy"


curl -H "Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03" -H "Content-Type: application/json" ^
  -d "{\"target_queue\": \"https://<example>.rossum.app/api/v1/queues/8236\", \"target_status\": \"to_review\"}" ^
  "https://<example>.rossum.app/api/v1/annotations/315777/copy"

Create an account

In order to interact with the API, you need an account. If you do not have one, you can create one via our self-service portal.

Login to the account

Fill-in your username and password (login credentials to work with API are the same as those to log into your account.). Trigger login endpoint to obtain a key (token), that can be used in subsequent calls.

curl -s -H 'Content-Type: application/json' \
  -d '{"username": "east-west-trading-co@example.com", "password": "aCo2ohghBo8Oghai"}' \
  'https://<example>.rossum.app/api/v1/auth/login'
{"key": "db313f24f5738c8e04635e036ec8a45cdd6d6b03"}

This key will be valid for a default expire time (currently 162 hours) or until you log out from the sessions.

Upload a document

In order to upload a document (PDF, image, XLSX, XLS, DOCX, DOC) through the API, you need to obtain the id of a queue first.

curl -s -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03'
  'https://<example>.rossum.app/api/v1/queues?page_size=1' | jq -r .results[0].url
https://<example>.rossum.app/api/v1/queues/8199

Then you can upload document to the queue. Alternatively, you can send documents to a queue-related inbox. See upload for more information about importing files.

curl -s -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -F content=@document.pdf 'https://<example>.rossum.app/api/v1/uploads?queue=8199' | jq -r .url
https://<example>.rossum.app/api/v1/tasks/9231

Wait for document to be ready and review extracted data

As soon as a document is uploaded, it will show up in the queue and the data extraction will begin. It may take a few seconds to several minutes to process a document. You can check status of the annotation and wait until its status is changed to to_review.

curl -s -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/319668' | jq .status
"to_review"

After that, you can open the Rossum web interface example.rossum.app to review and confirm extracted data.

Download reviewed data

Now you can export extracted data using the export endpoint of the queue. You can select XML, CSV, XLSX or JSON format. For CSV, use URL like:

curl -s -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/queues/8199/export?status=exported&format=csv&id=319668'
Invoice number,Invoice Date,PO Number,Due date,Vendor name,Vendor ID,Customer name,Customer ID,Total amount,
2183760194,2018-06-08,PO2231233,2018-06-08,Alza.cz a.s.,02231233,Rossum,05222322,500.00

Logout

Finally you can dispose token safely using logout endpoint:

curl -s -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/auth/logout'
{"detail":"Successfully logged out."}

Overview

HTTP and REST

The Rossum API is organized around REST. Our API has predictable, resource-oriented URLs, and uses HTTP response codes to indicate API errors. We use built-in HTTP features, like HTTP authentication and HTTP verbs, which are understood by off-the-shelf HTTP clients.

HTTP Verbs

Call the API using the following standard HTTP methods:

We support cross-origin resource sharing, allowing you to interact securely with our API from a client-side web application. JSON is returned by API responses, including errors (except when another format is requested, e.g. XML).

Base URL

Base API endpoint URL depends on the account type, deployment and location. Default URL is https://<example>.rossum.app/api where the example is the domain selected during the account creation. URLs of companies using a dedicated deployment may look like https://acme.rossum.app/api.

If you are not sure about the correct URL you can navigate to https://app.rossum.ai and use your email address to receive your account information via email.

Please note that we previously recommended using the https://api.elis.rossum.ai endpoint to interact with the Rossum API, but now it is deprecated. For new integrations use the new https://<example>.rossum.app/api endpoint. For accounts created before Nov 2022 use the https://elis.rossum.ai/api.

Authentication

Most of the API endpoints require a user to be authenticated. To login to the Rossum API, post an object with username and password fields. Login returns an access key to be used for token authentication.

Our API also provide possibility to authenticate via One-Time token which is returned after registration. This tokens allows users to authenticate against our API, but after one call, this token will be invalidated. This token can be exchanged for regular access token limited only by the time of validity. For the purpose of token exchange, use the /auth/token endpoint.

Users may delete a token using the logout endpoint or automatically after a configured time (the default expiration time is 162 hours). The default expiration time can be lowered using max_token_lifetime_s field. When the token expires, 401 status is returned. Users are expected to re-login to obtain a new token.

Rossum's API also supports session authentication, where a user session is created inside cookies after login. If enabled, the session lasts 1 day until expired by itself or until logout While the session is valid there is no need to send the authentication token in every request, but the "unsafe" request (POST, PUT, PATCH, DELETE), whose MIME type is different from application/json must include X-CSRFToken header with valid CSRF token, which is returned inside Cookie while loging in. When a session expires, 401 status is returned as with token authentication, and users are expected to re-login to start a new session.

Login

Login user using username and password

curl -H 'Content-Type: application/json' \
  -d '{"username": "east-west-trading-co@<example>.rossum.app", "password": "aCo2ohghBo8Oghai"}' \
  'https://<example>.rossum.app/api/v1/auth/login'
{
  "key": "db313f24f5738c8e04635e036ec8a45cdd6d6b03",
  "domain": "acme-corp.app.rossum.ai"
}

POST /v1/auth/login

Use token key in requests

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/organizations/406'

Note: The Token authorization scheme is also supported for compatibility with earlier versions.

curl -H 'Authorization: Token db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/organizations/406'

Login user expiring after 1 hour

curl -H 'Content-Type: application/json' \
  -d '{"username": "east-west-trading-co@<example>.rossum.app", "password": "aCo2ohghBo8Oghai", "max_token_lifetime_s": 3600}' \
  'https://<example>.rossum.app/api/v1/auth/login'
{
  "key": "ltcg2p2w7o9vxju313f04rq7lcc4xu2bwso423b3",
  "domain": null
}
Attribute Type Required Description
username string true Username of the user to be logged in.
password string true Password of the user.
origin string false For internal use only. Using this field may affect throttling of your API requests.
max_token_lifetime_s integer false Duration (in seconds) for which the token will be valid. Default is 162 hours which is also the maximum.

Response

Status: 200

Returns object with "key", which is an access token. And the user's domain.

Attribute Type Description
key string Access token.
domain string The domain the token was issued for.

Logout

Logout user

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/auth/logout'
{
  "detail": "Successfully logged out."
}

POST /v1/auth/logout

Logout user, discard auth token.

Response

Status: 200

Token Exchange

Exchange One-Time authentication token with a longer-lived access token.

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/auth/token'
{
  "key": "ltcg2p2w7o9vxju313f04rq7lcc4xu2bwso423b3",
  "domain": "<example>.rossum.app",
  "scope": "default"
}

POST /v1/auth/token

Attribute Type Required Description
scope string false Supported values are default, approval (for internal use only)
max_token_lifetime_s float false Duration (in seconds) for which the token will be valid (default: lifetime of the current token or 162 hours if the current token is one-time). Can be set to a maximum of 583200 seconds (162 hours).
origin string false For internal use only. Using this field may affect throttling of your API requests.

This endpoint enables the exchange of a one-time token for a longer lived access token.

It is able to receive either one-time tokens provided after registration, or JWT tokens if you have such a setup configured. The token must be provided in a the Bearer authorization header.

JWT authentication

Short-lived JWT tokens can be exchanged for access tokens. A typical use case, for example, is logging in your users via SSO in your own application, and displaying the Rossum app to them embedded.

To enable JWT authentication, one needs to provide Rossum with the public key that shall be used to decode the tokens. Currently only tokens with EdDSA (signed using Ed25519 and Ed448 curves) and RS512 signatures are allowed, and token validity should be 60 seconds maximum.

The expected formats of the header and encoded payload of the JWT token are as follows:

Decoded JWT Header Format

Example format of a decoded JWT token header (not encrypted)

{
   "alg":"EdDSA",
   "kid":"urn:rossum.ai:organizations:100",
   "typ":"JWT"
}

Example format of a decoded JWT token payload

{
   "ver":"1.0",
   "iss":"ACME Corporation",
   "aud":"https://<example>.rossum.app",
   "sub":"john.doe@rossum.ai",
   "exp":1514764800,
   "email":"john.doe@rossum.ai",
   "name":"John F. Doe",
   "rossum_org":"100",
   "roles": ["annotator"]
}
Attribute Type Required Description
kid string true Identifier. Must end with :{your Rossum org ID}, e.g. "urn:rossum.ai:organizations:123"
typ string false Type of the token.
alg string true Signature algorithm to be used for decoding the token. Only EdDSA or RS512 values are allowed.

Decoded JWT Payload Format

Attribute Type Required Description
ver string true Version of the payload format. Available versions: 1.0.
iss string true Name of the issuer of the token (e.g. company name).
aud string true Target domain used for API queries (e.g. https://<example>.rossum.app)
sub string true User email that will be matched against username in Rossum.
exp int true UNIX timestamp of the JWT token expiration. Must be set to 60 seconds after current UTC time at maximum.
email string true User email.
name string true User's first name and last name separated by space. Will be used for creation of new users if auto-provisioning is enabled.
rossum_org string true Rossum organization id.
roles list[string] false Name of the user roles that will be assigned to user created by auto-provisioning. Must be a subset of the roles stated in the auto-provisioning configuration for the organization.

Response

Status: 200

Attribute Type Description
key string Access token.
domain string The domain the token was issued for.
scope string Supported values are default, approval (for internal use only)

Single Sign-On (SSO)

Rossum allows customers to integrate with their own identity provider, such as Google, Azure AD or any other provider using OAuth2 OpenID Connect protocol (OIDC). Rossum then acts as a service provider.

When SSO is enabled for an organization, user is redirected to a configured identity provider login page and only allowed to access Rossum application when successfully authenticated. Identity provider user claim (e.g. email (default), sub, preferred_username, unique_name) is used to match a user account in Rossum. If auto-provisioning is enabled for the organization, user accounts in Rossum will be automatically created for users without accounts.

Required setup of the OIDC identity provider:

Required information to allow OIDC setup for the Rossum service provider:

If you need to setup SSO for your organization, please contact support@rossum.ai.

Pagination

All object list operations are paged by default, so you may need several API calls to obtain all objects of given type.

Parameter Default Maximum Description
page_size 20 100 (*) Number of results per page
page 1 Page of results

(*) Maximum page size differs for some endpoints:

Filters and ordering

List queues of workspace 7540, with locale en_US and order results by name.

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/queues?workspace=7540&locale=en_US&ordering=name'

Lists may be filtered using various attributes. Multiple attributes are combined with & in the URL, which results in more specific response. Please refer to the particular object description.

Ordering of results may be enforced by the ordering parameter and one or more keys delimited by a comma. Preceding key with a minus sign - enforces descending order.

Metadata

Example metadata in a document object

{
  "id": 319768,
  "url": "https://<example>.rossum.app/api/v1/documents/319768",
  "s3_name": "05feca6b90d13e389c31c8fdeb7fea26",
  "annotations": [
    "https://<example>.rossum.app/api/v1/annotations/319668"
  ],
  "mime_type": "application/pdf",
  "arrived_at": "2019-02-11T19:22:33.993427Z",
  "original_file_name": "document.pdf",
  "content": "https://<example>.rossum.app/api/v1/documents/319768/content",
  "metadata": {
    "customer-id": "f205ec8a-5597-4dbb-8d66-5a53ea96cdea",
    "source": 9581,
    "authors": ["Joe Smith", "Peter Doe"]
  }
}

When working with API objects, it may be useful to attach some information to the object (e.g. customer id to a document). You can store custom JSON object in a metadata section available in most objects.

List of objects with metadata support: organization, workspace, user, queue, schema, connector, inbox, document, annotation, page, survey.

Total metadata size may be up to 4 kB per object.

Versioning

API Version is part of the URL, e.g. https://<example>.rossum.app/api/v1/users.

To allow API progress, we consider addition of a field in a JSON object as well as addition of a new item in an enum object to be backward-compatible operations that may be introduced at any time. Clients are expected to deal with such changes.

Dates

All dates fields are represented as ISO 8601 formatted strings, e.g. 2018-06-01T21:36:42.223415Z. All returned dates are in UTC timezone.

Errors

Our API uses conventional HTTP response codes to indicate the success or failure of an API request.

Code Status Meaning
400 Bad Request Invalid input data or error from connector.
401 Unauthorized The username/password is invalid or token is invalid (e.g. expired).
403 Forbidden Insufficient permission, missing authentication, invalid CSRF token and similar issues.
404 Not Found Entity not found (e.g. already deleted).
405 Method Not Allowed You tried to access an endpoint with an invalid method.
409 Conflict Trying to change annotation that was not started by the current user.
413 Payload Too Large for too large payload (especially for files uploaded).
429 Too Many Requests The allowed number of requests per minute has been exceeded. Please wait before sending more requests.
500 Internal Server Error We had a problem with the server. Try again later.
503 Service Unavailable We're temporarily offline for maintenance. Please try again later.

Import and Export

Documents may be imported into Rossum using the REST API and email gateway. Supported file formats are PDF, PNG, JPEG, TIFF, XLSX/XLS and DOCX/DOC. Maximum supported file size is 40 MB (this limit applies also to the uncompressed size of the files within a .zip archive).

In order to get the best results from Rossum the documents should be in A4 format of at least 150 DPI (in case of scans/photos). Read more about import recommendations.

Importing non-standard mime types

Support for other mime types can be added by handling upload.created webhook event. With this setup, user is able to pre-process uploaded files (e.g. XML or JSON formats) in a way which Rossum understands. Those usually need to be enabled on queue level first (by adding appropriate mimetype to accepted_mime_types in queue settings attributes). If you find your document mime types not supported please contact Rossum support team for more information.

Upload API

You can upload a document to the queue using upload endpoint with one or more files to be uploaded. You can also specify additional field values in upload endpoint, e.g. your internal document id. As soon as a document is uploaded, data extraction is started.

Upload endpoint supports basic authentication to enable easy integration with third-party systems.

Import by Email

It is also possible to send documents by email using a properly configured inbox that is associated with a queue. Users then only need to know the email address to forward emails to.

For every incoming email, Rossum extracts PDF documents, images and zip files, stores them in the queue and starts data extraction process.

The size limit for incoming emails is 50 MB (the raw email message with base64 encoded attachments).

All the files from the root of the archive are extracted. In case the root only contains one directory (and no other files) the whole directory is extracted. The zip files and all extracted files must be allowed in accepted_mime_types (see queue settings) and must pass inbox filtering rules (see document rejection conditions) in order for annotations to be created.

Small images (up to 100x100 pixels) are ignored, see inbox for reference.

You can use selected email header data (e.g. Subject) to initialize additional field values, see rir_field_names attribute description for details.

Zip attachment limits:

Export

In order to export extracted and confirmed data you can call export endpoint. You can specify status, time-range filters and annotation id list to limit returned results.

Export endpoint supports basic authentication to enable easy integration with third-party systems.

Auto-split of document

It is possible to process a single PDF file that contains several invoices. Just insert a special separator page between the documents. You can print this page and insert it between documents while scanning.

Rossum will recognize a QR code on the page and split the PDF into individual documents automatically. Produced documents are imported to the queue, while the original document is set to a split state.

Document Schema

Every queue has an associated schema that specifies which fields will be extracted from documents as well as the structure of the data sent to connector and exported from the platform.

Rossum schema supports data fields with single values (datapoint), fields with multiple values (multivalue) or tuples of fields (tuple). At the topmost level, each schema consists of sections, which may either directly contain actual data fields (datapoints) or use nested multivalues and tuples as containers for single datapoints.

But while schema may theoretically consist of an arbitrary number of nested containers, the Rossum UI supports only certain particular combinations of datapoint types. The supported shapes are:

Schema content

Schema content consists of a list of section objects.

Common attributes

The following attributes are common for all schema objects:

Attribute Type Description Required
category string Category of an object, one of section, multivalue, tuple or datapoint. yes
id string Unique identifier of an object. Maximum length is 50 characters. yes
label string User-friendly label for an object, shown in the user interface yes
hidden boolean If set to true, the object is not visible in the user interface, but remains stored in the database and may be exported. Default is false. Note that section is hidden if all its children are hidden. no
disable_prediction boolean Can be set to true to disable field extraction, while still preserving the data shape. Ignored by aurora engines. no

Section

Example of a section

{
  "category": "section",
  "id": "amounts_section",
  "label": "Amounts",
  "children": [...],
  "icon": ""
}

Section represents a logical part of the document, such as amounts or vendor info. It is allowed only at the top level. Schema allows multiple sections, and there should be at least one section in the schema.

Attribute Type Description Required
children list[object] Specifies objects grouped under a given section. It can contain multivalue or datapoint objects. yes
icon string The icon that appears on the left panel in the UI for a given section (not yet supported on UI).

Datapoint

A datapoint represents a single value, typically a field of a document or some global document information. Fields common to all datapoint types:

Attribute Type Description Required
type string Data type of the object, must be one of the following: string, number, date, enum, button yes
can_export boolean If set to false, datapoint is not exported through export endpoint. Default is true.
can_collapse boolean If set to true, tabular (multivalue-tuple) datapoint may be collapsed in the UI. Default is false.
rir_field_names list[string] List of references used to initialize an object value. See below for the description. Must be empty for schemas connected to queues with aurora engines
default_value string Default value used either for fields that do not use hints from AI engine predictions (i.e. rir_field_names are not specified), or when the AI engine does not return any data for the field.
constraints object A map of various constraints for the field. See Value constraints.
ui_configuration object A group of settings affecting behaviour of the field in the application. See UI configuration.
width integer Width of the column (in characters). Default widths are: number: 8, string: 20, date: 10, enum: 20. Only supported for table datapoints.
stretch boolean If total width of columns doesn’t fill up the screen, datapoints with stretch set to true will be expanded proportionally to other stretching columns. Only supported for table datapoints.
width_chars integer (Deprecated) Use width and stretch properties instead.
score_threshold float [0;1] Threshold used to automatically validate field content based on AI confidence scores. If not set, queue.default_score_threshold is used.
formula string[0;500] Formula definition, required only for fields of type formula (beta), see Formula Fields. rir_field_names should also be empty for these fields.

rir_field_names attribute allows to specify source of initial value of the object. List items may be:

If more list items in rir_field_names are specified, the first available value will be used.

String type

Example string datapoint

{
  "category": "datapoint",
  "id": "document_id",
  "label": "Invoice ID",
  "type": "string",
  "default_value": null,
  "rir_field_names": ["document_id"],
  "constraints": {
    "length": {
      "exact": null,
      "max": 16,
      "min": null
    },
    "regexp": {
      "pattern": "^INV[0-9]+$"
    },
    "required": false
  }
}

String datapoint does not have any special attribute.

Date type

Example date datapoint

{
  "id": "item_delivered",
  "type": "date",
  "label": "Item Delivered",
  "format": "MM/DD/YYYY",
  "category": "datapoint"
}

Attributes specific to Date datapoint:

Attribute Type Description Required
format string Enforces a format for date datapoint on the UI. See Date format below for more details. Default is YYYY-MM-DD.

Date format supported: available tokens

Example date formats:

Number type

Example number datapoint

{
  "id": "item_quantity",
  "type": "number",
  "label": "Quantity",
  "format": "#,##0.#",
  "category": "datapoint"
}

Attributes specific to Number datapoint:

Attribute Type Default Description Required
format string # ##0.# Available choices for number format show table below. null value is allowed.
aggregations object A map of various aggregations for the field. See aggregations.

The following table shows numeric formats with their examples.

Format Example
# ##0,# 1 234,5 or 1234,5
# ##0.# 1 234.5 or 1234.5
#,##0.# 1,234.5 or 1234.5
#'##0.# 1'234.5 or 1234.5
#.##0,# 1.234,5 or 1234,5
# ##0 1 234 or 1234
#,##0 1,234 or 1234
#'##0 1'234 or 1234
#.##0 1.234 or 1234
Aggregations

Example aggregations

{
  "id": "quantity",
  "type": "number",
  "label": "Quantity",
  "category": "datapoint",
  "aggregations": {
    "sum": {
      "label": "Total"
    }
  },
  "default_value": null,
  "rir_field_names": []
}

Aggregations allow computation of some informative values, e.g. a sum of a table column with numeric values. These are returned among messages when /v1/annotations/{id}/content/validate endpoint is called. Aggregations can be computed only for tables (multivalues of tuples).

Attribute Type Description Required
sum object Sum of values in a column. Default label: "Sum".

All aggregation objects can have an attribute label that will be shown in the UI.

Enum type

Example enum datapoint with options and enum_value_type

{
  "id": "document_type",
  "type": "enum",
  "label": "Document type",
  "hidden": false,
  "category": "datapoint",
  "options": [
    {
      "label": "Invoice Received",
      "value": "21"
    },
    {
      "label": "Invoice Sent",
      "value": "22"
    },
    {
      "label": "Receipt",
      "value": "23"
    }
  ],
  "default_value": "21",
  "rir_field_names": [],
  "enum_value_type": "number"
}

Attributes specific to Enum datapoint:

Attribute Type Description Required
options object See object description below. yes
enum_value_type string Data type of the option's value attribute. Must be one of the following: string, number, date no

Every option consists of an object with keys:

Attribute Type Description Required
value string Value of the option. yes
label string User-friendly label for the option, shown in the UI. yes

Enum datapoint value is matched in a case insensitive mode, e.g. EUR currency value returned by the AI Core Engine is matched successfully against {"value": "eur", "label": "Euro"} option.

Button type

Specifies a button shown in Rossum UI. For more details please refer to custom UI extension.

Example button datapoint

{
  "id": "show_email",
  "type": "button",
  "category": "datapoint",
  "popup_url": "http://example.com/show_customer_data",
  "can_obtain_token": true
}

Buttons cannot be direct children of multivalues (simple multivalues with buttons are not allowed. In tables, buttons are children of tuples). Despite being a datapoint object, button currently cannot hold any value. Therefore, the set of available Button datapoint attributes is limited to:

Attribute Type Description Required
type string Data type of the object, must be one of the following: string, number, date, enum, button yes
can_export boolean If set to false, datapoint is not exported through export endpoint. Default is true.
can_collapse boolean If set to true, tabular (multivalue-tuple) datapoint may be collapsed in the UI. Default is false.
popup_url string URL of a popup window to be opened when button is pressed.
can_obtain_token boolean If set to true the popup window is allowed to ask the main Rossum window for authorization token

Value constraints

Example value constraints

{
  "id": "document_id",
  "type": "string",
  "label": "Invoice ID",
  "category": "datapoint",
  "constraints": {
    "length": {
      "max": 32,
      "min": 5
    },
    "required": false
  },
  "default_value": null,
  "rir_field_names": [
    "document_id"
  ]
}

Constraints limit allowed values. When constraints is not satisfied, annotation is considered invalid and cannot be exported.

Attribute Type Description Required
length object Defines minimum, maximum or exact length for the datapoint value. By default, minimum and maximum are 0 and infinity, respectively. Supported attributes: min, max and exact
regexp object When specified, content must match a regular expression. Supported attributes: pattern. To ensure that entire value matches, surround your regular expression with ^ and $.
required boolean Specifies if the datapoint is required by the schema. Default value is true.

UI configuration

Example UI configuration

{
  "id": "document_id",
  "type": "string",
  "label": "Invoice ID",
  "category": "datapoint",
  "ui_configuration": {
    "type":  "captured",
    "edit": "disabled"
  },
  "default_value": null,
  "rir_field_names": [
    "document_id"
  ]
}

UI configuration provides a group of settings, which alter behaviour of the field in the application. This does not affect behaviour of the field via the API. For example, disabling edit prohibits changing a value of the datapoint in the application, but the value can still be modified through API.

Attribute Type Description Required
type string Logical type of the datapoint. Possible values are: captured, data, manual, formula or null. Default value is null. false
edit string When set to disabled, value of the datapoint is not editable via UI. When set to enabled_without_warning, no warnings are displayed in the UI regarding this fields editing behaviour. Default value is enabled, this option enables field editing, but user receives dismissible warnings when doing so. false
Logical types

Multivalue

Example of a multivalue:

{
  "category": "multivalue",
  "id": "line_item",
  "label": "Line Item",
  "children": {
    ...
  },
  "show_grid_by_default": false,
  "min_occurrences": null,
  "max_occurrences": null,
  "rir_field_names": null
}

Example of a multivalue with grid row-types specification:

{
  "category": "multivalue",
  "id": "line_item",
  "label": "Line Item",
  "children": {
    ...
  },
  "grid": {
    "row_types": [
      "header", "data", "footer"
    ],
    "default_row_type": "data",
    "row_types_to_extract": [
      "data"
    ]
  },
  "min_occurrences": null,
  "max_occurrences": null,
  "rir_field_names": ["line_items"]
}

Multivalue is list of datapoints or tuples of the same type. It represents a container for data with multiple occurrences (such as line items) and can contain only objects with the same id.

Attribute Type Description Required
children object Object specifying type of children. It can contain only objects with categories tuple or datapoint. yes
min_occurrences integer Minimum number of occurrences of nested objects. If condition of min_occurrences is violated corresponding fields should be manually reviewed. Minimum required value for the field is 0. If not specified, it is set to 0 by default.
max_occurrences integer Maximum number of occurrences of nested objects. All additional rows above max_occurrences are removed by extraction process. Minimum required value for the field is 1. If not specified, it is set to 1000 by default.
grid object Configure magic-grid feature properties, see below.
show_grid_by_default boolean If set to true, the magic-grid is opened instead of footer upon entering the multivalue. Default false. Applied only in UI. Useful when annotating documents for custom training.
rir_field_names list[string] List of names used to initialize content from the AI engine predictions. If specified, the value of the first field from the array is used, otherwise default name line_items is used. Attribute can be set only for multivalue containing objects with category tuple. no

Multivalue grid object

Multivalue grid object allows to specify a row type for each row of the grid. For data representation of actual grid data rows see Grid object description.

Attribute Type Description Default Required
row_types list[string] List of allowed row type values. ["data"] yes
default_row_type string Row type to be used by default data yes
row_types_to_extract list[string] Types of rows to be extracted to related table ["data"] yes

For example to distinguish two header types and a footer in the validation interface, following row types may be used: header, subsection_header, data and footer.

Currently, data extraction classifies every row as either data or header (additional row types may be introduced in the future). We remove rows returned by data extraction that are not in row_types list (e.g. header by default) and are on the top/bottom of the table. When they are in the middle of the table, we mark them as skipped (null).

There are three visual modes, based on row_types quantity:

Tuple

Example of a tuple:

{
  "category": "tuple",
  "id": "tax_details",
  "label": "Tax Details",
  "children": [
    ...
  ],
  "rir_field_names": [
    "tax_details"
  ]
}

Container representing tabular data with related values, such as tax details. A tuple must be nested within a multivalue object, but unlike multivalue, it may consist of objects with different ids.

Attribute Type Description Required
children list[object] Array specifying objects that belong to a given tuple. It can contain only objects with category datapoint. yes
rir_field_names list[string] List of names used to initialize content from the AI engine predictions. If specified, the value of the first extracted field from the array is used, otherwise, no AI engine initialization is done for the object.

Updating Schema

When project evolves, it is a common practice to enhance or change the extracted field set. This is done by updating the schema object.

By design, Rossum supports multiple schema versions at the same time. However, each document annotation is related to only one of those schemas. If the schema is updated, all related document annotations are updated accordingly. See preserving data on schema change below for limitations of schema updates.

In addition, every queue is linked to a schema, which is used for all newly imported documents.

When updating a schema, there are two possible approaches:

Use case 1 - Initial setting of a schema

Use case 2 - Updating attributes of a field (label, constraints, options, etc.)

Use case 3 - Adding new field to a schema, even for already imported documents.

Use case 4 - Adding new field to schema, only for newly imported documents

Use case 5 - Deleting schema field, even for already imported documents.

Use case 6 - Deleting schema field, only for newly imported documents

Preserving data on schema change

In order to transfer annotation field values properly during the schema update, a datapoint's category and schema_id must be preserved.

Supported operations that preserve fields values are:

Extracted field types

AI engine currently automatically extracts the following fields at the all endpoint, subject to ongoing expansion.

Identifiers

Example of a schema with different identifiers:

[
  {
    "category": "section",
    "children": [
      {
        "category": "datapoint",
        "constraints": {
          "required": false
        },
        "default_value": null,
        "id": "document_id",
        "label": "Invoice number",
        "rir_field_names": [
          "document_id"
        ],
        "type": "string"
      },
      {
        "category": "datapoint",
        "constraints": {
          "required": false
        },
        "default_value": null,
        "format": "D/M/YYYY",
        "id": "date_issue",
        "label": "Issue date",
        "rir_field_names": [
          "date_issue"
        ],
        "type": "date"
      },
      {
        "category": "datapoint",
        "constraints": {
          "required": false
        },
        "default_value": null,
        "id": "terms",
        "label": "Terms",
        "rir_field_names": [
          "terms"
        ],
        "type": "string"
      }
    ],
    "icon": null,
    "id": "invoice_info_section",
    "label": "Basic information"
  }
]
Attr. rir_field_names Field label Description
account_num Bank Account Bank account number. Whitespaces are stripped.
bank_num Sort Code Sort code. Numerical code of the bank.
iban IBAN Bank account number in IBAN format.
bic BIC/SWIFT Bank BIC or SWIFT code.
const_sym Constant Symbol Statistical code on payment order.
spec_sym Specific Symbol Payee id on the payment order, or similar.
var_sym Variable symbol In some countries used by the supplier to match the payment received against the invoice. Possible non-numeric characters are stripped.
terms Terms Payment terms as written on the document (e.g. "45 days", "upon receipt").
payment_method Payment method Payment method defined on a document (e.g. 'Cheque', 'Pay order', 'Before delivery')
customer_id Customer Number The number by which the customer is registered in the system of the supplier. Whitespaces are stripped.
date_due Date Due The due date of the invoice.
date_issue Issue Date Date of issue of the document.
date_uzp Tax Point Date The date of taxable event.
document_id Document Identifier Document number. Whitespaces are stripped.
order_id Order Number Purchase order identification (Order Numbers not captured as "sender_order_id"). Whitespaces are stripped.
recipient_address Recipient Address Address of the customer.
recipient_dic Recipient Tax Number Tax identification number of the customer. Whitespaces are stripped.
recipient_ic Recipient Company ID Company identification number of the customer. Possible non-numeric characters are stripped.
recipient_name Recipient Name Name of the customer.
recipient_vat_id Recipient VAT Number Customer VAT Number
recipient_delivery_name Recipient Delivery Name Name of the recipient to whom the goods will be delivered.
recipient_delivery_address Recipient Delivery Address Address of the reciepient where the goods will be delivered.
sender_address Supplier Address Address of the supplier.
sender_dic Supplier Tax Number Tax identification number of the supplier. Whitespaces are stripped.
sender_ic Supplier Company ID Business/organization identification number of the supplier. Possible non-numeric characters are stripped.
sender_name Supplier Name Name of the supplier.
sender_vat_id Supplier VAT Number VAT identification number of the supplier.
sender_email Supplier Email Email of the sender.
sender_order_id Supplier's Order ID Internal order ID in the suppliers system.
delivery_note_id Delivery Note ID Delivery note ID defined on the invoice.
supply_place Place of Supply Place of supply (the name of the city or state where the goods will be supplied).

Document attributes

Attr. rir_field_names Field label Description
currency Currency The currency which the invoice is to be paid in. Possible values: CZK, DKK, EUR, GBP, NOK, SEK, HUF, USD, AUD, INR, CHF, CNY, JPY, PLN, RON, RUB or other. May be also in lowercase.
document_type Document Type Possible values: credit_note, debit_note, tax_invoice (most typical), proforma, receipt, delivery_note, order or other.
language Language The language which the document was written in. Possible values: ces, deu, eng, fra, slk, esp, hun, swe, dan, fin, ital, nor, pol, por or other.
payment_method_type Payment Method Type Payment method used for the transaction. Possible values: card, cash.

Amounts

Attr. rir_field_names Field label Description
amount_due Amount Due Final amount including tax to be paid after deducting all discounts and advances.
amount_rounding Amount Rounding Remainder after rounding amount_total.
amount_total Total Amount Subtotal over all items, including tax.
amount_paid Amount paid Amount paid already.
amount_total_base Tax Base Total Base amount for tax calculation.
amount_total_tax Tax Total Total tax amount.

Typical relations (may depend on local laws):

amount_total = amount_total_base + amount_total_tax
amount_rounding = amount_total - round(amount_total)
amount_due = amount_total - amount_paid + amount_rounding

All amounts are in the main currency of the invoice (as identified in the currency response field). Amounts in other currencies are generally excluded.

Tables

At the moment, the AI engine automatically extracts 2 types of tables. In order to pick one of the possible choices, set rir_field_names attribute on multivalue.

Attr. rir_field_names Table
tax_details Tax details
line_items Line items

Tax details

Example of a tax details table:

{
  "category": "section",
  "children": [
    {
      "category": "multivalue",
      "children": {
        "category": "tuple",
        "children": [
          {
            "category": "datapoint",
            "constraints": {
              "required": false
            },
            "default_value": null,
            "format": "# ##0.#",
            "id": "vat_detail_rate",
            "label": "VAT rate",
            "rir_field_names": [
              "tax_detail_rate"
            ],
            "type": "number",
            "width": 15
          },
          ...
        ],
        "id": "vat_detail",
        "label": "VAT detail"
      },
      "default_value": null,
      "id": "vat_details",
      "label": "VAT details",
      "max_occurrences": null,
      "min_occurrences": null,
      "rir_field_names": [
        "tax_details"
      ]
    }
  ],
  "icon": null,
  "id": "amounts_section",
  "label": "Amounts section"
}

Tax details table and breakdown by tax rates.

Attr. rir_field_names Field label Description
tax_detail_base Tax Base Sum of tax bases for items with the same tax rate.
tax_detail_rate Tax Rate One of the tax rates in the tax breakdown.
tax_detail_tax Tax Amount Sum of taxes for items with the same tax rate.
tax_detail_total Tax Total Total amount including tax for all items with the same tax rate.
tax_detail_code Tax Code [BETA] Text on document describing tax code of the tax rate (e.g. 'GST', 'CGST', 'DPH', 'TVA'). If multiple tax rates belong to one tax code on the document, the tax code will be assigned only to the first tax rate. (in future such tax code will be distributed to all matching tax rates.)

Line items

Example of a line items table:

{
  "category": "section",
  "children": [
    {
      "category": "multivalue",
      "children": {
        "category": "tuple",
        "children": [
          {
            "category": "datapoint",
            "constraints": {
              "required": true
            },
            "default_value": null,
            "id": "item_desc",
            "label": "Description",
            "rir_field_names": [
              "table_column_description"
            ],
            "type": "string",
            "stretch": true
          },
          {
            "category": "datapoint",
            "constraints": {
              "required": false
            },
            "default_value": null,
            "format": "# ##0.#",
            "id": "item_quantity",
            "label": "Quantity",
            "rir_field_names": [
              "table_column_quantity"
            ],
            "type": "number",
            "width": 15
          },
          {
            "category": "datapoint",
            "constraints": {
              "required": false
            },
            "default_value": null,
            "format": "# ##0.#",
            "id": "item_amount_total",
            "label": "Price w tax",
            "rir_field_names": [
              "table_column_amount_total"
            ],
            "type": "number"
          }
        ],
        "id": "line_item",
        "label": "Line item",
        "rir_field_names": []
      },
      "default_value": null,
      "id": "line_items",
      "label": "Line item",
      "max_occurrences": null,
      "min_occurrences": null,
      "rir_field_names": [
        "line_items"
      ]
    }
  ],
  "icon": null,
  "id": "line_items_section",
  "label": "Line items"
}

AI engine currently automatically extracts line item table content and recognizes row and column types as detailed below. Invoice line items come in a wide variety of different shapes and forms. The current implementation can deal with (or learn) most layouts, with borders or not, different spacings, header rows, etc. We currently make two further assumptions:

We plan to gradually remove both assumptions in the future.

Attribute rir_field_names Field label Description
table_column_code Item Code/Id Can be the SKU, EAN, a custom code (string of letters/numbers) or even just the line number.
table_column_description Item Description Line item description. Can be multi-line with details.
table_column_quantity Item Quantity Quantity of the item.
table_column_uom Item Unit of Measure Unit of measure of the item (kg, container, piece, gallon, ...).
table_column_rate Item Rate Tax rate for the line item.
table_column_tax Item Tax Tax amount for the line. Rule of thumb: tax = rate * amount_base.
table_column_amount_base Amount Base Unit price without tax. (This is the primary unit price extracted.)
table_column_amount Amount Unit price with tax. Rule of thumb: amount = amount_base + tax.
table_column_amount_total_base Amount Total Base The total amount to be paid for all the items excluding the tax. Rule of thumb: amount_total_base = amount_base * quantity.
table_column_amount_total Amount Total The total amount to be paid for all the items including the tax. Rule of thumb: amount_total = amount * quantity.
table_column_other Other Unrecognized data type.

Annotation Lifecycle

When a document is submitted to Rossum within a given queue, an annotation object is assigned to it. An annotation goes through a variety of states as it is processed, and eventually exported.

State Description
created Annotation was created manually via POST to annotations endpoint. Annotation created this way may be switched to importing state only at the end of the upload.created event (this happens automatically).
importing Document is being processed by the AI Engine for data extraction.
failed_import Import failed e.g. due to a malformed document file.
split Annotation was split in user interface or via API and new annotations were created from it.
to_review Initial extraction step is done and the annotation is waiting for user validation.
reviewing Annotation is undergoing validation in the user interface.
in_workflow Annotation is being processed in a workflow. Annotation content cannot be modified while in this state. Please note that any manual interaction with this status may introduce confilicts with Rossum automated workflows. Read more about Rossum Workflows here.
confirmed Annotation is validated and confirmed by the user. This status must be explicitly enabled on the queue to be present.
rejected Annotation was rejected by user. This status must be explicitly enabled on the queue to be present. You can read about when a rejection is possible here.
exporting Annotation is validated and is now awaiting the completion of connector save call. See connector extension for more information on this status.
exported Annotation is validated and successfully passed all hooks; this is the typical terminal state of an annotation.
failed_export When the connector returned an error.
postponed Operator has chosen to postpone the annotation instead of exporting it.
deleted When the annotation was deleted by the user.
purged Only metadata was preserved after a deletion. This status is terminal and cannot be further changed. See purge deleted if you want to know how to purge an annotation.

This diagram shows exact flow between the annotation states whole working with the UI.

Usage report

In order to obtain an overview of the Rossum usage, you can download Csv file with basic Rossum statistics.

The statistics contains following attributes:

Download usage statistics (January 2019).

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/annotations/usage_report?from=2019-01-01&to=2019-01-31'

Csv file (csv) may be downloaded from https://<example>.rossum.app/api/v1/annotations/usage_report?format=csv.

You may specify date range using from and to parameters (inclusive). If not specified, a report for last 12 months is generated.

Request

POST /v1/annotations/usage_report

Attribute Type Description
filter object Filters to be applied on documents used for the computation of usage report
filter.users list[URL] Filter documents modified by the specified users (not applied to imported_count)
filter.queues list[URL] Filter documents from the specified queues
filter.begin_date datetime Filter documents that has date (arrived_at for imported_count; deleted_at for deleted_count; rejected_at for rejected_count; or exported_at for the rest) greater than specified.
filter.end_date datetime Filter documents that has date (arrived_at for imported_count; deleted_at for deleted_count; rejected_at for rejected_count; or exported_at for the rest) lower than specified.
exported_on_time_threshold_s float Threshold (in seconds) under which are documents denoted as on_time.
group_by list[string] List of attributes by which the series is to be grouped. Possible values: user, workspace, queue, month, week, day.
{
  "filter": {
    "users": [
      "https://<example>.rossum.app/api/v1/users/173"
    ],
    "queues": [
      "https://<example>.rossum.app/api/v1/queues/8199"
    ],
    "begin_date": "2019-12-01",
    "end_date": "2020-01-31"
  },
  "exported_on_time_threshold_s": 86400,
  "group_by": [
    "user",
    "workspace",
    "queue",
    "month"
  ]
}

Response

Status: 200

{
  "series": [
    {
      "begin_date": "2019-12-01",
      "end_date": "2020-01-01",
      "queue": "https://<example>.rossum.app/api/v1/queues/8199",
      "workspace": "https://<example>.rossum.app/api/v1/workspaces/7540",
      "values": {
        "imported_count": 2,
        "confirmed_count": 6,
        "rejected_count": 2,
        "rejected_automatically_count": 1,
        "rejected_manually_count": 1,
        "deleted_count": null,
        "exported_count": null,
        "turnaround_avg_s": null,
        "corrections_per_document_avg": null,
        "exported_on_time_count": null,
        "exported_late_count": null,
        "time_per_document_avg_s": null,
        "time_per_document_active_avg_s": null,
        "time_and_corrections_per_field": []
      }
    },
    {
      "begin_date": "2020-01-01",
      "end_date": "2020-02-01",
      "queue": "https://<example>.rossum.app/api/v1/queues/8199",
      "workspace": "https://<example>.rossum.app/api/v1/workspaces/7540",
      "user": "https://<example>.rossum.app/api/v1/users/173",
      "values": {
        "imported_count": null,
        "confirmed_count": 6,
        "rejected_count": 3,
        "rejected_automatically_count": 2,
        "rejected_manually_count": 1,
        "deleted_count": 2,
        "exported_count": 2,
        "turnaround_avg_s": 1341000,
        "corrections_per_document_avg": 1.0,
        "exported_on_time_count": 1,
        "exported_late_count": 1,
        "time_per_document_avg_s": 70.0,
        "time_per_document_active_avg_s": 50.0,
        "time_and_corrections_per_field": [
          {
            "schema_id": "date_due",
            "label": "Date due",
            "total_count": 1,
            "corrected_ratio": 0.0,
            "time_spent_avg_s": 0.0
          },
          ...
        ]
      }
    },
    ...
  ],
  "totals": {
    "imported_count": 7,
    "confirmed_count": 6,
    "rejected_count": 5,
    "rejected_automatically_count": 3,
    "rejected_manually_count": 2,
    "deleted_count": 2,
    "exported_count": 3,
    "turnaround_avg_s": 894000,
    "corrections_per_document_avg": 1.0,
    "exported_on_time_count": 2,
    "exported_late_count": 1,
    "time_per_document_avg_s": 70.0,
    "time_per_document_active_avg_s": 50.0
  }
}

The response consists of two parts: totalsand series.

Totals

Totals contain summary information for the whole period (between begin_date and end_date).

Attribute Type Description
imported_count int Count of documents that were uploaded to Rossum
confirmed_count int Count of documents that were confirmed
rejected_count int Count of documents that were rejected
rejected_automatically_count int Count of documents that were automatically rejected
rejected_manually_count int Count of documents that were manually rejected
deleted_count int Count of documents that were deleted
exported_count int Count of documents that were successfully exported
turnaround_avg_s float Average time (in seconds) that a document spends in Rossum (computed as time exported_at - arrived_at)
corrections_per_document_avg float Average count of corrections on documents
exported_on_time_count int Number of documents of which turnaround was under exported_on_time_threshold
exported_late_count int Number of documents of which turnaround was above exported_on_time_threshold
time_per_document_avg_s float Average time (in seconds) that users spent validating documents. Based on the time_spent_overall metric, see annotation processing duration
time_per_document_active_avg_s float Average active time (in seconds) that users spent validating documents. Based on the time_spent_active metric, see annotation processing duration

Series

Series contain information grouped by fields defined in group_by. The data (see above) are wrapped in values object, and accompanied by the values of attributes that were used for grouping.

Attribute Type Description
user URL User, who modified documents within the group
workspace URL Workspace, in which are the documents within the group
queue URL Queue, in which are the documents within the group
begin_date date Start date, of the documents within the group
end_date date Final date, of the documents within the group
values object Contains the data of totals and time_and_corrections_per_field list (for details see below).

In addition to the totals data, series contain time_and_corrections_per_field list that provides detailed data about statistics on each field.

Series details

The detail object contains statistics grouped per field (schema_id).

Attribute Type Description
schema_id string Reference mapping of the data object to the schema tree
label string Label of the data object (taken from schema).
total_count int Number of data objects
corrected_ratio* float [0;1] Ratio of data objects that must have been corrected after automatic extraction.
time_spent_avg_s float Average time (in seconds) spent on validating the data objects

*Corrected ratio is calculated based on human corrections. If any kind of automation (Hook, Webhook, etc) is ran on the datapoints, even after a human correction took a place, the corrected_ration will not be calculated -> Is set to 0.

Extensions

The Rossum platform may be extended via third-party, externally running services or custom functions. These extensions are registered to receive callbacks from the Rossum platform on various occasions and allow to modify the platform behavior. Currently we support these callback extensions: Webhooks, Serverless Functions, and Connectors.

Webhooks and connectors require a third-party service accessible through a HTTP endpoint. This may incur additional operational and implementation costs. User-defined serverless functions, on the contrary, are executed within Rossum platform and no additional setup is necessary. They share the interface (input and output data format, error handling) with webhooks.

See the Building Your Own Extension set of guides in Rossum's developer portal for an introduction to Rossum extensions.

Webhook Extension

The webhook component is the most flexible extension. It covers all the most frequent use cases:

Implement a webhook

Webhooks are designed to be implemented using a push-model using common HTTPS protocol. When an event is triggered, the webhook endpoint is called with a relevant request payload. The webhook must be deployed with a public IP address so that the Rossum platform can call its endpoints; for testing, a middleware like ngrok or serveo may come useful.

Webhook vs. Connector

Webhook extensions are similar to connectors, but they are more flexible and easier to use. A webhook is notified when a defined type of webhook event occurs for a related queue.

Advantages of webhooks over connectors:

Webhooks are defined using a hook object of type webhook. For a description how to create and manage hooks, see the Hook API.

Webhook Events

Example data sent for annotation_status event to the hook.config.url when status of the annotation changes

{
  "request_id": "ae7bc8dd-73bd-489b-a3d2-f5514b209591",
  "timestamp": "2020-01-01T00:00:00.000000Z",
  "base_url": "https://<example>.rossum.app",
  "rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
  "hook": "https://<example>.rossum.app/api/v1/hooks/789",
  "settings": {
    "example_target_service_type": "SFTP",
    "example_target_hostname": "sftp.elis.rossum.ai"
  },
  "secrets": {
    "username": "my-rossum-importer",
    "password": "secret-importer-user-password"
  },
  "action": "changed",
  "event": "annotation_status",
  "annotation": {
    "document": "https://<example>.rossum.app/api/v1/documents/314621",
    "id": 314521,
    "queue": "https://<example>.rossum.app/api/v1/queues/8236",
    "schema": "https://<example>.rossum.app/api/v1/schemas/223",
    "pages": [
      "https://<example>.rossum.app/api/v1/pages/551518"
    ],
    "creator": "https://<example>.rossum.app/api/v1/users/1",
    "modifier": null,
    "assigned_at": null,
    "created_at": "2021-04-26T10:08:03.856648Z",
    "confirmed_at": null,
    "deleted_at": null,
    "exported_at": null,
    "export_failed_at": null,
    "modified_at": null,
    "purged_at": null,
    "rejected_at": null,
    "confirmed_by": null,
    "deleted_by": null,
    "exported_by": null,
    "purged_by": null,
    "rejected_by": null,
    "status": "to_review",
    "previous_status": "importing",
    "rir_poll_id": "54f6b91cfb751289e71ddf12",
    "messages": null,
    "url": "https://<example>.rossum.app/api/v1/annotations/314521",
    "content": "https://<example>.rossum.app/api/v1/annotations/314521/content",
    "time_spent": 0,
    "metadata": {},
    "organization": "https://<example>.rossum.app/api/v1/organizations/1"
  },
  "document": {
    "id": 314621,
    "url": "https://<example>.rossum.app/api/v1/documents/314621",
    "s3_name": "272c2f41ae84a4f19a422cb432a490bb",
    "mime_type": "application/pdf",
    "arrived_at": "2019-02-06T23:04:00.933658Z",
    "original_file_name": "test_invoice_1.pdf",
    "content": "https://<example>.rossum.app/api/v1/documents/314621/content",
    "metadata": {}
  }
}

Example data sent for annotation_content event to the hook.config.url when user updates a value in UI

{
  "request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
  "timestamp": "2020-01-01T00:00:00.000000Z",
  "base_url": "https://<example>.rossum.app",
  "rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
  "hook": "https://<example>.rossum.app/api/v1/hooks/781",
  "settings": {
    "example_target_hostname": "sftp.elis.rossum.ai"
  },
  "secrets": {
    "password": "secret-importer-user-password"
  },
  "action": "updated",
  "event": "annotation_content",
  "annotation": {
    "document": "https://<example>.rossum.app/api/v1/documents/314621",
    "id": 314521,
    "queue": "https://<example>.rossum.app/api/v1/queues/8236",
    "schema": "https://<example>.rossum.app/api/v1/schemas/223",
    "pages": [
      "https://<example>.rossum.app/api/v1/pages/551518"
    ],
    "creator": "https://<example>.rossum.app/api/v1/users/1",
    "modifier": null,
    "assigned_at": null,
    "created_at": "2021-04-26T10:08:03.856648Z",
    "confirmed_at": null,
    "deleted_at": null,
    "exported_at": null,
    "export_failed_at": null,
    "modified_at": null,
    "purged_at": null,
    "rejected_at": null,
    "confirmed_by": null,
    "deleted_by": null,
    "exported_by": null,
    "purged_by": null,
    "rejected_by": null,
    "status": "to_review",
    "previous_status": "importing",
    "rir_poll_id": "54f6b91cfb751289e71ddf12",
    "messages": null,
    "url": "https://<example>.rossum.app/api/v1/annotations/314521",
    "organization": "https://<example>.rossum.app/api/v1/organizations/1",
    "content": [
      {
        "id": 1123123,
        "url": "https://<example>.rossum.app/api/v1/annotations/314521/content/1123123",
        "schema_id": "basic_info",
        "category": "section",
        "children": [
          {
            "id": 20456864,
            "url": "https://<example>.rossum.app/api/v1/annotations/1/content/20456864",
            "content": {
              "value": "18 492.48",
              "normalized_value": "18492.48",
              "page": 2,
              ...
            },
            "category": "datapoint",
            "schema_id": "number",
            "validation_sources": [
              "checks",
              "score"
            ],
            "time_spent": 0
          }
        ]
      }
    ],
    "time_spent": 0,
    "metadata": {}
  },
  "document": {
    "id": 314621,
    "url": "https://<example>.rossum.app/api/v1/documents/314621",
    "s3_name": "272c2f41ae84a4f19a422cb432a490bb",
    "mime_type": "application/pdf",
    "arrived_at": "2019-02-06T23:04:00.933658Z",
    "original_file_name": "test_invoice_1.pdf",
    "content": "https://<example>.rossum.app/api/v1/documents/314621/content",
    "metadata": {}
  },
  "updated_datapoints": [11213211, 11213212]
}

Example of a response for annotation_content hook

{
  "messages": [
    {
      "content": "Invalid invoice number format",
      "id": 197467,
      "type": "error"
    }
  ],
  "operations": [
    {
      "op": "replace",
      "id": 198143,
      "value": {
        "content": {
          "value": "John",
          "position": [103, 110, 121, 122],
          "page": 1
        },
        "hidden": false,
        "options": [],
        "validation_sources": ["human"]
      }
    },
    {
      "op": "remove",
      "id": 884061
    },
    {
      "op": "add",
      "id": 884060,
      "value": [
        {
          "schema_id": "item_description",
          "content": {
            "page": 1,
            "position": [162, 852, 371, 875],
            "value": "Bottle"
          }
        }
      ]
    }
  ]
}

Example data sent for email event to the hook.config.url when email is received by Rossum mail server

{
  "request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
  "timestamp": "2020-01-01T00:00:00.000000Z",
  "base_url": "https://<example>.rossum.app",
  "rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
  "hook": "https://<example>.rossum.app/api/v1/hooks/781",
  "settings": {
    "example_target_hostname": "sftp.elis.rossum.ai"
  },
  "secrets": {
    "password": "secret-importer-user-password"
  },
  "action": "received",
  "event": "email",
  "email": "https://<example>.rossum.app/api/v1/emails/987",
  "queue": "https://<example>.rossum.app/api/v1/queues/41",
  "files": [
    {
      "id": "1",
      "filename": "image.png",
      "mime_type": "image/png",
      "n_pages": 1,
      "height_px": 100.0,
      "width_px": 150.0,
      "document": "https://<example>.rossum.app/api/v1/documents/427"
    },
    {
      "id": "2",
      "filename": "MS word.docx",
      "mime_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "n_pages": 1,
      "height_px": null,
      "width_px": null,
      "document": "https://<example>.rossum.app/api/v1/documents/428"
    },
    {
      "id": "3",
      "filename": "A4 pdf.pdf",
      "mime_type": "application/pdf",
      "n_pages": 3,
      "height_px": 3510.0,
      "width_px": 2480.0,
      "document": "https://<example>.rossum.app/api/v1/documents/429"
    },
    {
      "id": "4",
      "filename": "unknown_file",
      "mime_type": "text/xml",
      "n_pages": 1,
      "height_px": null,
      "width_px": null,
      "document": "https://<example>.rossum.app/api/v1/documents/430"
    }
  ],
  "headers": {
    "from": "test@example.com",
    "to": "east-west-trading-co-a34f3a@<example>.rossum.app",
    "reply-to": "support@example.com",
    "subject": "Some subject",
    "date": "Mon, 04 May 2020 11:01:32 +0200",
    "message-id": "15909e7e68e4b5f56fd78a3b4263c4765df6cc4d",
    "authentication-results": "example.com;\n    dmarc=pass d=example.com"
  },
  "body": {
    "body_text_plain": "Some body",
    "body_text_html": "<div dir=\"ltr\">Some body</div>"
  }
}

Example of a response for email hook

{
  "files": [
    {
      "id": "3",
      "values": [
        {
          "id": "email:invoice_id",
          "value": "INV001234"
        },
        {
          "id": "email:customer_name",
          "value": "John Doe"
        }
      ]
    }
  ]
}

Example data sent for invocation.scheduled event and action

{
  "request_id": "ae7bc8dd-73bd-489b-a3d2-f5514b209591",
  "timestamp": "2020-01-01T00:00:00.000000Z",
  "base_url": "https://<example>.rossum.app",
  "rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
  "hook": "https://<example>.rossum.app/api/v1/hooks/789",
  "settings": {
    "example_target_service_type": "SFTP",
    "example_target_hostname": "sftp.elis.rossum.ai"
  },
  "secrets": {
    "username": "my-rossum-importer",
    "password": "secret-importer-user-password"
  },
  "action": "scheduled",
  "event": "invocation"
}

Example data sent for upload event to the hook.config.url when documents are uploaded (either through API or as an Email attachment)

{
  "request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
  "timestamp": "2020-01-01T00:00:00.000000Z",
  "base_url": "https://<example>.rossum.app",
  "rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
  "hook": "https://<example>.rossum.app/api/v1/hooks/781",
  "settings": {},
  "secrets": {},
  "action": "created",
  "event": "upload",
  "email": "https://<example>.rossum.app/api/v1/emails/987",
  "upload": "https://<example>.rossum.app/api/v1/uploads/2046",
  "metadata": {},
  "files": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/427",
      "prevent_importing": false,
      "values": [],
      "queue": "https://<example>.rossum.app/api/v1/queues/41",
      "annotation": null
    },
    {
      "document": "https://<example>.rossum.app/api/v1/documents/428",
      "prevent_importing": true,
      "values": [],
      "queue": "https://<example>.rossum.app/api/v1/queues/41",
      "annotation": "https://<example>.rossum.app/api/v1/annotations/1638"
    }
  ],
  "documents": [
    {
      "id": 427,
      "url": "https://<example>.rossum.app/api/v1/documents/427",
      "mime_type": "application/pdf",
      ...
    },
    {
      "id": 428,
      "url": "https://<example>.rossum.app/api/v1/documents/428",
      "mime_type": "application/json",
      ...
    }
  ]
}

Example of a response for document hook

{
  "files": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/427",
      "prevent_importing": false
    },
    {
      "document": "https://<example>.rossum.app/api/v1/documents/428",
      "prevent_importing": true
    },
    {
      "document": "https://<example>.rossum.app/api/v1/documents/429",
    },
    {
      "document": "https://<example>.rossum.app/api/v1/documents/430",
    }
  ]
}

Webhook events specify when the hook should be notified. They can be defined as following:

Supported events and their actions

Event and Action Payload (outside default attributes) Response Description Retry on failure
annotation_status.changed annotation, document N/A Annotation status change occurred yes
annotation_content.initialize annotation + content, document, updated_datapoints operations, messages Annotation was initialized (data extracted) yes
annotation_content.started annotation + content, document, updated_datapoints (empty) operations, messages User entered validation screen no (interactive)
annotation_content.user_update annotation + content, document, updated_datapoints operations, messages (Deprecated in favor of annotation_content.updated) Annotation was updated by the user no (interactive)
annotation_content.updated annotation + content, document, updated_datapoints operations, messages Annotation data was updated by the user no (interactive)
annotation_content.confirm annotation + content, document, updated_datapoints (empty) operations, messages User confirmed validation screen no (interactive)
annotation_content.export annotation + content, document, updated_datapoints (empty) operations, messages Annotation is being moved to exported state yes
upload.created files, documents, metadata, email, upload files Upload object was created yes
email.received files, headers, body, email, queue files (*) Email with attachments was received yes
invocation.scheduled N/A N/A Hook was invoked at the scheduled time yes
invocation.manual custom payload fields forwarded hook response Event for manual hook triggering no

(*) May also contain other optional attributes - read more in this section.

Webhook Events Occurrence Diagram

To show an overview of the Hook events and when they are happening, this diagram was created.

Hook Events

Hooks common attributes

Key Type Description
request_id UUID Hook call request ID
timestamp datetime Timestamp when the hook was called
hook URL Hook's url
action string Hook's action
event string Hook's event
settings object Copy of hook.settings attribute

Annotation status event data format

annotation_status event contains following additional event specific attributes.

Key Type Description
annotation object Annotation object (enriched with attribute previous_status)
document object Document object (attribute annotations is excluded)
queues* list[object] list of related queue objects
modifiers* list[object] list of related modifier objects
schemas* list[object] list of related schema objects
emails* list[object] list of related email objects (for annotations created after email ingestion)
related_emails* list[object] list of related emails objects (other related emails)
relations* list[object] list of related relation objects
child_relations* list[object] list of related child_relation objects
suggested_edits* list[object] list of related suggested_edits objects
assignees* list[object] list of related assignee objects
pages* list[object] list of related pages objects
notes* list[object] list of related notes objects
labels* list[object] list of related labels objects
automation_blockers* list[object] list of related automation_blockers objects

* Attribute is only included in the request when specified in hook.sideload. Please note that sideloading of modifier object from different organization is not supported and that sideloading can decrease performance. See also annotation sideloading section.

Example data sent to hook with sideloaded queue objects

{
  "request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
  "timestamp": "2020-01-01T00:00:00.000000Z",
  "base_url": "https://<example>.rossum.app",
  "hook": "https://<example>.rossum.app/api/v1/hooks/781",
  "action": "changed",
  "event": "annotation_status",
  ...,
  "queues": [
    {
      "id": 8198,
      "name": "Received invoices",
      "url": "https://<example>.rossum.app/api/v1/queues/8198",
      ...,
      "metadata": {},
      "use_confirmed_state": false,
      "settings": {}
    }
  ]
}

Annotation content event data format

annotation_content event contains following additional event specific attributes.

Key Type Description
annotation object Annotation object. Content is pre-loaded with annotation data. Annotation data are enriched with normalized_value, see example.
document object Document object (attribute annotations is excluded)
updated_datapoints** list[int] List of IDs of datapoints that were changed by last or all predecessor events.
queues* list[object] list of related queue objects
modifiers* list[object] list of related modifier objects
schemas* list[object] list of related schema objects
emails* list[object] list of related email objects (for annotations created after email ingestion)
related_emails* list[object] list of related emails objects (other related emails)
relations* list[object] list of related relation objects
child_relations* list[object] list of related child_relation objects
suggested_edits* list[object] list of related suggested_edits objects
assignees* list[object] list of related assignee objects
pages* list[object] list of related pages objects
notes* list[object] list of related notes objects
labels* list[object] list of related labels objects
automation_blockers* list[object] list of related automation_blockers objects

* Attribute is only included in the request when specified in hook.sideload. Please note that sideloading of modifier object from different organization is not supported and that sideloading can decrease performance. See also annotation sideloading section.

** If the run_after attribute chains the hooks, the updated_datapoints will contain a list of all datapoint ids that were updated by any of the predecessive hooks. Moreover, in case of add operation on a multivalue table, the updated_datapoints will contain the id of the multivalue, the id of the new tuple datapoints and the id of all the newly created cell datapoints.

Annotation content event response format

All of the annotation_content events expect a JSON object with the following optional lists in the response: messages and operations

The message object contains attributes:

Key Type Description
id integer Optional unique id of the relevant datapoint; omit for a document-wide issues
type enum One of: error, warning or info.
content string A descriptive message to be shown to the user
detail object Detail object that enhances the response from a hook. (For more info refer to message detail)

For example, you may use error for fatals like a missing required field, whereas info is suitable to decorate a supplier company id with its name as looked up in the suppliers database.

The operations object describes operation to be performed on the annotation data (replace, add, remove). Format of the operations key is the same as for bulk update of annotations, please refer to the annotation data API for complete description.

Parsable error response format

It's possible to use the same format even with non-2XX response codes. In this type of response, operations are not considered.

Example of a parsable error response

{
  "messages": [
    {
      "id": "all",
      "type": "error",
      "content": "custom error message to be displayed in the UI"
    }
  ]
}

initialize event of annotation_content action additionally accepts list of automation_blockers objects. This allows for manual creation of automation blockers of type extension and therefore stops the automation without the need to create an error message.

The automation_blockers object contains attributes

Key Type Description
id integer Optional unique id of the relevant datapoint; omit for a document-wide issues
content str A descriptive message to be stored as an automation blocker

Example of a response for annotation_content.initialize hook creating automation blockers

{
  "messages": [...],
  "operations": [...],
  "automation_blockers": [
    {
      "id": 1357,
      "content": "Unregistered vendor"
    },
    {
      "content": "PO not found in the master data!"
    }
  ]
}

Email received event data format

email event contains following additional event specific attributes.

Key Type Description
files list[object] List of objects with metadata of each attachment contained in the arriving email.
headers object Headers extracted from the arriving email.
body object Body extracted from the arriving email.
email URL URL of the arriving email.
queue URL URL of the arriving email's queue.

The files object contains attributes:

Key Type Description
id string Some arbitrary identifier.
filename string Name of the attachment.
mime_type string MIME type of the attachment.
n_pages integer Number of pages (defaults to 1 if it could not be acquired).
height_px float Height in pixels (300 DPI is assumed for PDF files, defaults to null if it could not be acquired).
width_px float Width in pixels (300 DPI is assumed for PDF files, defaults to null if it could not be acquired).
document URL URL of related document object.

The headers object contains the same values as are available for initialization of values in email_header:<id> (namely: from, to, reply-to, subject,message-id,date`).

The body object contains the body_text_plain and body_text_html.

Email received event response format

All of the email events expect a JSON object with the following lists in the response: files, additional_files, extracted_original_sender

The files object contains attributes:

Key Type Description
id int id of file that will be used for creating an annotation
values list[object] This is used to initialize datapoint values. See values object description below

The values object consists of the following:

Key Type Description
id string Id of value - must start with email: prefix (to use this value refer to it in rir_field_names field in the schema similarly as described here).
value string String value to be used when annotation content is being constructed

This is useful for filtering out unwanted files by some measures that are not available in Rossum by default.

The additional_files object contains attributes:

Key Type Description
document URL URL of the document object that should be included, must be from the same queue. Documents without Annotation will be skipped
values list[object] This is used to initialize datapoint values. See values object description above

The extracted_original_sender object looks as follows:

Key Type Description
extracted_original_sender email_address_object Information about sender containing keys email and name.

This is useful for updating the email address field on email object with a new sender name and email address.

Upload created event data format

upload event contains following additional event specific attributes.

Key Type Description
files list[object] List of objects with metadata of each uploaded document.
documents list[object] List of document objects corresponding with the files object.
upload object Object representing the upload.
metadata object Client data passed in through the upload resource to create annotations with.
email URL URL of the arriving email or null if the document was uploaded via API.

The files object contains attributes:

Key Type Description
document URL URL of the uploaded document object.
prevent_importing bool If set no annotation is going to be created for the document or if already existing it is not going to be switched to importing status.
values list[object] This is used to initialize datapoint values. See values object description below
queue URL URL of the queue the document is being uploaded to.
annotation URL URL of the documents annotation or null if it doesn't exist.

The values object consists of the following:

Key Type Description
id string Id of value (to use this value refer to it in rir_field_names field in the schema similarly as described here).
value string String value to be used when annotation content is being constructed

Upload created event response format

All of the upload events expect a JSON object with the files object list in the response.

The files object contains attributes:

Key Type Description
document URL URL of the uploaded document object.
prevent_importing bool If set no annotation is going to be created for the document or if already exists it is not going to be switched to importing status. Optional, default false.

Validating payloads from Rossum

Example of hook receiver, which verifies the validity of Rossum request

import hashlib
import hmac

from flask import Flask, request, abort

app = Flask(__name__)

SECRET_KEY = "<Your secret key stored in hook.config.secret>"  # never store this in code

@app.route("/test_hook", methods=["POST"])
def test_hook():
    digest = hmac.new(SECRET_KEY.encode(), request.data, hashlib.sha1).hexdigest()
    try:
        prefix, signature = request.headers["X-Elis-Signature"].split("=")
    except ValueError:
        abort(401, "Incorrect header format")

    if not (prefix == "sha1" and hmac.compare_digest(signature, digest)):
        abort(401, "Authorization failed.")
    return

For authorization of payloads, the shared secret method is used. When a secret token is set in hook.config.secret, Rossum uses it to create a hash signature with each payload. This hash signature is passed along with each request in the headers as X-Elis-Signature.

The goal is to compute a hash using hook.config.secret and the request body, and ensure that the signature produced by Rossum is the same. Rossum uses HMAC SHA1 signature.

Webhook requests may be autenticated using a client SSL certificate, see Hook API for reference.

Access to Rossum API

You can access Rossum API from the Webhook. Each execution gets unique API key. The key is valid for 10 minutes or until Rossum receives a response from the Webhook. You can set token_lifetime_s up to 2 hours to keep the token valid longer. The API key and the environment's base URL are passed to webhooks as a first-level attributes rossum_authorization_token and base_url within the webhook payload.

Serverless Function Extension

Serverless functions allows to extend Rossum functionality without setup and maintenance of additional services.

Webhooks and Serverless functions share a basic setup: input and output data format and error handling. They are both configured using a hook API object.

Unlike webhooks, serverless functions do not send the event and action notifications to a specific URL. Instead, the function's code snippet is executed within the Rossum platform. See function API description for details about how to setup a serverless function and connect it to the queue.

Supported events and their actions

For description of supported events, actions and input/output data examples, please refer to Webhook Extensions section.

Supported runtimes

Currently Rossum supports NodeJS 18 runtime nodejs18.x to execute JavaScript functions and python3.12 to execute Python. If you would like to use another runtime, please let us know at product@rossum.ai.

Please be aware that we may eventually deprecate and remove runtimes in the future (deprecation will be announced at least 6 months before the deprecation date).

Runtime Deprecations

The nodejs12.x runtime is being phased out by the serverless vendors and so it has been scheduled to be discontinued by Mar 22 2023 at which point creating and updating hooks with runtime nodejs12.x is going to start returning an error response. Existing hooks will continue to work indefinitely without change.

The recommended action is to upgrade to the up-to-date nodejs18.x runtime.

Environment differences

On Azure the serverless function instances are held per organization and therefore creating or updating a serverless function can cause other functions in the same organization to get updated as well. This happens only in case the underlying implementation on Rossum side changes or in case the contents of the third_party_library_pack contents changes.

In other environments the function instances are kept per hook instance and therefore this side effect is not present.

Implementation

Example serverless function usable for annotation_content event (Python implementation).

'''
This custom function example can be used for showing custom messages to the
user on the validation screen or for updating values of specific fields.
(annotation_content event and updated action which provides annotation
content tree as an input). The function below shows how to:
1. Display a warning message to the user if "item_amount_base" field of
a line item exceeds a predefined threshold
2. Removes all dashes from the "document_id" field

item_amount_base and document_id should be fields defined in a schema.

More about custom functions - https://developers.rossum.ai/docs/how-to-use-serverless-functions
'''

'''
The rossum_hook_request_handler is a mandatory main function that accepts
input and produces output of the rossum custom function hook.
:param payload: see https://example.rossum.app/api/docs/#annotation-content-event-data-format
:return: messages and operations that update the annotation content or show messages
'''


def rossum_hook_request_handler(payload):

    if payload['event'] == 'annotation_content' and payload['action'] == 'updated':

        try:

            messages, operations = example_main_function(payload)

        except Exception as e:
            messages = [create_message('error', 'Serverless Function: ' + str(e))]
            return {"messages": messages}

        return {"messages": messages, "operations": operations}


'''
Main function that implements the custom logic for messages and operations on datapoints.
:param payload: see https://example.rossum.app/api/docs/#annotation-content-event-data-format
:return: tuple - messages and operations to be returned from the hook
'''


def example_main_function(payload):

    messages = []
    operations = []

    content = payload['annotation']['content']

    # Values over the threshold trigger a warning message
    TOO_BIG_THRESHOLD = 1000000;

    # List of all datapoints of item_amount_base schema id
    amount_base_column_datapoints = find_by_schema_id(content, 'item_amount_base');

    for amount_base_column_datapoint in amount_base_column_datapoints:

        # Use normalized_value for comparing values of
        # Date and Number fields (https://example.rossum.app/api/docs/#content-object)
        value = float(amount_base_column_datapoint['content']['normalized_value'] or 0)
        if value >= TOO_BIG_THRESHOLD:
            messages.append(create_message('warning', 'Value is too big', amount_base_column_datapoint['id']))

        # There should be only one datapoint of document_id schema id
        document_id_datapoint = find_by_schema_id(content, 'document_id')[0]

        if document_id_datapoint:
            operations = [
                create_replace_operation(
                    document_id_datapoint,
                    document_id_datapoint['content']['value'].replace('-', ''),
                )];

    return messages, operations


'''
Return datapoints matching a schema id.
:param content: annotation content tree (see https://example.rossum.app/api/docs/#annotation-data)
:param schema_id: field's ID as defined in the extraction schema(see https://example.rossum.app/api/docs/#document-schema)
:param accumulator: list for accumulating values with the same schema_id (f.e. values from same table column)
:return: the list of datapoints matching the schema ID
'''


def find_by_schema_id(content, schema_id: str):
    accumulator = []
    for node in content:
        if node["schema_id"] == schema_id:
            accumulator.append(node)
        elif "children" in node:
            accumulator.extend(find_by_schema_id(node["children"], schema_id))

    return accumulator


'''
Create a message which will be shown to the user
:param message_type: type of the message, any of {info|warning|error}. Errors prevent confirmation in the UI.
:param message_content: message shown to the user
:param datapoint_id: id of the datapoint where the message will appear (None for "global" messages).
:return: dict with the message definition (see https://example.rossum.app/api/docs/#annotation-content-event-response-format)
'''


def create_message(message_type, message_content, datapoint_id=None):
    return {
        "content": message_content,
        "type": message_type,
        "id": datapoint_id,
    }


'''
 Replace the value of the datapoint with a new value.
:param datapoint: content of the datapoint
:param new_value: new value of the datapoint
:return: dict with replace operation definition (see https://example.rossum.app/api/docs/#annotation-content-event-response-format)
'''


def create_replace_operation(datapoint, new_value):
    return {
        "op": 'replace',
        "id": datapoint['id'],
        "value": {
            "content": {
                "value": new_value,
            },
        },
    }

Example serverless function usable for annotation_content event (JavaScript/NodeJS implementation).

// This serverless function example can be used for annotation_content events
// (e.g. updated action). annotation_content events provide annotation
// content tree as the input.
//
// The function below shows how to:
// 1. Display a warning message to the user if "item_amount_base" field of
//    a line item exceeds a predefined threshold
// 2. Removes all dashes from the "invoice_id" field
//
// item_amount_base and invoice_id should be fields defined in a schema.

// --- ROSSUM HOOK REQUEST HANDLER ---

// The rossum_hook_request_handler is an mandatory main function that accepts
// input and produces output of the rossum serverless function hook.
// @param {Object} payload - see https://example.rossum.app/api/docs/#annotation-content-event-data-format
// @returns {Object} - the messages and operations that update the annotation content

exports.rossum_hook_request_handler = async (payload) => {
  const content = payload.annotation.content;

  try {
    // Values over the threshold trigger a warning message
    const TOO_BIG_THRESHOLD = 1000000;

    // List of all datapoints of item_amount_base schema id
    const amountBaseColumnDatapoints = findBySchemaId(
      content,
      'item_amount_base',
    );

    const messages = [];
    for (var i = 0; i < amountBaseColumnDatapoints.length; i++) {

      // Use normalized_value for comparing values of Date and Number fields (https://example.rossum.app/api/docs/#content-object)
      if (amountBaseColumnDatapoints[i].content.normalized_value >= TOO_BIG_THRESHOLD) {
        messages.push(
          createMessage(
            'warning',
            'Value is too big',
            amountBaseColumnDatapoints[i].id,
          ),
        );
      }
    }

    // There should be only one datapoint of invoice_id schema id
    const [invoiceIdDatapoint] = findBySchemaId(content, 'invoice_id');

    // "Replace" operation is returned to update the invoice_id value
    const operations = [
      createReplaceOperation(
        invoiceIdDatapoint,
        invoiceIdDatapoint.content.value.replace(/-/g, ''),
      ),
    ];

    // Return messages and operations to be used to update current annotation data
    return {
      messages,
      operations,
    };
  } catch (e) {
    // In case of exception, create and return error message. This may be useful for debugging.
    const messages = [
      createMessage('error', 'Serverless Function: ' + e.message)
    ];
    return {
      messages,
    };
  }
};

// --- HELPER FUNCTIONS ---

// Return datapoints matching a schema id.
// @param {Object} content - the annotation content tree (see https://example.rossum.app/api/docs/#annotation-data)
// @param {string} schemaId - the field's ID as defined in the extraction schema(see https://example.rossum.app/api/docs/#document-schema)
// @returns {Array} - the list of datapoints matching the schema ID

const findBySchemaId = (content, schemaId) =>
  content.reduce(
    (results, dp) =>
    dp.schema_id === schemaId ? [...results, dp] :
    dp.children ? [...results, ...findBySchemaId(dp.children, schemaId)] :
    results,
    [],
  );

// Create a message which will be shown to the user
// @param {number} datapointId - the id of the datapoint where the message will appear (null for "global" messages).
// @param {String} messageType - the type of the message, any of {info|warning|error}. Errors prevent confirmation in the UI.
// @param {String} messageContent - the message shown to the user
// @returns {Object} - the JSON message definition (see https://example.rossum.app/api/docs/#annotation-content-event-response-format)

const createMessage = (type, content, datapointId = null) => ({
  content: content,
  type: type,
  id: datapointId,
});

// Replace the value of the datapoint with a new value.
// @param {Object} datapoint - the content of the datapoint
// @param {string} - the new value of the datapoint
// @return {Object} - the JSON replace operation definition (see https://example.rossum.app/api/docs/#annotation-content-event-response-format)

const createReplaceOperation = (datapoint, newValue) => ({
  op: 'replace',
  id: datapoint.id,
  value: {
    content: {
      value: newValue,
    },
  },
});

To implement a serverless function, create a hook object of type function. Use code object config attribute to specify a serialized source code. You can use a code editor built-in to the Rossum UI, which also allows to test and debug the function before updating the code of the function itself.

See Python and NodeJS examples of a serverless function implementation next to this section or check out this article (and others in the relevant section).

If there is an issue with an extension code itself, it will be displayed as CallFunctionException in the annotation view. Raising this exception usually means issues such as:

Testing

To write, test and debug a serverless function, you can refer to this guide.

Limitations

By default, no internet access is allowed from a serverless function, except the Rossum API. If your functions require internet access to work properly, e.g. when exporting data over API to ERP system, please let us know at product@rossum.ai.

Access Rossum API

The access to the Rossum API is granted through a proxy server, HTTPS_PROXY environment variable should be used to get its URL. See examples below to see how to access Rossum API from a serverless function. Python's urllib.request can handle HTTPS proxy from environment variable on its own. For Node.js the https.globalAgent is set to an https-proxy-agent instance if present in the selected library pack.

Python code snippet to access Rossum API to get a list of queue names

import json
import urllib.request

def rossum_hook_request_handler(payload):
    request = urllib.request.Request(
      "https://<example>.rossum.app/api/v1/queues",
      headers={"Authorization": "Bearer " + payload["rossum_authorization_token"]}
    )
    with urllib.request.urlopen(request) as response:
        queues = json.loads(response.read())
    queue_names = (q["name"] for q in queues["results"])
    return {"messages": [{"type": "info", "content": ", ".join(queue_names)}]}

NodeJS code snippet to access Rossum API to get a list of queue names

exports.rossum_hook_request_handler = async (payload) => {
  const token = payload.rossum_authorization_token;

  queues = JSON.parse(await getFromRossumApi("https://<example>.rossum.app/api/v1/queues", token));
  queue_names = queues.results.map(q => q.name).join(", ")
  return { "messages": [{"type": "info", "content": queue_names}] };
}

const getFromRossumApi = async (url, token) => {
  var http = require('http');
  const proxy = new URL(process.env.HTTPS_PROXY);
  const options = {
    hostname: proxy.hostname,
    port: proxy.port,
    path: url,
    method: 'GET',
    headers: {
      'Authorization': 'token ' + token,
    },
  };
  const response = await new Promise((resolve, reject) => {
    let dataString = '';
    const req = http.request(options, function(res) {
      res.on('data', chunk => {
        dataString += chunk;
      });
      res.on('end', () => {
        resolve({
          statusCode: 200,
          body: dataString
        });
      });
    });
    req.on('error', (e) => {
      reject({
        statusCode: 500,
        body: 'Something went wrong!'
      });
    });
    req.end()
  });
  return response.body
}

Connector Extension

The connector component is aimed at two main use-cases: applying custom business rules during data validation, and direct integration of Rossum with downstream systems.

The connector component receives two types of callbacks - an on-the-fly validation callback on every update of captured data, and an on-export save callback when the document capture is finalized.

The custom business rules take use chiefly of the on-the-fly validation callback. The connector can auto-validate and transform both the initial AI-based extractions and each user operator edit within the validation screen; based on the input, it can push user-visible messages and value updates back to Rossum. This allows for both simple tweaks (like verifying that two amounts sum together or transforming decimal points to thousand separators) and complex functionality like intelligent PO match.

The integration with downstream systems on the other hand relies mainly on the save callback. At the same moment a document is exported from Rossum, it can be imported to a downstream system. Since there are typically constraints on the captured data, these constraints can be enforced even within the validation callback.

Implement a connector

Connectors are designed to be implemented using a push-model using common HTTPS protocol. When annotation data is changed, or when data export is triggered, specific connector endpoint is called with annotation data as a request payload. The connector must be deployed with a public IP address so that the Rossum platform can call its endpoints; for testing, a middleware like ngrok or serveo may come useful.

Example of a valid no-op (empty) validate response

{"messages": [], "updated_datapoints": []}

Example of a valid no-op (empty) save response

{}

The connector API consists of two endpoints, validate and save, described below. A connector must always implement both endpoints (though they may not necessarily perform a function in a particular connector - see the right column for an empty reply example), the platform raises an error if it is not able to run a endpoint.

Setup a connector

The next step after implementing the first version of a connector is configuring it in the Rossum platform.

In Rossum, a connector object defines service_url and params for construction of HTTPS requests and authorization_token that is passed in every request to authenticate the caller as the actual Rossum server. It may also uniquely identify the organization when multiple Rossum organizations share the same connector server.

To set-up a connector for a queue, create a connector object using either our API or the rossum tool – follow these instructions. A connector object may be associated with one or more queues. One queue can only have one connector object associated with it.

Connector API

Example data sent to connector (validate, save)

{
  "meta": {
    "document_url": "https://<example>.rossum.app/api/v1/documents/6780",
    "arrived_at": "2019-01-30T07:55:13.208304Z",
    "original_file": "https://<example>.rossum.app/api/v1/original/bf0db41937df8525aa7f3f9b18a562f3",
    "original_filename": "Invoice.pdf",
    "queue_name": "Invoices",
    "workspace_name": "EU",
    "organization_name": "East West Trading Co",
    "annotation": "https://<example>.rossum.app/api/v1/annotations/4710",
    "queue": "https://<example>.rossum.app/api/v1/queues/63",
    "workspace": "https://<example>.rossum.app/api/v1/workspaces/62",
    "organization": "https://<example>.rossum.app/api/v1/organizations/1",
    "modifier": "https://<example>.rossum.app/api/v1/users/27",
    "updated_datapoint_ids": ["197468"],
    "modifier_metadata": {},
    "queue_metadata": {},
    "annotation_metadata": {},
    "rir_poll_id": "54f6b9ecfa751789f71ddf12",
    "automated": false
  },
  "content": [
    {
      "id": "197466",
      "category": "section",
      "schema_id": "invoice_info_section",
      "children": [
        {
          "id": "197467",
          "category": "datapoint",
          "schema_id": "invoice_number",
          "page": 1,
          "position": [916, 168, 1190, 222],
          "rir_position": [916, 168, 1190, 222],
          "rir_confidence": 0.97657,
          "value": "FV103828806S",
          "validation_sources": ["score"],
          "type": "string"
        },
        {
          "id": "197468",
          "category": "datapoint",
          "schema_id": "date_due",
          "page": 1,
          "position": [938, 618, 1000, 654],
          "rir_position": [940, 618, 1020, 655],
          "rir_confidence": 0.98279,
          "value": "12/22/2018",
          "validation_sources": ["score"],
          "type": "date"
        },
        {
          "id": "197469",
          "category": "datapoint",
          "schema_id": "amount_due",
          "page": 1,
          "position": [1134, 1050, 1190, 1080],
          "rir_position": [1134, 1050, 1190, 1080],
          "rir_confidence": 0.74237,
          "value": "55.20",
          "validation_sources": ["human"],
          "type": "number"
        }
      ]
    },
    {
      "id": "197500",
      "category": "section",
      "schema_id": "line_items_section",
      "children": [
        {
          "id": "197501",
          "category": "multivalue",
          "schema_id": "line_items",
          "children": [
            {
              "id": "198139",
              "category": "tuple",
              "schema_id": "line_item",
              "children": [
                {
                  "id": "198140",
                  "category": "datapoint",
                  "schema_id": "item_desc",
                  "page": 1,
                  "position": [173, 883, 395, 904],
                  "rir_position": null,
                  "rir_confidence": null,
                  "value": "Red Rose",
                  "validation_sources": [],
                  "type": "string"
                },
                {
                  "id": "198142",
                  "category": "datapoint",
                  "schema_id": "item_net_unit_price",
                  "page": 1,
                  "position": [714, 846, 768, 870],
                  "rir_position": null,
                  "rir_confidence": null,
                  "value": "1532.02",
                  "validation_sources": ["human"],
                  "type": "number"
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

All connector endpoints, representing particular points in the document lifetime, are simple verbs that receive a JSON POSTed and potentially expect a JSON returned in turn.

The authorization type and authorization token is passed as an Authorization HTTP header. Authorization type may be secret_key (shared secret) or Basic for HTTP basic authentication.

Please note that for Basic authentication, authorization_token is passed as-is, therefore it must be set to a correct base64 encoded value. For example username connector and password secure123 is encoded as Y29ubmVjdG9yOnNlY3VyZTEyMw== authorization token.

Connector requests may be autenticated using a client SSL certificate, see Connector API for reference.

Errors

If a connector does not implement an endpoint, it may return HTTP status 404. An endpoint may fail, returning either HTTP 4xx or HTTP 5xx; for some endpoints (like validate and save), this may trigger a user interface message; either the error key of a JSON response is used, or the response body itself in case it is not JSON. The connector endpoint save can be called in asynchronous (default) as well as synchronous mode (useful for embedded mode).

Data format

The received JSON object contains two keys, meta carrying the metadata and content carrying endpoint-specific content.

The metadata identify the concerned document, containing attributes:

Key Type Description
document_url URL document URL
arrived_at timestamp A time of document arrival in Rossum (ISO 8601)
original_file URL Permanent URL for the document original file
original_filename string Filename of the document on arrival in Rossum
queue_name string Name of the document's queue
workspace_name string Name of the document's workspace
organization_name string Name of the document's organization
annotation URL Annotation URL
queue URL Document's queue URL
workspace URL Document's workspace URL
organization URL Document's organization URL
modifier URL Modifier URL
modifier_metadata object Metadata attribute of the modifier, see metadata
queue_metadata object Metadata attribute of the queue, see metadata
annotation_metadata object Metadata attribute of the annotation, see metadata
rir_poll_id string Internal extractor processing id
updated_datapoint_ids list[string] Ids of objects that were recently modified by user
automated bool Flag whether annotation was automated

A common class of content is the annotation tree, which is a JSON object that can contain nested datapoint objects, and matches the schema datapoint tree.

Intermediate nodes have the following structure:

Key Type Description
id integer A unique id of the given node
schema_id string Reference mapping the node to the schema tree
category string One of section, multivalue, tuple
children list A list of other nodes

Datapoint (leaf) nodes structure contains actual data:

Key Type Description
id integer A unique id of the given node
schema_id string Reference mapping the node to the schema tree
category string datapoint
type string One of string, date or number, as specified in the schema
value string The datapoint value, string represented but normalizes, to that they are machine readable: ISO format for dates, a decimal for numbers
page integer A 1-based integer index of the page, optional
position list[float] List of four floats describing the x1, y1, x2, y2 bounding box coordinates
rir_position list[float] Bounding box of the value as detected by the data extractor. Format is the same as for position.
rir_confidence float Confidence (estimated probability) that this field was extracted correctly.

Annotation lifecycle with a connector

If an asynchronous connector is deployed to a queue, an annotation status will change from reviewing to exporting and subsequently to exported or failed_export. If no connector extension is deployed to a queue or if the attribute asynchronous is set to false, an annotation status will change from reviewing to exported (or failed_export) directly.

Endpoint: validate

This endpoint is called after the document processing has finished, when operator opens a document in the Rossum verification interface and then every time after operator updates a field. After the processing is finished, the initial validate call is marked with initial=true URL parameter. For the other calls, only /validate without the parameter is called.

The request path is fixed to /validate and cannot be changed.

It may:

Both the messages and the updated data are shown in the verification interface. Moreover, the messages may block confirmation in the case of errors.

This endpoint should be fast as it is part of an interactive workflow.

Receives an annotation tree as content.

Returns a JSON object with the lists: messages, operations and updated_datapoints.

Keys messages, operations (optional)

The description of these keys was moved to the Hook Extension. See the description here.

Key updated_datapoints (optional, deprecated)

We also support a simplified version of updates using updated_datapoints response key. It only supports updates (no add or remove operations) and is now deprecated. The updated datapoint object contains attributes:

Key Type Description
id string A unique id of the relevant datapoint, currently only datapoints of category datapoint can be updated
value string New value of the datapoint. Value is formatted according to the datapoint type (e.g. date is string representation of ISO 8601 format).
hidden boolean Toggle for hiding/showing of the datapoint, see datapoint
options list[object] Options of the datapoint -- valid only for type=enum, see enum options
position list[float] New position of the datapoint, list of four numbers.

Validate endpoint should always return 200 OK status.

An error message returned from the connector prevents user from confirming the document.

Endpoint: save

This endpoint is called when the invoice transitions to the exported state. Connector may process the final document annotation and save it to the target system. It receives an annotation tree as content. The request path is fixed to /save and cannot be changed.

The save endpoint is called asynchronously (unless synchronous mode is set in related connector object. Timeout of the save endpoint is 60 seconds.

For successful export, the request should have 2xx status.

Example of successful save response without messages in UI

HTTP/1.1 204 No Content
HTTP/1.1 200 OK
Content-Type: text/plain

this response body is ignored
HTTP/1.1 200 OK
Content-Type: application/json

{
  "messages": []
}

When messages are expected to be displayed in the UI, they should be sent in the same format as in validate endpoint.

Example of successful save response with messages in UI

HTTP/1.1 200 OK
Content-Type: application/json

{
  "messages": [
    {
      "content": "Everything is OK.",
      "id": null,
      "type": "info"
    }
  ]
}

If the endpoint fails with an HTTP error and/or message of type error is received, the document transitions to the failed_export state - it is then available to the operators for manual review and re-queuing to the to_review state in the user interface. Re-queuing may be done also programmatically via the API using a PATCH call to set to_review annotation status. Patching annotation status to exporting state triggers an export retry.

Example of unsuccessful save response with messages in UI

HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json

{
  "messages": [
    {
      "content": "Even though this message is info, the export will fail due to the status code.",
      "id": null,
      "type": "info"
    }
  ]
}
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain

An errror message "Export failed." will show up in the UI
HTTP/1.1 200 OK
Content-Type: application/json

{
  "messages": [
    {
      "content": "Proper status code could not be set.",
      "id": null,
      "type": "error"
    }
  ]
}

Custom UI Extension

Sometimes users might want to extend the behavior of UI validation view with something special. That should be the goal of custom UI extensions.

Buttons

Currently, there are two different ways of using a custom button:

  1. Popup Button - opens a specific URL in the web browser
  2. Validate Button - triggers a standard validate call to connector

If you would like to read more about how to create a button, see the Button schema.

Popup Button opens a website completely managed by the user in a separate tab. It runs in parallel to the validation interface session in the app. Such website can be used for any interface that will assist operators in the reviewing process.

Example Use Cases of Popup Button:

  1. opening an email linked to the annotated document
  2. creating a new item in external database according to extracted data
Communication with the Validation Interface

You can communicate with the validation interface directly using standard browser API of window.postMessage. You will need to use window.addEventListeners in order to receive messages from the validation interface:

window.addEventListener('message', ({ data: { type, result } }) => {
  // logic
});

The shape of the result key is the same as the top level content attribute of the annotation data response.

Once the listener is in place, you can post one of supported message types:

window.opener.postMessage(
  { type: 'GET_DATAPOINTS' },
  'https://<example>.rossum.app'
)
window.opener.postMessage(
  {
    type: 'UPDATE_DATAPOINT',
    data: {id: DATAPOINT_ID, value: "Updated value"}
  },
  'https://<example>.rossum.app'
)
window.opener.postMessage(
  { type: 'FINISH' },
  'https://<example>.rossum.app'
);

Providing message type to postMessage lets Rossum interface know what operation user requests and determines the type of the answer which could be used to match appropriate response.

Validate button

If popup_url key is missing in button’s schema, clicking the button will trigger a standard validate call to connector. In such call, updated_datapoint_ids will contain the ID of the pressed button.

Note: if you’re missing some annotation data that you’d like to receive in a similar way, do contact our support team. We’re collecting feedback to further expand this list.

Extension Logs

For easy and efficient development process of the extensions, our backend logs requests, responses (if enabled) and additional information, when the hook is being called.

Hook Log

The hook log objects consist of following attributes, where it also differentiates between the hook events as follows:

Base Hook Log object

These attributes are included in all the logs independent of the hook event

Key Type Description Optional
timestamp* str Timestamp of the log-record
request_id UUID Hook call request ID
event string Hook's event
action string Hook's action
organization_id int ID of the associated Organization.
queue_id int ID of the associated Queue. true
hook_id int ID of the associated Hook.
hook_type str Hook type. Possible values: webhook, function
log_level str Log-level. Possible values: INFO, ERROR, WARNING
message str A log-message
request str Raw request sent to the Hook true
response str Raw response received from the Hook true

*Timestamp is of the ISO 8601 format with UTC timezone e.g. 2023-04-21T07:58:49.312655

Annotation Content or Annotation Status Hook Events

In addition to the Base Hook Log object, the annotation content and annotation status event hook logs contains the following attributes:

Key Type Description Optional
annotation_id int ID of the associated Annotation. true

Email Hook Events

In addition to the Base Hook Log object, the email event hook logs contains the following attributes:

Key Type Description Optional
email_id int ID of the associated Email. true

Source IP Address ranges

Rossum will use these source IP addresses for outgoing connections to your services (e.g. when sending requests to a webhook URL):

Europe (Ireland):

Europe 2 (Frankfurt):

US (N. Virginia):

JP (Tokyo):

You can use the list to limit incoming connections on a firewall. The list may be updated eventually, please update your configuration at least once per three months.

If you have a customer-specific deployment, contact Rossum support for a specific IP list.

Automation

All imported documents are processed by the data extraction process to obtain values of fields specified in the schema. Extracted values are then available for validation in the UI.

Using per-queue automation settings, it is possible to skip manual UI validation step and automatically switch document to confirmed state or proceed with the export of the document. Decision to export document or switch it to confirmed state is based on Queue settings.

Currently, there are three levels of automation:

Read more about the Automation framework on our developer hub.

Sources of field validation

Low-confidence fields are marked in the UI by an "eye" icon, we consider them to be not validated. On the API level they have an empty validation_sources list.

Validation of a field may be introduced by various sources: data extraction confidence above a threshold, computation of various checksums (e.g. VAT rate, net amount and gross amount) or a human review. These validations are recorded in the validation_source list. The data extraction confidence threshold may be adjusted, see validation sources for details.

AI Confidence Scores

While there are multiple ways to automatically pre-validate fields, the most prominent one is score-based validation based on AI Core Engine confidence scores.

The confidence score predicted for each AI-extractd field is stored in the rir_confidence attribute. The score is a number between 0 and 1, and is calibrated in such a way that it corresponds to the probability of a given value to be correct. In other words, a field with score 0.80 is expected to be correct 4 out of 5 times.

The value of the score_threshold (can be set on queue, or individually per datapoint in schema; default is 0.8) attribute represents the minimum score that triggers automatic validation. Because of the score meaning, this directly corresponds to the achieved accuracy. For example, if a score threshold for validation is set at 0.8, that gives an expected error rate of 20% for that field.

Autopilot

Autopilot is a automatic process removing "eye" icon from fields. This process is based on past occurrence of field value on documents which has been already processed in the same queue.

Read more about this Automation component on our developer hub.

Autopilot configuration

Default Autopilot configuration

{
  "autopilot": {
    "enabled": true,

    "search_history":{
      "rir_field_names": ["sender_ic", "sender_dic", "account_num", "iban", "sender_name"],
      "matching_fields_threshold": 2
    },
    "automate_fields":{
      "rir_field_names": [
        "account_num",
        "bank_num",
        "iban",
        "bic",
        "sender_dic",
        "sender_ic",
        "recipient_dic",
        "recipient_ic",
        "const_sym"
      ],
      "field_repeated_min": 3
    }
  }
}

Autopilot configuration can be modified in Queue.settings where you can set rules for each queue. If Autopilot is not explicitly disabled by switch enabled set to false, Autopilot is enabled.

Configuration is divided into two sections:

This section configures process of finding documents from the same sender as the document which is currently being processed. Annotation is considered from the same sender if it contains fields with same rir_field_name and value as the current document.

When at least two fields listed in rir_field_names match values of the current document, document is is considered to have same sender

{
  "search_history":{
    "rir_field_names": ["sender_ic", "sender_dic", "account_num"],
    "matching_fields_threshold": 2
  }
}
Attribute Type Description
rir_field_names list List of rir_field_names used to find annotations from the same sender. This should contain fields which are unique for each sender. For example sender_ic or sender_dic.
Please note that due to technical reasons it is not possible to use document_type in this field and it will be ignored.
matching_fields_threshold int At least matching_fields_threshold fields must match current annotation in order to be considered from the same sender. See example on the right side.

Automate fields

This section describes rules which will be applied on annotations found in previous step History search. Field will have "eye" icon removed, if we have found at least field_repeated_min fields with same rir_field_name and value on documents found in step History search.

Attribute Type Description
rir_field_names list List of rir_field_names which can be validated based on past occurrence
field_repeated_min int Number of times field must be repeated in order to be validated

If any config section is missing, default value which you can see on the right side is applied.

Using Triggers

Trigger REST operations can be found here

When an event occurs, all triggers of that type will perform actions of their related objects:

Related object Action Description
Email template Send email with the template to the event triggerer if automate=true Automatically respond to document vendors based on the document's content. The document has to come from an email
Delete recommendations Stop automation if one of the validation rules applies to the processed document Based on the user's rules for delete recommendations, stop automation for the document which applies to these rules. The document requires manual evaluation

Trigger Event Types

Trigger objects can have one of the following event types

Trigger Event type Description (Trigger for an event of)
email_with_no_processable_attachments An Email has been received without any processable attachments
annotation_created Processing of the Annotation started (Rossum received the Annotation)
annotation_imported Annotation data have been extracted by Rossum
annotation_confirmed Annotation was checked and confirmed by user (or automated)
annotation_exported Annotation was exported
validation Document is being validated

Trigger Events Occurrence Diagram

To show an overview of the Trigger events and when they are happening, this diagram was created.

Trigger Events

Trigger Condition

Simple condition validating the presence of vendor_id equal to Meat ltd.

{
  "$and": [
    {
      "field.vendor_id": {
        "$and": [
          {"$exists": true},
          {"$regex": "Meat ltd\\."}
        ]
      }
    }
  ]
}

Any required field is missing

{
  "$and": [
    {"required_field_missing": true}
  ]
}

At least one of the iban, date_due, and sender_vat_id fields is missing

{
  "$and": [
    {
      "missing_fields": {
        "$elemMatch": {
          "$in": ["iban", "date_due", "sender_vat_id"]
        }
      }
    }
  ]
}

Will match if a required field is missing in the annotation, and the annotation contains a vendor_id field with a value that does match Milk( inc\.)? regex. Or in other words, the trigger will activate if the Milk company sent us an invoice with missing data

{
  "$and": [
    {
      "field.vendor_id": {
        "$and": [
          {"$exists": true},
          {"$regex": "Milk( inc\\.)?"}
        ]
      }
    },
    {"required_field_missing": true}
  ]
}

Will match if at least one of the document_type (Receipt, Other), language (CZ, EN, CH), or currency (USD, CZK) field match.

{
  "$or": [
    {
      "field.document_type": {
          "$in": ["Receipt", "Other"]
      },
      "field.language": {
          "$in": ["CZ", "EN", "CH"]
      },
      "field.currency": {
          "$in": ["CZK", "USD"]
      }
    }
  ]
}

Will match if filename is a subset of the specified regular expression.

{
  "$or": [
    {
      "filename": {"$regex": "Milk( inc\\.)?"}
    }
  ]
}

Will match if filename is a subset of one of the specified regular expressions.

{
  "$or": [
    {
      "filename": {
        "$or": [
          {"$regex": "Milk( inc\\.)?"},
          {"$regex": "Barn( inc\\.)?"}
        ]
      }
    }
  ]
}

Will match if a number of pages in the processed document is higher than the specified threshold.

{
  "$or": [
    {
      "number_of_pages": {
        "$gt": 10
      }
    }
  ]
}

A subset of MongoDB Query Language. The annotation will get converted into JSON records behind the scenes. The trigger gets activated if at least one such record matches the condition according to the MQL query rules. A null condition matches any record, just like {}. Record format:

   {
     "field": {
       "{schema_id}": string | null,
     },
     "required_field_missing": boolean,
     "missing_fields": string[],
   }

Supported MQL subset based on the trigger event type:

All trigger event types:

   {}

Only annotation_imported, annotation_confirmed, and annotation_exported trigger event types:

   {
     "$and": [
       {"field.{schema_id}": {"$and": [{"$exists": true}, REGEX]}}
     ]
   }

Only annotation_imported trigger event type:

   {
     "$and": [
       {"field.{schema_id}": {"$and": [{"$exists": true}, REGEX]}},
       {"required_field_missing": true},
       {"missing_fields": {"$elemMatch": {"$in": list[str[schema_id]]}}
     ]
   }

Only validation trigger event type:

   {
     "$or": [
       {"field.document_type": {"$in": list[str[document_type]]},
       {"field.language": {"$in": list[str[language]]},
       {"field.currency": {"$in": list[str[currency]]},
       {"number_of_pages": {"$gt": 10},
       {"filename": REGEX}
     ]
   }
   {
     "$or": [
       {"field.document_type": {"$in": list[str[document_type]]},
       {"field.language": {"$in": list[str[language]]},
       {"field.currency": {"$in": list[str[currency]]},
       {"number_of_pages": {"$gt": 10},
       {"filename": {"$or": [REGEX, REGEX]}
     ]
   }
Field Required Description
field.{schema_id} A field contained in the Annotation data. The schema_id is the schema id it got extracted under
required_field_missing Any of the schema-required fields is missing. (*) Can not be combined with missing_fields
missing_fields At least one of the schema fields is missing. (*) Can not be combined with required_field_missing
field.{validation_field} A field contained a list of Delete Recommendation data. The validation_field is the schema id it got extracted under
number_of_pages A threshold value for the number of pages. A document with more pages is matched by the trigger.
filename The filename or subset of filenames of the document is to match.
REGEX true Either {"$regex": re2} or {"$not": {"$regex": re2}}**. Uses re2 regex syntax

(*) A field is considered missing if no value for it was extracted by the extraction engine with rir_confidence score of at least 0.95.

(**) The $not option for REGEX is not valid for the validation trigger.

Triggering Email Templates

Email template REST operations can be found here.

To set up email template trigger automation, link an email template object to a trigger object and set its automate attribute to true. Currently, only one trigger can be linked. To set up the recipient(s) of the automated emails, you can use built-in placeholders or direct values in the to, cc, and bcc fields in email templates.

Only some email template types and some trigger event types can be linked together:

Template type Allowed trigger events
custom *
email_with_no_processable_attachments email_with_no_processable_attachments
rejection annotation_imported
rejection_default annotation_imported

Email templates of type rejection and rejection_default will also reject the associated annotation when triggered.

Every newly created queue has default email templates. Some of them have a trigger linked, including an email template of type email_with_no_processable_attachments which can not have its trigger unlinked or linked to another trigger. To disable its automation, set its automate attribute to false.

Triggering Validation

Delete Recommendation REST operations can be found here.

To set up validation trigger automation, specify the rules for validation and set its enabled attribute to true.

This trigger is only valid for the validation trigger event.

Hooks and Triggers Workflow

Sometimes it may happen that there is a need to know, what triggers and hooks and when are they run. That can be found in this workflow.

Hook and Trigger Events Order

Workflows

This feature must be explicitly enabled in queue settings.

Approval workflows

Approval workflows allow you to define multiple steps of approval process.

The workflow is started when the data extraction process is done (annotation is confirmed) - it enters in_workflow status. Then the annotation must be approved by defined approvers in order to be moved further (confirmed or exported status).

The annotation is moved to rejected status if one of the assignees rejects it.

The current status of workflow is stored in workflow run object. All the events that happened during workflow can be tracked down by workflow activity resources.

Embedded Mode

In some use-cases, it is desirable to use only the per-annotation validation view of the Rossum application. Rossum may be integrated with other systems using so-called embedded mode.

In embedded mode, special URL is constructed and then used in iframe or popup browser window to show Rossum annotation view. Some view navigation widgets are hidden (such as home, postpone and delete buttons), so that user is only allowed to update and confirm all field values.

Embedded mode can be used to view annotations only in status to_review, reviewing, postponed, or confirmed.

Embedded mode workflow

The host application first uploads a document using standard Rossum API. During this process, an annotation object is created. It is possible to obtain a status of the annotation object and wait for the status to become to_review (ready for checking) using annotation endpoint.

As soon as importing of the annotation object has finished, an authenticated user may call start_embedded endpoint to obtain a URL that is to be included in iframe or popup browser window of the host application. Parameters of the call are return_url and cancel_url that are used to redirect to in a browser when user finishes the annotation.

The URL contains security token that is used by embedded Rossum application to access Rossum API. When the checking of the document has finished, user clicks on done button and host application is notified about finished annotation through save endpoint of the connector HTTP API. By default, this call is made asynchronously, which causes a lag (up to a few seconds) between the click on done button and the call to save endpoint. However, it is possible to switch the calls to synchronous mode by switching the connector asynchronous toggle to false (see connector for reference).

API Reference

For introduction to the Rossum API, see Overview

Most of the API endpoints require user to be authenticated, see Authentication for details.

Annotation

Example annotation object

{
  "document": "https://<example>.rossum.app/api/v1/documents/314628",
  "id": 314528,
  "queue": "https://<example>.rossum.app/api/v1/queues/8199",
  "schema": "https://<example>.rossum.app/api/v1/schemas/95",
  "relations": [],
  "pages": [
    "https://<example>.rossum.app/api/v1/pages/558598"
  ],
  "creator": "https://<example>.rossum.app/api/v1/users/1",
  "modifier": null,
  "modified_by": null,
  "assigned_at": null,
  "created_at": "2021-04-26T10:08:03.856648Z",
  "confirmed_at": null,
  "deleted_at": null,
  "exported_at": null,
  "export_failed_at": null,
  "modified_at": null,
  "purged_at": null,
  "rejected_at": null,
  "confirmed_by": null,
  "deleted_by": null,
  "exported_by": null,
  "purged_by": null,
  "rejected_by": null,
  "status": "to_review",
  "rir_poll_id": "54f6b9ecfa751789f71ddf12",
  "messages": null,
  "url": "https://<example>.rossum.app/api/v1/annotations/314528",
  "content": "https://<example>.rossum.app/api/v1/annotations/314528/content",
  "time_spent": 0,
  "metadata": {},
  "related_emails": [],
  "email": "https://<example>.rossum.app/api/v1/emails/96743",
  "automation_blocker": null,
  "email_thread": "https://<example>.rossum.app/api/v1/email_threads/34567",
  "has_email_thread_with_replies": true,
  "has_email_thread_with_new_replies": false,
  "organization": "https://<example>.rossum.app/api/v1/organizations/1",
  "prediction": null,
  "assignees": [],
  "labels": []
}

An annotation object contains all extracted and verified data related to a document. Every document belongs to a queue and is related to the schema object, that defines datapoint types and overall shape of the extracted data.

Commonly you need to use queue the upload endpoint to create annotations instances.

Attribute Type Default Description Read-only
id integer Id of the annotation true
url URL URL of the annotation true
status enum Status of the document, see Document Lifecycle for list of value.
document URL Related document.
queue URL Queue that annotation belongs to.
schema URL Schema that defines content shape.
relations list[URL] (Deprecated) List of relations that annotation belongs to.
pages list[URL] List of rendered pages. true
creator URL User that created the annotation. true
created_at datetime Timestamp of object's creation. true
modifier URL User that last modified the annotation.
modified_by URL User that last modified the annotation.
modified_at datetime Timestamp of last modification. true
assigned_at datetime Timestamp of last assignment to a user or when the annotation was started being annotated. true
confirmed_at datetime Timestamp when the annotation was moved to status confirmed. true
deleted_at datetime Timestamp when the annotation was moved to status deleted. true
exported_at datetime Timestamp of finished export. true
export_failed_at datetime Timestamp of failed export. true
purged_at datetime Timestamp when was annotation purged. true
rejected_at datetime Timestamp when the annotation was moved to status rejected. true
confirmed_by URL User that confirmed the annotation. true
deleted_by URL User that deleted the annotation. true
exported_by URL User that exported the annotation. true
purged_by URL User that purged the annotation. true
rejected_by URL User that rejected the annotation. true
rir_poll_id string Internal.
messages list[object] [] List of messages from the connector (save).
content URL Link to annotation data (datapoint values), see Annotation data. true
suggested_edit URL Link to Suggested edit object. true
time_spent float 0 Total time spent while validating the annotation.
metadata object {} Client data.
automated boolean false Whether annotation was automated
related_emails list[URL] List emails related with annotation. true
email URL Related email that the annotation was imported by (for annotations imported by email). true
automation_blocker URL Related automation blocker object. true
email_thread URL Related email thread object. true
has_email_thread_with_replies bool Related email thread contains more than one incoming email. true
has_email_thread_with_new_replies bool Related email thread contains an unread incoming email. true
organization URL Link to related organization. true
automatically_rejected bool Read-only field of automatically_rejected annotation true
prediction object Internal. true
assignees list[URL] List of assigned users (only for internal purposes). true
labels list[URL] List of selected labels true
restricted_access bool false Access to annotation is restricted true

Start annotation

Start annotation of object 319668

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/319668/start'
{
  "annotation": "https://<example>.rossum.app/api/v1/annotations/319668",
  "session_timeout": "01:00:00"
}

POST /v1/annotations/{id}/start

Start reviewing annotation by the calling user. Can be called with statuses payload to specify allowed statuses for starting annotation. Returns 409 Conflict if annotation fails to be in one of the specified states.

Attribute Type Default Description required
statuses list[str] ["to_review", "reviewing", "postponed", "confirmed"] List of allowed states for the starting annotation to be in false

Response

Status: 200

Returns object with annotation and session_timeout keys.

Start embedded annotation

Start embedded annotation of object 319668

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"return_url": "https://service.com/return", "cancel_url": "https://service.com/cancel"}' \
  'https://<example>.rossum.app/api/v1/annotations/319668/start_embedded'
{
  "url": "https://<example>.rossum.app/embedded/document/319668#authToken=1c50ae8552441a2cda3c360c1e8cb6f2d91b14a9"
}

POST /v1/annotations/{id}/start_embedded

Start embedded annotation.

Key Description Required
return_url URL browser is redirected to in case of successful user validation No
cancel_url URL browser is redirected to in case of user canceling the annotation No
postpone_url URL browser is redirected to in case of user postponing the annotation No
delete_url URL browser is redirected to in case of user deleting the annotation No
max_token_lifetime_s Duration (in seconds) for which the token will be valid (default: queue's session_timeout, max: 162 hours) No

Response

Status: 200

Returns object with url that specifies URL to be used in the browser iframe/popup window. URL includes a token that is valid for this document only for a limited period of time.

Create embedded URL for annotation

Create embedded URL for annotation object 319668

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"return_url": "https://service.com/return", "cancel_url": "https://service.com/cancel"}' \
  'https://<example>.rossum.app/api/v1/annotations/319668/create_embedded_url'
{
  "url": "https://<example>.rossum.app/embedded/document/319668#authToken=1c50ae8552441a2cda3c360c1e8cb6f2d91b14a9",
  "status": "exported"
}

POST /v1/annotations/{id}/create_embedded_url

Similar to start embedded annotation endpoint but can be called for annotations with all statuses and does not switch status.

Key Description Required
return_url URL browser is redirected to in case of successful user validation No
cancel_url URL browser is redirected to in case of user canceling the annotation No
postpone_url URL browser is redirected to in case of user postponing the annotation No
delete_url URL browser is redirected to in case of user deleting the annotation No
max_token_lifetime_s Duration (in seconds) for which the token will be valid (default: queue's session_timeout, max: 162 hours) No

Response

Status: 200

Key Type Description
url str URL to be used in the browser iframe/popup window. URL includes a token that is valid for this document only for a limited period of time.
status enum Status of annotation, see annotation lifecycle.

Confirm annotation

Confirm annotation of object 319668

Key Default Description Required
skip_workflows False Whether to skip workflows evaluation. Read more about workflows here. bypass_workflows_allowed must be set to true in workflows queue settings in order to use this feature No
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/319668/confirm'

POST /v1/annotations/{id}/confirm

Confirm annotation, switch status to exported (or exporting). If the confirmed state is enabled, this call moves the annotation to the confirmed status.

Confirm annotation can optionally accept time spent data as described in annotation time spent, for internal use only.

Response

Status: 204

Cancel annotation

Cancel annotation of object 319668

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/319668/cancel'

POST /v1/annotations/{id}/cancel

Cancel annotation, switch its status back to to_review or postponed.

Cancel annotation can optionally accept time spent data as described in annotation time spent, for internal use only.

Response

Status: 204

Approve annotation

Approve annotation of object 319668

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -d '{}' \
  'https://<example>.rossum.app/api/v1/annotations/319668/approve'

POST /v1/annotations/{id}/approve

Approve annotation, switch its status to exporting or confirmed, or it stays in in_workflow, depending on the evaluation of the current workflow step

Only admin, organization group admin, or an assigned user with approver role can approve annotation in this state. A workflow activity record object will be created.

Response

Status: 200

Key Type Description
status string New status of the annotation

Assign annotation

Assign annotation 319668 to the user 1122

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -d '{"annotations": ["https://<example>.rossum.app/api/v1/annotations/319668", \
  "assignees": ["https://<example>.rossum.app/api/v1/users/1122"], \
  "note_content": "I just want to reassign as I do not care about it"]}' \
  'https://<example>.rossum.app/api/v1/annotations/assing'

POST /v1/annotations/assign

Change assignees of the annotation.

Key Type Description Required Default
annotations list[URL] List of annotations to change the assignees of (currenlty we support only one annotation at a time) yes
assignees list[URL] List of users to be added as annotation assignees yes
note_content string Content of the note that will be added to the workflow activity of action reassign (only applicable for annotation in in_workflow state) no ""

Response

Status: 204

Reject annotation

Reject annotation of object 319668

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -d '{"note_content": "Rejected due to invalid due date."}' \
  'https://<example>.rossum.app/api/v1/annotations/319668/reject'

POST /v1/annotations/{id}/reject

Reject annotation, switch its status to rejected.

Key Description Required Default
note_content Rejection note No ""
automatically_rejected For internal use only (designates whether annotation is displayed as automatically rejected) in the statistics No false

Reject annotation can optionally accept time spent data as described in annotation time spent, for internal use only.

If rejecting in in_workflow state, the annotation.workflow_run.workflow_status will also be set to rejected and a workflow activity record object will be created. Only admin, organization group admin, or an assigned user can approve annotation in this state.

Response

Status: 200

Key Type Description
status string New status of the annotation (rejected).
note URL Link to Note object.

Switch to postponed

Postpone annotation status of object 319668 to postponed

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/319668/postpone'

POST /v1/annotations/{id}/postpone

Switch annotation status to postpone.

Postpone annotation can optionally accept time spent data as described in annotation time spent, for internal use only.

Response

Status: 204

Switch to deleted

Switch annotation status of object 319668 to deleted

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/319668/delete'

POST /v1/annotations/{id}/delete

Switch annotation status to deleted. Annotation with status deleted is still available in Rossum UI.

Delete annotation can optionally accept time spent data as described in annotation time spent, for internal use only.

Response

Status: 204

Rotate the annotation

Rotate the annotation 319668

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Type:application/json' -d '{"rotation_deg": 270}' \
  'https://<example>.rossum.app/api/v1/annotations/319668/rotate"

POST /v1/annotations/{id}/rotate

Rotate a document. It requires one parameter: rotation_deg.

Status of the annotation is switched to importing and the extraction phase begins over again. After the new extraction, the value from rotation_deg field is copied to pages rotation field rotation_deg.

Key Description
rotation_deg States degrees by which the document shall be rotated. Possible values: 0, 90, 180, 270.

Response

Status: 204

Edit the annotation

Edit the annotation 319668

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Type:application/json' -d '{"documents": [{"pages": [{"page": "https://<example>.rossum.app/api/v1/pages/1", "rotation_deg": 90}, {"page": "https://<example>.rossum.app/api/v1/pages/2", "rotation_deg": 90}], "metadata": {"document": {"my_info": "something I want to store here"}, "annotation": {"some_key": "some value"}}}, {"pages": [{"page": "https://<example>.rossum.app/api/v1/pages/2", "rotation_deg": 180}]}]}' \
  'https://<example>.rossum.app/api/v1/annotations/319668/edit"
{
  "results": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/320551",
      "annotation": "https://<example>.rossum.app/api/v1/documents/320221"
    },
    {
      "document": "https://<example>.rossum.app/api/v1/documents/320552",
      "annotation": "https://<example>.rossum.app/api/v1/documents/320222"
    }
  ]
}

POST /v1/annotations/{id}/edit

Edit a document. It requires parameter documents that contains description of requested edits for annotations that should be created from the original annotation. Description of each edit contains list of pages and rotation degree.

If used on an annotation in a way that after the editing only one document remains, the original annotation will be edited. If multiple documents are to be created after the call, status of the original annotation is switched to split, status of the newly created annotations is importing and the extraction phase begins over again. To split the annotation into multiple annotations, consider using the latest dedicated split endpoint instead.

Key Description
documents Documents that should be created from the original annotation. Each document contains list of pages and rotation degree.

The documents object consists of following available parameters:

Key Type Description
pages list[object] A list of objects containing information about page (URL) and rotation_deg (integer)
metadata object (optional) A dictionary with attributes document and annotation for adding/updating metadata of edited annotation and its related document.

Response

Status: 200

Returns results with a list of objects:

Key Type Description
document URL URL to the document that was newly created after calling the edit endpoint.
annotation URL URL of the annotation assigned to the document.

Split the annotation

Split the annotation 319668

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Type:application/json' -d '{"documents": [{"pages": [{"page": "https://<example>.rossum.app/api/v1/pages/1", "rotation_deg": 90}, {"page": "https://<example>.rossum.app/api/v1/pages/2", "rotation_deg": 90}], "metadata": {"document": {"my_info": "something I want to store here"}, "annotation": {"some_key": "some value"}}}]}' \
  'https://<example>.rossum.app/api/v1/annotations/319668/split"
{
  "results": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/320551",
      "annotation": "https://<example>.rossum.app/api/v1/documents/320221"
    }
  ]
}

POST /v1/annotations/{id}/split

Split a document based on editing rules. It requires parameter documents that contains description of requested edits for annotations that should be created from the original annotation. Description of each edit contains list of pages and rotation degree.

When using this endpoint, status of the original annotation is switched to split, status of the newly created annotations is importing and the extraction phase begins over again.

This endpoint can be used for splitting annotations also from webhook listening to annotation_content.initialize event and action.

Key Description
documents Documents that should be created from the original annotation. Each document contains list of pages and rotation degree.

The documents object consists of following available parameters:

Key Type Description
pages list[object] A list of objects containing information about page (URL) and rotation_deg (integer)
metadata object (optional) A dictionary with attributes document and annotation for adding/updating metadata of edited annotation and its related document.

Edit annotation can optionally accept time spent data as described in annotation time spent, for internal use only.

Response

Status: 200

Returns results with a list of objects:

Key Type Description
document URL URL to the document that was newly created after calling the edit endpoint.
annotation URL URL of the annotation assigned to the document.

Edit pages Start

Start splitting the document and all its child documents.

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/111/edit_pages/start'
{
    "parent_annotation": "http://<example>.rossum.app/api/v1/annotations/111",
    "children": [
        {
            "url": "http://<example>.rossum.app/api/v1/annotations/120",
            "queue": "http://<example>.rossum.app/api/v1/queues/1",
            "status": "reviewing",
            "started": true,
            "original_file_name": "large_4.pdf",
            "parent_pages": [
                {
                    "page": "http://<example>.rossum.app/api/v1/pages/142",
                    "rotation_deg": 0
                },
                {
                    "page": "http://<example>.rossum.app/api/v1/pages/143",
                    "rotation_deg": 0
                },
                {
                    "page": "http://<example>.rossum.app/api/v1/pages/144",
                    "rotation_deg": 0
                }
            ]
        },
        {
            "url": "http://<example>.rossum.app/api/v1/annotations/119",
            "queue": "http://<example>.rossum.app/api/v1/queues/1",
            "status": "reviewing",
            "started": true,
            "original_file_name": "large_3.pdf",
            "parent_pages": [
                {
                    "page": "http://<example>.rossum.app/api/v1/pages/139",
                    "rotation_deg": 0
                },
                {
                    "page": "http://<example>.rossum.app/api/v1/pages/140",
                    "rotation_deg": 0
                },
                {
                    "page": "http://<example>.rossum.app/api/v1/pages/141",
                    "rotation_deg": 0
                }
            ]
        }
    ],
    "session_timeout": "01:00:00"
}

POST /v1/annotations/{id}/edit_pages/start

Starts editing the annotation and all its child documents (the documents into which the original document was split). The parent annotation must be in the to_review, split or reviewing state (for the calling user). This call will "lock" the parent and child annotations from being edited. It returns some basic information about the parent annotation and a list of its children. Children to which the current user does not have rights contains only limited information. If the parent annotation cannot be "locked", an error is returned. If the child annotation cannot be locked, it is skipped and sent in a response with value started=False.

Response

Status: 200

Returns object with following keys.

Key Type Description
parent_annotation URL URL of annotation
children list[object] List of child annotation objects
session_timeout string timeout in format "HH:MM:SS"

The children member object has following keys:

Key Type Description
url URL URL of the annotation
queue URL URL of the queue
status string Status of the parent annotation
started boolean was annotation started or not
original_file_name string File name of original document
parent_pages list[object] List of annotation pages from parent document with its rotation.

The parent_pages member object has following keys:

Key Type Description
page URL URL of annotation
rotation_deg integer Rotation in degrees

Status: 403

User doesn't have a right to edit parent annotation.

Edit pages Cancel

Cancel splitting the document and its child documents.

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/111/edit_pages/cancel' -d \
  '{"annotations": ["http://<example>.rossum.app/api/v1/annotations/119"], "cancel_parent": false, "processing_duration": {"time_spent": 10.0}}'

POST /v1/annotations/{id}/edit_pages/cancel

Cancel multiple started child annotations at once. By default cancel also parent annotation (optional).

Key Type Description
annotations list[URL] List of urls of child annotations to cancel. Must be in reviewing state.
cancel_parent boolean Cancel parent annotation. Optional, default true.
processing_duration object Optional processing_duration object

Response

Status: 204 on success.

Status: 400 when preconditions are not met.

Edit pages

Split the document and move one of the new child documents into different queue.

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/111/edit_pages' -d \
  '{"edit": [{"parent_pages": [{"page": "http://<example>.rossum.app/api/v1/pages/142", "rotation_deg": 90}]},{"parent_pages": [{"page": "http://<example>.rossum.app/api/v1/pages/141", "rotation_deg": 90}], "target_queue": "https://<example>.rossum.app/api/v1/queues/23"}], "stop_parent": true}'
{
  "results": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/320551",
      "annotation": "https://<example>.rossum.app/api/v1/annotations/320221"
    },
    {
      "document": "https://<example>.rossum.app/api/v1/documents/320552",
      "annotation": "https://<example>.rossum.app/api/v1/annotations/320222"
    }
  ]
}

Join of two child documents (784, 785, each with one page) into single new document.

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/111/edit_pages' -d \
  '{"edit": [{"parent_pages": [{"page":"https://<example>.rossum.app/api/v1/pages/1088", "rotation_deg": 0}, {"page": "https://<example>.rossum.app/api/v1/pages/1089", "rotation_deg": 0}], "document_name": "joined_pages.pdf"}],"delete": ["https://<example>.rossum.app/api/v1/annotations/784", "https://<example>.rossum.app/api/v1/annotations/785"]}'
{
  "results": [
       {"document": "https://<example>.rossum.app/api/v1/documents/320551","annotation": "https://<example>.rossum.app/api/v1/annotations/786"}
  ]
}

Move one child document into different queue.

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/111/edit_pages' -d \
  '{"move": [{"annotation": "https://<example>.rossum.app/api/v1/annotations/784", "target_queue": "https://<example>.rossum.app/api/v1/queues/23"}]}'
{
  "results": [
       {"document": "https://<example>.rossum.app/api/v1/documents/320551","annotation": "https://<example>.rossum.app/api/v1/annotations/784"}
  ]
}

Delete one child document.

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/111/edit_pages' -d \
  '{"delete": ["https://<example>.rossum.app/api/v1/annotations/784"]}'
{
  "results": []
}

POST /v1/annotations/{parent_id}/edit_pages

Edit document pages, split and re-split already split document.

When using this endpoint, status of the original annotation (when not editing existing split) is switched to split, status of the newly created annotations is importing and the extraction phase begins over again.

This endpoint can be used for splitting annotations also from webhook listening to annotation_content.initialize event and action.

Key Type Description
delete list[URL] Optional list of urls of child annotations to delete.
move list[object] Optional list of Move objects.
edit list[object] Optional list of Edit objects.
stop_reviewing list[URL] Optional list of urls of child annotations to stop reviewing. Must be in reviewing state.
stop_parent boolean Stop also parent annotation. Optional, default true.
edit_data_source String Optional source of edit data. Either automation, suggest, modified_suggest or manual.
processing_duration object Optional processing_duration object.

The Move object has the following keys:

Key Type Description
annotation URL URL of annotation.
target_queue URL URL of target queue.

The Edit object has the following keys:

Key Type Description
annotation URL Optional URL of annotation.
target_queue URL Optional URL of target queue.
document_name String Optional document name. When not provided, generated automatically.
parent_pages list[object] List of parent pages with rotation.
metadata object Metadata object. May contain objects annotation and metadata which are saved in created/edited annotation/document metadata.

The Parent page object has the following keys:

Key Type Description Required Default value
page URL URL of page. yes
rotation_deg int Rotation angle in degrees with a step of 90 degrees no 0

Response

Status: 200 on success.

Returns results with a list of objects:

Key Type Description
document URL URL to the document that was newly created after calling the edit endpoint.
annotation URL URL of the annotation assigned to the document.

Status: 400 when preconditions are not met.

Edit pages in-place

Edit pages of document and move to different queue.

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/111/edit_pages/in_place' -d \
  '{"parent_pages": [{"page": "http://<example>.rossum.app/api/v1/pages/142", "rotation_deg": 90}], "target_queue": "https://<example>.rossum.app/api/v1/queues/23"}'
{
  "results": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/2121",
      "annotation": "https://<example>.rossum.app/api/v1/annotations/111"
    }
  ]
}

POST /v1/annotations/{parent_id}/edit_pages/in_place

Edit existing document pages without creating new annotations. You can rotate pages, delete pages or move the annotation into another queue. This endpoint can be used for the embedded mode.

Key Type Description
parent_pages list[object] List of parent pages with rotation.
target_queue URL Optional URL of target queue.
metadata object Optional metadata object. May contain objects annotation and metadata which are saved in created/edited annotation/document metadata.
edit_data_source String Optional source of edit data. Either automation, suggest, modified_suggest or manual.
processing_duration object Optional processing_duration object.

The Parent page object has the following keys:

Key Type Description
page URL URL of page.
rotation_deg int Rotation angle in deg. with step 90 deg.

Response

Status: 200 on success.

Returns results with a list of objects:

Key Type Description
document URL URL to the document that was newly created after calling the edit endpoint.
annotation URL URL of the annotation assigned to the document.

Status: 400 when preconditions are not met.

Search for text

Search for text in annotation 319668

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/319668/search?phrase=some'
{
  "results": [
    {
      "rectangle": [
        67.15157010915198,
        545.9286363906203,
        87.99106633081445,
        563.4617583852776
      ],
      "page": 1
    },
    {
      "rectangle": [
        45.27717884130982,
        1060.3084761056693,
        66.11667506297229,
        1077.8415981003266
      ],
      "page": 1
    }
  ],
  "status": "ok"
}

GET /v1/annotations/{id}/search

Search for a phrase in the document.

Argument Type Description
phrase string A phrase to search for
tolerance integer Allowed Edit distance from the search phrase (number of removal, insertion or substitution operations that need to be performed for strings to match). Only used for OCR invoices (images, such as png or PDF with scanned images). Default value is computed as length(phrase)/4.

Response

Status: 200

Returns results with a list of objects:

Key Type Description
rectangle list[float] Bounding box of an occurrence.
page integer Page of occurrence.

Search for annotations

Supported ordering: id, arrived_at, assigned_at, assignees, automated, confirmed_at, confirmed_by__username, confirmed_by, created_at, creator__username, creator, deleted_at, deleted_by__username, deleted_by, document, exported_at, exported_by__username, exported_by, export_failed_at, has_email_thread_with_new_replies, has_email_thread_with_replies, labels, modified_at, modifier__username, modifier, original_file_name, purged_at, purged_by__username, purged_by, queue, rejected_at, rejected_by__username, rejected_by, relations__key, relations__parent, relations__type, rir_poll_id, status, workspace, email_thread, email_sender, field.<schema_id>.<format> (where format is one of number, date, string).

Obtain only annotations matching a complex filter

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Type:application/json' \
  -d '{"query": {"$and": [{"field.vendor_name.string": {"$eq": "ACME corp"}}, {"labels": {"$in": ["https://<example>.rossum.app/api/v1/labels/12", "https://<example>.rossum.app/api/v1/labels/34"]}}]}, "query_string": {"string": "explosives"}}' \
  'https://<example>.rossum.app/api/v1/annotations/search?ordering=status,confirmed_by__username,field.amount_total.number'
{
  "pagination": {
    "total": 101,
    "total_pages": 6,
    "next": "https://<example>.rossum.app/api/v1/annotations/search?search_after=eyJxdWVyeV9oYXNoIjogImM2ZWIzNjA5MDI1NWNmNTg4ODk0YWE5MGZiMjVmZjBlIiwgInNlYXJjaF9hZnRlciI6IFsxNTg2NTMwMzI0MDAwLCAyXSwgInJldmVyc2VkIjogZmFsc2V9%3A1NYBmgNCV-Ssmf7G9rd9vXnBY-BuvCZWrD95wcb2jIg",
    "previous": null
  },
  "results": [
    {
      "url": "https://<example>.rossum.app/api/v1/annotations/315777",
      "content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
      "document": "https://<example>.rossum.app/api/v1/documents/315877",
      ...
    }
  ]
}

POST /v1/annotations/search

Search for annotations matching a complex filter

Key Type Description
query object A subset of MongoDB Query Language (see query definition below)
query_string object Object with configuration for full-text search (see query string definition below)

If query_string is used together with query, search is done as a conjunction of these expressions (query_string AND query).

Search Query

A list of definitions under a $and key:

Key Type Description
<meta_field> object Matches against annotation metadata according to <meta_field>. (See definition below)
field.<schema_id>.<type> object Matches against annotation content value according to <schema_id> treating it as <type>. (See definition below)

field.<schema_id>.type is of type: string | number | date (in ISO 8601 format). Max. 256 characters long strings are allowed.

meta_field can be one of:

Meta field name Type
annotation URL
arrived_at date
assigned_at date
assignees URL
automated bool
automatically_rejected bool
confirmed_at date
confirmed_by__username string
confirmed_by URL
created_at date
creator__username string
creator URL
deleted_at date
deleted_by__username string
deleted_by URL
document URL
exported_at date
exported_by__username string
exported_by URL
has_email_thread_with_new_replies bool
has_email_thread_with_replies bool
labels URL
messages string
modified_at date
modifier__username string
modifier URL
original_file_name string
purged_at date
purged_by__username string
purged_by URL
queue URL
rejected_at date
rejected_by__username string
rejected_by URL
relations__key string
relations__parent URL
relations__type string
restricted_access bool
rir_poll_id string
status string
workspace URL
email_thread URL
email_sender string
Search Query Objects
Key Type Description
$startsWith string Matches the start of a value. Must be at least 2 characters long.
$anyTokenStartsWith string Matches the start of each token within a string. Must be at least 2 characters long.
$containsPrefixes string Same as $anyTokenStartsWith but query is split into tokens (words). Must be at least 2 characters long. Example query quick brown matches quick brown fox but also brown quick dog or quickiest brown fox, but not quick dog.
$emptyOrMissing bool Matches values that are empty or missing. When false, matches existing non-empty values.
$eq | $ne number | string | date | URL Default MQL behavior
$gt | $lt | $gte | $lte number | string | date Default MQL behavior
$in | $nin list[number | string | URL] Default MQL behavior

Related objects can be sideloaded and query fields can be used in the same way as when listing annotations.

Response

Status: 200

Returns paginated response with a list of annotation objects, like annotations list

Search Query String

Obtain only annotations matching prefix explosive

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Type:application/json' \
  -d '{"query_string": {"string": "expl"}}' \
  'https://<example>.rossum.app/api/v1/annotations/search?ordering=status,confirmed_by__username,field.amount_total.number'
{
  "pagination": {
    "total": 101,
    "total_pages": 6,
    "next": "https://<example>.rossum.app/api/v1/annotations/search?search_after=eyJxdWVyeV9oYXNoIjogImM2ZWIzNjA5MDI1NWNmNTg4ODk0YWE5MGZiMjVmZjBlIiwgInNlYXJjaF9hZnRlciI6IFsxNTg2NTMwMzI0MDAwLCAyXSwgInJldmVyc2VkIjogZmFsc2V9%3A1NYBmgNCV-Ssmf7G9rd9vXnBY-BuvCZWrD95wcb2jIg",
    "previous": null
  },
  "results": [
    {
      "url": "https://<example>.rossum.app/api/v1/annotations/315777",
      "content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
      "document": "https://<example>.rossum.app/api/v1/documents/315877",
      ...
    }
  ]
}

Apply full-text search to datapoint values using a chosen term. The value is searched by its prefix, separately for each term separated by whitespace, in case-insensitive way. Special characters at the end of the strings are ignored. For example, when searching for a term Large drink, all of the following values passed would give a match: lar#, lar dri, dri. We search also in the non-extracted page data, if the data are available.

If query_string is used together with query, search is done as a conjunction of these expressions (query_string AND query).

Key Type Description
string string String to be used for full-text search. At least 2 characters need to be passed to apply this search. Max. 256 characters long strings are allowed.
Annotation search pagination

Pagination is set by query parameters of the URL. Request body and ordering mustn't be changed when listing through pages, otherwise 400 response is returned.

Key Default Type Description
page_size 20 int Number of results per page. The maximum value is 500 (*)
search_after null string Encoded value acting as a cursor (do not try to modify, only for internal purposes).

(*) For requests that sideload content, the maximum value is limited to 100. Sideloading content for this endpoint is deprecated and will be removed in the near future.

Convert grid to table data

Convert grid to tabular data in annotation 319623

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/319623/content/37507202/transform_grid_to_datapoints'

POST /v1/annotations/{id}/content/{id of the child node}/transform_grid_to_datapoints

Transform grid structure to tabular data of related multivalue object.

Response

Status: 200

All tuple datapoints and their children are returned.

Add new row to multivalue datapoint

Add row to annotation 319623 multivalue 37507202

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/319623/content/37507202/add_empty'

POST /v1/annotations/{id}/content/{id of the child node}/add_empty

Adds a row to a multivalue table. This row will not be connected to the grid and modifications of the grid will not trigger any OCR on the cells of this row.

Response

Status: 200

Validate annotation content

Validate the content of annotation 319623

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Type:application/json' -d '{"updated_datapoint_ids": [37507204]}' \
  'https://<example>.rossum.app/api/v1/annotations/319623/content/validate'
{
  "messages": [
    {
      "id": "1038654",
      "type": "error",
      "content": "required",
      "detail": {
        "hook_id": "42345",
        "hook_name": "Webhook 8365",
        "request_id": "6166deb3-2f89-4fc2-9359-56cc8e3838e4",
        "is_exception": true,
        "timestamp": "2022-10-10T15:00:00.000000Z"
      }
    },
    {
      "id": "all",
      "type": "error",
      "content": "Whole document is invalid.",
      "detail": {
        "hook_id": "94634",
        "hook_name": "Function 4934",
        "request_id": "5477aeb2-8f43-3fe1-9279-23bc8e4121e5",
        "is_exception": true,
        "timestamp": "2022-10-10T15:00:00.000000Z"
      }
    },
    {
      "id": "1038456",
      "type": "aggregation",
      "content": "246.456",
      "aggregation_type": "sum",
      "schema_id": "vat_detail_tax2"
    }
  ],
  "updated_datapoints": [
    {
      "id": 37507205,
      "url": "https://<example>.rossum.app/api/v1/annotations/319623/content/37507205",
      "content": {
        "value": "new value",
        "page": 1,
        "position": [
          0.0,
          1.0,
          2.0,
          3.0
        ],
        "rir_text": null,
        "rir_page": null,
        "rir_position": null,
        "rir_confidence": null,
        "connector_position": [
          0.0,
          1.0,
          2.0,
          3.0
        ],
        "connector_text": "new value"
      },
      "category": "datapoint",
      "schema_id": "vat_rate",
      "validation_sources": [
        "connector",
        "history"
      ],
      "time_spent": 0.0,
      "time_spent_overall": 0.0,
      "options": [
        {
          "value": "value",
          "label": "label"
        }
      ],
      "hidden": false
    }
  ],
  "suggested_operations": [
    {
      "op": "replace",
      "id": "198143",
      "value": {
        "content": {
          "value": "John",
          "position": [
            103,
            110,
            121,
            122
          ],
          "page": 1
        },
        "hidden": false,
        "options": [],
        "validation_sources": [
          "human"
        ]
      }
    },
    {
      "op": "remove",
      "id": "884061"
    }
  ],
  "matched_trigger_rules": [
    {
      "type": "page_count",
      "value": 24,
      "threshold": 10
    },
    {
      "type": "filename",
      "value": "spam.pdf",
      "regex": "^spam.*"
    },
    {
      "id": 198143,
      "value": "foobar",
      "type": "datapoint"
    }
  ]
}

POST /v1/annotations/{id}/content/validate

Validate the content of an annotation. At first, the content is sent to the validate hook of connected extension. Then some standard validations (data type, constraints are checked) are carried out in Rossum. Additionally, if the annotation's respective queue has enabled delete recommendation conditions, they are evaluated as well.

Key Type Description
actions list[enum] Validation actions. Possible values : ["user_update"], ["user_update", "updated"] or ["user_update", "started"] (default: ["user_update"])
updated_datapoint_ids list[int] List of IDs of datapoints that were changed since last call of this endpoint.

Response

Status: 200

Key Type Description
messages list[object] Bounding box of an occurrence.
updated_datapoints list[object] Page of occurrence.
suggested_operations list[object] Datapoint operations suggested as a result of validation.
matched_trigger_rules list[object] Delete Recommendation rules that matched.
Messages

The message object contains attributes:

Key Type Description
id string ID of the concerned datapoint; "all" for a document-wide issues
type enum One of: error, warning, info or aggregation.
content string A message shown in UI. Limited to 4096 characters.
detail object Detail object that enhances the response from a hook.
aggregation_type (*) enum Type of aggregation (currently supported "sum" aggregation type).
schema_id (*) string Identifier of schema datapoint for which is aggregation computed.

(*) Attribute present only in message with type "aggregation".

Message detail

The message detail object is present only in annotation_content hook events responses and contains following attributes:

Key Type Description
hook_id int ID of the responding hook.
hook_name string Name of the responding hook.
request_id string ID of the request preceding this hook's response.
is_exception bool Flag signaling non-200 response from the hook.
timestamp string Timestamp of the request preceding this hook's response.
Updated datapoints

The updated datapoint object contains the subtrees of datapoints updated from an extension.

Suggested operations

The suggestions follow the same format as the one that can be specified in requests - please refer to the annotation data API for a complete description.

Matched trigger rules

The base of the response looks like this, the remaining fields depend on the "type" field and are prone to change.

Key Type Description
type string One of "page_count", "filename", "datapoint".

List all annotations

List all annotations

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations'
{
  "pagination": {
    "total": 22,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/315877",
      "id": 315777,
      "queue": "https://<example>.rossum.app/api/v1/queues/8236",
      "schema": "https://<example>.rossum.app/api/v1/schemas/31336",
      "pages": [
        "https://<example>.rossum.app/api/v1/pages/561206"
      ],
      "creator": "https://<example>.rossum.app/api/v1/users/1",
      "modifier": null,
      "modified_by": null,
      "assigned_at": null,
      "created_at": "2021-04-26T10:08:03.856648Z",
      "confirmed_at": null,
      "deleted_at": null,
      "exported_at": null,
      "export_failed_at": null,
      "modified_at": null,
      "purged_at": null,
      "rejected_at": null,
      "confirmed_by": null,
      "deleted_by": null,
      "exported_by": null,
      "purged_by": null,
      "rejected_by": null,
      "status": "to_review",
      "rir_poll_id": "54f6b9ecfa751789f71ddf12",
      "messages": null,
      "url": "https://<example>.rossum.app/api/v1/annotations/315777",
      "content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
      "time_spent": 0,
      "metadata": {},
      ...
    },
    {
      ...
    }
  ]
}

GET /v1/annotations

Retrieve all annotation objects.

Supported ordering: document, document__arrived_at, document__original_file_name, modifier, modifier__username, modified_by, modified_by__username, creator, creator__username,queue, status, created_at, assigned_at,confirmed_at, modified_at, exported_at, export_failed_at, purged_at, rejected_at, deleted_at, confirmed_by, deleted_by, exported_by, purged_by, rejected_by, confirmed_by__username, deleted_by__username, exported_by__username, purged_by__username, rejected_by__username

Filters

Obtain only annotations with parent annotation 1500

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations?relations__parent=1500'
{
  "pagination": {
    "total": 2,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/2",
      "id": 2,
      "queue": "https://<example>.rossum.app/api/v1/queues/1",
      "schema": "https://<example>.rossum.app/api/v1/schemas/1",
      "relations": [
        "https://<example>.rossum.app/api/v1/relations/1"
      ],
      ...
      "url": "https://<example>.rossum.app/api/v1/annotations/2",
      ...
    },
    {
      "document": "https://<example>.rossum.app/api/v1/documents/3",
      "id": 3,
      "queue": "https://<example>.rossum.app/api/v1/queues/2",
      "schema": "https://<example>.rossum.app/api/v1/schemas/2",
      "relations": [
        "https://<example>.rossum.app/api/v1/relations/1"
      ],
      ...
      "url": "https://<example>.rossum.app/api/v1/annotations/3",
      ...
    }
  ]
}

Filters may be specified to limit annotations to be listed.

Attribute Description
status Annotation status, multiple values may be separated using a comma
id List of ids separated by a comma
modifier User id
confirmed_by User id
deleted_by User id
exported_by User id
purged_by User id
rejected_by User id
assignees User id, multiple values may be separated using a comma
labels Label id, multiple values may be separated using a comma
document Document id
queue List of queue ids separated by a comma
queue__workspace List of workspace ids separated by a comma
relations__parent ID of parent annotation defined in related Relation object
relations__type Type of Relation that annotation belongs to
relations__key Key of Relation that annotation belongs to
arrived_at_before ISO 8601 timestamp (e.g. arrived_at_before=2019-11-15)
arrived_at_after ISO 8601 timestamp (e.g. arrived_at_after=2019-11-14)
assigned_at_before ISO 8601 timestamp (e.g. assigned_at_before=2019-11-15)
assigned_at_after ISO 8601 timestamp (e.g. assigned_at_after=2019-11-14)
confirmed_at_before ISO 8601 timestamp (e.g. confirmed_at_before=2019-11-15)
confirmed_at_after ISO 8601 timestamp (e.g. confirmed_at_after=2019-11-14)
modified_at_before ISO 8601 timestamp (e.g. modified_at_before=2019-11-15)
modified_at_after ISO 8601 timestamp (e.g. modified_at_after=2019-11-14)
deleted_at_before ISO 8601 timestamp (e.g. deleted_at_before=2019-11-15)
deleted_at_after ISO 8601 timestamp (e.g. deleted_at_after=2019-11-14)
exported_at_before ISO 8601 timestamp (e.g. exported_at_before=2019-11-14 22:00:00)
exported_at_after ISO 8601 timestamp (e.g. exported_at_after=2019-11-14 12:00:00)
export_failed_at_before ISO 8601 timestamp (e.g. export_failed_at_before=2019-11-14 22:00:00)
export_failed_at_after ISO 8601 timestamp (e.g. export_failed_at_after=2019-11-14 12:00:00)
purged_at_before ISO 8601 timestamp (e.g. purged_at_before=2019-11-15)
purged_at_after ISO 8601 timestamp (e.g. purged_at_after=2019-11-14)
rejected_at_before ISO 8601 timestamp (e.g. rejected_at_before=2019-11-15)
rejected_at_after ISO 8601 timestamp (e.g. rejected_at_after=2019-11-14)
restricted_access Boolean
automated Boolean
has_email_thread_with_replies Boolean (related email thread contains more than one incoming emails)
has_email_thread_with_new_replies Boolean (related email thread contains unread incoming email)
search String, see Annotation search

If this filter is used, annotations are filtered based on full-text search in annotation's datapoint values, original file name, modifier user email and messages. Max. 256 characters allowed.

Query fields

Obtain only subset of annotation attributes

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations?fields=id,url'
{
  "pagination": {
    "total": 22,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 320332,
      "url": "https://<example>.rossum.app/api/v1/annotations/320332"
    },
    {
      "id": 319668,
      "url": "https://<example>.rossum.app/api/v1/annotations/319668"
    },
    ...
  ]
}

In order to obtain only subset of annotation object attributes, one can use query parameter fields.

Argument Description
fields Comma-separated list of attributes to be included in the response.
fields! Comma-separated list of attributes to be excluded from the response.

Sideloading

Sideload documents, modifiers and content

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations?sideload=modifiers,documents,content&content.schema_id=item_amount_total'
{
  "pagination": {
    "total": 22,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/320432",
      "id": 320332,
      ...,
      "modifier": "https://<example>.rossum.app/api/v1/users/10775",
      "status": "to_review",
      "rir_poll_id": "a898b6bdc8964721b38e0160",
      "messages": null,
      "url": "https://<example>.rossum.app/api/v1/annotations/320332",
      "content": "https://<example>.rossum.app/api/v1/annotations/320332/content",
      "time_spent": 0,
      "metadata": {}
    },
    ...
  ],
  "documents": [
    {
      "id": 320432,
      "url": "https://<example>.rossum.app/api/v1/documents/320432",
      ...
    },
    ...
  ],
  "modifiers": [
    {
      "id": 10775,
      "url": "https://<example>.rossum.app/api/v1/users/10775",
      ...
    },
    ...
  ],
  "content": [
    {
      "id": 19434,
      "url": "https://<example>.rossum.app/api/v1/annotations/320332/content/19434",
      "category": "datapoint",
      "schema_id": "item_amount_total",
      ...
    }
    ...
  ]
}

Sideload content filtered by schema_id

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations?sideload=content&content.schema_id=sender_id,vat_detail_tax'
{
  "pagination": {
    "total": 22,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "document": "https://<example>.rossum.app/api/v1/documents/320432",
      "id": 320332,
      ...,
      "modifier": "https://<example>.rossum.app/api/v1/users/10775",
      "status": "to_review",
      "rir_poll_id": "a898b6bdc8964721b38e0160",
      "messages": null,
      "url": "https://<example>.rossum.app/api/v1/annotations/320332",
      "content": "https://<example>.rossum.app/api/v1/annotations/320332/content",
      "time_spent": 0,
      "metadata": {}
    },
    ...
  ],
  "content": [
    {
      "id": 15984,
      "url": "https://<example>.rossum.app/api/v1/annotations/320332/content/15984",
      "category": "datapoint",
      "schema_id": "sender_id",
      ...
    },
    {
      "id": 15985,
      "url": "https://<example>.rossum.app/api/v1/annotations/320332/content/15985",
      "category": "datapoint",
      "schema_id": "vat_detail_tax",
      ...
    },
    ...
  ]
}

In order to decrease the number of requests necessary for obtaining useful information about annotations, modifiers and documents can be sideloaded using query parameter sideload. This parameter accepts comma-separated list of keywords: assignees, automation_blockers, confirmed_bys, content, deleted_bys, documents, emails, exported_bys, labels, modifiers, notes, organizations, pages, purged_bys, queues, rejected_bys, related_emails, relations, child_relations, schemas, suggested_edits, workspaces. The response is then enriched by the requested keys, which contain lists of the sideloaded objects. Sideloaded content can be filtered by schema_id to obtain only a subset of datapoints in content part of response, but is a deprecated feature and will be removed in the future. Filter on content can be specified using query parameter content.schema_id that accepts comma-separated list of required schema_ids.

Response

Status: 200

Returns paginated response with a list of annotation objects.

Create an annotation

Create an annotation

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -d '{"status": "created", "document": "https://<example>.rossum.app/api/v1/documents/315877", "queue": "https://<example>.rossum.app/api/v1/queues/8236", "content_data": [{category: "datapoint", schema_id: "doc_id", content: {value: "122"}, "validation_sources": []}], "values": {}, "metadata": {}}' \
  'https://<example>.rossum.app/api/v1/annotations'
{
  "document": "https://<example>.rossum.app/api/v1/documents/315877",
  "id": 315777,
  "queue": "https://<example>.rossum.app/api/v1/queues/8236",
  "schema": "https://<example>.rossum.app/api/v1/schemas/31336",
  "pages": [
    "https://<example>.rossum.app/api/v1/pages/561206"
  ],
  "creator": "https://<example>.rossum.app/api/v1/users/1",
  "modifier": null,
  "modified_by": null,
  "assigned_at": null,
  "created_at": "2021-04-26T10:08:03.856648Z",
  "confirmed_at": null,
  "deleted_at": null,
  "exported_at": null,
  "modified_at": null,
  "purged_at": null,
  "rejected_at": null,
  "confirmed_by": null,
  "deleted_by": null,
  "exported_by": null,
  "purged_by": null,
  "rejected_by": null,
  "status": "created",
  "rir_poll_id": null,
  "messages": null,
  "url": "https://<example>.rossum.app/api/v1/annotations/315777",
  "content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
  "time_spent": 0,
  "metadata": {},
  "related_emails": [],
  "email": null,
  ...
}

POST /v1/annotations

Create an annotation object.

Normally you create annotations via the upload endpoint.

This endpoint could be used for creating annotation instances including their content and with status set to an explicitly requested value. Currently only created is supported which is not touched by the rest of the platform and is not visible via the Rossum UI. This allows for subsequent updates before switching the status to importing so that it is passed through the rest of the upload pipeline.

The use-case for this is the upload.created hook event where new annotations could be created and the platform runtime then switches all such annotations' status to importing.

Key type Description Required
status enum Requested annotation status. Only created is currently supported. Yes
document URL Annotation document. Yes
queue URL Target queue. Yes
content_data list[object] Array of annotation data content objects. No
values object Values object as described in upload endpoint. No
metadata object Client data. No

Response

Status: 200

Returns annotation object.

Retrieve an annotation

Get annotation object 315777

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/315777'
{
  "document": "https://<example>.rossum.app/api/v1/documents/315877",
  "id": 315777,
  "queue": "https://<example>.rossum.app/api/v1/queues/8236",
  "schema": "https://<example>.rossum.app/api/v1/schemas/31336",
  "pages": [
    "https://<example>.rossum.app/api/v1/pages/561206"
  ],
  "creator": "https://<example>.rossum.app/api/v1/users/1",
  "modifier": null,
  "modified_by": null,
  "assigned_at": null,
  "created_at": "2021-04-26T10:08:03.856648Z",
  "confirmed_at": null,
  "deleted_at": null,
  "exported_at": null,
  "export_failed_at": null,
  "modified_at": null,
  "purged_at": null,
  "rejected_at": null,
  "confirmed_by": null,
  "deleted_by": null,
  "exported_by": null,
  "purged_by": null,
  "rejected_by": null,
  "status": "to_review",
  "rir_poll_id": "54f6b9ecfa751789f71ddf12",
  "messages": null,
  "url": "https://<example>.rossum.app/api/v1/annotations/315777",
  "content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
  "time_spent": 0,
  "metadata": {},
  "related_emails": [],
  "email": null,
  ...
}

GET /v1/annotations/{id}

Get an annotation object.

Response

Status: 200

Returns annotation object.

Update an annotation

Update annotation object 315777

curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"document": "https://<example>.rossum.app/api/v1/documents/315877", "queue": "https://<example>.rossum.app/api/v1/queues/8236", "status": "postponed"}' \
  'https://<example>.rossum.app/api/v1/annotations/315777'
{
  "document": "https://<example>.rossum.app/api/v1/documents/315877",
  "id": 315777,
  "queue": "https://<example>.rossum.app/api/v1/queues/8236",
  ...
  "status": "postponed",
  "rir_poll_id": "a898b6bdc8964721b38e0160",
  "messages": null,
  "url": "https://<example>.rossum.app/api/v1/annotations/315777",
  "content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
  "time_spent": 0,
  "metadata": {},
  "related_emails": [],
  "email": null
}

PUT /v1/annotations/{id}

Update annotation object.

Response

Status: 200

Returns updated annotation object.

Update part of an annotation

Update status of annotation object 315777

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"status": "deleted"}' \
  'https://<example>.rossum.app/api/v1/annotations/315777'
{
  "document": "https://<example>.rossum.app/api/v1/documents/315877",
  "id": 315777,
  ...
  "status": "deleted",
  "rir_poll_id": "a898b6bdc8964721b38e0160",
  "messages": null,
  "url": "https://<example>.rossum.app/api/v1/annotations/315777",
  "content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
  "time_spent": 0,
  "metadata": {},
  "related_emails": [],
  "email": null
}

PATCH /v1/annotations/{id}

Update part of annotation object.

Response

Status: 200

Returns updated annotation object.

Copy annotation

Copy annotation 315777 to a queue 8236

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"target_queue": "https://<example>.rossum.app/api/v1/queues/8236", "target_status": "to_review"}' \
  'https://<example>.rossum.app/api/v1/annotations/315777/copy'
{
  "annotation": "https://<example>.rossum.app/api/v1/annotations/320332"
}

POST /v1/annotations/{id}/copy

Make a copy of annotation in another queue. All data and metadata are copied.

Key Description
target_queue URL of queue, where the copy should be placed.
target_status Status of copied annotation (if not set, it stays the same)

If you want to directly reimport the copied annotation, you can use reimport=true query parameter (such annotation will be billed).

Response

Status: 200

Returns URL of the new annotation object.

Delete annotation

Delete annotation 315777

curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/315777'

DELETE /v1/annotations/{id}

Delete an annotation object from the database. It also deletes the related page objects.

Never call this internal API, mark the annotation as deleted instead.

Response

Status: 204

Get suggested email recipients

Get 315777 and 78590 annotations suggested email recipients

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"annotations": ["https://<example>.rossum.app/api/v1/annotations/315777", https://<example>.rossum.app/api/v1/annotations/78590]' \
  'https://<example>.rossum.app/api/v1/annotations/suggested_recipients'
{
  "results": [
    {
      "source": "email_header",
      "email": "don.joe@corp.us",
      "name": "Don Joe"
    },
    ...
  ]
}

POST /v1/annotations/suggested_recipients

Retrieves annotations suggested email recipients depending on Queues suggested recipients settings.

Response

Status: 200

Returns a list of source objects.

Suggested recipients source object

Parameter Description
source Specifies where the email is found, see possible sources
email Email address of the suggested recipient
name Name of the suggested recipient. Either a value from an email header or a value from parsing the email address

Purge deleted annotations

Purge deleted annotations from queue 42

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"queue": "https://<example>.rossum.app/api/v1/queues/42"}' \
  'https://<example>.rossum.app/api/v1/annotations/purge_deleted'

POST /v1/annotations/purge_deleted

Start the asynchronous process of purging customer's data related to selected annotations with deleted status. The following operations will happen:

Key Type Required Description
annotations list[URL] false List of annotations to be purged
queue URL false Queue of which the annotations should be purged.

At least one of annotations, queue fields must be filled in. The resulting set of annotations is the disjunction of queue and annotations filter.

Response

Status: 202

This is an asynchronous endpoint, status of annotations is changed to purged and related objects are gradually being deleted.

Annotation time spent

Time spent information can be optionally passed along the following annotation endpoints: cancel, confirm, delete, edit, postpone, reject.

Confirm annotation 315777 and also update time spent data

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -d '{"processing_duration": {"time_spent_active": 10.0, "time_spent_overall": 20.0, "time_spent_edit": 1.0, "time_spent_blockers": 2.0, "time_spent_emails": 3.0, "time_spent_opening": 1.5}}' \
  'https://<example>.rossum.app/api/v1/annotations/315777/confirm'

POST /v1/annotations/{id}/cancel

POST /v1/annotations/{id}/confirm

POST /v1/annotations/{id}/delete

POST /v1/annotations/{id}/edit

POST /v1/annotations/{id}/postpone

POST /v1/annotations/{id}/reject

See annotation processing duration object.

Get page spatial data

Get spatial data for two first pages of annotation 1421

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://example.app.rossum.ai/api/v1/annotations/1421/page_data?granularity=words&page_numbers=1,2'
{
    "results": [
      {
        "page_number": 1,
        "granularity": "words",
        "items": [
          {"position": [120,22,33,44] , "text": "full"},
          {"position": [180,22,33,44] , "text": "of"},
          {"position": [180,22,33,44] , "text": "eels"},
        ]
      },
      {
        "page_number": 2,
        "granularity": "words",
        "items": [
          {"position": [120,22,33,44] , "text": "it"},
          {"position": [180,22,33,44] , "text": "is"},
          {"position": [180,22,33,44] , "text": "scratched"},
        ]
      },
    ]
}

GET /v1/annotations/{id}/page_data

Get text content for every page, including position coordinates, considering granularity options like lines, words, characters, or complete page text content.

Query parameters:

Key Type Default Description Required
granularity str One of lines, words, chars, texts. Yes
page_numbers str First 20 pages of the document Comma separated page numbers. Max. 20 page numbers, if there is more, they are silently ignored. No

Response

Status: 200

Response result objects consist of following keys:

Key Type Description
page_number int Number of page.
granularity str One of lines, words, chars, texts.
items list[object] List of objects divided by the chosen granularity.

Items consist of following keys:

Key Type Description
position list[int] Coordinates of the item on the given page. In case of texts granularity, the result items objects are missing position key, since the text value is the full page text.
text str Value of the item.

Status: 404

If there are no spatial data available for the given annotation.

Annotation Data

Example annotation data

{
  "content": [
    {
      "id": 27801931,
      "url": "https://<example>.rossum.app/api/v1/annotations/319668/content/27801931",
      "children": [
        {
          "id": 27801932,
          "url": "https://<example>.rossum.app/api/v1/annotations/319668/content/27801932",
          "content": {
            "value": "2183760194",
            "normalized_value": "2183760194",
            "page": 1,
            "position": [
              761,
              48,
              925,
              84
            ],
            "rir_text": "2183760194",
            "rir_position": [
              761,
              48,
              925,
              84
            ],
            "connector_text": null,
            "rir_confidence": 0.99234
          },
          "category": "datapoint",
          "schema_id": "document_id",
          "validation_sources": [
            "score"
          ],
          "time_spent": 0,
          "time_spent_overall": 0,
          "hidden": false
        },
        {
          "id": 27801933,
          "url": "https://<example>.rossum.app/api/v1/annotations/319668/content/27801933",
          "content": {
            "value": "6/8/2018",
            "normalized_value": "2018-08-06",
            "page": 1,
            "position": [
              283,
              300,
              375,
              324
            ],
            "rir_text": "6/8/2018",
            "rir_position": [
              283,
              300,
              375,
              324
            ],
            "connector_text": null,
            "rir_confidence": 0.98279
          },
          "category": "datapoint",
          "schema_id": "date_issue",
          "validation_sources": [
            "score"
          ],
          "time_spent": 0,
          "time_spent_overall": 0,
          "hidden": false
        },
        {
          "id": 27801934,
          "url": "https://<example>.rossum.app/api/v1/annotations/319668/content/27801934",
          "content": null,
          "category": "datapoint",
          "schema_id": "email_button",
          "validation_sources": [
            "NA"
          ],
          "time_spent": 0,
          "time_spent_overall": 0,
          "hidden": false
        },
        ...
    }
  ]
}

Annotation data is used by the Rossum UI to display annotation data properly. Be aware that values in attribute value are not normalized (e.g. numbers, dates) and data structure may be changed to accommodate UI requirements.

Top level content contains a list of section objects. results is currently a copy of content and is deprecated.

Section objects:

Attribute Type Description Read-only
id int64 A unique ID of a given section. true
url URL URL of the section. true
schema_id string Reference mapping the object to the schema tree.
category string section
children list Array specifying objects that belong to the section.

Datapoint, multivalue and tuple objects:

Attribute Type Description Read-only
id int64 A unique ID of a given object. true
url URL URL of a given object. true
schema_id string Reference mapping the object to the schema tree.
category string Type of the object (datapoint, multivalue or tuple). true
children list Array specifying child objects. Only available for multivalue and tuple categories. true
content object (optional) A dictionary of the attributes of a given datapoint (only available for datapoint) see below for details. true
validation_sources list[object] Source of validation of the extracted data, see below.
time_spent float (optional) Time spent while actively working on a given node, in seconds.
time_spent_overall float (optional) Total time spent while validating a given node, in seconds. (only for internal purposes).
time_spent_grid float (optional) Total time spent while actively working on a grid, in seconds. Only available for multivalue category. (only for internal purposes).
time_spent_grid_overall float (optional) Total time spent while validating a given grid, in seconds. Only available for multivalue category. (only for internal purposes).
hidden bool If set to true, the datapoint is not visible in the user interface, but remains stored in the database.
no_recalculation bool If set to true, the datapoint's formula is not recalculated automatically. Only available for datapoint category editable formula datapoints. see below
grid object Specify grid structure, see below for details. Only allowed for multivalue object.

Time spent

Time spents on datapoint are in seconds and are stored on datapoint object, for category multivalue or datapoint. For time spent on the annotation level, see annotation processing duration.

Active time spent is stored in time_spent. Overall time spent is stored in time_spent_overall. Active time spent with an active magic grid is stored in time_spent_grid. Overall time spent with an active magic grid is stored in time_spent_grid_overall.

Measuring starts when an annotation is not in a read-only mode after selecting a datapoint.

Measuring ends when:

When a measuring ends time_spent of the previously selected datapoint is incremented by measured time_spent and the result is patched together with adding a human validation source to validation sources.

Content object

Can be null for datapoints of type button

Attribute Type Description Read-only
value string The extracted data of a given node. Maximum length: 1500 UTF characters.
normalized_value string Normalized value for date (in ISO 8601 format) and number fields (in JSON number format).
page int Number of page where the data is situated (see position).
position list List of the coordinates of the label box of the given node. (left, top, right, bottom)
rir_text string The extracted text, used as a reference for data extraction models. true
rir_raw_text string Raw extracted text (only for internal purposes, may be removed in the future). true
rir_page int The extracted page, used as a reference for data extraction models. true
rir_position list The extracted position, used as a reference for data extraction models. (left, top, right, bottom) true
rir_confidence float Confidence (estimated probability) that this field was extracted correctly. true
connector_text string Text set by the connector. true
connector_position list Position set by the connector. (left, top, right, bottom) true
ocr_text string Value extracted by OCR, if applicable. (only for internal purposes, may be removed in the future) true
ocr_raw_text string Raw value extracted by OCR, if applicable. (only for internal purposes, may be removed in the future) true
ocr_position string OCR position, if applicable. (left, top, right, bottom) (only for internal purposes, may be removed in the future) true

When both value and normalized_value is set, normalized_value is ignored on update.

Formula datapoints

For datapoint category fields which have their schema UI configuration's type property set to formula the datapoint content and attributes are being updated automatically based on the provided formula code.

For editable formula fields (i.e. the corresponding UI configuration's edit property is not set to disabled option) the automatic recalculation can be disabled by setting the datapoint no_recalculation flag to true. To re-enable the formula automatic recalculation set the no_recalculation flag to false.

Validation sources

validation_sources property is a list of sources that verified the extracted data. When the list is non-empty, datapoint is considered to be validated (and no eye-icon is displayed next to it in the Rossum UI).

Currently, these are the sources of validation:

Additional possible validation source value NA signs that validation sources are "Not Applicable" and may now occur only for button datapoints.

The list is subject to ongoing expansion.

Example multivalue datapoint object with a grid

{
  "id": 122852,
  "schema_id": "line_items",
  "category": "multivalue",
  "time_spent": 3.4,
  "time_spent_overall": 4.5,
  "time_spent_grid": 1.2,
  "time_spent_grid_overall": 2.3,
  "grid": {
    "parts": [
      {
        "page": 1,
        "columns": [
          {
            "left_position": 348,
            "schema_id": "item_description",
            "header_texts": ["Description"]
          },
          {
            "left_position": 429,
            "schema_id": "item_quantity",
            "header_texts": ["Qty"]
          }
        ],
        "rows": [
          {
            "top_position": 618,
            "tuple_id": null,
            "type": "header"
          },
          {
            "top_position": 649,
            "tuple_id": 123,
            "type": "data"
          }
        ],
        "width": 876,
        "height": 444
      }
    ]
  },
  ...
}

Grid object (for internal use only) is used to store table vertical and horizontal separators and related attributes. Every grid consists of zero or more parts.

Every part object consists of several attributes:

Attribute Type Description
page int A unique ID of a given object.
columns list[object] Description of grid columns.
rows list[object] Description of grid rows.
width float Total width of the grid.
height float Total height of the grid.

Every column contains attributes:

Attribute Type Description
left_position float Position of the column left edge.
schema_id string Reference to datapoint schema id. Used in grid-to-table conversion.
header_texts list[string] Extracted texts from column headers.

Every row contains attributes:

Attribute Type Description
top_position float Position of the row top edge.
tuple_id int Id of the corresponding tuple datapoint if it exists else null.
type string Row type. Allowed values are specified in the schema, see grid. If null, the row is ignored during grid-to-table conversion.

Currently, it is only allowed to have one part per page (for a particular grid).

Get the annotation data

Get annotation data of annotation 315777

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/315777/content'

GET /v1/annotations/{id}/content

Get annotation data.

Response

Status: 200

Returns annotation data.

Update annotation data

Update annotation data of annotation 315777

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"content": [{"category": "section", "schema_id": "invoice_details_section", "children": [{"category": "datapoint", "schema_id": "document_id", "content": {"value": "12345"}, "validation_sources": ["human"], "type": "string", "rir_confidence": 0.99}]}]}' \
  'https://<example>.rossum.app/api/v1/annotations/315777/content'
{
  "content": [
    {
      "category": "section",
      "schema_id": "invoice_details_section",
      "children": [
        {
          "category": "datapoint",
          "schema_id": "document_id",
          "value": "12345",
          "type": "string",
          "rir_confidence": 0.99
        }
      ]
    }
  ]
}

PATCH /v1/annotations/{id}/content

Update annotation data. The format is the same as for GET, datapoints missing in the uploaded content are preserved.

Response

Status: 200

Returns annotation data.

Bulk update annotation data

Example of body for bulk update of annotation data

{
  "operations": [
    {
      "op": "replace",
      "id": "198143",
      "value": {
        "content": {
          "value": "John",
          "position": [103, 110, 121, 122],
          "page": 1
        },
        "hidden": false,
        "options": [],
        "validation_sources": ["human"]
      }
    },
    {
      "op": "remove",
      "id": "884061"
    },
    {
      "op": "add",
      "id": "884060",
      "value": [
        {
          "schema_id": "item_description",
          "content": {
            "page": 1,
            "position": [162, 852, 371, 875],
            "value": "Bottle"
          }
        }
      ]
    }
  ]
}

POST /v1/annotations/{id}/content/operations

Allows to specify a sequence of operations that should be performed on particular datapoint objects.

To replace a datapoint value (or other supported attribute), use replace operation:

Key Type Description
op string Type of operation: replace
id integer Datapoint id
value object Updated data, format is the same as in Anotation Data. Only value(*), position, page, validation_sources, hidden and options attributes may be updated. Please note that value is parsed and formatted.

(*) normalized_value may also be specified. When both value and normalized_value are specified, they must match, otherwise datapoint won't be modified (this may be changed in the future).

Please note that section, multivalue and tuple should not be updated.

To add a new row into a table multivalue, use add operation:

Key Type Description
op string Type of operation: add
id integer Multivalue id (parent of new datapoint)
value list[object] Added row data. List of objects, format of the object is the same as in Anotation Data. schema_id attribute is required, only value, position, page, validation_sources, hidden and options attributes may be set.
validation_sources list[object] (optional) List of validation sources to set for all fields of the row by default (unless overriden in value). This allows easily adding rows without breaking automation. See the "Validation sources" section below.

The row will be appended to the current list of rows.

For simple multivalues, the add operation can be used to add one child datapoint:

Key Type Description
op string Type of operation: add
id integer Multivalue id (parent of new datapoint)
value object Updated data, format is the same as in Anotation Data. Only value(*), position, page, validation_sources, hidden and options attributes may be updated. Please note that value is parsed and formatted.

To remove a row from a multivalue, use remove operation:

Key Type Description
op string Type of operation: remove
id integer Datapoint id

Please note that only multivalue children datapoints may be removed.

Response

Status: 200

Returns annotation data.

Replace annotation data by OCR

Replace annotation data value by text extracted from a rectangle

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Type:application/json' -d '{"rectangle": [316.2, 533.9, 352.7, 556.5], "page": "https://<example>.rossum.app/api/v1/pages/12221"}' \
  'https://<example>.rossum.app/api/v1/annotations/319668/content/21233223/select"

POST /v1/annotations/{id}/content/{id of child node}/select

Replace annotation data by OCR extracted from the rectangle of the document page. Payload of the request:

Key Type Description
rectangle list[float] Bounding box of an occurrence.
page URL Page of occurrence.

When the rectangle size is unsuitable for OCR (any rectangle side is smaller than 4 px), rectangle is extended to cover the text that overlaps with the rectangle.

Response

Status: 200

Returns annotation data.

Grid operations

Update multiple grid parts and perform OCR on created and updated grids

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Type:application/json' -d '{"operations": [{"op": "update", "grid_index": 0, "grid": {"page": 1, "columns": [...], "rows": [...]}}]}' \
  'https://elis.rossum.ai/api/v1/annotations/319668/content/21233223/grid_operations"

POST /v1/annotations/{id}/content/{id of the multivalue}/grid_operations

This endpoint applies multiple operations on multiple grids for one multivalue and perform OCR if required, and update the multivalue with the resulting grid.

For update operation the position of the grid and its rows and columns can be changed, the column layout can be changed, but the row structure must be unchanged.

Payload of the request:

Key Type Description
operations list[object] List of operations to apply to the grid

Single operations:

Key Type Description Required
op str update or delete or create Yes
grid_index int Index of the grid, Yes
grid object New grid part For create and update operations

The operations are applied sequentially. The grid_index corresponds to the index of the grid parts when the operation is applied. Combining different types of operations is not supported.

Response

Status: 200

Returns updated multivalue content as a tree, with only updated datapoints.

Partial grid updates

Update a grid part and perform OCR on modified cell datapoints

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Type:application/json' -d '{"grid_index": 0, "grid": {"page": 1, "columns": [...], "rows": [...]}, "operations": {"columns": [{"op": "update", "schema_id": "vat_rate"}], "rows": [{"op": "delete", "tuple_id": 1256}]}' \
  'https://elis.rossum.ai/api/v1/annotations/319668/content/21233223/grid_parts_operations"

POST /v1/annotations/{id}/content/{id of the multivalue}/grid_parts_operations

Apply multiple operations on a grid and perform OCR on modified cell datapoints. Update the multivalue with the new grid.

Query parameters

Query parameter Type Default Required Description
full_response boolean false false Use this parameter to get all datapoints in the grid part in the response

Payload of the request:

Key Type Description
operations object Operations to apply to the grid
grid object Updated grid part
grid_index int Index of the grid part

Operations are grouped in rows operations and columns operations:

Key Type Description
rows list[object] List of row operations
columns list[object] List of column operations

Single operations must contain the following parameters:

Key Type Description
op str update or delete or create
row_index int Required for row update and row create operations
tuple_id int Id of the tuple datapoint, required for row delete and row update operations
schema_id int Id of the schema, required for column operations

Possible operations:

axis op required parameters OCR Result
columns update schema_id Yes Update column datapoints
columns delete schema_id No Set content to empty for column datapoints
rows create row_index Yes Insert a new row, create datapoints and perform OCR
rows update row_index, tuple_id Yes Update datapoints via OCR
rows delete tuple_id No Delete the tuple associated to this row

OCR is performed only for rows of extractable type as defined in the multivalue schema by row_types_to_extract, or by default for rows of type data only.

Response

Status: 200

Returns updated multivalue content as a tree. By default, only updated datapoints and updated grid are returned. Add ?full_response=true to the url to get in the response all the datapoints in this grid.

Send updated annotation data

Send feedback on annotation 315777

Start the annotation

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/315777/start'
{
  "annotation": "https://<example>.rossum.app/api/v1/annotations/315777",
  "session_timeout": "01:00:00"
}

Get the annotation data

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/315777/content'
{
  "id": 37507206,
  "url": "https://<example>.rossum.app/api/v1/annotations/315777/content/37507206",
  "content": {
    "value": "001",
    "page": 1,
    "position": [
      302,
      91,
      554,
      56
    ],
    "rir_text": "000957537",
    "rir_position": [
      302,
      91,
      554,
      56
    ],
    "connector_text": null,
    "rir_confidence": null
  },
  "category": "datapoint",
  "schema_id": "document_id",
  "validation_sources": [
    "human"
  ],
  "time_spent": 2.7,
  "time_spent_overall": 6.1,
  "hidden": false
  }

Patch the annotation

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' -d '{"content": {"value": "#INV00011", "position": [302, 91, 554, 56]}}' \
'https://<example>.rossum.app/api/v1/annotations/315777/content/37507206'
{
  "id": 37507206,
  "url": "https://<example>.rossum.app/api/v1/annotations/431694/content/39125535",
  "content": {
    "value": "#INV00011",
    "page": 1,
    "position": [
      302,
      91,
      554,
      56
    ],
    "rir_text": "",
    "rir_position": null,
    "rir_confidence": null,
    "connector_text": null
  },
  "category": "datapoint",
  "schema_id": "document_id",
  "validation_sources": [],
  "time_spent": 0,
  "time_spent_overall": 0,
  "hidden": false
}

Confirm the annotation

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/315777/confirm'

PATCH /v1/annotations/{id}/content/{id of the child node}

Update a particular annotation content node.

It is enough to pass just the updated attributes in the PATCH payload.

Response

Status: 200

Returns updated annotation data for the given node.

Annotation Processing Duration

Example annotation processing duration

{
  "annotation": "https://<example>.rossum.app/api/v1/annotations/1",
  "time_spent_active": 12.3,
  "time_spent_overall": 23.4,
  "time_spent_edit": 1.23,
  "time_spent_blockers": 2.34,
  "time_spent_emails": 3.45,
  "time_spent_opening": 4.56
}

Annotation processing duration stores additional time spent information for an Annotation.

Annotation processing duration object:

Attribute Type Description Read-only Optional
annotation URL Annotation that the processing duration is related to true
time_spent_active float Total active time spent on the annotation, in seconds true
time_spent_overall float Total time spent on the annotation, in seconds (same value as Annotation.time_spent) true
time_spent_edit float Time spent editing the annotation, in seconds true
time_spent_blockers float Time spent on annotation blockers, in seconds true
time_spent_emails float Time spent on emails, in seconds true
time_spent_opening float Time spent opening the annotation, in seconds true

Measuring of time spent starts after an annotation is successfully started and datapoints and schema for annotation are fetched.

Measuring ends when:

time_spent_overall is the total time spent on the annotation, time_spent_active is the same but measurement is stopped after 10 seconds of inactivity (no mouse movement nor key stroke or inactive tab).

Get the annotation processing duration

Get annotation processing duration of annotation 315777

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/annotations/315777/processing_duration'

GET /v1/annotations/{id}/processing_duration

Get annotation processing duration.

Response

Status: 200

Returns annotation processing duration.

Update annotation processing duration

Update annotation processing duration of annotation 315777

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"time_spent_active": 10.00, "time_spent_overall": 20.0, "time_spent_edit": 1.0, "time_spent_blockers": 2.0, "time_spent_emails": 3.0, "time_spent_opening": 1.5}' \
  'https://<example>.rossum.app/api/v1/annotations/315777/processing_duration'
{
  "annotation": "https://<example>.rossum.app/api/v1/annotations/315777",
  "time_spent_active": 10.0,
  "time_spent_overall": 20.0,
  "time_spent_edit": 1.0,
  "time_spent_blockers": 2.0,
  "time_spent_emails": 2.0,
  "time_spent_opening": 1.5
}

PATCH /v1/annotations/{id}/processing_duration

Update annotation processing duration.

Response

Status: 200

Returns annotation processing duration.

Audit log

Audit log represents a log record of actions performed by users.

Only admin or organization group admins can access the log records. Logs do not include records about changes made by Rossum representatives via internal systems. The log retention policy is set to 1 year.

Attribute Type Description
organization_id integer ID of the organization.
timestamp* str Timestamp of the log record.
username str Username of the user that performed the action.
object_id int ID of the object on which the action was performed.
object_type str Type of the object on which the action was performed.
action str Type of the action performed.
content object Detailed content of the action.

*Timestamp is of the ISO 8601 format with UTC timezone e.g. 2024-07-01T07:00:00.000000

content consists of the following elements:

Attribute Type Description
path str Partial URL path of the request.
method str Method of the request.
request_id str ID of the request. Use this when contacting Rossum support with any related questions.
status_code int Status code of the response.
details object Details about the request (if available). For most cases, this field will be {}.

details may include following attributes:

Attribute Type Description
groups list Name of the user roles that were sent (if sent) in a request on a user object.

List all audit logs

List all audit logs for update actions on user objects

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/audit_logs?object_type=user&action=update'
{
  "pagination": {
    "total": 1,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
      {
        "object_type": "user",
        "action": "update",
        "username": "john.doe@example.com",
        "object_id": 131,
        "timestamp": "2024-07-01T07:00:00.000000",
        "details": {
            "path": "api/v1/users/131",
            "method": "PATCH",
            "request_id": "0aadfd75-8dcz-4e62-94d9-a23811d0d0b0",
            "status_code": 200,
            "payload": {"groups": ["admin"]},
        }
      }
    ]
}

GET /v1/audit_logs

List audit log records for chosen objects and actions.

Using filters, you can narrow down the number of records. object_type is a required filter.

Supported filters:

Attribute Type Description Required
object_type str Type of the object on which the action was performed. Available types are document, annotation, user. Yes
action str Type of the action performed. See below. No
object_id int ID of the object on which the action was performed. No
timestamp_before str Filter for log entries before the given timestamp. No
timestamp_after str Filter for log entries after the given timestamp. No
username str Username of the user that performed the action. No

Depending on the object_type, you can choose to filter the logs based on action. Each object_type supports filtering by different actions:

object_type Available actions
document create
annotation update-status
user create, delete, purge, update, destroy, app_load*, reset-password, change_password

*app_load value represents records of when api/v1/auth/user endpoint was called

Response

Status: 200

Returns paginated response with a list of audit logs objects.

Automation blocker

Example automation blocker object

{
  "id": 1,
  "url": "https://<example>.rossum.app/api/v1/automation_blockers/1",
  "annotation": "https://<example>.rossum.app/api/v1/annotations/4",
  "content": [
    {
      "level": "datapoint",
      "type": "low_score",
      "schema_id": "invoice_id",
      "samples_truncated": false,
      "samples": [
        {
          "datapoint_id": 1234,
          "details": {
            "score": 0.901,
            "threshold": 0.975
          }
        }
      ]
    },
    {
      "level": "datapoint",
      "type": "failed_checks",
      "schema_id": "invoice_id",
      "samples_truncated": false,
      "samples": {
        "datapoint_id": 1234,
        "details": {"validation": "bad"}
      }
    },
    {
      "level": "datapoint",
      "type": "no_validation_sources",
      "schema_id": "invoice_id",
      "samples_truncated": false,
      "samples": {
        "datapoint_id": 1234
      }
    },
    {
      "level": "datapoint",
      "type": "error_message",
      "schema_id": "invoice_id",
      "samples_truncated": false,
      "samples": [
        {
          "datapoint_id": 1234,
          "details": {
            "message_content": ["Error 1", "Error 2"]
          }
        }
      ]
    },
    {
      "level": "annotation",
      "type": "suggested_edit_present"
    },
    {
      "level": "annotation",
      "type": "is_duplicate"
    },
    {
      "level": "annotation",
      "type": "error_message",
      "details": {
        "message_content": ["Error 1"]
      }
    }
  ]
}

Automation blocker stores reason why annotation was not automated.

Attribute Type Read-only Description
id integer yes AutomationBlocker object ID.
url URL yes AutomationBlocker object URL.
annotation URL yes URL of related Annotation object.
content list[object] no List of reasons why automation is blocked.

Content consists of following elements

Attribute Type Description
level enum Designates whether automation blocker relates to specific datapoint or to the whole annotation.
type enum See below for possible values.
schema_id string Only for datapoint level objects.
samples list[object] Contains sample of specific datapoints with detailed info (only for datapoint level objects). Only first 10 samples are listed.
samples_truncated bool Whether number samples were truncated to 10, or contains all of them.
details object Only for level: annotation with type: error_message. Contains message_content with list of error messages.

Automation blocker types

low_score automation blocker example

{
  "level": "datapoint",
  "type": "low_score",
  "schema_id": "invoice_id",
  "samples_truncated": false,
  "samples": [
    {
      "datapoint_id": 1234,
      "details": {
        "score": 0.901,
        "threshold": 0.975
      }
    },
    {
      "datapoint_id": 1235,
      "details": {
        "score": 0.968,
        "threshold": 0.975
      }
    }
  ]
}

failed_checks automation blocker example

{
  "level": "datapoint",
  "type": "failed_checks",
  "schema_id": "schema_id",
  "samples_truncated": false,
  "samples": [
    {
      "datapoint_id": 43,
      "details": {
        "validation": "bad"
      }
    }
  ]
}

no_validation_sources automation blocker example

{
  "level": "datapoint",
  "type": "no_validation_sources",
  "schema_id": "schema_id",
  "samples_truncated": false,
  "samples": [
    {
      "datapoint_id": 412
    }
  ]
}

error_message automation blocker example

[
    {
      "level": "annotation",
      "type": "error_message",
      "details": {
        "message_content": ["annotation error"]
      }
    },
    {
      "level": "datapoint",
      "type": "error_message",
      "schema_id": "schema_id",
      "samples_truncated": false,
      "samples": [
        {
          "datapoint_id": 45,
          "details": {
            "message_content": ["longer than 3 characters"]
          }
        }
      ]
    }
]

delete_recommendations automation blocker example

[
  {
    "level": "annotation",
    "type": "delete_recommendation_filename | delete_recommendation_page_count",
    "details": {
      "message_content": ["annotation error"]
    }
  },
  {
    "level": "datapoint",
    "type": "delete_recommendation_field",
    "schema_id": "document_type",
    "samples_truncated": false,
    "samples": [
      {
        "datapoint_id": 45
      }
    ]
  }
]

extension automation blocker example

[
  {
    "level": "annotation",
    "type": "extension",
    "details": {
      "content": ["PO not found in the master data!"]
    }
  },
  {
    "level": "datapoint",
    "type": "extension",
    "schema_id": "sender_name",
    "samples_truncated": false,
    "samples": [
      {
        "datapoint_id": 1357,
        "details": {
          "content": ["Unregistered vendor"]
        }
      }
    ]
  }
]

List all automation blockers

List all automation blockers

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/automation_blockers'
{
  "pagination": {
    "total": 1,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 1,
      "url": "https://<example>.rossum.app/api/v1/automation_blockers/1",
      "annotation": "https://<example>.rossum.app/api/v1/annotations/4",
      "content": [
        {
          "level": "datapoint",
          "type": "low_score",
          "schema_id": "invoice_id",
          "samples_truncated": false,
          "samples": [
            {
              "datapoint_id": 1234,
              "details": {
                "score": 0.901,
                "threshold": 0.975
              }
            }
          ]
        },
        {
          "level": "datapoint",
          "type": "failed_checks",
          "schema_id": "invoice_id",
          "samples_truncated": false,
          "samples": {
            "datapoint_id": 1234,
            "details": {"validation": "bad"}
          }
        },
        {
          "level": "datapoint",
          "type": "error_message",
          "schema_id": "invoice_id",
          "samples_truncated": false,
          "samples": {
            "datapoint_id": 1234,
            "details": {
              "message_content": ["Error 1", "Error 2"]
            }
          }
        },
        {
          "level": "annotation",
          "type": "suggested_edit_present"
        },
        {
          "level": "annotation",
          "type": "is_duplicate"
        },
        {
          "level": "annotation",
          "type": "error_message",
          "details": {
            "message_content": ["Error 1"]
          }
        }
      ]
    }
  ]
}

GET /v1/automation_blockers

List all automation blocker objects.

Supported filters: annotation

For additional info please refer to filters and ordering.

Response

Status: 200

Returns paginated response with a list of automation blocker objects.

Retrieve automation blocker

Get automation blocker 12

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/automation_blocker/12'
{
  "id": 12,
  "url": "https://<example>.rossum.app/api/v1/automation_blockers/12",
  "annotation": "https://<example>.rossum.app/api/v1/annotations/481",
  "content": [
    {
      "level": "annotation",
      "type": "automation_disabled"
    }
  ]
}

GET /v1/automation_blockers/{id}

Response

Status 200

Returns automation blocker object.

Connector

Example connector object

{
  "id": 1500,
  "name": "MyQ Connector",
  "queues": [
    "https://<example>.rossum.app/api/v1/queues/8199"
  ],
  "url": "https://<example>.rossum.app/api/v1/connectors/1500",
  "service_url": "https://myq.east-west-trading.com",
  "params": "strict=true",
  "client_ssl_certificate": "-----BEGIN CERTIFICATE-----\n...",
  "authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
  "asynchronous": true,
  "metadata": {},
  "modified_by": "https://<example>.rossum.app/api/v1/users/1",
  "modified_at": "2020-01-01T10:08:03.856648Z"
}

A connector is an extension of Rossum that allows to validate and modify data during validation and also export data to an external system. A connector object is used to configure external or internal endpoint of such an extension service. For more information see Extensions.

Attribute Type Default Description Read-only
id integer Id of the connector true
name string Name of the connector (not visible in UI)
url URL URL of the connector true
queues list[URL] List of queues that use connector object.
service_url URL URL of the connector endpoint
params string Query params appended to the service_url
client_ssl_certificate string Client SSL certificate used to authenticate requests. Must be PEM encoded.
client_ssl_key string Client SSL key (write only). Must be PEM encoded. Key may not be encrypted.
authorization_type string secret_key String sent in HTTP header Authorization could be set to secret_key or Basic. For details see Connector API.
authorization_token string Token sent to connector in Authorization header to ensure connector was contacted by Rossum (displayed only to admin user).
asynchronous bool true Affects exporting: when true, confirm endpoint returns immediately and connector's save endpoint is called asynchronously later on.
metadata object {} Client data.
modified_by URL null URL of the last connector modifier true
modified_at datetime null Date of last modification true

List all connectors

List all connectors

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/connectors'
{
  "pagination": {
    "total": 1,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 1500,
      "name": "MyQ Connector",
      "queues": [
        "https://<example>.rossum.app/api/v1/queues/8199"
      ],
      "url": "https://<example>.rossum.app/api/v1/connectors/1500",
      "service_url": "https://myq.east-west-trading.com",
      "params": "strict=true",
      "client_ssl_certificate": null,
      "authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
      "asynchronous": true,
      "metadata": {},
      "modified_by": "https://<example>.rossum.app/api/v1/users/1",
      "modified_at": "2020-01-01T10:08:03.856648Z"
    }
  ]
}

GET /v1/connectors

Retrieve all connector objects.

Supported filters: id, name, service_url

Supported ordering: id, name, service_url

For additional info please refer to filters and ordering.

Response

Status: 200

Returns paginated response with a list of connector objects.

Create a new connector

Create new connector related to queue 8199 with endpoint URL https://myq.east-west-trading.com

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"name": "MyQ Connector", "queues": ["https://<example>.rossum.app/api/v1/queues/8199"], "service_url": "https://myq.east-west-trading.com", "authorization_token":"wuNg0OenyaeK4eenOovi7aiF"}' \
  'https://<example>.rossum.app/api/v1/connectors'
{
  "id": 1500,
  "name": "MyQ Connector",
  "queues": [
    "https://<example>.rossum.app/api/v1/queues/8199"
  ],
  "url": "https://<example>.rossum.app/api/v1/connectors/1500",
  "service_url": "https://myq.east-west-trading.com",
  "params": null,
  "client_ssl_certificate": null,
  "authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
  "asynchronous": true,
  "metadata": {},
  "modified_by": "https://<example>.rossum.app/api/v1/users/1",
  "modified_at": "2020-01-01T10:08:03.856648Z"
}

POST /v1/connectors

Create a new connector object.

Response

Status: 201

Returns created connector object.

Retrieve a connector

Get connector object 1500

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/connectors/1500'
{
  "id": 1500,
  "name": "MyQ Connector",
  "queues": [
    "https://<example>.rossum.app/api/v1/queues/8199"
  ],
  "url": "https://<example>.rossum.app/api/v1/connectors/1500",
  "service_url": "https://myq.east-west-trading.com",
  "params": null,
  "client_ssl_certificate": null,
  "authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
  "asynchronous": true,
  "metadata": {},
  "modified_by": null,
  "modified_at": null
}

GET /v1/connectors/{id}

Get a connector object.

Response

Status: 200

Returns connector object.

Update a connector

Update connector object 1500

curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"name": "MyQ Connector (stg)", "queues": ["https://<example>.rossum.app/api/v1/queues/8199"], "service_url": "https://myq.stg.east-west-trading.com", "authorization_token":"wuNg0OenyaeK4eenOovi7aiF"} \
  'https://<example>.rossum.app/api/v1/connectors/1500'
{
  "id": 1500,
  "name": "MyQ Connector (stg)",
  "queues": [
    "https://<example>.rossum.app/api/v1/queues/8199"
  ],
  "url": "https://<example>.rossum.app/api/v1/connectors/1500",
  "service_url": "https://myq.stg.east-west-trading.com",
  "params": null,
  "client_ssl_certificate": null,
  "authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
  "asynchronous": true,
  "metadata": {},
  "modified_by": "https://<example>.rossum.app/api/v1/users/1",
  "modified_at": "2020-01-01T10:08:03.856648Z"
}

PUT /v1/connectors/{id}

Update connector object.

Response

Status: 200

Returns updated connector object.

Update part of a connector

Update connector URL of connector object 1500

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"service_url": "https://myq.stg2.east-west-trading.com"}' \
  'https://<example>.rossum.app/api/v1/connectors/1500'
{
  "id": 1500,
  "name": "MyQ Connector",
  "queues": [
    "https://<example>.rossum.app/api/v1/queues/8199"
  ],
  "url": "https://<example>.rossum.app/api/v1/connectors/1500",
  "service_url": "https://myq.stg2.east-west-trading.com",
  "params": null,
  "client_ssl_certificate": null,
  "authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
  "asynchronous": true,
  "metadata": {},
  "modified_by": "https://<example>.rossum.app/api/v1/users/1",
  "modified_at": "2020-01-01T10:08:03.856648Z"
}

PATCH /v1/connectors/{id}

Update part of connector object.

Response

Status: 200

Returns updated connector object.

Delete a connector

Delete connector 1500

curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/connectors/1500'

DELETE /v1/connectors/{id}

Delete connector object.

Response

Status: 204

Dedicated Engine

Example engine object

{
    "id": 3000,
    "name": "Dedicated engine 1",
    "description": "AI engine trained to recognize data for the specific data capture requirement",
    "url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
    "status": "draft",
    "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
    "queues": []
}

A Dedicated Engine object holds specification and a current state of training setup for a Dedicated Engine.

Attribute Type Default Description Read-only
id integer Id of the engine true
name string Name of the engine
description string Description of the engine
url URL URL of the engine true
status enum draft Current status of the engine, see below true
schema url null Related dedicated engine schema

Dedicated Engine Status

Can be one of draft, schema_review, annotating_initial, annotating_review, annotating_training, training_started, training_finished, and retraining

If status is not draft, the whole engine and its schema become read-only.

Request a new Dedicated Engine

Request a new Dedicated Engine using a form (multipart/form-data)

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -F document_type="Custom invoice" -F document_language="en-US" -F volume="9" \
  -F sample_uploads=@document1.pdf -F sample_uploads=@document2.pdf \
  'https://<example>.rossum.app/api/v1/dedicated_engines/request'
{
    "id": 3001,
    "url": "https://<example>.rossum.app/api/v1/dedicated_engines/3001",
    "name": "Requested engine - Custom invoice",
    "status": "sample_review",
    "description": "AI engine trained to recognize customer-provided data for the customer's specific data capture requirements",
    "schema": null
}

POST /v1/dedicated_engines/request

Request training of a new Dedicated Engine

Field Type Description Required
document_type str Type of the document the engine should predict True
document_language str Language of the documents True
volume int Estimated volume per year True
sample_uploads list[FILE] Multiple sample files of the documents.

Response

Status: 200

Returns created dedicated engine object.

List all dedicated engines

List all dedicated engines

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/dedicated_engines'
{
  "pagination": {
    "total": 1,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 3000,
      "name": "Dedicated engine 1",
      "description": "AI engine trained to recognize data for the specific data capture requirement",
      "url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
      "status": "draft",
      "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"
    }
  ]
}

GET /v1/dedicated_engines

Retrieve all dedicated engine objects.

Response

Status: 200

Returns paginated response with a list of dedicated engine objects.

Create a new dedicated engine

Create a new dedicated engine

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"name": "Dedicated engine 1", "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6001"}' \
  'https://<example>.rossum.app/api/v1/dedicated_engines'
{
  "id": 3001,
  "name": "Dedicated engine 1",
  "description": "AI engine trained to recognize data for the specific data capture requirement",
  "url": "https://<example>.rossum.app/api/v1/dedicated_engines/3001",
  "status": "draft",
  "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6001"
}

POST /v1/dedicated_engines

Create a new dedicated engine object.

Response

Status: 201

Returns created dedicated engine object.

Retrieve a dedicated engine

Get dedicated engine object 3000

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/dedicated_engines/3000'
{
  "id": 3000,
  "name": "Dedicated engine 1",
  "description": "AI engine trained to recognize data for the specific data capture requirement",
  "url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
  "status": "draft",
  "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"
}

GET /v1/dedicated_engines/{id}

Get a dedicated engine object.

Response

Status: 200

Returns dedicated engine object.

Update a dedicated engine

Update dedicated engine object 3000

curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"name": "New name", "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"}' \
  'https://<example>.rossum.app/api/v1/dedicated_engines/3000'
{
  "id": 3000,
  "name": "New name",
  "description": "AI engine trained to recognize data for the specific data capture requirement",
  "url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
  "status": "draft",
  "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"
}

PUT /v1/dedicated_engines/{id}

Update dedicated engine object.

Response

Status: 200

Returns updated dedicated engine object.

Update part of a dedicated engine

Update content URL of dedicated engine object 3000

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"name": "New name"}' \
  'https://<example>.rossum.app/api/v1/dedicated_engines/3000'
{
  "id": 3000,
  "name": "New name",
  "description": "AI engine trained to recognize data for the specific data capture requirement",
  "url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
  "status": "draft",
  "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"
}

PATCH /v1/dedicated_engines/{id}

Update part of a dedicated engine object.

Response

Status: 200

Returns updated dedicated engine object.

Delete a dedicated engine

Delete dedicated engine 3000

curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/dedicated_engines/3000'

DELETE /v1/dedicated_engines/{id}

Delete dedicated engine object.

Response

Status: 204

Dedicated Engine Schema

Example dedicated engine schema object

{
  "id": 6000,
  "url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
  "content": {
    "training_queues": [
      "https://<example>.rossum.app/api/v1/queues/123",
      "https://<example>.rossum.app/api/v1/queues/200",
      "https://<example>.rossum.app/api/v1/queues/321"
    ],
    "fields": [
      {
        "category": "datapoint",
        "engine_output_id": "document_id",
        "type": "string",
        "label": "Document ID",
        "description": "Document number",
        "trained": true,
        "sources": [
          {
            "queue": "https://<example>.rossum.app/api/v1/queues/123",
            "schema_id": "document_id"
          },
          {
            "queue": "https://<example>.rossum.app/api/v1/queues/200",
            "schema_id": "custom_name_document_id"
          }
        ]
      },
      {
        "category": "multivalue",
        "children": {
          "category": "datapoint",
          "engine_output_id": "order_id",
          "type": "string",
          "label": "Order Number",
          "description": "Purchase order identification (Order Numbers not captured as 'sender_order_id')",
          "trained": false,
          "sources": [
            {
              "queue": "https://<example>.rossum.app/api/v1/queues/200",
              "schema_id": "custom_name_order_id"
            },
            {
              "queue": "https://<example>.rossum.app/api/v1/queues/321",
              "schema_id": "order_id"
            }
          ]
        }
      },
      {
        "category": "multivalue",
        "engine_output_id": "line_items",
        "type": "grid",
        "label": "Line Items",
        "description": "Line item column types.",
        "trained": true,
        "children": {
          "category": "tuple",
          "children": [
            {
              "category": "datapoint",
              "engine_output_id": "table_column_tax",
              "type": "number",
              "label": "Item Tax",
              "description": "Tax amount for the line",
              "trained": true,
              "sources": [
                {
                  "queue": "https://<example>.rossum.app/api/v1/queues/123",
                  "schema_id": "table_column_tax"
                },
                {
                  "queue": "https://<example>.rossum.app/api/v1/queues/200",
                  "schema_id": "custom_table_column_tax"
                }
              ]
            },
            {
              "category": "datapoint",
              "engine_output_id": "table_column_rate",
              "type": "number",
              "label": "Item Rate",
              "description": "Tax rate for the line item",
              "trained": true,
              "sources": [
                {
                  "queue": "https://<example>.rossum.app/api/v1/queues/321",
                  "schema_id": "table_column_rate"
                }
              ]
            }
          ]
        }
      }
    ]
  }
}

An engine schema is an object which describes what fields are available in the engine. Do not confuse engine schema with Document Schema.

Attribute Type Default Description Read-only
id integer Id of the engine schema true
url URL URL of the engine schema true
content object See below for description of the engine schema content

Schema can be edited only if its Dedicated Engine has status draft.

Content structure

Top-level
Attribute Type Description
training_queues list[URL] List of Queues that will be used for the training. Note that queues can't have delete_after field set, otherwise a validation error is raised. (see queue fields)
fields list[object] Container for fields declarations. It may contain only objects of category multivalue or datapoint
Multivalue
Attribute Type Description Read-only
category string Category of the object, multivalue
engine_output_id string Unique name of the new extracted field in the trained Dedicated Engine
label string User-friendly label for an object, shown in the user interface
trained bool Whether the field was successfully trained true
type enum Type of the trained field. One of: grid and freeform.
description string Description of field attribute
children object Object specifying type of children. It may contain only objects with categories tuple or datapoint.

Multivalue objects with datapoint children do not have engine_output_id, label, trained, type, or description attributes

Tuple
Attribute Type Description
category string Category of the object, tuple
children list[object] Array specifying objects that belong to a given tuple. It may contain only objects with category datapoint.
Datapoint
Attribute Type Description Read-only
category string Category of the object, datapoint
engine_output_id string Name of the new extracted field in the trained Dedicated Engine
label string User-friendly label for an object, shown in the user interface
trained bool Whether the field was successfully trained true
type enum Type of the trained field. One of: number, string, date, and enum
description string Description of field attribute
sources list[Sources] Mapping describing the source Queues and their fields to train this field from
Sources
Attribute Type Description
queue URL Queue to map the field from. Only one Queue per engine output is allowed
schema_id string Id of the field to map. The id must exist in the mapped Queue's schema

Validate a dedicated engine schema

Validate content and integrity of dedicated engine schema object

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"content":{"training_queues":["https://<example>.rossum.app/api/v1/queues/123"],"fields":[{"engine_output_id":"document_id","category":"datapoint","type":"string","label":"ID","description":"Document ID","sources":[{"queue":"https://<example>.rossum.app/api/v1/queues/123","schema_id":"document_id"}]}]}}' \
  'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/validate'

POST /v1/dedicated_engine_schemas/validate

Validate dedicated engine schema object, check for errors. Additionally, to the basic checks done by the CRUD endpoints, this endpoint checks that:

Response

Status: 200

Returns 200 and error description in case of validation failure.

Predict a dedicated engine schema

Predict a dedicated engine schema

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"training_queues":["https://<example>.rossum.app/api/v1/queues/123", "https://<example>.rossum.app/api/v1/queues/200", "https://<example>.rossum.app/api/v1/queues/321"]}' \
  'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/predict'
 {
  "content": {
    "training_queues": [
      "https://<example>.rossum.app/api/v1/queues/123",
      "https://<example>.rossum.app/api/v1/queues/200",
      "https://<example>.rossum.app/api/v1/queues/321"
    ],
    "fields": [...]
  }
}

POST /v1/dedicated_engine_schemas/predict

Try to predict a dedicated engine schema based on the provided training queue's schemas. The predicted schema is not guaranteed to pass /v1/dedicated_engine_schemas/validate check, only the checks done on engine schema save

Response

Status: 200

Returns 200 and predicted dedicated engine schema

List all dedicated engine schemas

List all dedicated engine schemas

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/dedicated_engine_schemas'
{
  "pagination": {
    "total": 1,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 6000,
      "url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
      "content": {
        "training_queues": [...],
        "fields": [...]
      }
    }
  ]
}

GET /v1/dedicated_engine_schemas

Retrieve all dedicated engine schema objects.

Response

Status: 200

Returns paginated response with a list of dedicated engine schema objects.

Create a new dedicated engine schema

Create a new dedicated engine schema

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"content": {"fields": [...], "training_queues": [...]}}' \
  'https://<example>.rossum.app/api/v1/dedicated_engine_schemas'
{
  "id": 6001,
  "url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6001",
  "content": {
    "training_queues": [...],
    "fields": [...]
  }
}

POST /v1/dedicated_engine_schemas

Create a new dedicated engine schema object.

Response

Status: 201

Returns created dedicated engine schema object.

Retrieve a dedicated engine schema

Retrieve dedicated engine schema object 6000

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000'
{
  "id": 6000,
  "url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
  "content": {
    "training_queues": [...],
    "fields": [...]
  }
}

GET /v1/dedicated_engine_schemas/{id}

Get a dedicated engine schema object.

Response

Status: 200

Returns dedicated engine schema object.

Update a dedicated engine schema

Update dedicated engine schema object 6000

curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"content": {"fields": [...], "training_queues": [...]}}' \
  'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000'
{
  "id": 6000,
  "url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
  "content": {
    "training_queues": [...],
    "fields": [...]
  }
}

PUT /v1/dedicated_engine_schemas/{id}

Update dedicated engine schema object.

Response

Status: 200

Returns updated dedicated engine schema object.

Update part of a dedicated engine schema

Update content URL of dedicated engine schema object 6000

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"content": {"fields": [...], "training_queues": [...]}}' \
  'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000'
{
  "id": 6000,
  "url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
  "content": {
    "training_queues": [...],
    "fields": [...]
  }
}

PATCH /v1/dedicated_engine_schemas/{id}

Update part of a dedicated engine schema object.

Response

Status: 200

Returns updated dedicated engine schema object.

Delete a dedicated engine schema

Delete dedicated engine schema 6000

curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000'

DELETE /v1/dedicated_engine_schemas/{id}

Delete a dedicated engine schema object.

Response

Status: 204

Delete Recommendation

Example delete-recommendation object

{
  "id": 1244,
  "enabled": true,
  "url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
  "organization": "https://<example>.rossum.app/api/v1/organizations/132",
  "queue": "https://<example>.rossum.app/api/v1/queues/4857",
   "triggers": [
        "https://<example>.rossum.app/api/v1/triggers/500",
   ]
}
Attribute Type Required Description Read-only
id integer Id of the delete recommendation. true
enabled boolean Whether the associated triggers' rules should be active
url URL URL of the delete recommendation. true
organization URL URL of the associated organization. true
queue URL URL of the associated queue.
triggers List[URL] URL of the associated triggers.

A Delete-recommendation is an object that binds together triggers that fire when a document meets a queue's criteria for a deletion recommendation. Currently, only binding to a single trigger is supported. The trigger bound to a DeleteRecommendation must belong to the same queue.

List all delete recommendations

List all delete recommendations

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/delete_recommendations'
{
  "pagination": {
    "total": 2,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 1244,
      "url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
      "organization": "https://<example>.rossum.app/api/v1/organizations/132",
      "queue": "https://<example>.rossum.app/api/v1/queues/4857",
       "triggers": [
            "https://<example>.rossum.app/api/v1/triggers/500",
       ],
    },
    ...
  ]
}

GET /v1/delete_recommendations

Retrieve all delete recommendations objects.

Supported filters

Delete recommendations currently support the following filters:

Filter name Type Description
queue integer Filter only delete recommendations associated with given queue id (or multiple ids).

Supported ordering

Delete recommendations currently support the following ordering: id, queue

For additional info please refer to filters and ordering.

Response

Status: 200

Returns paginated response with a list of delete recommendation objects.

Retrieve a delete recommendation

Get the delete recommendation object with ID 1244

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/delete_recommendations/1244'
{
  "id": 1244,
  "enabled": true,
  "url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
  "organization": "https://<example>.rossum.app/api/v1/organizations/132",
  "queue": "https://<example>.rossum.app/api/v1/queues/4857",
   "triggers": [
        "https://<example>.rossum.app/api/v1/triggers/500",
   ]
}

GET /v1/delete_recommendations/{id}

Get a delete recommendation object object.

Response

Status: 200

Returns a delete recommendation object.

Create a delete recommendation

Create a new delete recommendation

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"organization": "https://<example>.rossum.app/api/v1/organizations/132", "triggers": ["https://<example>.rossum.app/api/v1/triggers/5000"], "queue": "https://<example>.rossum.app/api/v1/queues/4857", "enabled": "True"}' \
  'https://<example>.rossum.app/api/v1/delete_recommendations/'
{
  "id": 1244,
  "url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
  "organization": "https://<example>.rossum.app/api/v1/organizations/132",
  "queue": "https://<example>.rossum.app/api/v1/queues/4857",
  "enabled": true,
  "triggers": ["https://<example>.rossum.app/api/v1/triggers/5000"]
}

POST /v1/delete_recommendations/

Create a new delete recommendation

Update a delete recommendation

Update the delete recommendation object with ID 1244

curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"triggers": [], "enabled": "False"}' \
  'https://<example>.rossum.app/api/v1/delete_recommendations/1244'
{
  "id": 1244,
  "url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
  "organization": "https://<example>.rossum.app/api/v1/organizations/132",
  "queue": "https://<example>.rossum.app/api/v1/queues/4857",
  "enabled": false,
  "triggers": [],
  ...
}

PUT /v1/delete_recommendations/{id}

Update a delete recommendation

Update a part of a delete recommendation

Update flag enabled of delete recommendation object 1244

curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"enabled": "False"}' \
  'https://<example>.rossum.app/api/v1/delete_recommendations/1244'
{
  "id": 1244,
  "url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
  "organization": "https://<example>.rossum.app/api/v1/organizations/132",
  "queue": "https://<example>.rossum.app/api/v1/queues/4857",
  "enabled": false,
  ...
}

PATCH /v1/delete_recommendations/{id}

Update a part of a delete recommendation

Remove a delete recommendation

Remove the delete recommendation object 1244

curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/delete_recommendations/1244'

DELETE /v1/delete_recommendations/{id}

Remove a delete recommendation.

Document

Example document object

{
  "id": 314628,
  "url": "https://<example>.rossum.app/api/v1/documents/314628",
  "s3_name": "272c2f01ae84a4e19a421cb432e490bb",
  "parent": "https://<example>.rossum.app/api/v1/documents/203517",
  "email": "https://<example>.rossum.app/api/v1/emails/987654",
  "annotations": [
    "https://<example>.rossum.app/api/v1/annotations/314528"
  ],
  "mime_type": "application/pdf",
  "creator": "https://<example>.rossum.app/api/v1/users/1",
  "created_at": "2019-10-13T23:04:00.933658Z",
  "arrived_at": "2019-10-13T23:04:00.933658Z",
  "original_file_name": "test_invoice_1.pdf",
  "content": "https://<example>.rossum.app/api/v1/documents/314628/content",
  "attachment_status": null,
  "metadata": {}
}

A document object contains information about one input file. To create it, one can:

Attribute Type Default Description Read-only
id integer Id of the document true
url URL URL of the document true
s3_name string Internal true
parent URL null URL of the parent document (e.g. the zip file it was extracted from) true
email URL URL of the email object that document was imported by (only for documents imported by email). true
annotations list[URL] List of annotations related to the document. Usually there is only one annotation. true
mime_type string MIME type of the document (e.g. application/pdf) true
creator URL User that created the annotation. true
created_at datetime Timestamp of document upload or incoming email attachment extraction. true
arrived_at datetime (Deprecated) See created_at true
original_file_name string File name of the attachment or upload. true
content URL Link to the document's raw content (e.g. PDF file). May be null if there is no file associated. true
attachment_status string null Reason, why the Document got filtered out on Email ingestion. See attachment status true
metadata object {} Client data.

Attachment status

Possible values: filtered_by_inbox_resolution, filtered_by_inbox_size, filtered_by_inbox_mime_type, filtered_by_inbox_file_name, filtered_by_hook_custom, filtered_by_queue_mime_type, hook_additional_file, filtered_by_insecure_mime_type, extracted_archive, failed_to_extract, processed, password_protected_archive, broken_image and null

List all documents

List all documents

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/documents'
{
  "pagination": {
    "total": 2,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 314628,
      "url": "https://<example>.rossum.app/api/v1/documents/314628",
      ...
    },
    {
      "id": 315609,
      "url": "https://<example>.rossum.app/api/v1/documents/315609",
      ...
    }
  ]
}

GET /v1/documents

Retrieve all document objects.

Supported filters: id, email, creator, arrived_at, created_at, original_file_name, attachment_status

Supported ordering: id, arrived_at, created_at, original_file_name, mime_type, attachment_status

For additional info please refer to filters and ordering.

Response

Status: 200

Returns paginated response with a list of document objects.

Retrieve a document

Get document object 314628

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/documents/314628'
{
  "id": 314628,
  "url": "https://<example>.rossum.app/api/v1/documents/314628",
  ...
}

GET /v1/documents/{id}

Get a document object.

Response

Status: 200

Returns document object.

Create document

Create new document using a form (multipart/form-data)

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -F content=@document.pdf \
  'https://example.app.rossum.ai/api/v1/documents'

Create new document by sending file in a request body

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H 'Content-Disposition: attachment; filename=document.pdf' --data-binary @file.pdf \
  'https://example.app.rossum.ai/api/v1/documents'

Create new document by sending file in a request body (UTF-8 filename must be URL encoded)

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -H "Content-Disposition: attachment; filename*=utf-8''document%20%F0%9F%8E%81.pdf" --data-binary @file.pdf \
  'https://example.app.rossum.ai/api/v1/documents'

Create documents using basic authentication

curl -u 'east-west-trading-co@example.app.rossum.ai:secret' \
  -F content=@document.pdf \
  'https://example.app.rossum.ai/api/v1/documents'

Create document with metadata and a parent document

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -F content=@document.pdf \
  -F metadata='{"project":"Market ABC"}' \
  -F parent='https://example.app.rossum.ai/api/v1/documents/456700' \
  'https://example.app.rossum.ai/api/v1/documents'
{
  "id": 314628,
  "url": "https://example.app.rossum.ai/api/v1/documents/314628",
  ...
}

POST /v1/documents

Create a new document object.

Use this API call to create a document without an annotation. Suitable for creating documents for mime types that cannot be extracted by Rossum. Only one document can be created per request. Allowed attributes for creation request:

Attribute Type Description
content bytes The file to be uploaded.
metadata object Client data.
parent URL URL of the parent document (e.g. the original file based on which the uploaded content was created)

Response

Status: 201

Returns created document object.

Update part of a document

Update metadata of a document object 314628

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"metadata": {"translation_file_name": "Rechnung.pdf"}}' \
  'https://<example>.rossum.app/api/v1/documents/314628'
{
  "id": 314628,
  "url": "https://<example>.rossum.app/api/v1/documents/314628",
  "metadata": {"translation_file_name": "Rechnung.pdf"},
  ...
}

PATCH /v1/documents/{id}

Update part of a document object.

Document content

Download document original

To download multiple documents in one archive, refer to documents download object.

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/documents/314628/content'

GET /v1/documents/{id}/content

Get original document content (e.g. PDF file).

Response

Status: 200

Returns original document file.

Permanent URL

Download document original from a permanent URL

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/original/272c2f01ae84a4e19a421cb432e490bb'

GET /v1/original/272c2f01ae84a4e19a421cb432e490bb

Get original document content (e.g. PDF file).

Response

Status: 200

Returns original document file.

Delete a document

Delete document 314628

curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/documents/314628'

DELETE /v1/documents/{id}

Delete a document object from the database. It also deletes the related annotation and page objects.

Never call this internal API, mark the annotation as deleted instead.

Response

Status: 204

Documents Download

Example download object

{
  "id": 105,
  "url": "https://<example>.rossum.app/api/v1/documents/downloads/105",
  "file_name": "test_invoice_1.pdf",
  "expires_at": "2023-09-13T23:04:00.933658Z",
  "content": "https://<example>.rossum.app/api/v1/documents/downloads/105/content",
}

Set of endpoints enabling download of multiple documents at once. The workflow of such action is as follows:

A download object contains information about a downloadable archive in .zip format.

Attribute Type Description Read-only
id integer Id of the download object true
url URL URL of the download object true
expires_at datetime Timestamp of a guaranteed availability of the download object and its content. Set to the archive creation time plus 2 hours. Expired downloads are being deleted periodically. true
file_name string Name of the archive to be downloaded. true
content URL Link to the download's raw content. May be null if there is no archive associated yet. true

Retrieve a download

GET /v1/documents/downloads/{id}

Get a download object.

Response

Status: 200

Returns download object.

Create new download

Create new download object

curl -s -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -d '{"documents": ["https://<example>.rossum.app/api/v1/documents/123000", "https://<example>.rossum.app/api/v1/documents/123001"], "file_name": "monday_invoices.zip"}' \
  'https://<example>.rossum.app/api/v1/documents/downloads'
{
  "url": "https://<example>.rossum.app/api/v1/tasks/301"
}

POST /v1/documents/downloads

Create a new download object.

Argument Type Required Default Description
documents list[URL] true Comma-separated list of document URLs to be included in the resulting downloadable archive. Max. 500 documents.
file_name string documents.zip The filename of the resulting archive. Must include a .zip extension.
type enum document One of: document and source_document.
zip boolean true Use application/zip to bundle the download contents.

Response

Status: 202

The response Location header provides the task url (same as in the JSON body of the response).

Returns created task object.

Retrieve download content

Download archive with original documents files

curl -s -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/documents/downloads/100/content'

GET /v1/documents/downloads/{id}/content

Get archive with original document files.

Response

Status: 200

Returns an archive with original document files.

Email

Example email object

{
  "id": 1234,
  "url": "https://<example>.rossum.app/api/v1/emails/1234",
  "queue": "https://<example>.rossum.app/api/v1/queues/4321",
  "inbox": "https://<example>.rossum.app/api/v1/inboxes/8199",
  "documents": [
    "https://<example>.rossum.app/api/v1/documents/5678"
  ],
  "parent": "https://<example>.rossum.app/api/v1/emails/1230",
  "children": [
      "https://<example>.rossum.app/api/v1/emails/1244"
  ],
  "created_at": "2021-03-26T14:31:46.993427Z",
  "last_thread_email_created_at": "2021-03-27T14:29:48.665478Z",
  "subject": "Some email subject",
  "from": {"email": "company@east-west.com", "name": "Company East"},
  "to": [{"email": "east-west-trading-co-a34f3a@<example>.rossum.app", "name": "East West Trading"}],
  "cc": [],
  "bcc": [],
  "body_text_plain": "Some body",
  "body_text_html": "<div dir=\"ltr\">Some body</div>",
  "metadata": {},
  "type": "outgoing",
  "annotation_counts": {
    "annotations": 3,
    "annotations_processed": 1,
    "annotations_purged": 0,
    "annotations_unprocessed": 1,
    "annotations_rejected": 1
  },
  "annotations": [
    "https://<example>.rossum.app/api/v1/annotations/1",
    "https://<example>.rossum.app/api/v1/annotations/2",
    "https://<example>.rossum.app/api/v1/annotations/4"
  ],
  "related_annotations": [],
  "related_documents": [
    "https://<example>.rossum.app/api/v1/documents/3"
  ],
  "filtered_out_document_count": 2,
  "labels": ["rejected"]
}

An email object represents emails sent to Rossum inboxes.

Attribute Type Required Description Read-only
id integer Id of the email true
url URL URL of the email true
queue URL true URL of the associated queue
inbox URL true URL of the associated inbox
parent URL URL of the parent email
email_thread URL URL of the associated email thread true
children list[URL] List of URLs of the children emails
documents list[URL] List of documents attached to email true
created_at datetime Timestamp of incoming email true
last_thread_email_created_at datetime (Deprecated) Timestamp of the most recent email in this email thread true
subject string Email subject
from email_address_object Information about sender containing keys email and name. true
to list[email_address_object] List that contains information about recipients. true
cc list[email_address_object] List that contains information about recipients of carbon copy. true
bcc list[email_address_object] List that contains information about recipients of blind carbon copy. true
body_text_plain string Plain text email section (shortened to 4kB).
body_text_html string HTML email section (shortened to 4kB).
metadata object Client data.
type string Email type. Can be incoming or outgoing. true
annotation_counts object This attribute is intended for INTERNAL use only and may be changed in the future. Information about how many annotations were extracted from email attachments and in which state they currently are true
annotations list[URL] List of URLs of annotations that arrived via email true
related_annotations list[URL] List of URLs of annotations that are related to the email (e.g. rejected by that, added as attachment etc.) true
related_documents list[URL] List of URLs of documents related to the email (e.g. by forwarding email containing document as attachment etc.) true
creator URL User that have sent the email. None if email has been received via SMTP true
filtered_out_document_count integer This attribute is intended for INTERNAL use only and may be changed in the future without notice. Number of documents automatically filtered out by Rossum smart inbox (this feature can be configured in inbox settings). true
labels list[string] List of email labels. Possible values are rejection, automatic_rejection, rejected, automatic_status_changed_info, forwarded, reply false
content URL URL of the emails content true

Email address object

Attribute Type Default Description Required
email string Email address true
name string Name of the email recipient

Annotation counts object

This object stores numbers of annotations extracted from email attachments and their current status.

Attribute Type Description Annotation status
annotations integer Total number of annotations Any
annotations_processed integer Number of processed annotations exported, deleted, purged, split
annotations_purged integer Number of purged annotations purged
annotations_unprocessed integer Number of not yet processed annotations importing, failed_import, to_review, reviewing, confirmed, exporting, postponed, failed_export
annotations_rejected integer Number of rejected annotations rejected
related_annotations integer Total number of related annotations Any

Email labels

Email objects can have assigned any number of labels.

Label name Description
rejection Outgoing informative email sent by Rossum after email was manually rejected.
automatic_rejection Informative automatic email sent by Rossum when no document was extracted from incoming email.
automatic_status_changed_info Informative automatic email sent by Rossum about document status change.
rejected Incoming email rejected together with all attached documents.
forwarded Outgoing email sent by forwarding other email.
reply Outgoing email sent by replying to another email.

List all emails

List all emails

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/emails'
{
  "pagination": {
    "total": 2,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 1234,
      "url": "https://<example>.rossum.app/api/v1/emails/1234",
      "inbox": "https://<example>.rossum.app/api/v1/inboxes/8199",
      "queue": "https://<example>.rossum.app/api/v1/queues/4321",
      "documents": [
        "https://<example>.rossum.app/api/v1/documents/5678"
      ],
    ...
  ]
}

GET /v1/emails

Retrieve all emails objects.

Supported filters: id, created_at, subject, queue, inbox, documents, from__email, from__name, to, last_thread_email_created_at_before, last_thread_email_created_at_after, type, email_thread, has_documents

Supported ordering: id, created_at, subject, queue, inbox, from__email, from__name, last_thread_email_created_at

For additional info please refer to filters and ordering.

Response

Status: 200

Returns paginated response with a list of email objects.

Retrieve an email

Get email object 1244

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/emails/1244'
{
  "id": 1244,
  "url": "https://<example>.rossum.app/api/v1/emails/1244",
  "queue": "https://<example>.rossum.app/api/v1/queues/4321",
  "inbox": "https://<example>.rossum.app/api/v1/inboxes/8199",
  "documents": ["https://<example>.rossum.app/api/v1/documents/5678"],
  "parent": "https://<example>.rossum.app/api/v1/emails/1230",
  "children": [],
  "arrived_at": "2021-03-26T14:31:46.993427Z",
  "last_thread_email_created_at": "2021-03-27T14:29:48.665478Z",
  "subject": "Some email subject",
  "from": {"email": "company@east-west.com"},
  "to": [{"email": "east-west-trading-co-a34f3a@<example>.rossum.app"}],
  "cc": [],
  "bcc": [],
  "body_text_plain": "",
  "body_text_html": "",
  "metadata": {},
  "type": "outgoing",
  "labels": [],
  ...
}

GET /v1/emails/{id}

Get an email object.

Response

Status: 200

Returns email object.

Update an email

Update email object 1244

curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"queue": "https://<example>.rossum.app/api/v1/queues/4321", "inbox": "https://<example>.rossum.app/api/v1/inboxes/8236", "subject": "Some subject", "to": [{"email": "jack@east-west-trading.com"}]}' \
  'https://<example>.rossum.app/api/v1/emails/1244'
{
  "id": 1244,
  "url": "https://<example>.rossum.app/api/v1/emails/1244",
  "queue": "https://<example>.rossum.app/api/v1/queues/4321",
  "inbox": "https://<example>.rossum.app/api/v1/inboxes/8199",
  "documents": [],
  "parent": null,
  "children": [],
  "arrived_at": "2021-03-26T14:31:46.993427Z",
  "last_thread_email_created_at": "2021-03-27T14:29:48.665478Z",
  "subject": "Some subject",
  "from": null,
  "to": [{"email": "jack@east-west-trading.com"}],
  "body_text_plain": "",
  "body_text_html": "",
  "metadata": {},
  "type": "outgoing",
  "labels": [],
  ...
}

PUT /v1/emails/{id}

Update email object.

Response

Status: 200

Returns updated email object.

Update part of an email

Update subject of email object 1244

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"subject": "Some subject"}' \
  'https://<example>.rossum.app/api/v1/emails/1244'
{
  "id": 1244,
  "subject": "Some subject",
  ...
}

PATCH /v1/emails/{id}

Update part of email object.

Response

Status: 200

Returns updated email object.

Send email

Send email

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"to": [{"email": "jack@east-west-trading.com"}], "queue": "https://<example>.rossum.app/api/v1/queues/145300", "template_values": {"subject": "Some subject", "message": "<b>Hello!</b>"}}' \
  'https://<example>.rossum.app/api/v1/emails/send'

POST /v1/emails/send

Send email to specified recipients. The number of emails that can be sent is limited (10 for trials accounts).

Key Type Required Description
to list[email_address_object] List that contains information about recipients.
cc list[email_address_object] List that contains information about recipients of carbon copy.
bcc list[email_address_object] List that contains information about recipients of blind carbon copy.
template_values object false Values to fill in the email template, it should always contain subject and message keys. See below for description.
queue URL true Link to email-related queue.
related_annotations list[URL] false List of links to email-related annotations.
related_documents list[URL] false List of URLs to email-related documents (on the top of related_annotations documents which are linked automatically).
attachments object false Keys are attachment types (currently only documents key is supported), value is list of URL.
parent_email URL false Link to parent email.
labels list[string] false List of email labels.

At least one email in to, cc, bcc must be filled.

email object consists of names and email addresses:

Key Type Required Description
email email true Email address, e.g. john.doe@example.com
name string false Name related to the email, e.g. John Doe

Template values

Object template_values is used to create an outgoing email. Key subject is used to fill an email subject and message is used to fill a body of the email (it may contain a subset of html). Values may contain other placeholders that are either built-in (see below) or specified in the template_values object as well. For placeholders referring to annotations, the annotations from related_annotations attribute are used for filling in correct values.

Example of template_values

{
  ...
  "template_values": {
    "subject": "Document processed",
    "message": "<p>The document was processed.<br>{{user_name}}<br>Additional notes: {{note}}</p>",
    "note": "No issues found"
  }
  ...
}
List of built-in placeholders
Placeholder Description Can be used in automation
organization_name Name of the organization. True
app_url App root url True
user_name Username of the user sending the email. False
current_user_fullname Full name of user sending the email. False
current_user_email Email address of the user sending the email. False
parent_email_subject Subject of the email we are replying to. True
sender_email Email address of the author of the incoming email. True
annotation.document.original_file_name Filenames of the documents belonging to the related annotation(s) True
annotation.content.value.{schema_id} Content value of datapoints from email related annotation(s) True
annotation.id Ids of the related annotation(s) True
annotation.url Urls of the related annotation(s) True
annotation.assignee_email Emails of the assigned users to the related annotation(s) True

Example request data

{
  "to": [{"name": "John Doe", "email": "john.doe@rossum.ai"}],
  "template_values": {
    "subject": "Rejected!: {{parent_email_subject}}",
    "message": "<p>Dear user,<br>Error occurred!<br><br>Note: {{rejection_note}}. Occurred on your document issued at {{ annotation.content.value.date_issue }}.<br>Yours, Rossum</p>",
    "rejection_note": "There is no invoice id!"
  },
  "annotations": ["https://<example>.rossum.app/api/v1/annotations/123"],
  "attachments": {
    "documents": ["https://<example>.rossum.app/api/v1/documents/123"]
  }
}

Response

Status: 200

Returns created email link.

Get email counts

Get email counts

curl -X GET -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  'https://<example>.rossum.app/api/v1/emails/counts'
{
  "incoming": {
    "total": 12,
    "no_documents": 5,
    "recent_with_no_documents_not_replied": 2,
    "rejected": 1,
    "recent_filtered_out_documents": 2
  }
}

GET /v1/emails/counts

Retrieve counts of emails grouped based on status of extracted annotations.

Supports the same filters as list emails endpoint.

Response

Status: 200

Returns object which under the incoming key contains object with email counts computed based on the status of extracted documents

Attribute Type Description
total integer Total number of emails
no_documents integer Number of emails containing no attachment which was processed by Rossum
recent_with_no_documents_not_replied integer Number of emails arrived in the last 14 days with no attachment processed by Rossum, not with rejected label and without any reply (i. e. email has no related children emails - see email docs).
rejected integer Number of emails containing at least one document in rejected status (see document lifecycle) or with rejected label.
recent_with_filtered_out_documents integer Number of emails arrived in the last 14 days containing one or more automatically rejected attachment by Rossum smart inbox (rules for email attachment filtering is defined here).

Email content

GET /emails/<id>/content

Retrieve content of email.

Response

Status: 200

Email notifications management

Unsubscribe from automatic email notifications

curl -X GET 'https://<example>.rossum.app/api/v1/emails/subscription?content=eyJldmVudCI6ImRvY3VtZW50X3JlY2VpdmVkIiwiZW1haWwiOiJqaXJpLmJhdWVyQHJvc3N1bS5haSIsIm9yZ2FuaXphdGlvbiI6Imh0dHA6Ly9sb2NhbGhvc3Q6ODAwMC92MS9vcmdhbml6YXRpb25zLzEifQ&signature=LhgMR01vQ9NAsvAtOKifZpaYBi20vkhOK-Cm7HT1Cqs&subscribe=false'
<!DOCTYPE html>
...
</html>

GET /v1/emails/subscription?subscribe=false

Enable or disable subscription to automatic email notifications sent by Rossum.

Query parameter Type Default Required Description
signature string true Signature used to sign the content (generated by our backend).
content string true Signed content of the payload (generated by our backend).
subscribe boolean true false Designates whether the subscription is enabled or disabled.

Response

Status: 200

Renders HTML page.

Email tracking events

Email tracking events

curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -d '{"payload": "ORSXG5DTFVZHVZLSFZQG6Y3BNQ5DC===", "signature": "nGoqalaYlSMFiCPmJDPWaiN3FLEm_cPbxA4mrgqodpk", "link": "https://rossum.ai", "event": "click"}' \
  'https://<example>.rossum.app/api/v1/email_tracking_events'

POST /v1/email_tracking_events

Rossum has the ability to track email events: send, delivery, open, click, bounce for sent emails.

Key Type Required Description
payload string True Encrypted email, domain and organization ID.
event string True Actions performed on the sent email: bounce, send, delivery, open, click.
link URL False The link from the email body that the user clicked on.
signature string True Signature used to sign the encrypted domain (generated by our backend).

Response

Status: 201

Email Template

Example email template object

{
  "id": 1234,
  "url": "https://<example>.rossum.app/api/v1/email_templates/1234",
  "name": "My Email Template",
  "queue": "https://<example>.rossum.app/api/v1/queues/4321",
  "organization": "https://<example>.rossum.app/api/v1/queues/210",
  "triggers": [
    "https://<example>.rossum.app/api/v1/triggers/500",
    "https://<example>.rossum.app/api/v1/triggers/600"
  ],
  "type": "custom",
  "subject": "My Email Template Subject",
  "message": "<p>My Email Template Message</p>",
  "automate": true
}

An email template object represents templates one can choose from when sending an email from Rossum.

Attribute Type Default Required Description Read-only
id integer Id of the email template true
url URL URL of the email template
name string true Name of the email template
queue URL true URL of the associated queue
organization URL URL of the associated organization
triggers list[URL] URLs of the linked triggers. Read more
type string custom Type of the email template (see email template types)
subject string "" Email subject
message string "" HTML subset of text email section
enabled bool true (Deprecated) Use automate instead
automate bool true True if user wants to send email automatically on the action, see types
to list[email_address_object] [] List that contains information about recipients.
cc list[email_address_object] [] List that contains information about recipients of carbon copy.
bcc list[email_address_object] [] List that contains information about recipients of blind carbon copy.

Email Template Types

Email Template objects can have one of the following types. Only templates with types rejection and custom can be manually created and deleted.

Template type name Description
rejection Template for a rejection email
rejection_default Default template for a rejection email
email_with_no_processable_attachments Template for a reply to an email with no attachments
custom Custom email template

Default Email templates

Every newly created queue triggers a creation of five default email templates with default messages and subjects.

[
  {
    "id": 1234,
    "url": "https://<example>.rossum.app/api/v1/email_templates/1234",
    "name": "Annotation status change - confirmed",
    "queue": "https://<example>.rossum.app/api/v1/queues/501",
    "organization": "https://<example>.rossum.app/api/v1/organizations/123",
    "subject": "Verified documents: {{ parent_email_subject }}",
    "message": "<p>Dear sender,<br><br>Your documents have been checked by annotator.<br><br>{{ document_list }}<br><br>Regards</p>",
    "type": "custom",
    "triggers": ["https://<example>.rossum.app/api/v1/triggers/456"],
    "automate": false,
    "to": [{"email": "{{sender_email}}"}]
  },
  {
    "id": 1235,
    "url": "https://<example>.rossum.app/api/v1/email_templates/1235",
    "name": "Annotation status change - exported",
    "queue": "https://<example>.rossum.app/api/v1/queues/501",
    "organization": "https://<example>.rossum.app/api/v1/organizations/123",
    "subject": "Documents exported: {{ parent_email_subject }}",
    "message": "<p>Dear sender,<br><br>Your documents have been successfully exported.<br><br>{{ document_list }}<br><br>Regards</p>",
    "type": "custom",
    "triggers": ["https://<example>.rossum.app/api/v1/triggers/457"],
    "automate": false,
    "to": [{"email": "{{sender_email}}"}]
  },
  {
    "id": 1236,
    "url": "https://<example>.rossum.app/api/v1/email_templates/1236",
    "name": "Annotation status change - received",
    "queue": "https://<example>.rossum.app/api/v1/queues/501",
    "organization": "https://<example>.rossum.app/api/v1/organizations/123",
    "subject": "Documents received: {{ parent_email_subject }}",
    "message": "<p>Dear sender,<br><br>Your documents have been successfully received.<br><br>{{ document_list }}<br><br>Regards</p>",
    "type": "custom",
    "triggers": ["https://<example>.rossum.app/api/v1/triggers/458"],
    "automate": false,
    "to": [{"email": "{{sender_email}}"}]
  },
  {
    "id": 1237,
    "url": "https://<example>.rossum.app/api/v1/email_templates/1237",
    "name": "Default rejection template",
    "queue": "https://<example>.rossum.app/api/v1/queues/501",
    "organization": "https://<example>.rossum.app/api/v1/organizations/123",
    "subject": "Rejected document {{parent_email_subject}}",
    "message": "<p>Dear sender,<br><br>The attached document has been rejected.<br><br><br>Best regards,<br>{{ user_name }}</p>",
    "type": "rejection_default",
    "triggers": [],
    "automate": true,
    "to": [{"email": "{{sender_email}}"}]
  },
  {
    "id": 1238,
    "url": "https://<example>.rossum.app/api/v1/email_templates/1238",
    "name": "Email with no processable attachments",
    "queue": "https://<example>.rossum.app/api/v1/queues/501",
    "organization": "https://<example>.rossum.app/api/v1/organizations/123",
    "subject": "No processable documents: {{ parent_email_subject }}",
    "message": "<p>Dear sender,<br><br>Unfortunately, we have not received any document in the email that we can process. Please send a corrected version if appropriate.<br><br>Regards</p>",
    "type": "email_with_no_processable_attachments",
    "triggers": ["https://<example>.rossum.app/api/v1/triggers/459"],
    "automate": false,
    "to": [{"email": "{{sender_email}}"}]
  }
]

Email template rendering

Email templates support Django Template Variables.

Please note that only simple variables are supported. Filters and the . lookup are not. A template such as:

  {% if subject %}
  The subject is {{ subject }}.
  {% endif %}
  The message is {{ message|lower }}.

with template settings such as:

  {'subject': 'Hello', 'message': 'World'}

will render as:

  {% if subject %}
  The subject is Hello.
  {% endif %}
  The message is .

List all email templates

List all email templates

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/email_templates'
{
  "pagination": {
    "total": 1,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 1234,
      "url": "https://<example>.rossum.app/api/v1/email_templates/1234",
      "name": "My Email Template",
      "queue": "https://<example>.rossum.app/api/v1/queues/4321",
      "organization": "https://<example>.rossum.app/api/v1/queues/210",
      "subject": "My Email Template Subject",
      "message": "<p>My Email Template Message</p>",
      "type": "custom",
      "automate": true
    }
  ]
}

GET /v1/email_templates

Retrieve all email template objects.

Supported filters: id, queue, type, name

Supported ordering: id, name

For additional info please refer to filters and ordering.

Response

Status: 200

Returns paginated response with a list of email template objects.

Create new email template object

Create new email template in queue 4321

curl -s -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -d '{"queue": "https://<example>.rossum.app/api/v1/queues/4321", "name": "My Email Template", "subject": "My Email Template Subject", "message": "<p>My Email Template Message</p>", "type": "custom"}' \
  'https://<example>.rossum.app/api/v1/email_templates'
{
  "id": 1234,
  "url": "https://<example>.rossum.app/api/v1/email_templates/1234",
  "name": "My Email Template",
  "queue": "https://<example>.rossum.app/api/v1/queues/4321",
  "organization": "https://<example>.rossum.app/api/v1/queues/210",
  "subject": "My Email Template Subject",
  "message": "<p>My Email Template Message</p>",
  "type": "custom"
}

POST /v1/email_templates

Create new email template object.

Response

Status: 201

Returns new email template object.

Retrieve an email template object

Get email template object 1234

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/email_templates/1234'
{
  "id": 1234,
  "url": "https://<example>.rossum.app/api/v1/email_templates/1234",
  "name": "My Email Template",
  "queue": "https://<example>.rossum.app/api/v1/queues/4321",
  "organization": "https://<example>.rossum.app/api/v1/queues/210",
  "subject": "My Email Template Subject",
  "message": "<p>My Email Template Message</p>",
  "type": "custom",
  "automate": true
}

GET /v1/email_templates/{id}

Get an email template object.

Response

Status: 200

Returns email template object.

Update an email template

Update email template object 1234

curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"queue": "https://<example>.rossum.app/api/v1/queues/4321", "subject": "Some new subject"}' \
  'https://<example>.rossum.app/api/v1/email_templates/1234'
{
  "id": 1234,
  "url": "https://<example>.rossum.app/api/v1/email_templates/1234",
  "name": "My Email Template",
  "queue": "https://<example>.rossum.app/api/v1/queues/4321",
  "organization": "https://<example>.rossum.app/api/v1/queues/210",
  "subject": "Some new subject",
  "message": "<p>My Email Template Message</p>",
  "type": "custom",
  "automate": true
}

PUT /v1/email_templates/{id}

Update email template object.

Response

Status: 200

Returns updated email template object.

Update part of an email template

Update subject of email template object 1234

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"subject": "Some new subject"}' \
  'https://<example>.rossum.app/api/v1/email_templates/1234'
{
  "id": 1234,
  "subject": "Some new subject",
  ...
}

PATCH /v1/email_templates/{id}

Update part of an email template object.

Response

Status: 200

Returns updated email template object.

Delete an email template

Delete email template object 1234

curl -X DELETE 'https://<example>.rossum.app/api/v1/email_templates/1234' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03'

DELETE /v1/email_templates/{id}

Delete an email template object.

Response

Status: 204

Get email templates stats

Get stats for all email templates from queue with id 478

curl -X GET 'https://<example>.rossum.app/api/v1/email_templates/stats?queue=478' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03'
{
  "pagination": {
    "total": 6,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "url": "https://<example>.rossum.app/api/v1/email_templates/2",
      "manual_count": 12,
      "automated_count": 190
    },
    {
      "url": "https://<example>.rossum.app/api/v1/email_templates/3",
      "manual_count": 87,
      "automated_count": 0
    },
    ...
  ]
}

GET /v1/email_templates/stats

Get stats for email templates.

Response

Status: 200

Returns paginated response with a list of following objects

Attribute Type Description
url URL Link of the email template.
manual_count integer Number of manually sent emails in the last 90 days based on given email template.
automated_count integer Number of automatically sent emails in the last 90 days based on given email template.

Supports the same filters as list email templates endpoint.

Render email template

Render email template 221

curl -s -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  -d '{"parent_email": "https://<example>.rossum.app/api/v1/emails/1234", "document_list": ["https://<example>.rossum.app/api/v1/documents/2314"], "to": [{"email": "{{ current_user_email }}"}]}' \
  'https://<example>.rossum.app/api/v1/email_templates/221/render'
{
  "to": [{"email": "satisfied.customer@rossum.ai"}],
  "cc": [],
  "bcc": [],
  "subject": "My Email Template Subject: Rendered Parent Email Subject",
  "message": "<p>My Email Template Message from user@example.com</p>"
}

POST /v1/email_templates/{id}/render

The rendered email template can be requested via the render endpoint with the following attributes:

Attribute Type Default Required Description
to* list[email_address_object] [] false List that contains information about recipients to be rendered.
cc* list[email_address_object] [] false List that contains information about recipients of carbon copy to be rendered.
bcc* list[email_address_object] [] false List that contains information about recipients of blind carbon copy to be rendered.
parent_email URL false Link to parent_email.
document_list list[URL] [] false List of document's URLs to simulate sending of documents over email into Rossum
annotation_list list[URL] [] false List of annotation's URLs to use for rendering values for annotation.content placeholders
template_values object {} false Values to fill in the email template. Read more.
Placeholder Description Can be used in automation
current_user_email Email address of the user sending the email. False
sender_email Email address of the author of the incoming email. True
annotation.document.original_file_name Filename of the documents passed under annotation_list. True
annotation.content.value.{schema_id} Email address from a datapoint value of a related annotation. True

Render an email template object.

Response

Status: 200

Returns rendered message and subject of an email template

Attribute Type Description
to list[email_address_object] List that contains rendered information about recipients.
cc list[email_address_object] List that contains rendered information about recipients of carbon copy.
bcc list[email_address_object] List that contains rendered information about recipients of blind carbon copy.
message string Rendered email template's message.
subject string Rendered email template's subject.

Email Thread

Example email thread object

{
  "id": 1244,
  "url": "https://<example>.rossum.app/api/v1/email_threads/1244",
  "organization": "https://<example>.rossum.app/api/v1/organizations/132",
  "queue": "https://<example>.rossum.app/api/v1/queues/4857",
  "root_email": "https://<example>.rossum.app/api/v1/emails/5432",
  "has_replies": false,
  "has_new_replies": false,
  "root_email_read": false,
  "last_email_created_at": "2021-11-01T18:02:24.740600Z",
  "subject": "Root email subject",
  "from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
  "created_at": "2021-06-10T12:38:44.866180Z",
  "labels": [],
  "annotation_counts": {
    "annotations": 4,
    "annotations_processed": 2,
    "annotations_purged": 0,
    "annotations_rejected": 1,
    "annotations_unprocessed": 1
  }
}

An email thread object represents thread of related objects in Rossum's inbox.

Attribute Type Required Description Read-only
id integer Id of the email thread. true
url URL URL of the email thread.
organization URL URL of the associated organization. true
queue URL URL of the associated queue. true
root_email URL URL of the associated root email (first incoming email in the thread). true
has_replies boolean True if the thread has more than one incoming emails. true
has_new_replies boolean True if the thread has unread incoming emails.
root_email_read boolean True if the root email has been opened in Rossum UI at least once. true
created_at datetime Timestamp of the creation of email thread (inherited from arrived_at timestamp of the root email). true
last_email_created_at datetime Timestamp of the most recent email in this email thread. true
subject string Subject of the root email. true
from object Information about sender of the root email containing keys email and name. true
labels list[string] This attribute is intended for INTERNAL use only and may be changed without notice. List of email thread labels set by root email. If root email is rejected and no other incoming emails are in thread, labels field is set to [rejected]. Labels is an empty list in all the other cases. true
annotation_counts object This attribute is intended for INTERNAL use only and may be changed without notice. Information about how many annotations were extracted from all emails in the thread and in which state they currently are true

Thread Annotation counts object

This object stores numbers of annotations extracted from all emails in given email thread.

Attribute Type Description Annotation status
annotations integer Total number of annotations Any
annotations_processed integer Number of processed annotations exported, deleted, purged, split
annotations_purged integer Number of purged annotations purged
annotations_unprocessed integer Number of not yet processed annotations importing, failed_import, to_review, reviewing, confirmed, exporting, postponed, failed_export
annotations_rejected integer Number of rejected annotations rejected
related_annotations integer Total number of related annotations Any

List all email threads

List all email threads

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/email_threads'
{
  "pagination": {
    "total": 2,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 1244,
      "url": "https://<example>.rossum.app/api/v1/email_threads/1244",
      "organization": "https://<example>.rossum.app/api/v1/organizations/132",
      "queue": "https://<example>.rossum.app/api/v1/queues/4857",
      "root_email": "https://<example>.rossum.app/api/v1/emails/5432",
      "has_replies": false,
      "has_new_replies": false,
      "root_email_read": false,
      "last_email_created_at": "2021-11-01T18:02:24.740600Z",
      "subject": "Root email subject",
      "from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
      "created_at": "2021-06-10T12:38:44.866180Z",
      ...
    },
    ...
  ]
}

GET /v1/email_threads

Retrieve all email thread objects.

Supported filters

Email threads support following filters:

Filter name Type Description
has_root_email boolean Filter only email threads with a root email.
has_replies boolean Filter only email threads with two and more emails with type incoming
queue integer Filter only email threads associated with given queue id (or multiple ids).
has_new_replies boolean Filter only email threads with unread emails with type incoming
created_at_before datetime Filter only email threads with root email created before given timestamp.
created_at_after datetime Filter only email threads with root email created after given timestamp.
last_email_created_at_before datetime Filter only email threads with the last email in the thread created before given timestamp.
last_email_created_at_after datetime Filter only email threads with the last email in the thread created after given timestamp.
recent_with_no_documents_not_replied boolean Filter only email threads with root email that arrived in the last 14 days with no attachment processed by Rossum, excluding those: with rejected label, without any reply and when root email has been read.

Supported ordering

Email threads support following ordering: id, created_at, last_email_created_at, subject, from__email, from__name, queue

For additional info please refer to filters and ordering.

Response

Status: 200

Returns paginated response with a list of email thread objects.

Retrieve an email thread

Get email thread object 1244

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/email_threads/1244'
{
  "id": 1244,
  "url": "https://<example>.rossum.app/api/v1/email_threads/1244",
  "organization": "https://<example>.rossum.app/api/v1/organizations/132",
  "queue": "https://<example>.rossum.app/api/v1/queues/4857",
  "root_email": "https://<example>.rossum.app/api/v1/emails/5432",
  "has_replies": false,
  "has_new_replies": false,
  "root_email_read": false,
  "last_email_created_at": "2021-11-01T18:02:24.740600Z",
  "subject": "Root email subject",
  "from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
  "created_at": "2021-06-10T12:38:44.866180Z",
  ...
}

GET /v1/email_threads/{id}

Get an email thread object.

Response

Status: 200

Returns email thread object.

Update an email thread

Update email thread object 1244

curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"root_email": "https://<example>.rossum.app/api/v1/emails/5432", "has_new_replies": "True"}' \
  'https://<example>.rossum.app/api/v1/email_threads/1244'
{
  "id": 1244,
  "url": "https://<example>.rossum.app/api/v1/email_threads/1244",
  "organization": "https://<example>.rossum.app/api/v1/organizations/132",
  "queue": "https://<example>.rossum.app/api/v1/queues/4857",
  "root_email": "https://<example>.rossum.app/api/v1/emails/5432",
  "has_replies": false,
  "has_new_replies": true,
  "root_email_read": true,
  "last_email_created_at": "2021-11-01T18:02:24.740600Z",
  "subject": "Root email subject",
  "from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
  "created_at": "2021-06-10T12:38:44.866180Z",
  ...
}

PUT /v1/email_threads/{id}

Update email thread object.

Response

Status: 200

Returns updated email thread object.

Update part of an email thread

Update flag has_new_responses of email thread object 1244

curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  -d '{"has_new_replies": "True"}' \
  'https://<example>.rossum.app/api/v1/emails/1244'
{
  "id": 1244,
  "url": "https://<example>.rossum.app/api/v1/email_threads/1244",
  "organization": "https://<example>.rossum.app/api/v1/organizations/132",
  "queue": "https://<example>.rossum.app/api/v1/queues/4857",
  "root_email": "https://<example>.rossum.app/api/v1/emails/5432",
  "has_replies": false,
  "has_new_replies": true,
  "root_email_read": true,
  "last_email_created_at": "2021-11-01T18:02:24.740600Z",
  "subject": "Root email subject",
  "from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
  "created_at": "2021-06-10T12:38:44.866180Z",
  ...
}

PATCH /v1/email_threads/{id}

Update part of email thread object.

Response

Status: 200

Returns updated email thread object.

Get email thread counts

Get email thread counts

curl -X GET -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
  'https://<example>.rossum.app/api/v1/email_threads/counts'
{
  "with_replies": 5,
  "with_new_replies": 3,
  "recent_with_no_documents_not_replied": 2
}

GET /v1/email_threads/counts

Retrieve counts of email threads.

Supports the same filters as list email threads endpoint.

Response

Status: 200

Returns object with email thread counts.

Attribute Type Description
with_replies integer Number of email threads containing two or more incoming emails
with_new_replies integer Number of emails threads containing unread incoming replies.
recent_with_no_documents_not_replied integer Number of email threads with root email that arrived in the last 14 days without any attachments processed by Rossum, excluding those: with rejected label, without any reply (email thread contains only this email) and when root email has been read.

Generic Engine

Example generic engine object

{
  "id": 3000,
  "url": "https://<example>.rossum.app/api/v1/generic_engines/3000",
  "name": "Generic engine",
  "description": "AI engine trained to recognize data for the specific data capture requirement",
  "documentation_url": "https://rossum.ai/help/faq/generic-ai-engine/",
  "schema": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000"
}

A Generic Engine object holds specification of training setup for Rossum trained Engine.

Attribute Type Default Description Read-only
id integer Id of the generic engine true
url URL URL of the generic engine true
name string Name of the generic engine
description string Description of the generic engine
documentation_url url null URL of the generic engine's documentation
schema url null Related generic engine schema

List all generic engines

List all generic engines

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/generic_engines'
{
  "pagination": {
    "total": 1,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 3000,
      "url": "https://<example>.rossum.app/api/v1/generic_engines/3000",
      "name": "Generic engine",
      "description": "AI engine trained to recognize data for the specific data capture requirement",
      "documentation_url": "https://rossum.ai/help/faq/generic-ai-engine/",
      "schema": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000"
    }
  ]
}

GET /v1/generic_engines

Retrieve all generic engine objects.

Response

Status: 200

Returns paginated response with a list of generic engine objects.

Retrieve a generic engine

Get generic engine object 3000

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/generic_engines/3000'
{
  "id": 3000,
  "url": "https://<example>.rossum.app/api/v1/generic_engines/3000",
  "name": "Generic engine",
  "description": "AI engine trained to recognize data for the specific data capture requirement",
  "documentation_url": "https://rossum.ai/help/faq/generic-ai-engine/",
  "schema": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000"
}

GET /v1/generic_engines/{id}

Get a generic engine object.

Response

Status: 200

Returns generic engine object.

Generic Engine Schema

Example generic engine schema object

{
  "id": 6000,
  "url": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000",
  "content": {
    "training_queues": [],
    "fields": [
      {
        "category": "datapoint",
        "engine_output_id": "document_id",
        "type": "string",
        "label": "label text",
        "description": "description text",
        "trained": true,
        "sources": []
      },
      {
        "category": "multivalue",
        "engine_output_id": "my_cool_ids",
        "label": "label text",
        "description": "description text",
        "type": "freeform",
        "trained": false,
        "children": {
          "category": "datapoint",
          "engine_output_id": "my_cool_id",
          "type": "enum",
          "label": "label text",
          "description": "description text",
          "trained": false,
          "sources": []
        }
      },
      {
        "category": "multivalue",
        "engine_output_id": "date_timezone_table",
        "label": "label text",
        "description": "description text",
        "type": "grid",
        "trained": true,
        "children": {
          "category": "tuple",
          "children": [
            {
              "category": "datapoint",
              "engine_output_id": "date",
              "type": "date",
              "label": "label text",
              "description": "description text",
              "trained": true,
              "sources": []
            },
            {
              "category": "datapoint",
              "engine_output_id": "timezone",
              "type": "string",
              "label": "label text",
              "description": "description text",
              "trained": true,
              "sources": []
            }
          ]
        }
      }
    ]
  }
}

An engine schema is an object which describes what fields are available in the engine. Do not confuse engine schema with Document Schema.

Attribute Type Default Description Read-only
id integer Id of the generic engine schema true
url URL URL of the generic engine schema true
content object See Dedicated Engine Schema's description of the content structure

List all generic engine schemas

List all generic engine schemas

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/generic_engine_schemas'
{
  "pagination": {
    "total": 1,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 6000,
      "url": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000",
      "content": {
        "training_queues": [...],
        "fields": [...]
      }
    }
  ]
}

GET /v1/generic_engine_schemas

Retrieve all generic engine schema objects.

Response

Status: 200

Returns paginated response with a list of generic engine schema objects.

Retrieve a generic engine schema

Get generic engine schema object 6000

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/generic_engine_schemas/6000'
{
  "id": 6000,
  "url": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000",
  "content": {
    "training_queues": [...],
    "fields": [...]
  }
}

GET /v1/generic_engine_schemas/{id}

Get a generic engine schema object.

Response

Status: 200

Returns generic engine schema object.

Hook

Example hook object of type webhook

{
  "id": 1500,
  "type": "webhook",
  "url": "https://<example>.rossum.app/api/v1/hooks/1500",
  "name": "Change of Status",
  "metadata": {},
  "queues": [
    "https://<example>.rossum.app/api/v1/queues/8199",
    "https://<example>.rossum.app/api/v1/queues/8191"
  ],
  "run_after": [],
  "sideload": [
    "queues"
  ],
  "active": true,
  "events": [
    "annotation_status.changed"
  ],
  "config": {
    "url": "https://myq.east-west-trading.com/api/hook1?strict=true",
    "secret": "secret-token",
    "insecure_ssl": false,
    "client_ssl_certificate": "-----BEGIN CERTIFICATE-----\n...",
    "timeout_s": 30,
    "retry_count": 4,
    "app": {
      "display_mode": "drawer",
      "url": "https://myq.east-west-trading.com/api/hook1?strict=true",
      "settings": {},
    }
  },
  "test": {
    "saved_input": {
      ...
    }
  },
  "extension_source": "custom",
  "settings": {},
  "settings_schema": {
    "type": "object",
    "properties": {}
  },
  "secrets": {},
  "secrets_schema": {
    "type": "object",
    "properties": {}
  },
  "guide": "Here we explain how the extension should be used.",
  "read_more_url": "https://github.com/rossumai/simple-vendor-matching-webhook-python",
  "extension_image_url": "https://rossum.ai/wp-content/themes/rossum/static/img/logo.svg",
  "hook_template": "https://<example>.rossum.app/api/v1/hook_templates/998877",
  "modified_by": "https://<example>.rossum.app/api/v1/users/1",
  "modified_at": "2020-01-01T10:08:03.856648Z"
}

Example hook object of type function

{
  "id": 1500,
  "type": "function",
  "url": "https://<example>.rossum.app/api/v1/hooks/1500",
  "name": "Empty function",
  "metadata": {},
  "queues": [
    "https://<example>.rossum.app/api/v1/queues/8199",
    "https://<example>.rossum.app/api/v1/queues/8191"
  ],
  "run_after": [],
  "sideload": ["modifiers"],
  "active": true,
  "events": [
    "annotation_status.changed"
  ],
  "config": {
    "runtime": "nodejs18.x",
    "code": "exports.rossum_hook_request_handler = () => {\nconst messages = [{\"type\": \"info\", \"content\": \"Yup!\"}];\nconst operations = [];\nreturn {\nmessages,\noperations\n};\n};",
    "timeout_s": 30,
    "retry_count": 4,
    "app": {
      "display_mode": "drawer",
      "url": "https://myq.east-west-trading.com/api/hook1?strict=true",
      "settings": {},
    }
  },
  "token_owner": "https://<example>.rossum.app/api/v1/users/12345",
  "token_lifetime_s": 1000,
  "test": {
    "saved_input": {...}
  },
  "status": "ready",
  "extension_source": "custom",
  "settings": {},
  "settings_schema": {
    "type": "object",
    "properties": {}
  },
  "secrets": {},
  "secrets_schema": {
    "type": "object",
    "properties": {}
  },
  "guide": "Here we explain how the extension should be used.",
  "read_more_url": "https://github.com/rossumai/simple-vendor-matching-webhook-python",
  "extension_image_url": "https://rossum.ai/wp-content/themes/rossum/static/img/logo.svg",
  "hook_template": "https://<example>.rossum.app/api/v1/hook_templates/998877",
  "modified_by": "https://<example>.rossum.app/api/v1/users/1",
  "modified_at": "2020-01-01T10:08:03.856648Z"
}

A hook is an extension of Rossum that is notified when specific event occurs. A hook object is used to configure what endpoint or function is executed and when. For an overview of other extension options see Extensions.

In Azure environment the actual serverless function instances are held per organization which can in certain fairly rare situations cause create or update related side effects not present in other environments. Please check Environment differences for more details.

Attribute Type Default Description Read-only
id integer Id of the hook true
type string webhook Hook type. Possible values: webhook, function
name string Name of the hook
url URL URL of the hook true
queues list[URL] List of queues that use hook object.
run_after list[URL] List of all hooks that has to be executed before running this hook.
active bool If set to true the hook is notified.
events list[string] List of events, when the hook should be notified. For the list of events see Webhook events.
sideload list[string] [] List of related objects that should be included in hook request. For the list of possible sideloads see Webhook events.
metadata object {} Client data.
config object Configuration of the hook.
token_owner URL URL of a user object. If present, an API access token is generated for this user and sent to the hook. If null, token is not generated.
token_lifetime_s integer null Lifetime number of seconds for rossum_authorization_token (min=0, max=7200). This setting will ensure the token will be valid after hook response is returned. If null, default lifetime of 600 is used.
test object {} Input saved for hook testing purposes, see Test a hook
description string Hook description text.
extension_source string custom Import source of the extension.
settings object {} Specific settings that will be included in the payload when executing the hook. Field is validated with json schema stored in settings_schema field.
settings_schema object null [BETA] JSON schema for settings field validation.
secrets object {} Specific secrets that are stored securely encrypted. The values are merged into the hook execution payload. Field is validated with json schema stored in secrets_schema field. (write only)
secrets_schema object JSON schema [BETA] JSON schema for secrets field validation.
guide string Description how to use the extension.
read_more_url URL URL address leading to more info page.
extension_image_url URL URL address of extension picture.
hook_template URL null URL of the hook template used to create the hook
modified_by URL null URL of the last hook modifier true
modified_at datetime null Date of last modification true

Config attribute

Config attribute allows to specify per-type configuration.

Webhook config
Attribute Type Default Description Read-only
url URL URL of the webhook endpoint to call
secret string (optional) If set, it is used to create a hash signature with each payload. For more information see Validating payloads from Rossum
insecure_ssl bool false Disable SSL certificate verification (only use for testing purposes).
client_ssl_certificate string Client SSL certificate used to authenticate requests. Must be PEM encoded.
client_ssl_key string Client SSL key (write only). Must be PEM encoded. Key may not be encrypted.
private bool false (optional) If set, the url and secret values become hidden and immutable once the hook is created. The value of this flag cannot be changed to false once set.
schedule object {} Specific configuration for hooks of invocation.scheduled event and action interval. See schedule
timeout_s int 30 Webhook call timeout in seconds. For non-interactive webhooks only (min=0, max=60).
retry_count int 4 Number of times the webhook call is retried in case of failure. For non-interactive webhooks only (min=0, max=4).
app obj [BETA] (optional) Configuration of the app
payload_logging_enabled bool false (optional) If set to False, hook payload is omited from hook logs feature accessible via UI
retry_on_any_non_2xx bool false (optional) Disabling this option results in retrying only on these response statuses: [408, 429, 500, 502, 503, 504]
Function config
Attribute Type Default Description
runtime string Runtime used to execute code. Allowed values: nodejs12.x, nodejs18.x, python3.8 or python3.12, please note that the nodejs12.x has been deprecated
code string String-serialized source code to be executed
third_party_library_pack string default Set of libraries to be included in execution environment of the function, see documentation below for details.
private bool false (optional) If set, the runtime, code and third_party_library_pack values become hidden and immutable once the hook is created. The value of this flag cannot be changed to false once set.
schedule obj {} Specific configuration for hooks of invocation.scheduled event and action interval. See schedule
timeout_s int 30 Function call timeout in seconds. For non-interactive functions only (min=0, max=60)
retry_count int 4 Number of times the function call is retried in case of failure. For non-interactive functions only (min=0, max=4).
app obj [BETA] (optional) Configuration of the app
payload_logging_enabled bool false (optional) If set to False, hook payload is omited from hook logs feature accessible via UI

Schedule object

Schedule object contains the following additional event-specific attributes. Cron object interval can't be shorter than every 10 minutes.

Key Type Description
cron object Used to set interval with cron expression

App object

The App object contains the following attributes.

Key Type Default Description Required
url URL URL of the app that will be embedded in Rossum UI. True
settings object Settings of the app that can be used for further customization of configuration app (such as UI schema etc.) False
display_mode string drawer Display mode of the app
Display modes

Currently, there are two display modes supported.

display_mode Description
drawer opens a drawer with embedded URL
fullscreen opens an embedded URL in full-screen overlay
Third party libraries

Libraries available in the execution environment can be configured via option third_party_library_pack. Please note that functions with third party libraries need up to 30 seconds to update the code.

Let us know if you need any additional libraries.

Library packs for Python 3.8 and Python 3.12 runtimes
Value Type Description
null object Only standard Python Standard Library is available
default string Contains additional libraries rossum, requests, jmespath, xmltodict, pydantic, pandas, httpx, boto3 and botocore
Library packs for Node.js 18 runtime
Value Type Description
null object Only Node.js Built-in Modules are available
default string Contains additional libraries node-fetch, https-proxy-agent and lodash
Function specific attributes
Attribute Type Description Read-only
status string Status indicates whether the function is ready to be invoked or modified. Possible values are ready, pending or failed. While the state is pending, invocations and other API actions that operate on the function return status 400. It is recommended to resave function for failed state. True
Extension sources

This value indicates where the hook has been imported from.

Value Description
custom The Hook has been imported and set up by the user.
rossum_store A preconfigured Hook has been imported from an extension store.
Hook secrets

The content of secrets is stored encrypted and is write-only in the API. There is an additional secrets_schema property to provide a JSON schema for secrets validation.

To get secrets as a list of keys, please refer to Retrieve list of secrets keys

Hook secrets schema

JSON schema for the hook secrets properties. Schema needs to include additionalProperties. This needs to be either set to false (as shown in the example), so no additional properties than the ones specified in this schema are allowed for secrets field, or set to an object with "type": "string" property (as shown in the default value), to ensure all additional properties are of type string. More on additionalProperties can be found in the official docs

Example of Secrets schema object for validating two secrets properties

{
  "type": "object",
  "properties": {
    "username": {
      "type": "string",
      "description": "Target system user",
    },
    "password": {
      "type": "string",
      "description": "Target system user password",
    }
  },
  "additionalProperties": false
}
Hook secrets schema default value

Default value of secrets_schema field

{
  "type": "object",
  "additionalProperties": {
    "type": "string"
  }
}
Hook settings schema

JSON schema for the hook settings validation.

Example of Settings schema object for validating two settings properties

{
  "type": "object",
  "properties": {
    "username": {
      "type": "string",
      "description": "Target system user",
    },
    "password": {
      "type": "string",
      "description": "Target system user password",
    }
  }
}

List all hooks

List all hooks

curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
  'https://<example>.rossum.app/api/v1/hooks'
{
  "pagination": {
    "total": 1,
    "total_pages": 1,
    "next": null,
    "previous": null
  },
  "results": [
    {
      "id": 1500,
      "type": "webhook",
      "url": "https://<example>.rossum.app/api/v1/hooks/1500",
      "name": "Some Hook",
      "metadata": {},
      "queues": [
        "https://<example>.rossum.app/api/v1/queues/8199",
        "https://<example>.rossum.app/api/v1/queues/8191"
      ],
      "run_after": [],
      "active": true,
      "events": [
        "annotation_status.changed"
      ],
      "config": {
        "url": "https://myq.east-west-trading.com/api/hook1?strict=true",
        "schedule": {"cron": "*/10 * * * *"}
      },
      "test": {
        "saved_input": {...}
      },
      "settings": {},
      "settings_schema": null,
      "description": "This hook does...",
      "extension_source": "custom",
      "guide": "Here we explain how the extension should be used.",
      "read_more_url": "https://github.com/rossumai/simple-vendor-matching-webhook-python",
      "extension_image_url": "https://rossum.ai/wp-content/themes/rossum/static/img/logo.svg",
      "modified_by": "https://<example>.rossum.app/api/v1/users/1",
      "modified_at": "2020-01-01T10:08:03