Getting Started
Introduction
The Rossum API allows you to programmatically access and manage your organization's Rossum data and account information. The API allows you to do the following programmatically:
- Manage your organization, users, workspaces and queues
- Configure captured data: select extracted fields
- Integrate Rossum with other systems: import, export, extensions
On this page, you will find an introduction to the API usage from a developer perspective, and a reference to all the API objects and methods.
Developer Resources
There are several other key resources related to implementing, integrating and extending the Rossum platform:
- Refer to the Rossum Developer Portal for guides, tutorials, news and a community Q&A section.
- Subscribe to the rossum-api-announcements group to stay up to date on the API and platform updates.
- For managing and configuring your account, use the rossum command-line tool. (Setup instructions.)
- For a management overview of Rossum implementation options, see the Rossum Integration Whitepaper.
- If you are an RPA developer, refer to our UiPath or BluePrism guides.
Quick API Tutorial
For a quick tutorial on how to authenticate, upload a document and export extracted data, see the sections below. If you want to skip this quick tutorial, continue directly to the Overview section.
It is a good idea to go through the introduction to the Rossum platform on the Developer Portal first to make sure you are up to speed on the basic Rossum concepts.
If in trouble, feel free to contact us at support@rossum.ai.
Install curl tool
Test curl is installed properly
curl https://<example>.rossum.app/api/v1
{
"organizations":"https://<example>.rossum.app/api/v1/organizations",
"workspaces":"https://<example>.rossum.app/api/v1/workspaces",
"schemas":"https://<example>.rossum.app/api/v1/schemas",
"connectors":"https://<example>.rossum.app/api/v1/connectors",
"inboxes":"https://<example>.rossum.app/api/v1/inboxes",
"queues":"https://<example>.rossum.app/api/v1/queues",
"documents":"https://<example>.rossum.app/api/v1/documents",
"users":"https://<example>.rossum.app/api/v1/users",
"groups":"https://<example>.rossum.app/api/v1/groups",
"annotations":"https://<example>.rossum.app/api/v1/annotations",
"pages":"https://<example>.rossum.app/api/v1/pages"
}
All code samples included in this API documentation use curl
, the command
line data transfer tool. On MS Windows 10, MacOS X and most Linux
distributions, curl should already be pre-installed. If not, please download
it from curl.haxx.se).
Optionally use
jq
tool to pretty-print JSON output
curl https://<example>.rossum.app/api/v1 | jq
{
"organizations": "https://<example>.rossum.app/api/v1/organizations",
"workspaces": "https://<example>.rossum.app/api/v1/workspaces",
"schemas": "https://<example>.rossum.app/api/v1/schemas",
"connectors": "https://<example>.rossum.app/api/v1/connectors",
"inboxes": "https://<example>.rossum.app/api/v1/inboxes",
"queues": "https://<example>.rossum.app/api/v1/queues",
"documents": "https://<example>.rossum.app/api/v1/documents",
"users": "https://<example>.rossum.app/api/v1/users",
"groups": "https://<example>.rossum.app/api/v1/groups",
"annotations": "https://<example>.rossum.app/api/v1/annotations",
"pages": "https://<example>.rossum.app/api/v1/pages"
}
You may also want to install jq
tool to make curl output human-readable.
Use the API on Windows
This API documentation is written for usage in command line interpreters running on UNIX based operation systems (Linux and Mac). Windows users may need to use the following substitutions when working with API:
Character used in this documentation | Meaning/usage | Substitute character for Windows users |
---|---|---|
' | single quotes | " |
" | double quotes | "" or \" |
\ | continue the command on the next line | ^ |
Example of API call on UNIX-based OS
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"target_queue": "https://<example>.rossum.app/api/v1/queues/8236", "target_status": "to_review"}' \
'https://<example>.rossum.app/api/v1/annotations/315777/copy'
Examples of API call on Windows
curl -H "Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03" -H "Content-Type: application/json" ^
-d "{""target_queue"": ""https://<example>.rossum.app/api/v1/queues/8236"", ""target_status"": ""to_review""}" ^
"https://<example>.rossum.app/api/v1/annotations/315777/copy"
curl -H "Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03" -H "Content-Type: application/json" ^
-d "{\"target_queue\": \"https://<example>.rossum.app/api/v1/queues/8236\", \"target_status\": \"to_review\"}" ^
"https://<example>.rossum.app/api/v1/annotations/315777/copy"
Create an account
In order to interact with the API, you need an account. If you do not have one, you can create one via our self-service portal.
Login to the account
Fill-in your username and password (login credentials to work with API are the same as those to log into your account.). Trigger login endpoint to obtain a key (token), that can be used in subsequent calls.
curl -s -H 'Content-Type: application/json' \ -d '{"username": "east-west-trading-co@example.com", "password": "aCo2ohghBo8Oghai"}' \ 'https://<example>.rossum.app/api/v1/auth/login' {"key": "db313f24f5738c8e04635e036ec8a45cdd6d6b03"}
This key will be valid for a default expire time (currently 162 hours) or until you log out from the sessions.
Upload a document
In order to upload a document (PDF, image, XLSX, XLS, DOCX, DOC) through the API, you need to obtain the id of a queue first.
curl -s -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' 'https://<example>.rossum.app/api/v1/queues?page_size=1' | jq -r .results[0].url https://<example>.rossum.app/api/v1/queues/8199
Then you can upload document to the queue. Alternatively, you can send documents to a queue-related inbox. See upload for more information about importing files.
curl -s -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \ -F content=@document.pdf 'https://<example>.rossum.app/api/v1/uploads?queue=8199' | jq -r .url https://<example>.rossum.app/api/v1/tasks/9231
Wait for document to be ready and review extracted data
As soon as a document is uploaded, it will show up in the queue and the data extraction will begin.
It may take a few seconds to several minutes to process a document. You can check status
of the annotation and wait until its status is changed to to_review
.
curl -s -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \ 'https://<example>.rossum.app/api/v1/annotations/319668' | jq .status "to_review"
After that, you can open the Rossum web interface example.rossum.app to review and confirm extracted data.
Download reviewed data
Now you can export extracted data using the export
endpoint of the queue. You
can select XML, CSV, XLSX or JSON format. For CSV, use URL like:
curl -s -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \ 'https://<example>.rossum.app/api/v1/queues/8199/export?status=exported&format=csv&id=319668' Invoice number,Invoice Date,PO Number,Due date,Vendor name,Vendor ID,Customer name,Customer ID,Total amount, 2183760194,2018-06-08,PO2231233,2018-06-08,Alza.cz a.s.,02231233,Rossum,05222322,500.00
Logout
Finally you can dispose token safely using logout endpoint:
curl -s -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \ 'https://<example>.rossum.app/api/v1/auth/logout' {"detail":"Successfully logged out."}
Overview
HTTP and REST
The Rossum API is organized around REST. Our API has predictable, resource-oriented URLs, and uses HTTP response codes to indicate API errors. We use built-in HTTP features, like HTTP authentication and HTTP verbs, which are understood by off-the-shelf HTTP clients.
HTTP Verbs
Call the API using the following standard HTTP methods:
- GET to retrieve an object or multiple objects in a specific category
- POST to create an object
- PUT to modify entire object
- PATCH to modify fields of the object
- DELETE to delete an object
We support cross-origin resource sharing, allowing you to interact securely with our API from a client-side web application. JSON is returned by API responses, including errors (except when another format is requested, e.g. XML).
Base URL
Base API endpoint URL depends on the account type, deployment and location. Default URL is
https://<example>.rossum.app/api
where the example
is the domain selected during the account creation.
URLs of companies using a dedicated deployment may look like https://acme.rossum.app/api
.
If you are not sure about the correct URL you can navigate to https://app.rossum.ai
and use your email address
to receive your account information via email.
Please note that we previously recommended using the https://api.elis.rossum.ai
endpoint to interact with the Rossum
API, but now it is deprecated. For new integrations use the new https://<example>.rossum.app/api
endpoint.
For accounts created before Nov 2022 use the https://elis.rossum.ai/api
.
Authentication
Most of the API endpoints require a user to be authenticated. To login to the Rossum
API, post an object with username
and password
fields. Login returns an access key
to be used for token authentication.
Our API also provide possibility to authenticate via One-Time token which is returned after registration.
This tokens allows users to authenticate against our API, but after one call, this token will be invalidated.
This token can be exchanged for regular access token limited only by the time of validity. For the
purpose of token exchange, use the /auth/token
endpoint.
Users may delete a token using the logout endpoint or automatically after a
configured time (the default expiration time is 162 hours). The default expiration time can be lowered using max_token_lifetime_s
field. When the token expires, 401 status is returned.
Users are expected to re-login to obtain a new token.
Rossum's API also supports session authentication, where a user session is created inside cookies after login.
If enabled, the session lasts 1 day until expired by itself or until logout
While the session is valid there is no need to send the authentication token in every request, but the "unsafe" request (POST, PUT, PATCH, DELETE),
whose MIME type is different from application/json
must include X-CSRFToken
header with valid CSRF token, which is returned inside Cookie while loging in.
When a session expires, 401 status is returned as with token authentication, and users are expected to re-login to start a new session.
Login
Login user using username and password
curl -H 'Content-Type: application/json' \
-d '{"username": "east-west-trading-co@<example>.rossum.app", "password": "aCo2ohghBo8Oghai"}' \
'https://<example>.rossum.app/api/v1/auth/login'
{
"key": "db313f24f5738c8e04635e036ec8a45cdd6d6b03",
"domain": "acme-corp.app.rossum.ai"
}
POST /v1/auth/login
Use token key in requests
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/organizations/406'
Note: The Token authorization scheme is also supported for compatibility with earlier versions.
curl -H 'Authorization: Token db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/organizations/406'
Login user expiring after 1 hour
curl -H 'Content-Type: application/json' \
-d '{"username": "east-west-trading-co@<example>.rossum.app", "password": "aCo2ohghBo8Oghai", "max_token_lifetime_s": 3600}' \
'https://<example>.rossum.app/api/v1/auth/login'
{
"key": "ltcg2p2w7o9vxju313f04rq7lcc4xu2bwso423b3",
"domain": null
}
Attribute | Type | Required | Description |
---|---|---|---|
username | string | true | Username of the user to be logged in. |
password | string | true | Password of the user. |
origin | string | false | For internal use only. Using this field may affect throttling of your API requests. |
max_token_lifetime_s | integer | false | Duration (in seconds) for which the token will be valid. Default is 162 hours which is also the maximum. |
Response
Status: 200
Returns object with "key", which is an access token. And the user's domain.
Attribute | Type | Description |
---|---|---|
key | string | Access token. |
domain | string | The domain the token was issued for. |
Logout
Logout user
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/auth/logout'
{
"detail": "Successfully logged out."
}
POST /v1/auth/logout
Logout user, discard auth token.
Response
Status: 200
Token Exchange
Exchange One-Time authentication token with a longer-lived access token.
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/auth/token'
{
"key": "ltcg2p2w7o9vxju313f04rq7lcc4xu2bwso423b3",
"domain": "<example>.rossum.app",
"scope": "default"
}
POST /v1/auth/token
Attribute | Type | Required | Description |
---|---|---|---|
scope | string | false | Supported values are default , approval (for internal use only) |
max_token_lifetime_s | float | false | Duration (in seconds) for which the token will be valid (default: lifetime of the current token or 162 hours if the current token is one-time). Can be set to a maximum of 583200 seconds (162 hours). |
origin | string | false | For internal use only. Using this field may affect throttling of your API requests. |
This endpoint enables the exchange of a one-time token for a longer lived access token.
It is able to receive either one-time tokens provided after registration, or JWT tokens if you have such a setup configured. The token must be provided in a the Bearer
authorization header.
JWT authentication
Short-lived JWT tokens can be exchanged for access tokens. A typical use case, for example, is logging in your users via SSO in your own application, and displaying the Rossum app to them embedded.
To enable JWT authentication, one needs to provide Rossum with the public key that shall be used to decode the tokens.
Currently only tokens with EdDSA (signed using Ed25519
and Ed448
curves) and RS512 signatures are allowed, and token validity should be 60 seconds maximum.
The expected formats of the header and encoded payload of the JWT token are as follows:
Decoded JWT Header Format
Example format of a decoded JWT token header (not encrypted)
{
"alg":"EdDSA",
"kid":"urn:rossum.ai:organizations:100",
"typ":"JWT"
}
Example format of a decoded JWT token payload
{
"ver":"1.0",
"iss":"ACME Corporation",
"aud":"https://<example>.rossum.app",
"sub":"john.doe@rossum.ai",
"exp":1514764800,
"email":"john.doe@rossum.ai",
"name":"John F. Doe",
"rossum_org":"100",
"roles": ["annotator"]
}
Attribute | Type | Required | Description |
---|---|---|---|
kid | string | true | Identifier. Must end with :{your Rossum org ID} , e.g. "urn:rossum.ai:organizations:123" |
typ | string | false | Type of the token. |
alg | string | true | Signature algorithm to be used for decoding the token. Only EdDSA or RS512 values are allowed. |
Decoded JWT Payload Format
Attribute | Type | Required | Description |
---|---|---|---|
ver | string | true | Version of the payload format. Available versions: 1.0 . |
iss | string | true | Name of the issuer of the token (e.g. company name). |
aud | string | true | Target domain used for API queries (e.g. https://<example>.rossum.app ) |
sub | string | true | User email that will be matched against username in Rossum. |
exp | int | true | UNIX timestamp of the JWT token expiration. Must be set to 60 seconds after current UTC time at maximum. |
string | true | User email. | |
name | string | true | User's first name and last name separated by space. Will be used for creation of new users if auto-provisioning is enabled. |
rossum_org | string | true | Rossum organization id. |
roles | list[string] | false | Name of the user roles that will be assigned to user created by auto-provisioning. Must be a subset of the roles stated in the auto-provisioning configuration for the organization. |
Response
Status: 200
Attribute | Type | Description |
---|---|---|
key | string | Access token. |
domain | string | The domain the token was issued for. |
scope | string | Supported values are default , approval (for internal use only) |
Single Sign-On (SSO)
Rossum allows customers to integrate with their own identity provider, such as Google, Azure AD or any other provider using OAuth2 OpenID Connect protocol (OIDC). Rossum then acts as a service provider.
When SSO is enabled for an organization, user is redirected to a configured identity provider login page
and only allowed to access Rossum application when successfully authenticated.
Identity provider user claim (e.g. email
(default), sub
, preferred_username
, unique_name
)
is used to match a user account in Rossum. If auto-provisioning is enabled for
the organization, user accounts in Rossum will be automatically created for users without accounts.
Required setup of the OIDC identity provider:
- Redirect URI (also known as Reply URL):
https://<example>.rossum.app/api/v1/oauth/code
Required information to allow OIDC setup for the Rossum service provider:
- OIDC endpoint, such as https://accounts.google.com. It should support openid configuration, e.g. https://accounts.google.com/.well-known/openid-configuration
- client id
- client secret
- claim that should be matched in Rossum
- Rossum organization id
If you need to setup SSO for your organization, please contact support@rossum.ai.
Pagination
All object list operations are paged by default, so you may need several API calls to obtain all objects of given type.
Parameter | Default | Maximum | Description |
---|---|---|---|
page_size | 20 | 100 |
Number of results per page |
page | 1 | Page of results |
- 1,000 for exporting data in CSV format via POST on /annotations
- 500 for searching in annotations via annotations/search (if
sideload=content
is not included)
Filters and ordering
List queues of workspace
7540
, with localeen_US
and order results byname
.
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/queues?workspace=7540&locale=en_US&ordering=name'
Lists may be filtered using various attributes. Multiple attributes are combined with & in the URL, which results in more specific response. Please refer to the particular object description.
Ordering of results may be enforced by the ordering
parameter and one or more
keys delimited by a comma. Preceding key with a minus sign -
enforces
descending order.
Metadata
Example metadata in a document object
{
"id": 319768,
"url": "https://<example>.rossum.app/api/v1/documents/319768",
"s3_name": "05feca6b90d13e389c31c8fdeb7fea26",
"annotations": [
"https://<example>.rossum.app/api/v1/annotations/319668"
],
"mime_type": "application/pdf",
"arrived_at": "2019-02-11T19:22:33.993427Z",
"original_file_name": "document.pdf",
"content": "https://<example>.rossum.app/api/v1/documents/319768/content",
"metadata": {
"customer-id": "f205ec8a-5597-4dbb-8d66-5a53ea96cdea",
"source": 9581,
"authors": ["Joe Smith", "Peter Doe"]
}
}
When working with API objects, it may be useful to attach some information to
the object (e.g. customer id to a document). You can store custom JSON object
in a metadata
section available in most objects.
List of objects with metadata support: organization, workspace, user, queue, schema, connector, inbox, document, annotation, page, survey.
Total metadata size may be up to 4 kB per object.
Versioning
API Version is part of the URL, e.g. https://<example>.rossum.app/api/v1/users
.
To allow API progress, we consider addition of a field in a JSON object as well as addition of a new item in an enum object to be backward-compatible operations that may be introduced at any time. Clients are expected to deal with such changes.
Dates
All dates fields are represented as ISO 8601
formatted strings, e.g. 2018-06-01T21:36:42.223415Z
. All returned dates are in UTC timezone.
Errors
Our API uses conventional HTTP response codes to indicate the success or failure of an API request.
Code | Status | Meaning |
---|---|---|
400 | Bad Request | Invalid input data or error from connector. |
401 | Unauthorized | The username/password is invalid or token is invalid (e.g. expired). |
403 | Forbidden | Insufficient permission, missing authentication, invalid CSRF token and similar issues. |
404 | Not Found | Entity not found (e.g. already deleted). |
405 | Method Not Allowed | You tried to access an endpoint with an invalid method. |
409 | Conflict | Trying to change annotation that was not started by the current user. |
413 | Payload Too Large | for too large payload (especially for files uploaded). |
429 | Too Many Requests | The allowed number of requests per minute has been exceeded. Please wait before sending more requests. |
500 | Internal Server Error | We had a problem with the server. Try again later. |
503 | Service Unavailable | We're temporarily offline for maintenance. Please try again later. |
Import and Export
Documents may be imported into Rossum using the REST API and email gateway.
Supported file formats are PDF, PNG, JPEG, TIFF, XLSX/XLS and DOCX/DOC.
Maximum supported file size is 40 MB (this limit applies also to the uncompressed size of the files within a .zip
archive).
In order to get the best results from Rossum the documents should be in A4 format of at least 150 DPI (in case of scans/photos). Read more about import recommendations.
Importing non-standard mime types
Support for other mime types can be added by handling upload.created webhook event. With this setup, user is able to pre-process uploaded files (e.g. XML or JSON formats) in a way which Rossum understands. Those usually need to be enabled on queue level first (by adding appropriate mimetype to accepted_mime_types in queue settings attributes). If you find your document mime types not supported please contact Rossum support team for more information.
Upload API
You can upload a document to the queue using upload endpoint with one or more files to be uploaded. You can also specify additional field values in upload endpoint, e.g. your internal document id. As soon as a document is uploaded, data extraction is started.
Upload endpoint supports basic authentication to enable easy integration with third-party systems.
Import by Email
It is also possible to send documents by email using a properly configured inbox that is associated with a queue. Users then only need to know the email address to forward emails to.
For every incoming email, Rossum extracts PDF documents, images and zip files, stores them in the queue and starts data extraction process.
The size limit for incoming emails is 50 MB (the raw email message with base64 encoded attachments).
All the files from the root of the archive are extracted. In case the root only contains one directory (and no other files)
the whole directory is extracted. The zip files and all extracted files must be allowed in accepted_mime_types
(see queue settings) and must pass inbox filtering rules
(see document rejection conditions) in order for annotations to be created.
Small images (up to 100x100 pixels) are ignored, see inbox for reference.
You can use selected email header data (e.g. Subject) to initialize additional
field values, see rir_field_names
attribute
description for details.
Zip attachment limits:
- the uncompressed size of the files within a
.zip
archive may not exceed 40 MB - only archives containing less than 100 files are processed
- only files in the root of the archive are processed (or files inside a first level directory if it's the only one there)
Export
In order to export extracted and confirmed data you can call export endpoint. You can specify status, time-range filters and annotation id list to limit returned results.
Export endpoint supports basic authentication to enable easy integration with third-party systems.
Auto-split of document
It is possible to process a single PDF file that contains several invoices. Just insert a special separator page between the documents. You can print this page and insert it between documents while scanning.
Rossum will recognize a QR code on the page and split the PDF into individual documents
automatically. Produced documents are imported to the queue, while the original document
is set to a split
state.
Document Schema
Every queue has an associated schema that specifies which fields will be extracted from documents as well as the structure of the data sent to connector and exported from the platform.
Rossum schema supports data fields with single values (datapoint
),
fields with multiple values (multivalue
) or tuples of fields (tuple
). At the
topmost level, each schema consists of section
s, which may either directly
contain actual data fields (datapoints
) or use nested multivalue
s and tuple
s as
containers for single datapoints.
But while schema may theoretically consist of an arbitrary number of nested containers, the Rossum UI supports only certain particular combinations of datapoint types. The supported shapes are:
- simple: atomic datapoints of type
number
,string
,date
orenum
- list: simple datapoint within a multivalue
- tabular: simple datapoint within a "multivalue tuple" (a multivalue list containing a tuple for every row)
Schema content
Schema content consists of a list of section objects.
Common attributes
The following attributes are common for all schema objects:
Attribute | Type | Description | Required |
---|---|---|---|
category | string | Category of an object, one of section , multivalue , tuple or datapoint . |
yes |
id | string | Unique identifier of an object. Maximum length is 50 characters. | yes |
label | string | User-friendly label for an object, shown in the user interface | yes |
hidden | boolean | If set to true , the object is not visible in the user interface, but remains stored in the database and may be exported. Default is false. Note that section is hidden if all its children are hidden. |
no |
disable_prediction | boolean | Can be set to true to disable field extraction, while still preserving the data shape. Ignored by aurora engines. |
no |
Section
Example of a section
{
"category": "section",
"id": "amounts_section",
"label": "Amounts",
"children": [...],
"icon": ""
}
Section represents a logical part of the document, such as amounts or vendor info. It is allowed only at the top level. Schema allows multiple sections, and there should be at least one section in the schema.
Attribute | Type | Description | Required |
---|---|---|---|
children | list[object] | Specifies objects grouped under a given section. It can contain multivalue or datapoint objects. |
yes |
icon | string | The icon that appears on the left panel in the UI for a given section (not yet supported on UI). |
Datapoint
A datapoint represents a single value, typically a field of a document or some global document information. Fields common to all datapoint types:
Attribute | Type | Description | Required |
---|---|---|---|
type | string | Data type of the object, must be one of the following: string , number , date , enum , button |
yes |
can_export | boolean | If set to false , datapoint is not exported through export endpoint. Default is true. |
|
can_collapse | boolean | If set to true , tabular (multivalue-tuple) datapoint may be collapsed in the UI. Default is false. |
|
rir_field_names | list[string] | List of references used to initialize an object value. See below for the description. Must be empty for schemas connected to queues with aurora engines | |
default_value | string | Default value used either for fields that do not use hints from AI engine predictions (i.e. rir_field_names are not specified), or when the AI engine does not return any data for the field. |
|
constraints | object | A map of various constraints for the field. See Value constraints. | |
ui_configuration | object | A group of settings affecting behaviour of the field in the application. See UI configuration. | |
width | integer | Width of the column (in characters). Default widths are: number: 8, string: 20, date: 10, enum: 20. Only supported for table datapoints. | |
stretch | boolean | If total width of columns doesn’t fill up the screen, datapoints with stretch set to true will be expanded proportionally to other stretching columns. Only supported for table datapoints. | |
width_chars | integer | (Deprecated) Use width and stretch properties instead. |
|
score_threshold | float [0;1] | Threshold used to automatically validate field content based on AI confidence scores. If not set, queue.default_score_threshold is used. |
|
formula | string[0;500] | Formula definition, required only for fields of type formula , see Formula Fields. rir_field_names should also be empty for these fields. |
rir_field_names
attribute allows to specify source of initial value of the object. List items may be:
- one of extracted field types to use the AI engine prediction value
upload:id
to identify a value specified while uploading the documentemail_header:<id>
to use a value extracted from email headers. Supported email headers:from
,to
,reply-to
,subject
,message-id
,date
.email_body:<id>
to select email body. Supported values aretext_html
for HTML body ortext_plain
for plain text body.email:<id>
to identify a value specified inemail.received
hook response
If more list items in rir_field_names
are specified, the first available value will be used.
String type
Example string datapoint
{
"category": "datapoint",
"id": "document_id",
"label": "Invoice ID",
"type": "string",
"default_value": null,
"rir_field_names": ["document_id"],
"constraints": {
"length": {
"exact": null,
"max": 16,
"min": null
},
"regexp": {
"pattern": "^INV[0-9]+$"
},
"required": false
}
}
String datapoint does not have any special attribute.
Date type
Example date datapoint
{
"id": "item_delivered",
"type": "date",
"label": "Item Delivered",
"format": "MM/DD/YYYY",
"category": "datapoint"
}
Attributes specific to Date datapoint:
Attribute | Type | Description | Required |
---|---|---|---|
format | string | Enforces a format for date datapoint on the UI. See Date format below for more details. Default is YYYY-MM-DD . |
Date format supported: available tokens
D/M/YYYY
: e.g. 23/1/2019MM/DD/YYYY
: e.g. 01/23/2019YYYY-MM-DD
: e.g. 2019-01-23 (ISO date format)
Number type
Example number datapoint
{
"id": "item_quantity",
"type": "number",
"label": "Quantity",
"format": "#,##0.#",
"category": "datapoint"
}
Attributes specific to Number datapoint:
Attribute | Type | Default | Description | Required |
---|---|---|---|---|
format | string | # ##0.# |
Available choices for number format show table below. null value is allowed. |
|
aggregations | object | A map of various aggregations for the field. See aggregations. |
The following table shows numeric formats with their examples.
Format | Example |
---|---|
# ##0,# |
1 234,5 or 1234,5 |
# ##0.# |
1 234.5 or 1234.5 |
#,##0.# |
1,234.5 or 1234.5 |
#'##0.# |
1'234.5 or 1234.5 |
#.##0,# |
1.234,5 or 1234,5 |
# ##0 |
1 234 or 1234 |
#,##0 |
1,234 or 1234 |
#'##0 |
1'234 or 1234 |
#.##0 |
1.234 or 1234 |
Aggregations
Example aggregations
{
"id": "quantity",
"type": "number",
"label": "Quantity",
"category": "datapoint",
"aggregations": {
"sum": {
"label": "Total"
}
},
"default_value": null,
"rir_field_names": []
}
Aggregations allow computation of some informative values, e.g. a sum of a table column with numeric values.
These are returned among messages
when /v1/annotations/{id}/content/validate
endpoint is called.
Aggregations can be computed only for tables (multivalues
of tuples
).
Attribute | Type | Description | Required |
---|---|---|---|
sum | object | Sum of values in a column. Default label : "Sum". |
All aggregation objects can have an attribute label
that will be shown in the UI.
Enum type
Example enum datapoint with options and enum_value_type
{
"id": "document_type",
"type": "enum",
"label": "Document type",
"hidden": false,
"category": "datapoint",
"options": [
{
"label": "Invoice Received",
"value": "21"
},
{
"label": "Invoice Sent",
"value": "22"
},
{
"label": "Receipt",
"value": "23"
}
],
"default_value": "21",
"rir_field_names": [],
"enum_value_type": "number"
}
Attributes specific to Enum datapoint:
Attribute | Type | Description | Required |
---|---|---|---|
options | object | See object description below. | yes |
enum_value_type | string | Data type of the option's value attribute. Must be one of the following: string , number , date |
no |
Every option consists of an object with keys:
Attribute | Type | Description | Required |
---|---|---|---|
value | string | Value of the option. | yes |
label | string | User-friendly label for the option, shown in the UI. | yes |
Enum datapoint value is matched in a case insensitive mode, e.g. EUR
currency value returned by the AI Core Engine is
matched successfully against {"value": "eur", "label": "Euro"}
option.
Button type
Specifies a button shown in Rossum UI. For more details please refer to custom UI extension.
Example button datapoint
{
"id": "show_email",
"type": "button",
"category": "datapoint",
"popup_url": "http://example.com/show_customer_data",
"can_obtain_token": true
}
Buttons cannot be direct children of multivalues (simple multivalues with buttons are not allowed. In tables, buttons are children of tuples). Despite being a datapoint object, button currently cannot hold any value. Therefore, the set of available Button datapoint attributes is limited to:
Attribute | Type | Description | Required |
---|---|---|---|
type | string | Data type of the object, must be one of the following: string , number , date , enum , button |
yes |
can_export | boolean | If set to false , datapoint is not exported through export endpoint. Default is true. |
|
can_collapse | boolean | If set to true , tabular (multivalue-tuple) datapoint may be collapsed in the UI. Default is false. |
|
popup_url | string | URL of a popup window to be opened when button is pressed. | |
can_obtain_token | boolean | If set to true the popup window is allowed to ask the main Rossum window for authorization token |
Value constraints
Example value constraints
{
"id": "document_id",
"type": "string",
"label": "Invoice ID",
"category": "datapoint",
"constraints": {
"length": {
"max": 32,
"min": 5
},
"required": false
},
"default_value": null,
"rir_field_names": [
"document_id"
]
}
Constraints limit allowed values. When constraints is not satisfied, annotation is considered invalid and cannot be exported.
Attribute | Type | Description | Required |
---|---|---|---|
length | object | Defines minimum, maximum or exact length for the datapoint value. By default, minimum and maximum are 0 and infinity, respectively. Supported attributes: min , max and exact |
|
regexp | object | When specified, content must match a regular expression. Supported attributes: pattern . To ensure that entire value matches, surround your regular expression with ^ and $ . |
|
required | boolean | Specifies if the datapoint is required by the schema. Default value is true . |
UI configuration
Example UI configuration
{
"id": "document_id",
"type": "string",
"label": "Invoice ID",
"category": "datapoint",
"ui_configuration": {
"type": "captured",
"edit": "disabled"
},
"default_value": null,
"rir_field_names": [
"document_id"
]
}
UI configuration provides a group of settings, which alter behaviour of the field in the application. This does not affect behaviour of the field via the API.
For example, disabling edit
prohibits changing a value of the datapoint in the application, but the value can still be modified through API.
Attribute | Type | Description | Required |
---|---|---|---|
type | string | Logical type of the datapoint. Possible values are: captured , data , manual , formula or null . Default value is null . |
false |
edit | string | When set to disabled , value of the datapoint is not editable via UI. When set to enabled_without_warning , no warnings are displayed in the UI regarding this fields editing behaviour. Default value is enabled , this option enables field editing, but user receives dismissible warnings when doing so. |
false |
Logical types
- Captured field represents information retrieved by the OCR model. If combined with
edit
option disabled, user can't overwrite field's value, but is able to redraw field's bounding box and select another value from the document by such an action. - Data field represents information filled by extensions. This field is not mapped to the AI model, so it does not have a bounding box, neither it's possible to create one. If combined with
edit
option disabled the field can't be modified from the UI. - Manual field behaves exactly like Data field, without the mapping to extensions. This field should be used for sharing information between users or to transfer information to downstream systems.
- Formula field This field will be updated according to its
formula
definition, see Formula Fields. If theedit
option is not disabled the field value can be overridden from the UI (see no_recalculation). - null value is displayed in UI as Unset and behaves similar to the Captured field.
Multivalue
Example of a multivalue:
{
"category": "multivalue",
"id": "line_item",
"label": "Line Item",
"children": {
...
},
"show_grid_by_default": false,
"min_occurrences": null,
"max_occurrences": null,
"rir_field_names": null
}
Example of a multivalue with grid row-types specification:
{
"category": "multivalue",
"id": "line_item",
"label": "Line Item",
"children": {
...
},
"grid": {
"row_types": [
"header", "data", "footer"
],
"default_row_type": "data",
"row_types_to_extract": [
"data"
]
},
"min_occurrences": null,
"max_occurrences": null,
"rir_field_names": ["line_items"]
}
Multivalue is list of datapoint
s or tuple
s of the same type.
It represents a container for data with multiple occurrences
(such as line items) and can contain only objects with the same id
.
Attribute | Type | Description | Required |
---|---|---|---|
children | object | Object specifying type of children. It can contain only objects with categories tuple or datapoint . |
yes |
min_occurrences | integer | Minimum number of occurrences of nested objects. If condition of min_occurrences is violated corresponding fields should be manually reviewed. Minimum required value for the field is 0. If not specified, it is set to 0 by default. | |
max_occurrences | integer | Maximum number of occurrences of nested objects. All additional rows above max_occurrences are removed by extraction process. Minimum required value for the field is 1. If not specified, it is set to 1000 by default. | |
grid | object | Configure magic-grid feature properties, see below. | |
show_grid_by_default | boolean | If set to true , the magic-grid is opened instead of footer upon entering the multivalue. Default false . Applied only in UI. Useful when annotating documents for custom training. |
|
rir_field_names | list[string] | List of names used to initialize content from the AI engine predictions. If specified, the value of the first field from the array is used, otherwise default name line_items is used. Attribute can be set only for multivalue containing objects with category tuple . |
no |
Multivalue grid object
Multivalue grid
object allows to specify a row type for each row of the
grid. For data representation of actual grid data rows see Grid object description.
Attribute | Type | Description | Default | Required |
---|---|---|---|---|
row_types | list[string] | List of allowed row type values. | ["data"] |
yes |
default_row_type | string | Row type to be used by default | data |
yes |
row_types_to_extract | list[string] | Types of rows to be extracted to related table | ["data"] |
yes |
For example to distinguish two header types and a footer in the validation interface, following row types may be used: header
,
subsection_header
, data
and footer
.
Currently, data extraction classifies every row as either data
or header
(additional row types may be introduced
in the future). We remove rows returned by data extraction that are not in row_types
list (e.g. header
by
default) and are on the top/bottom of the table. When they are in the middle of the table, we mark them as skipped
(null
).
There are three visual modes, based on row_types
quantity:
- More than two row types defined: User selects row types freely to any non-default row type. Clearing row type resets to a default row type. We support up to 6 colors to easily distinguish row types visually.
- Two row types defined (header and default): User only marks header and skipped rows. Clearing row type resets to a default row type.
- One row type defined: User is only able to mark row as skipped (
null
value in data). This is also a default behavior when nogrid
row types configuration is specified in the schema.
Tuple
Example of a tuple:
{
"category": "tuple",
"id": "tax_details",
"label": "Tax Details",
"children": [
...
],
"rir_field_names": [
"tax_details"
]
}
Container representing tabular data with related values, such as tax details.
A tuple
must be nested within a multivalue
object, but unlike multivalue
,
it may consist of objects with different id
s.
Attribute | Type | Description | Required |
---|---|---|---|
children | list[object] | Array specifying objects that belong to a given tuple . It can contain only objects with category datapoint . |
yes |
rir_field_names | list[string] | List of names used to initialize content from the AI engine predictions. If specified, the value of the first extracted field from the array is used, otherwise, no AI engine initialization is done for the object. |
Updating Schema
When project evolves, it is a common practice to enhance or change the extracted field set. This is done by updating the schema object.
By design, Rossum supports multiple schema versions at the same time. However, each document annotation is related to only one of those schemas. If the schema is updated, all related document annotations are updated accordingly. See preserving data on schema change below for limitations of schema updates.
In addition, every queue is linked to a schema, which is used for all newly imported documents.
When updating a schema, there are two possible approaches:
- Update the schema object (PUT/PATCH). All related annotations will be
updated to match current schema shape (even
exported
anddeleted
documents). - Create a new schema object (POST) and link it to the queue. In such case, only newly created objects will use the current schema. All previously created objects will remain in the shape of their linked schema.
Use case 1 - Initial setting of a schema
- Situation: User is initially setting up the schema. This might be an iterative process.
- Recommendation: Edit the existing schema by updating schema (PUT) or updating part of a schema (PATCH).
Use case 2 - Updating attributes of a field (label, constraints, options, etc.)
- Situation: User is updating field, e.g. changing label, number format, constraints, enum options, hidden flag, etc.
- Recommendation: Edit existing schema (see Use case 1).
Use case 3 - Adding new field to a schema, even for already imported documents.
- Situation: User is extending a production schema by adding a new field. Moreover, user would like to see all annotations (
to_review
,postponed
,exported
,deleted
, etc. states) in the look of the newly extended schema. - Recommendation: Edit existing schema (see Use case 1). Data of already created annotations will be transformed to the shape of the updated schema. New fields of annotations in
to_review
andpostponed
state that are linked to extracted fields types will be filled by AI Engine, if available. New fields for alreadyexported
ordeleted
annotations (alsopurged
,exporting
andfailed_export
) will be filled with empty or default values.
Use case 4 - Adding new field to schema, only for newly imported documents
- Situation: User is extending a production schema by adding a new field. However, with the intention that the user does not want to see the newly added field on previously created annotations.
- Recommendation: Create a new schema object (POST) and link it to the queue. Annotation data of previously created annotations will be preserved according to the original schema. This approach is recommended if there is an organizational need to keep different field sets before and after the schema update.
Use case 5 - Deleting schema field, even for already imported documents.
- Situation: User is changing a production schema by removing a field that was used previously. However, user would like to see all annotations (
to_review
,postponed
,exported
,deleted
, etc. states) in the look of the newly extended schema. There is no need to see the original fields in already exported annotations. - Recommendation: Edit existing schema (see Use case 1).
Use case 6 - Deleting schema field, only for newly imported documents
- Situation: User is changing a production schema by removing a field that was used previously. However, with the intention that the user will still be able to see the removed fields on previously created annotations.
- Recommendation: Create a new schema object (see Use case 4). Annotation data of previously created annotations will be preserved according to the original schema. This approach is recommended if there is an organizational need to retrieve the data in the original state.
Preserving data on schema change
In order to transfer annotation field values properly during the schema update,
a datapoint's category
and schema_id
must be preserved.
Supported operations that preserve fields values are:
- adding a new datapoint (filled from AI Engine, if available)
- reordering datapoints on the same level
- moving datapoints from section to another section
- moving datapoints to and from a tuple
- reordering datapoints inside a tuple
- making datapoint a multivalue (original datapoint is the only value in a new multivalue container)
- making datapoint non-multivalue (only first datapoint value is preserved)
Extracted field types
AI engine currently automatically extracts the following fields at the all
endpoint, subject to ongoing expansion.
Identifiers
Example of a schema with different identifiers:
[
{
"category": "section",
"children": [
{
"category": "datapoint",
"constraints": {
"required": false
},
"default_value": null,
"id": "document_id",
"label": "Invoice number",
"rir_field_names": [
"document_id"
],
"type": "string"
},
{
"category": "datapoint",
"constraints": {
"required": false
},
"default_value": null,
"format": "D/M/YYYY",
"id": "date_issue",
"label": "Issue date",
"rir_field_names": [
"date_issue"
],
"type": "date"
},
{
"category": "datapoint",
"constraints": {
"required": false
},
"default_value": null,
"id": "terms",
"label": "Terms",
"rir_field_names": [
"terms"
],
"type": "string"
}
],
"icon": null,
"id": "invoice_info_section",
"label": "Basic information"
}
]
Attr. rir_field_names | Field label | Description |
---|---|---|
account_num | Bank Account | Bank account number. Whitespaces are stripped. |
bank_num | Sort Code | Sort code. Numerical code of the bank. |
iban | IBAN | Bank account number in IBAN format. |
bic | BIC/SWIFT | Bank BIC or SWIFT code. |
const_sym | Constant Symbol | Statistical code on payment order. |
spec_sym | Specific Symbol | Payee id on the payment order, or similar. |
var_sym | Variable symbol | In some countries used by the supplier to match the payment received against the invoice. Possible non-numeric characters are stripped. |
terms | Terms | Payment terms as written on the document (e.g. "45 days", "upon receipt"). |
payment_method | Payment method | Payment method defined on a document (e.g. 'Cheque', 'Pay order', 'Before delivery') |
customer_id | Customer Number | The number by which the customer is registered in the system of the supplier. Whitespaces are stripped. |
date_due | Date Due | The due date of the invoice. |
date_issue | Issue Date | Date of issue of the document. |
date_uzp | Tax Point Date | The date of taxable event. |
document_id | Document Identifier | Document number. Whitespaces are stripped. |
order_id | Order Number | Purchase order identification (Order Numbers not captured as "sender_order_id"). Whitespaces are stripped. |
recipient_address | Recipient Address | Address of the customer. |
recipient_dic | Recipient Tax Number | Tax identification number of the customer. Whitespaces are stripped. |
recipient_ic | Recipient Company ID | Company identification number of the customer. Possible non-numeric characters are stripped. |
recipient_name | Recipient Name | Name of the customer. |
recipient_vat_id | Recipient VAT Number | Customer VAT Number |
recipient_delivery_name | Recipient Delivery Name | Name of the recipient to whom the goods will be delivered. |
recipient_delivery_address | Recipient Delivery Address | Address of the reciepient where the goods will be delivered. |
sender_address | Supplier Address | Address of the supplier. |
sender_dic | Supplier Tax Number | Tax identification number of the supplier. Whitespaces are stripped. |
sender_ic | Supplier Company ID | Business/organization identification number of the supplier. Possible non-numeric characters are stripped. |
sender_name | Supplier Name | Name of the supplier. |
sender_vat_id | Supplier VAT Number | VAT identification number of the supplier. |
sender_email | Supplier Email | Email of the sender. |
sender_order_id | Supplier's Order ID | Internal order ID in the suppliers system. |
delivery_note_id | Delivery Note ID | Delivery note ID defined on the invoice. |
supply_place | Place of Supply | Place of supply (the name of the city or state where the goods will be supplied). |
Document attributes
Attr. rir_field_names | Field label | Description |
---|---|---|
currency | Currency | The currency which the invoice is to be paid in. Possible values: CZK, DKK, EUR, GBP, NOK, SEK, HUF, USD, AUD, INR, CHF, CNY, JPY, PLN, RON, RUB or other. May be also in lowercase. |
document_type | Document Type | Possible values: credit_note, debit_note, tax_invoice (most typical), proforma, receipt, delivery_note, order or other. |
language | Language | The language which the document was written in. Possible values: ces, deu, eng, fra, slk, esp, hun, swe, dan, fin, ital, nor, pol, por or other. |
payment_method_type | Payment Method Type | Payment method used for the transaction. Possible values: card, cash. |
Amounts
Attr. rir_field_names | Field label | Description |
---|---|---|
amount_due | Amount Due | Final amount including tax to be paid after deducting all discounts and advances. |
amount_rounding | Amount Rounding | Remainder after rounding amount_total. |
amount_total | Total Amount | Subtotal over all items, including tax. |
amount_paid | Amount paid | Amount paid already. |
amount_total_base | Tax Base Total | Base amount for tax calculation. |
amount_total_tax | Tax Total | Total tax amount. |
Typical relations (may depend on local laws):
amount_total = amount_total_base + amount_total_tax amount_rounding = amount_total - round(amount_total) amount_due = amount_total - amount_paid + amount_rounding
All amounts are in the main currency of the invoice (as identified in the currency response field). Amounts in other currencies are generally excluded.
Tables
At the moment, the AI engine automatically extracts 2 types of tables.
In order to pick one of the possible choices, set rir_field_names
attribute on multivalue
.
Attr. rir_field_names | Table |
---|---|
tax_details | Tax details |
line_items | Line items |
Tax details
Example of a tax details table:
{
"category": "section",
"children": [
{
"category": "multivalue",
"children": {
"category": "tuple",
"children": [
{
"category": "datapoint",
"constraints": {
"required": false
},
"default_value": null,
"format": "# ##0.#",
"id": "vat_detail_rate",
"label": "VAT rate",
"rir_field_names": [
"tax_detail_rate"
],
"type": "number",
"width": 15
},
...
],
"id": "vat_detail",
"label": "VAT detail"
},
"default_value": null,
"id": "vat_details",
"label": "VAT details",
"max_occurrences": null,
"min_occurrences": null,
"rir_field_names": [
"tax_details"
]
}
],
"icon": null,
"id": "amounts_section",
"label": "Amounts section"
}
Tax details table and breakdown by tax rates.
Attr. rir_field_names | Field label | Description |
---|---|---|
tax_detail_base | Tax Base | Sum of tax bases for items with the same tax rate. |
tax_detail_rate | Tax Rate | One of the tax rates in the tax breakdown. |
tax_detail_tax | Tax Amount | Sum of taxes for items with the same tax rate. |
tax_detail_total | Tax Total | Total amount including tax for all items with the same tax rate. |
tax_detail_code | Tax Code | [BETA] Text on document describing tax code of the tax rate (e.g. 'GST', 'CGST', 'DPH', 'TVA'). If multiple tax rates belong to one tax code on the document, the tax code will be assigned only to the first tax rate. (in future such tax code will be distributed to all matching tax rates.) |
Line items
Example of a line items table:
{
"category": "section",
"children": [
{
"category": "multivalue",
"children": {
"category": "tuple",
"children": [
{
"category": "datapoint",
"constraints": {
"required": true
},
"default_value": null,
"id": "item_desc",
"label": "Description",
"rir_field_names": [
"table_column_description"
],
"type": "string",
"stretch": true
},
{
"category": "datapoint",
"constraints": {
"required": false
},
"default_value": null,
"format": "# ##0.#",
"id": "item_quantity",
"label": "Quantity",
"rir_field_names": [
"table_column_quantity"
],
"type": "number",
"width": 15
},
{
"category": "datapoint",
"constraints": {
"required": false
},
"default_value": null,
"format": "# ##0.#",
"id": "item_amount_total",
"label": "Price w tax",
"rir_field_names": [
"table_column_amount_total"
],
"type": "number"
}
],
"id": "line_item",
"label": "Line item",
"rir_field_names": []
},
"default_value": null,
"id": "line_items",
"label": "Line item",
"max_occurrences": null,
"min_occurrences": null,
"rir_field_names": [
"line_items"
]
}
],
"icon": null,
"id": "line_items_section",
"label": "Line items"
}
AI engine currently automatically extracts line item table content and recognizes row and column types as detailed below. Invoice line items come in a wide variety of different shapes and forms. The current implementation can deal with (or learn) most layouts, with borders or not, different spacings, header rows, etc. We currently make two further assumptions:
- The table generally follows a grid structure - that is, columns and rows may be represented as rectangle spans. In practice, this means that we may currently cut off text that overlaps from one cell to the next column. We are also not optimizing for table rows that are wrapped to multiple physical lines.
- The table contains just a flat structure of line items, without subsection headers, nested tables, etc.
We plan to gradually remove both assumptions in the future.
Attribute rir_field_names | Field label | Description |
---|---|---|
table_column_code | Item Code/Id | Can be the SKU, EAN, a custom code (string of letters/numbers) or even just the line number. |
table_column_description | Item Description | Line item description. Can be multi-line with details. |
table_column_quantity | Item Quantity | Quantity of the item. |
table_column_uom | Item Unit of Measure | Unit of measure of the item (kg, container, piece, gallon, ...). |
table_column_rate | Item Rate | Tax rate for the line item. |
table_column_tax | Item Tax | Tax amount for the line. Rule of thumb: tax = rate * amount_base . |
table_column_amount_base | Amount Base | Unit price without tax. (This is the primary unit price extracted.) |
table_column_amount | Amount | Unit price with tax. Rule of thumb: amount = amount_base + tax . |
table_column_amount_total_base | Amount Total Base | The total amount to be paid for all the items excluding the tax. Rule of thumb: amount_total_base = amount_base * quantity . |
table_column_amount_total | Amount Total | The total amount to be paid for all the items including the tax. Rule of thumb: amount_total = amount * quantity . |
table_column_other | Other | Unrecognized data type. |
Annotation Lifecycle
When a document is submitted to Rossum within a given queue, an annotation object is assigned to it. An annotation goes through a variety of states as it is processed, and eventually exported.
State | Description |
---|---|
created | Annotation was created manually via POST to annotations endpoint. Annotation created this way may be switched to importing state only at the end of the upload.created event (this happens automatically). |
importing | Document is being processed by the AI Engine for data extraction. |
failed_import | Import failed e.g. due to a malformed document file. |
split | Annotation was split in user interface or via API and new annotations were created from it. |
to_review | Initial extraction step is done and the annotation is waiting for user validation. |
reviewing | Annotation is undergoing validation in the user interface. |
in_workflow | Annotation is being processed in a workflow. Annotation content cannot be modified while in this state. Please note that any manual interaction with this status may introduce confilicts with Rossum automated workflows. Read more about Rossum Workflows here. |
confirmed | Annotation is validated and confirmed by the user. This status must be explicitly enabled on the queue to be present. |
rejected | Annotation was rejected by user. This status must be explicitly enabled on the queue to be present. You can read about when a rejection is possible here. |
exporting | Annotation is validated and is now awaiting the completion of connector save call. See connector extension for more information on this status. |
exported | Annotation is validated and successfully passed all hooks; this is the typical terminal state of an annotation. |
failed_export | When the connector returned an error. |
postponed | Operator has chosen to postpone the annotation instead of exporting it. |
deleted | When the annotation was deleted by the user. |
purged | Only metadata was preserved after a deletion. This status is terminal and cannot be further changed. See purge deleted if you want to know how to purge an annotation. |
This diagram shows exact flow between the annotation states whole working with the UI.
Usage report
In order to obtain an overview of the Rossum usage, you can download Csv file with basic Rossum statistics.
The statistics contains following attributes:
- Username (may be empty in case document was not modified by any user)
- Workspace name
- Queue name
- User url
- Queue url
- Workspace url
- Imported: count of all documents that were imported during the time period
- Confirmed: count of all documents that were confirmed during the time period
- Rejected: count of all documents that were rejected during the time period
- Rejected automatically: count of all documents that were automatically rejected during the time period
- Rejected manually: count of all documents that were manually rejected during the time period
- Deleted: count of documents that were deleted during the time period
- Exported: count of documents that were successfully exported during the time period
- Net time: total time spent by a user validating the successfully exported documents
Download usage statistics (January 2019).
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/annotations/usage_report?from=2019-01-01&to=2019-01-31'
Csv file (csv) may be downloaded from https://<example>.rossum.app/api/v1/annotations/usage_report?format=csv
.
You may specify date range using from
and to
parameters (inclusive). If not
specified, a report for last 12 months is generated.
Request
POST /v1/annotations/usage_report
Attribute | Type | Description |
---|---|---|
filter | object | Filters to be applied on documents used for the computation of usage report |
filter.users | list[URL] | Filter documents modified by the specified users (not applied to imported_count ) |
filter.queues | list[URL] | Filter documents from the specified queues |
filter.begin_date | datetime | Filter documents that has date (arrived_at for imported_count ; deleted_at for deleted_count ; rejected_at for rejected_count ; or exported_at for the rest) greater than specified. |
filter.end_date | datetime | Filter documents that has date (arrived_at for imported_count ; deleted_at for deleted_count ; rejected_at for rejected_count ; or exported_at for the rest) lower than specified. |
exported_on_time_threshold_s | float | Threshold (in seconds) under which are documents denoted as on_time . |
group_by | list[string] | List of attributes by which the series is to be grouped. Possible values: user , workspace , queue , month , week , day . |
{
"filter": {
"users": [
"https://<example>.rossum.app/api/v1/users/173"
],
"queues": [
"https://<example>.rossum.app/api/v1/queues/8199"
],
"begin_date": "2019-12-01",
"end_date": "2020-01-31"
},
"exported_on_time_threshold_s": 86400,
"group_by": [
"user",
"workspace",
"queue",
"month"
]
}
Response
Status: 200
{
"series": [
{
"begin_date": "2019-12-01",
"end_date": "2020-01-01",
"queue": "https://<example>.rossum.app/api/v1/queues/8199",
"workspace": "https://<example>.rossum.app/api/v1/workspaces/7540",
"values": {
"imported_count": 2,
"confirmed_count": 6,
"rejected_count": 2,
"rejected_automatically_count": 1,
"rejected_manually_count": 1,
"deleted_count": null,
"exported_count": null,
"turnaround_avg_s": null,
"corrections_per_document_avg": null,
"exported_on_time_count": null,
"exported_late_count": null,
"time_per_document_avg_s": null,
"time_per_document_active_avg_s": null,
"time_and_corrections_per_field": []
}
},
{
"begin_date": "2020-01-01",
"end_date": "2020-02-01",
"queue": "https://<example>.rossum.app/api/v1/queues/8199",
"workspace": "https://<example>.rossum.app/api/v1/workspaces/7540",
"user": "https://<example>.rossum.app/api/v1/users/173",
"values": {
"imported_count": null,
"confirmed_count": 6,
"rejected_count": 3,
"rejected_automatically_count": 2,
"rejected_manually_count": 1,
"deleted_count": 2,
"exported_count": 2,
"turnaround_avg_s": 1341000,
"corrections_per_document_avg": 1.0,
"exported_on_time_count": 1,
"exported_late_count": 1,
"time_per_document_avg_s": 70.0,
"time_per_document_active_avg_s": 50.0,
"time_and_corrections_per_field": [
{
"schema_id": "date_due",
"label": "Date due",
"total_count": 1,
"corrected_ratio": 0.0,
"time_spent_avg_s": 0.0
},
...
]
}
},
...
],
"totals": {
"imported_count": 7,
"confirmed_count": 6,
"rejected_count": 5,
"rejected_automatically_count": 3,
"rejected_manually_count": 2,
"deleted_count": 2,
"exported_count": 3,
"turnaround_avg_s": 894000,
"corrections_per_document_avg": 1.0,
"exported_on_time_count": 2,
"exported_late_count": 1,
"time_per_document_avg_s": 70.0,
"time_per_document_active_avg_s": 50.0
}
}
The response consists of two parts: totals
and series
.
Totals
Totals
contain summary information for the whole period (between begin_date
and end_date
).
Attribute | Type | Description |
---|---|---|
imported_count | int | Count of documents that were uploaded to Rossum |
confirmed_count | int | Count of documents that were confirmed |
rejected_count | int | Count of documents that were rejected |
rejected_automatically_count | int | Count of documents that were automatically rejected |
rejected_manually_count | int | Count of documents that were manually rejected |
deleted_count | int | Count of documents that were deleted |
exported_count | int | Count of documents that were successfully exported |
turnaround_avg_s | float | Average time (in seconds) that a document spends in Rossum (computed as time exported_at - arrived_at ) |
corrections_per_document_avg | float | Average count of corrections on documents |
exported_on_time_count | int | Number of documents of which turnaround was under exported_on_time_threshold |
exported_late_count | int | Number of documents of which turnaround was above exported_on_time_threshold |
time_per_document_avg_s | float | Average time (in seconds) that users spent validating documents. Based on the time_spent_overall metric, see annotation processing duration |
time_per_document_active_avg_s | float | Average active time (in seconds) that users spent validating documents. Based on the time_spent_active metric, see annotation processing duration |
Series
Series
contain information grouped by fields defined in group_by
.
The data (see above) are wrapped in values
object,
and accompanied by the values of attributes that were used for grouping.
Attribute | Type | Description |
---|---|---|
user | URL | User, who modified documents within the group |
workspace | URL | Workspace, in which are the documents within the group |
queue | URL | Queue, in which are the documents within the group |
begin_date | date | Start date, of the documents within the group |
end_date | date | Final date, of the documents within the group |
values | object | Contains the data of totals and time_and_corrections_per_field list (for details see below). |
In addition to the totals
data, series
contain time_and_corrections_per_field
list
that provides detailed data about statistics on each field.
Series details
The detail object contains statistics grouped per field (schema_id
).
Attribute | Type | Description |
---|---|---|
schema_id | string | Reference mapping of the data object to the schema tree |
label | string | Label of the data object (taken from schema). |
total_count | int | Number of data objects |
corrected_ratio* | float [0;1] | Ratio of data objects that must have been corrected after automatic extraction. |
time_spent_avg_s | float | Average time (in seconds) spent on validating the data objects |
*Corrected ratio is calculated based on human corrections. If any kind of automation (Hook, Webhook, etc) is ran on the datapoints, even after a human correction took a place, the corrected_ration will not be calculated -> Is set to 0.
Extensions
The Rossum platform may be extended via third-party, externally running services or custom serverless functions. These extensions are registered to receive callbacks from the Rossum platform on various occasions and allow to modify the platform behavior. Currently we support these callback extensions: Webhooks, Serverless Functions, and Connectors.
Webhooks and connectors require a third-party service accessible through a HTTP endpoint. This may incur additional operational and implementation costs. User-defined serverless functions, on the contrary, are executed within Rossum platform and no additional setup is necessary. They share the interface (input and output data format, error handling) with webhooks.
See the Building Your Own Extension set of guides in Rossum's developer portal for an introduction to Rossum extensions.
Webhook Extension
The webhook component is the most flexible extension. It covers all the most frequent use cases:
- displaying messages to users in validation screen,
- applying custom business rules to check, change, or autofill missing values,
- reacting to document status change in your workflow,
- sending reviewed data to an external systems.
Implement a webhook
Webhooks are designed to be implemented using a push-model using common HTTPS protocol. When an event is triggered, the webhook endpoint is called with a relevant request payload. The webhook must be deployed with a public IP address so that the Rossum platform can call its endpoints; for testing, a middleware like ngrok or serveo may come useful.
Webhook vs. Connector
Webhook extensions are similar to connectors, but they are more flexible and easier to use. A webhook is notified when a defined type of webhook event occurs for a related queue.
Advantages of webhooks over connectors:
- There may be multiple webhooks defined for a single queue
- No hard-coded endpoint names (
/validate
,/save
) - Robust retry mechanism in case of webhook failure
- If webhooks are connected via the
run_after
parameter, the results from the predecessor webhook are passed to its successor
Webhooks are defined using a hook
object of type webhook. For a description
how to create and manage hooks, see the Hook API.
Webhook Events
Example data sent for
annotation_status
event to the hook.config.url when status of the annotation changes
{
"request_id": "ae7bc8dd-73bd-489b-a3d2-f5514b209591",
"timestamp": "2020-01-01T00:00:00.000000Z",
"base_url": "https://<example>.rossum.app",
"rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
"hook": "https://<example>.rossum.app/api/v1/hooks/789",
"settings": {
"example_target_service_type": "SFTP",
"example_target_hostname": "sftp.elis.rossum.ai"
},
"secrets": {
"username": "my-rossum-importer",
"password": "secret-importer-user-password"
},
"action": "changed",
"event": "annotation_status",
"annotation": {
"document": "https://<example>.rossum.app/api/v1/documents/314621",
"id": 314521,
"queue": "https://<example>.rossum.app/api/v1/queues/8236",
"schema": "https://<example>.rossum.app/api/v1/schemas/223",
"pages": [
"https://<example>.rossum.app/api/v1/pages/551518"
],
"creator": "https://<example>.rossum.app/api/v1/users/1",
"modifier": null,
"assigned_at": null,
"created_at": "2021-04-26T10:08:03.856648Z",
"confirmed_at": null,
"deleted_at": null,
"exported_at": null,
"export_failed_at": null,
"modified_at": null,
"purged_at": null,
"rejected_at": null,
"confirmed_by": null,
"deleted_by": null,
"exported_by": null,
"purged_by": null,
"rejected_by": null,
"status": "to_review",
"previous_status": "importing",
"rir_poll_id": "54f6b91cfb751289e71ddf12",
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/314521",
"content": "https://<example>.rossum.app/api/v1/annotations/314521/content",
"time_spent": 0,
"metadata": {},
"organization": "https://<example>.rossum.app/api/v1/organizations/1"
},
"document": {
"id": 314621,
"url": "https://<example>.rossum.app/api/v1/documents/314621",
"s3_name": "272c2f41ae84a4f19a422cb432a490bb",
"mime_type": "application/pdf",
"arrived_at": "2019-02-06T23:04:00.933658Z",
"original_file_name": "test_invoice_1.pdf",
"content": "https://<example>.rossum.app/api/v1/documents/314621/content",
"metadata": {}
}
}
Example data sent for
annotation_content
event to the hook.config.url when user updates a value in UI
{
"request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
"timestamp": "2020-01-01T00:00:00.000000Z",
"base_url": "https://<example>.rossum.app",
"rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
"hook": "https://<example>.rossum.app/api/v1/hooks/781",
"settings": {
"example_target_hostname": "sftp.elis.rossum.ai"
},
"secrets": {
"password": "secret-importer-user-password"
},
"action": "updated",
"event": "annotation_content",
"annotation": {
"document": "https://<example>.rossum.app/api/v1/documents/314621",
"id": 314521,
"queue": "https://<example>.rossum.app/api/v1/queues/8236",
"schema": "https://<example>.rossum.app/api/v1/schemas/223",
"pages": [
"https://<example>.rossum.app/api/v1/pages/551518"
],
"creator": "https://<example>.rossum.app/api/v1/users/1",
"modifier": null,
"assigned_at": null,
"created_at": "2021-04-26T10:08:03.856648Z",
"confirmed_at": null,
"deleted_at": null,
"exported_at": null,
"export_failed_at": null,
"modified_at": null,
"purged_at": null,
"rejected_at": null,
"confirmed_by": null,
"deleted_by": null,
"exported_by": null,
"purged_by": null,
"rejected_by": null,
"status": "to_review",
"previous_status": "importing",
"rir_poll_id": "54f6b91cfb751289e71ddf12",
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/314521",
"organization": "https://<example>.rossum.app/api/v1/organizations/1",
"content": [
{
"id": 1123123,
"url": "https://<example>.rossum.app/api/v1/annotations/314521/content/1123123",
"schema_id": "basic_info",
"category": "section",
"children": [
{
"id": 20456864,
"url": "https://<example>.rossum.app/api/v1/annotations/1/content/20456864",
"content": {
"value": "18 492.48",
"normalized_value": "18492.48",
"page": 2,
...
},
"category": "datapoint",
"schema_id": "number",
"validation_sources": [
"checks",
"score"
],
"time_spent": 0
}
]
}
],
"time_spent": 0,
"metadata": {}
},
"document": {
"id": 314621,
"url": "https://<example>.rossum.app/api/v1/documents/314621",
"s3_name": "272c2f41ae84a4f19a422cb432a490bb",
"mime_type": "application/pdf",
"arrived_at": "2019-02-06T23:04:00.933658Z",
"original_file_name": "test_invoice_1.pdf",
"content": "https://<example>.rossum.app/api/v1/documents/314621/content",
"metadata": {}
},
"updated_datapoints": [11213211, 11213212]
}
Example of a response for
annotation_content
hook
{
"messages": [
{
"content": "Invalid invoice number format",
"id": 197467,
"type": "error"
}
],
"operations": [
{
"op": "replace",
"id": 198143,
"value": {
"content": {
"value": "John",
"position": [103, 110, 121, 122],
"page": 1
},
"hidden": false,
"options": [],
"validation_sources": ["human"]
}
},
{
"op": "remove",
"id": 884061
},
{
"op": "add",
"id": 884060,
"value": [
{
"schema_id": "item_description",
"content": {
"page": 1,
"position": [162, 852, 371, 875],
"value": "Bottle"
}
}
]
}
]
}
Example data sent for
{
"request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
"timestamp": "2020-01-01T00:00:00.000000Z",
"base_url": "https://<example>.rossum.app",
"rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
"hook": "https://<example>.rossum.app/api/v1/hooks/781",
"settings": {
"example_target_hostname": "sftp.elis.rossum.ai"
},
"secrets": {
"password": "secret-importer-user-password"
},
"action": "received",
"event": "email",
"email": "https://<example>.rossum.app/api/v1/emails/987",
"queue": "https://<example>.rossum.app/api/v1/queues/41",
"files": [
{
"id": "1",
"filename": "image.png",
"mime_type": "image/png",
"n_pages": 1,
"height_px": 100.0,
"width_px": 150.0,
"document": "https://<example>.rossum.app/api/v1/documents/427"
},
{
"id": "2",
"filename": "MS word.docx",
"mime_type": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"n_pages": 1,
"height_px": null,
"width_px": null,
"document": "https://<example>.rossum.app/api/v1/documents/428"
},
{
"id": "3",
"filename": "A4 pdf.pdf",
"mime_type": "application/pdf",
"n_pages": 3,
"height_px": 3510.0,
"width_px": 2480.0,
"document": "https://<example>.rossum.app/api/v1/documents/429"
},
{
"id": "4",
"filename": "unknown_file",
"mime_type": "text/xml",
"n_pages": 1,
"height_px": null,
"width_px": null,
"document": "https://<example>.rossum.app/api/v1/documents/430"
}
],
"headers": {
"from": "test@example.com",
"to": "east-west-trading-co-a34f3a@<example>.rossum.app",
"reply-to": "support@example.com",
"subject": "Some subject",
"date": "Mon, 04 May 2020 11:01:32 +0200",
"message-id": "15909e7e68e4b5f56fd78a3b4263c4765df6cc4d",
"authentication-results": "example.com;\n dmarc=pass d=example.com"
},
"body": {
"body_text_plain": "Some body",
"body_text_html": "<div dir=\"ltr\">Some body</div>"
}
}
Example of a response for
{
"files": [
{
"id": "3",
"values": [
{
"id": "email:invoice_id",
"value": "INV001234"
},
{
"id": "email:customer_name",
"value": "John Doe"
}
]
}
]
}
Example data sent for
invocation.scheduled
event and action
{
"request_id": "ae7bc8dd-73bd-489b-a3d2-f5514b209591",
"timestamp": "2020-01-01T00:00:00.000000Z",
"base_url": "https://<example>.rossum.app",
"rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
"hook": "https://<example>.rossum.app/api/v1/hooks/789",
"settings": {
"example_target_service_type": "SFTP",
"example_target_hostname": "sftp.elis.rossum.ai"
},
"secrets": {
"username": "my-rossum-importer",
"password": "secret-importer-user-password"
},
"action": "scheduled",
"event": "invocation"
}
Example data sent for
upload
event to the hook.config.url when documents are uploaded (either through API or as an Email attachment)
{
"request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
"timestamp": "2020-01-01T00:00:00.000000Z",
"base_url": "https://<example>.rossum.app",
"rossum_authorization_token": "1024873d424a007d8eebff7b3684d283abdf7d0d",
"hook": "https://<example>.rossum.app/api/v1/hooks/781",
"settings": {},
"secrets": {},
"action": "created",
"event": "upload",
"email": "https://<example>.rossum.app/api/v1/emails/987",
"upload": "https://<example>.rossum.app/api/v1/uploads/2046",
"metadata": {},
"files": [
{
"document": "https://<example>.rossum.app/api/v1/documents/427",
"prevent_importing": false,
"values": [],
"queue": "https://<example>.rossum.app/api/v1/queues/41",
"annotation": null
},
{
"document": "https://<example>.rossum.app/api/v1/documents/428",
"prevent_importing": true,
"values": [],
"queue": "https://<example>.rossum.app/api/v1/queues/41",
"annotation": "https://<example>.rossum.app/api/v1/annotations/1638"
}
],
"documents": [
{
"id": 427,
"url": "https://<example>.rossum.app/api/v1/documents/427",
"mime_type": "application/pdf",
...
},
{
"id": 428,
"url": "https://<example>.rossum.app/api/v1/documents/428",
"mime_type": "application/json",
...
}
]
}
Example of a response for
document
hook
{
"files": [
{
"document": "https://<example>.rossum.app/api/v1/documents/427",
"prevent_importing": false
},
{
"document": "https://<example>.rossum.app/api/v1/documents/428",
"prevent_importing": true
},
{
"document": "https://<example>.rossum.app/api/v1/documents/429",
},
{
"document": "https://<example>.rossum.app/api/v1/documents/430",
}
]
}
Webhook events specify when the hook should be notified. They can be defined as following:
- either as whole event containing all supported actions for its type (for example
annotation_status
) - or as separately named actions for specified event (for example
annotation_status.changed
)
Supported events and their actions
Event and Action | Payload (outside default attributes) | Response | Description | Retry on failure |
---|---|---|---|---|
annotation_status.changed |
annotation, document | N/A | Annotation status change occurred |
yes |
annotation_content.initialize |
annotation + content, document, updated_datapoints | operations, messages | Annotation was initialized (data extracted) | yes |
annotation_content.started |
annotation + content, document, updated_datapoints (empty) | operations, messages | User entered validation screen | no (interactive) |
annotation_content.user_update |
annotation + content, document, updated_datapoints | operations, messages | (Deprecated in favor of annotation_content.updated ) Annotation was updated by the user |
no (interactive) |
annotation_content.updated |
annotation + content, document, updated_datapoints | operations, messages | Annotation data was updated by the user | no (interactive) |
annotation_content.confirm |
annotation + content, document, updated_datapoints (empty) | operations, messages | User confirmed validation screen | no (interactive) |
annotation_content.export |
annotation + content, document, updated_datapoints (empty) | operations, messages | Annotation is being moved to exported state | yes |
upload.created |
files, documents, metadata, email, upload | files | Upload object was created | yes |
email.received |
files, headers, body, email, queue | files (*) | Email with attachments was received | yes |
invocation.scheduled |
N/A | N/A | Hook was invoked at the scheduled time | yes |
invocation.manual |
custom payload fields | forwarded hook response | Event for manual hook triggering | no |
(*) May also contain other optional attributes - read more in this section.
- For each webhook call there is a 30 seconds timeout by default (this applies to all events and actions). It can be modified in
config
with attributetimeout_s
(min=0, max=60, only for non-interactive webhooks). - When
annotation_content.export
action fails, annotation is switched to thefailed_export
state. - When response from webhook on
annotation_content.export
contains a message of typeerror
, the annotation is switched to thefailed_export
state. - In case a non-interactive webhook call fails (check the configuration of
retry_on_any_non_2xx
attribute of the webhook to see what statuses this includes), it is retried within 30 seconds by default. There are up to 5 attempts performed. This number can be modified inconfig
with attributeretry_count
(min=0, max=4, only for non-interactive webhooks). - For a non-interactive webhook call with the resulting HTTP status 202 (asynchronous result), the response
Location
header provided url is polled the same number of times as the webhook url itself till it receives a 201 status code response of which the body is taken as the hook call result. - The
updated_datapoints
list is never empty forannotation_content.updated
only triggered by interactive date changes actions. - The
updated_datapoints
list is always empty for theannotation_content.export
action. - The
updated_datapoints
list is empty for theannotation_content.initialize
action ifrun_after=[]
, but it can have data from its predecessor if chained viarun_after
. - The
updated_datapoints
list may also be empty forannotation_content.user_update
in case of an action triggered interactively by a user, but with no data changes (e.g. after opening validation screen or eventually at its confirmation issued by the Rossum UI).
Webhook Events Occurrence Diagram
To show an overview of the Hook events and when they are happening, this diagram was created.
Hooks common attributes
Key | Type | Description |
---|---|---|
request_id | UUID | Hook call request ID |
timestamp | datetime | Timestamp when the hook was called |
hook | URL | Hook's url |
action | string | Hook's action |
event | string | Hook's event |
settings | object | Copy of hook.settings attribute |
Annotation status event data format
annotation_status
event contains following additional event specific attributes.
Key | Type | Description |
---|---|---|
annotation | object | Annotation object (enriched with attribute previous_status ) |
document | object | Document object (attribute annotations is excluded) |
queues* | list[object] | list of related queue objects |
modifiers* | list[object] | list of related modifier objects |
schemas* | list[object] | list of related schema objects |
emails* | list[object] | list of related email objects (for annotations created after email ingestion) |
related_emails* | list[object] | list of related emails objects (other related emails) |
relations* | list[object] | list of related relation objects |
child_relations* | list[object] | list of related child_relation objects |
suggested_edits* | list[object] | list of related suggested_edits objects |
assignees* | list[object] | list of related assignee objects |
pages* | list[object] | list of related pages objects |
notes* | list[object] | list of related notes objects |
labels* | list[object] | list of related labels objects |
automation_blockers* | list[object] | list of related automation_blockers objects |
* Attribute is only included in the request when specified in hook.sideload
. Please note that sideloading of modifier object from different organization is not supported and that sideloading can decrease performance. See also annotation sideloading section.
Example data sent to hook with sideloaded queue objects
{
"request_id": "ae7bc8dd-73bd-489b-a3d2-f5214b209591",
"timestamp": "2020-01-01T00:00:00.000000Z",
"base_url": "https://<example>.rossum.app",
"hook": "https://<example>.rossum.app/api/v1/hooks/781",
"action": "changed",
"event": "annotation_status",
...,
"queues": [
{
"id": 8198,
"name": "Received invoices",
"url": "https://<example>.rossum.app/api/v1/queues/8198",
...,
"metadata": {},
"use_confirmed_state": false,
"settings": {}
}
]
}
Annotation content event data format
annotation_content
event contains following additional event specific attributes.
Key | Type | Description |
---|---|---|
annotation | object | Annotation object. Content is pre-loaded with annotation data. Annotation data are enriched with normalized_value , see example. |
document | object | Document object (attribute annotations is excluded) |
updated_datapoints** | list[int] | List of IDs of datapoints that were changed by last or all predecessor events. |
queues* | list[object] | list of related queue objects |
modifiers* | list[object] | list of related modifier objects |
schemas* | list[object] | list of related schema objects |
emails* | list[object] | list of related email objects (for annotations created after email ingestion) |
related_emails* | list[object] | list of related emails objects (other related emails) |
relations* | list[object] | list of related relation objects |
child_relations* | list[object] | list of related child_relation objects |
suggested_edits* | list[object] | list of related suggested_edits objects |
assignees* | list[object] | list of related assignee objects |
pages* | list[object] | list of related pages objects |
notes* | list[object] | list of related notes objects |
labels* | list[object] | list of related labels objects |
automation_blockers* | list[object] | list of related automation_blockers objects |
* Attribute is only included in the request when specified in hook.sideload
. Please note that sideloading of modifier object from different organization is not supported and that sideloading can decrease performance. See also annotation sideloading section.
** If the run_after
attribute chains the hooks, the updated_datapoints will contain a list of all datapoint ids that were updated by any of the predecessive hooks. Moreover, in case of add
operation on a multivalue table, the updated_datapoints
will contain the id
of the multivalue, the id
of the new tuple datapoints and the id
of all the newly created cell datapoints.
Annotation content event response format
All of the annotation_content
events expect a JSON object with the following
optional lists in the response: messages
and operations
The message
object contains attributes:
Key | Type | Description |
---|---|---|
id | integer | Optional unique id of the relevant datapoint; omit for a document-wide issues |
type | enum | One of: error, warning or info. |
content | string | A descriptive message to be shown to the user |
detail | object | Detail object that enhances the response from a hook. (For more info refer to message detail) |
For example, you may use error for fatals like a missing required field, whereas info is suitable to decorate a supplier company id with its name as looked up in the suppliers database.
The operations
object describes operation to be performed on the annotation
data (replace, add, remove). Format of the operations
key is the same as for
bulk update of annotations, please refer to the annotation
data API for complete description.
Parsable error response format
It's possible to use the same format even with non-2XX response codes. In this type of response, operations
are not considered.
Example of a parsable error response
{
"messages": [
{
"id": "all",
"type": "error",
"content": "custom error message to be displayed in the UI"
}
]
}
initialize
event of annotation_content
action additionally accepts list of automation_blockers
objects.
This allows for manual creation of automation blockers of type extension
and therefore stops the automation without the need to create an error message.
The automation_blockers
object contains attributes
Key | Type | Description |
---|---|---|
id | integer | Optional unique id of the relevant datapoint; omit for a document-wide issues |
content | str | A descriptive message to be stored as an automation blocker |
Example of a response for
annotation_content.initialize
hook creating automation blockers
{
"messages": [...],
"operations": [...],
"automation_blockers": [
{
"id": 1357,
"content": "Unregistered vendor"
},
{
"content": "PO not found in the master data!"
}
]
}
Email received event data format
email
event contains following additional event specific attributes.
Key | Type | Description |
---|---|---|
files | list[object] | List of objects with metadata of each attachment contained in the arriving email. |
headers | object | Headers extracted from the arriving email. |
body | object | Body extracted from the arriving email. |
URL | URL of the arriving email. | |
queue | URL | URL of the arriving email's queue. |
The files
object contains attributes:
Key | Type | Description |
---|---|---|
id | string | Some arbitrary identifier. |
filename | string | Name of the attachment. |
mime_type | string | MIME type of the attachment. |
n_pages | integer | Number of pages (defaults to 1 if it could not be acquired). |
height_px | float | Height in pixels (300 DPI is assumed for PDF files, defaults to null if it could not be acquired). |
width_px | float | Width in pixels (300 DPI is assumed for PDF files, defaults to null if it could not be acquired). |
document | URL | URL of related document object. |
The headers
object contains the same values as are available for initialization of values in email_header:<id>
(namely: from
, to
, reply-to
, subject,
message-id,
date`).
The body
object contains the body_text_plain
and body_text_html
.
Email received event response format
All of the email
events expect a JSON object with the following lists in the response: files
, additional_files
, extracted_original_sender
The files
object contains attributes:
Key | Type | Description |
---|---|---|
id | int | id of file that will be used for creating an annotation |
values | list[object] | This is used to initialize datapoint values. See values object description below |
The values
object consists of the following:
Key | Type | Description |
---|---|---|
id | string | Id of value - must start with email: prefix (to use this value refer to it in rir_field_names field in the schema similarly as described here). |
value | string | String value to be used when annotation content is being constructed |
This is useful for filtering out unwanted files by some measures that are not available in Rossum by default.
The additional_files
object contains attributes:
Key | Type | Description |
---|---|---|
document | URL | URL of the document object that should be included, must be from the same queue. Documents without Annotation will be skipped |
values | list[object] | This is used to initialize datapoint values. See values object description above |
The extracted_original_sender
object looks as follows:
Key | Type | Description |
---|---|---|
extracted_original_sender | email_address_object | Information about sender containing keys email and name . |
This is useful for updating the email address field on email object with a new sender name and email address.
Upload created event data format
upload
event contains following additional event specific attributes.
Key | Type | Description |
---|---|---|
files | list[object] | List of objects with metadata of each uploaded document. |
documents | list[object] | List of document objects corresponding with the files object. |
upload | object | Object representing the upload. |
metadata | object | Client data passed in through the upload resource to create annotations with. |
URL | URL of the arriving email or null if the document was uploaded via API. |
The files
object contains attributes:
Key | Type | Description |
---|---|---|
document | URL | URL of the uploaded document object. |
prevent_importing | bool | If set no annotation is going to be created for the document or if already existing it is not going to be switched to importing status. |
values | list[object] | This is used to initialize datapoint values. See values object description below |
queue | URL | URL of the queue the document is being uploaded to. |
annotation | URL | URL of the documents annotation or null if it doesn't exist. |
The values
object consists of the following:
Key | Type | Description |
---|---|---|
id | string | Id of value (to use this value refer to it in rir_field_names field in the schema similarly as described here). |
value | string | String value to be used when annotation content is being constructed |
Upload created event response format
All of the upload
events expect a JSON object with the files
object list in the response.
The files
object contains attributes:
Key | Type | Description |
---|---|---|
document | URL | URL of the uploaded document object. |
prevent_importing | bool | If set no annotation is going to be created for the document or if already exists it is not going to be switched to importing status. Optional, default false. |
Validating payloads from Rossum
Example of hook receiver, which verifies the validity of Rossum request
import hashlib
import hmac
from flask import Flask, request, abort
app = Flask(__name__)
SECRET_KEY = "<Your secret key stored in hook.config.secret>" # never store this in code
@app.route("/test_hook", methods=["POST"])
def test_hook():
digest = hmac.new(SECRET_KEY.encode(), request.data, hashlib.sha1).hexdigest()
try:
prefix, signature = request.headers["X-Elis-Signature"].split("=")
except ValueError:
abort(401, "Incorrect header format")
if not (prefix == "sha1" and hmac.compare_digest(signature, digest)):
abort(401, "Authorization failed.")
return
For authorization of payloads, the shared secret method is used.
When a secret token is set in hook.config.secret
, Rossum uses it to create a hash signature with each payload.
This hash signature is passed along with each request in the headers as X-Elis-Signature
.
The goal is to compute a hash using hook.config.secret
and the request body,
and ensure that the signature produced by Rossum is the same. Rossum uses HMAC SHA1 signature.
Webhook requests may be autenticated using a client SSL certificate, see Hook API for reference.
Access to Rossum API
You can access Rossum API from the Webhook. Each execution gets unique API key. The key is valid
for 10 minutes or until Rossum receives a response from the Webhook. You can set token_lifetime_s
up to 2 hours to keep
the token valid longer. The API key and the environment's base URL are passed to webhooks as a first-level attributes
rossum_authorization_token
and base_url
within the webhook payload.
Serverless Function Extension
Serverless functions allows to extend Rossum functionality without setup and maintenance of additional services.
Webhooks and Serverless functions share a basic setup: input and output data format and error handling. They are both configured using a hook API object.
Unlike webhooks, serverless functions do not send the event and action notifications to a specific URL. Instead, the function's code snippet is executed within the Rossum platform. See function API description for details about how to setup a serverless function and connect it to the queue.
Supported events and their actions
For description of supported events, actions and input/output data examples, please refer to Webhook Extensions section.
Supported runtimes
Currently Rossum supports NodeJS 18 runtime
nodejs18.x
and NodeJS 22 runtime
nodejs22.x
to execute JavaScript functions and python3.12
to execute Python. If you would like to use
another runtime, please let us know at product@rossum.ai.
Please be aware that we may eventually deprecate and remove runtimes in the future.
The python 3.8
runtime is being phased out by the serverless vendors and so it has been scheduled to be
discontinued by Feb 28 2025. From this date, creation of the hooks with runtime python3.8
is going to start
returning an error response. From March 31, update of such hooks is not going to be possible anymore without
switching to the higher runtime. The recommended action is to upgrade to the up-to-date python3.12
runtime.
Python Runtime and Rossum Transaction Scripts
The python3.12
runtime includes a txscript
module that provides convenience
functionality for working with Rossum objects,
in particular in the context of the annotation_content
event.
Implementation
Example serverless function usable for
annotation_content
event (Python implementation).
'''
This custom serverless function example demonstrates showing messages to the
user on the validation screen, updating values of specific fields, and
executing actions on the annotation.
See https://elis.rossum.ai/api/docs/#rossum-transaction-scripts for more examples.
'''
from txscript import TxScript, default_to, substitute
def rossum_hook_request_handler(payload: dict) -> dict:
t = TxScript.from_payload(payload)
for row in t.field.line_items:
if default_to(row.item_amount_base, 0) >= 1000000:
t.show_warning('Value is too big', row.item_amount_base)
# Remove dashes from document_id
# Note: This type of operation is strongly discouraged in serverless
# functions, since the modification is non-transparent to the user and
# it is hard to trace down which hook modified the field.
# Always prefer making a formula field!
t.field.document_id = substitute(r'-', '', t.field.document_id)
if default_to(t.field.amount_total, 0) > 1000000:
print("postponing")
t.annotation.action("postpone")
return t.hook_response()
return t.hook_response()
Example serverless function usable for
annotation_content
event (JavaScript/NodeJS implementation).
// This serverless function example can be used for annotation_content events
// (e.g. updated action). annotation_content events provide annotation
// content tree as the input.
//
// The function below shows how to:
// 1. Display a warning message to the user if "item_amount_base" field of
// a line item exceeds a predefined threshold
// 2. Removes all dashes from the "invoice_id" field
//
// item_amount_base and invoice_id should be fields defined in a schema.
// --- ROSSUM HOOK REQUEST HANDLER ---
// The rossum_hook_request_handler is an mandatory main function that accepts
// input and produces output of the rossum serverless function hook.
// @param {Object} payload - see https://example.rossum.app/api/docs/#annotation-content-event-data-format
// @returns {Object} - the messages and operations that update the annotation content
exports.rossum_hook_request_handler = async (payload) => {
const content = payload.annotation.content;
try {
// Values over the threshold trigger a warning message
const TOO_BIG_THRESHOLD = 1000000;
// List of all datapoints of item_amount_base schema id
const amountBaseColumnDatapoints = findBySchemaId(
content,
'item_amount_base',
);
const messages = [];
for (var i = 0; i < amountBaseColumnDatapoints.length; i++) {
// Use normalized_value for comparing values of Date and Number fields (https://example.rossum.app/api/docs/#content-object)
if (amountBaseColumnDatapoints[i].content.normalized_value >= TOO_BIG_THRESHOLD) {
messages.push(
createMessage(
'warning',
'Value is too big',
amountBaseColumnDatapoints[i].id,
),
);
}
}
// There should be only one datapoint of invoice_id schema id
const [invoiceIdDatapoint] = findBySchemaId(content, 'invoice_id');
// "Replace" operation is returned to update the invoice_id value
const operations = [
createReplaceOperation(
invoiceIdDatapoint,
invoiceIdDatapoint.content.value.replace(/-/g, ''),
),
];
// Return messages and operations to be used to update current annotation data
return {
messages,
operations,
};
} catch (e) {
// In case of exception, create and return error message. This may be useful for debugging.
const messages = [
createMessage('error', 'Serverless Function: ' + e.message)
];
return {
messages,
};
}
};
// --- HELPER FUNCTIONS ---
// Return datapoints matching a schema id.
// @param {Object} content - the annotation content tree (see https://example.rossum.app/api/docs/#annotation-data)
// @param {string} schemaId - the field's ID as defined in the extraction schema(see https://example.rossum.app/api/docs/#document-schema)
// @returns {Array} - the list of datapoints matching the schema ID
const findBySchemaId = (content, schemaId) =>
content.reduce(
(results, dp) =>
dp.schema_id === schemaId ? [...results, dp] :
dp.children ? [...results, ...findBySchemaId(dp.children, schemaId)] :
results,
[],
);
// Create a message which will be shown to the user
// @param {number} datapointId - the id of the datapoint where the message will appear (null for "global" messages).
// @param {String} messageType - the type of the message, any of {info|warning|error}. Errors prevent confirmation in the UI.
// @param {String} messageContent - the message shown to the user
// @returns {Object} - the JSON message definition (see https://example.rossum.app/api/docs/#annotation-content-event-response-format)
const createMessage = (type, content, datapointId = null) => ({
content: content,
type: type,
id: datapointId,
});
// Replace the value of the datapoint with a new value.
// @param {Object} datapoint - the content of the datapoint
// @param {string} - the new value of the datapoint
// @return {Object} - the JSON replace operation definition (see https://example.rossum.app/api/docs/#annotation-content-event-response-format)
const createReplaceOperation = (datapoint, newValue) => ({
op: 'replace',
id: datapoint.id,
value: {
content: {
value: newValue,
},
},
});
To implement a serverless function, create a hook object of type
function
. Use code
object config attribute to specify a serialized source
code. You can use a code editor built-in to the Rossum UI, which also allows to
test and debug the function before updating the code of the function itself.
See Python and NodeJS examples of a serverless function implementation next to this section or check out this article (and others in the relevant section).
If there is an issue with an extension code itself, it will be displayed as CallFunctionException
in the
annotation view. Raising this exception usually means issues such as:
- undefined variables are called in the code
- the code is raising an exception as a response rather than returning a proper response
Testing
To write, test and debug a serverless function, you can refer to this guide.
Limitations
By default, no internet access is allowed from a serverless function, except the Rossum API. If your functions require internet access to work properly, e.g. when exporting data over API to ERP system, please let us know at product@rossum.ai.
Access Rossum API
The access to the Rossum API is granted through a proxy server,
HTTPS_PROXY
environment variable should be used to get its URL. See examples
below to see how to access Rossum API from a serverless function. Python's
urllib.request
can handle HTTPS proxy from environment variable on its own.
For Node.js the https.globalAgent
is set to an https-proxy-agent
instance
if present in the selected library pack.
Python code snippet to access Rossum API to get a list of queue names
import json
import urllib.request
def rossum_hook_request_handler(payload):
request = urllib.request.Request(
"https://<example>.rossum.app/api/v1/queues",
headers={"Authorization": "Bearer " + payload["rossum_authorization_token"]}
)
with urllib.request.urlopen(request) as response:
queues = json.loads(response.read())
queue_names = (q["name"] for q in queues["results"])
return {"messages": [{"type": "info", "content": ", ".join(queue_names)}]}
NodeJS code snippet to access Rossum API to get a list of queue names
exports.rossum_hook_request_handler = async (payload) => {
const token = payload.rossum_authorization_token;
queues = JSON.parse(await getFromRossumApi("https://<example>.rossum.app/api/v1/queues", token));
queue_names = queues.results.map(q => q.name).join(", ")
return { "messages": [{"type": "info", "content": queue_names}] };
}
const getFromRossumApi = async (url, token) => {
var http = require('http');
const proxy = new URL(process.env.HTTPS_PROXY);
const options = {
hostname: proxy.hostname,
port: proxy.port,
path: url,
method: 'GET',
headers: {
'Authorization': 'token ' + token,
},
};
const response = await new Promise((resolve, reject) => {
let dataString = '';
const req = http.request(options, function(res) {
res.on('data', chunk => {
dataString += chunk;
});
res.on('end', () => {
resolve({
statusCode: 200,
body: dataString
});
});
});
req.on('error', (e) => {
reject({
statusCode: 500,
body: 'Something went wrong!'
});
});
req.end()
});
return response.body
}
Connector Extension
The connector component is aimed at two main use-cases: applying custom business rules during data validation, and direct integration of Rossum with downstream systems.
The connector component receives two types of callbacks - an on-the-fly validation callback on every update of captured data, and an on-export save callback when the document capture is finalized.
The custom business rules take use chiefly of the on-the-fly validation callback. The connector can auto-validate and transform both the initial AI-based extractions and each user operator edit within the validation screen; based on the input, it can push user-visible messages and value updates back to Rossum. This allows for both simple tweaks (like verifying that two amounts sum together or transforming decimal points to thousand separators) and complex functionality like intelligent PO match.
The integration with downstream systems on the other hand relies mainly on the save callback. At the same moment a document is exported from Rossum, it can be imported to a downstream system. Since there are typically constraints on the captured data, these constraints can be enforced even within the validation callback.
Implement a connector
Connectors are designed to be implemented using a push-model using common HTTPS protocol. When annotation data is changed, or when data export is triggered, specific connector endpoint is called with annotation data as a request payload. The connector must be deployed with a public IP address so that the Rossum platform can call its endpoints; for testing, a middleware like ngrok or serveo may come useful.
Example of a valid no-op (empty)
validate
response
{"messages": [], "updated_datapoints": []}
Example of a valid no-op (empty)
save
response
{}
The connector API consists of two endpoints, validate and save, described below. A connector must always implement both endpoints (though they may not necessarily perform a function in a particular connector - see the right column for an empty reply example), the platform raises an error if it is not able to run a endpoint.
Setup a connector
The next step after implementing the first version of a connector is configuring it in the Rossum platform.
In Rossum, a connector object defines service_url
and params
for
construction of HTTPS requests and authorization_token
that is passed in
every request to authenticate the caller as the actual Rossum server. It may also
uniquely identify the organization when multiple Rossum organizations share the
same connector server.
To set-up a connector for a queue, create a connector object using either our API or the rossum tool – follow these instructions. A connector object may be associated with one or more queues. One queue can only have one connector object associated with it.
Connector API
Example data sent to connector (
validate
,save
)
{
"meta": {
"document_url": "https://<example>.rossum.app/api/v1/documents/6780",
"arrived_at": "2019-01-30T07:55:13.208304Z",
"original_file": "https://<example>.rossum.app/api/v1/original/bf0db41937df8525aa7f3f9b18a562f3",
"original_filename": "Invoice.pdf",
"queue_name": "Invoices",
"workspace_name": "EU",
"organization_name": "East West Trading Co",
"annotation": "https://<example>.rossum.app/api/v1/annotations/4710",
"queue": "https://<example>.rossum.app/api/v1/queues/63",
"workspace": "https://<example>.rossum.app/api/v1/workspaces/62",
"organization": "https://<example>.rossum.app/api/v1/organizations/1",
"modifier": "https://<example>.rossum.app/api/v1/users/27",
"updated_datapoint_ids": ["197468"],
"modifier_metadata": {},
"queue_metadata": {},
"annotation_metadata": {},
"rir_poll_id": "54f6b9ecfa751789f71ddf12",
"automated": false
},
"content": [
{
"id": "197466",
"category": "section",
"schema_id": "invoice_info_section",
"children": [
{
"id": "197467",
"category": "datapoint",
"schema_id": "invoice_number",
"page": 1,
"position": [916, 168, 1190, 222],
"rir_position": [916, 168, 1190, 222],
"rir_confidence": 0.97657,
"value": "FV103828806S",
"validation_sources": ["score"],
"type": "string"
},
{
"id": "197468",
"category": "datapoint",
"schema_id": "date_due",
"page": 1,
"position": [938, 618, 1000, 654],
"rir_position": [940, 618, 1020, 655],
"rir_confidence": 0.98279,
"value": "12/22/2018",
"validation_sources": ["score"],
"type": "date"
},
{
"id": "197469",
"category": "datapoint",
"schema_id": "amount_due",
"page": 1,
"position": [1134, 1050, 1190, 1080],
"rir_position": [1134, 1050, 1190, 1080],
"rir_confidence": 0.74237,
"value": "55.20",
"validation_sources": ["human"],
"type": "number"
}
]
},
{
"id": "197500",
"category": "section",
"schema_id": "line_items_section",
"children": [
{
"id": "197501",
"category": "multivalue",
"schema_id": "line_items",
"children": [
{
"id": "198139",
"category": "tuple",
"schema_id": "line_item",
"children": [
{
"id": "198140",
"category": "datapoint",
"schema_id": "item_desc",
"page": 1,
"position": [173, 883, 395, 904],
"rir_position": null,
"rir_confidence": null,
"value": "Red Rose",
"validation_sources": [],
"type": "string"
},
{
"id": "198142",
"category": "datapoint",
"schema_id": "item_net_unit_price",
"page": 1,
"position": [714, 846, 768, 870],
"rir_position": null,
"rir_confidence": null,
"value": "1532.02",
"validation_sources": ["human"],
"type": "number"
}
]
}
]
}
]
}
]
}
All connector endpoints, representing particular points in the
document lifetime, are simple verbs that receive a JSON POST
ed and
potentially expect a JSON returned in turn.
The authorization type and authorization token is passed as an Authorization
HTTP header. Authorization type may be secret_key
(shared secret) or Basic
for HTTP basic authentication.
Please note that for Basic authentication, authorization_token
is passed
as-is, therefore it must be set to a correct base64 encoded value. For example
username connector
and password secure123
is encoded as
Y29ubmVjdG9yOnNlY3VyZTEyMw==
authorization token.
Connector requests may be autenticated using a client SSL certificate, see Connector API for reference.
Errors
If a connector does not implement an endpoint, it may return HTTP status 404. An endpoint may fail, returning either HTTP 4xx or HTTP 5xx; for some endpoints (like validate and save), this may trigger a user interface message; either the error key of a JSON response is used, or the response body itself in case it is not JSON. The connector endpoint save can be called in asynchronous (default) as well as synchronous mode (useful for embedded mode).
Data format
The received JSON object contains two keys, meta
carrying the metadata and content
carrying endpoint-specific content.
The metadata identify the concerned document, containing attributes:
Key | Type | Description |
---|---|---|
document_url | URL | document URL |
arrived_at | timestamp | A time of document arrival in Rossum (ISO 8601) |
original_file | URL | Permanent URL for the document original file |
original_filename | string | Filename of the document on arrival in Rossum |
queue_name | string | Name of the document's queue |
workspace_name | string | Name of the document's workspace |
organization_name | string | Name of the document's organization |
annotation | URL | Annotation URL |
queue | URL | Document's queue URL |
workspace | URL | Document's workspace URL |
organization | URL | Document's organization URL |
modifier | URL | Modifier URL |
modifier_metadata | object | Metadata attribute of the modifier, see metadata |
queue_metadata | object | Metadata attribute of the queue, see metadata |
annotation_metadata | object | Metadata attribute of the annotation, see metadata |
rir_poll_id | string | Internal extractor processing id |
updated_datapoint_ids | list[string] | Ids of objects that were recently modified by user |
automated | bool | Flag whether annotation was automated |
A common class of content is the annotation tree, which is a JSON object that can contain nested datapoint objects, and matches the schema datapoint tree.
Intermediate nodes have the following structure:
Key | Type | Description |
---|---|---|
id | integer | A unique id of the given node |
schema_id | string | Reference mapping the node to the schema tree |
category | string | One of section, multivalue, tuple |
children | list | A list of other nodes |
Datapoint (leaf) nodes structure contains actual data:
Key | Type | Description |
---|---|---|
id | integer | A unique id of the given node |
schema_id | string | Reference mapping the node to the schema tree |
category | string | datapoint |
type | string | One of string, date or number, as specified in the schema |
value | string | The datapoint value, string represented but normalizes, to that they are machine readable: ISO format for dates, a decimal for numbers |
page | integer | A 1-based integer index of the page, optional |
position | list[float] | List of four floats describing the x1, y1, x2, y2 bounding box coordinates |
rir_position | list[float] | Bounding box of the value as detected by the data extractor. Format is the same as for position . |
rir_confidence | float | Confidence (estimated probability) that this field was extracted correctly. |
Annotation lifecycle with a connector
If an asynchronous connector is deployed to a queue, an annotation status will change from reviewing
to exporting
and subsequently to exported
or failed_export
. If no connector extension is deployed to a queue or if the attribute asynchronous
is set to false
, an annotation status will change from reviewing
to exported
(or failed_export
) directly.
Endpoint: validate
This endpoint is called after the document processing has finished, when operator opens a document
in the Rossum verification interface and then every time after operator updates a field. After the
processing is finished, the initial validate call is marked with initial=true
URL parameter.
For the other calls, only /validate
without the parameter is called.
The request path is fixed to /validate
and cannot be changed.
It may:
- validate the given annotation tree and return a list of messages commenting on it (e.g. pointing out errors or showing matched suppliers).
- update the annotation tree by returning a list of replace, add and remove operations
Both the messages and the updated data are shown in the verification interface. Moreover, the messages may block confirmation in the case of errors.
This endpoint should be fast as it is part of an interactive workflow.
Receives an annotation tree as content.
Returns a JSON object with the lists: messages
, operations
and updated_datapoints
.
Keys messages
, operations
(optional)
The description of these keys was moved to the Hook Extension. See the description here.
Key updated_datapoints
(optional, deprecated)
We also support a simplified version of updates using updated_datapoints
response key. It only supports updates (no add or remove operations) and is now
deprecated. The updated datapoint object contains attributes:
Key | Type | Description |
---|---|---|
id | string | A unique id of the relevant datapoint, currently only datapoints of category datapoint can be updated |
value | string | New value of the datapoint. Value is formatted according to the datapoint type (e.g. date is string representation of ISO 8601 format). |
hidden | boolean | Toggle for hiding/showing of the datapoint, see datapoint |
options | list[object] | Options of the datapoint -- valid only for type=enum , see enum options |
position | list[float] | New position of the datapoint, list of four numbers. |
Validate endpoint should always return 200 OK status.
An error message returned from the connector prevents user from confirming the document.
Endpoint: save
This endpoint is called when the invoice transitions to the exported
state.
Connector may process the final document annotation and save it to the target
system. It receives an annotation tree as content
. The request path is fixed
to /save
and cannot be changed.
The save endpoint is called asynchronously (unless synchronous mode is set in related connector object. Timeout of the save endpoint is 60 seconds.
For successful export, the request should have 2xx status.
Example of successful
save
response without messages in UI
HTTP/1.1 204 No Content
HTTP/1.1 200 OK
Content-Type: text/plain
this response body is ignored
HTTP/1.1 200 OK
Content-Type: application/json
{
"messages": []
}
When messages are expected to be displayed in the UI, they should be sent in the same format as in validate endpoint.
Example of successful
save
response with messages in UI
HTTP/1.1 200 OK
Content-Type: application/json
{
"messages": [
{
"content": "Everything is OK.",
"id": null,
"type": "info"
}
]
}
If the endpoint fails with an HTTP error and/or message of type error
is received,
the document transitions to the failed_export
state - it is then available
to the operators for manual review and re-queuing to the to_review
state
in the user interface. Re-queuing may be done also programmatically via
the API using a PATCH call to set to_review
annotation status. Patching
annotation status to exporting
state triggers an export retry.
Example of unsuccessful
save
response with messages in UI
HTTP/1.1 422 Unprocessable Entity
Content-Type: application/json
{
"messages": [
{
"content": "Even though this message is info, the export will fail due to the status code.",
"id": null,
"type": "info"
}
]
}
HTTP/1.1 500 Internal Server Error
Content-Type: text/plain
An errror message "Export failed." will show up in the UI
HTTP/1.1 200 OK
Content-Type: application/json
{
"messages": [
{
"content": "Proper status code could not be set.",
"id": null,
"type": "error"
}
]
}
Custom UI Extension
Sometimes users might want to extend the behavior of UI validation view with something special. That should be the goal of custom UI extensions.
Buttons
Currently, there are two different ways of using a custom button:
- Popup Button - opens a specific URL in the web browser
- Validate Button - triggers a standard validate call to connector
If you would like to read more about how to create a button, see the Button schema.
Popup Button
Popup Button opens a website completely managed by the user in a separate tab. It runs in parallel to the validation interface session in the app. Such website can be used for any interface that will assist operators in the reviewing process.
Example Use Cases of Popup Button:
- opening an email linked to the annotated document
- creating a new item in external database according to extracted data
Communication with the Validation Interface
You can communicate with the validation interface directly using standard browser API of window.postMessage. You will need to use window.addEventListeners in order to receive messages from the validation interface:
window.addEventListener('message', ({ data: { type, result } }) => {
// logic
});
The shape of the result
key is the same as the top level content
attribute of the annotation data response.
Once the listener is in place, you can post one of supported message types:
GET_DATAPOINTS
- returns the same tree structure you’d get by requesting annotation data
window.opener.postMessage(
{ type: 'GET_DATAPOINTS' },
'https://<example>.rossum.app'
)
UPDATE_DATAPOINT
- sends updated value to a Rossum datapoint. Only one datapoint value can be updated at a time.
window.opener.postMessage(
{
type: 'UPDATE_DATAPOINT',
data: {id: DATAPOINT_ID, value: "Updated value"}
},
'https://<example>.rossum.app'
)
FINISH
- informs the Rossum app that the popup process is ready to be closed. After this message is posted, popup will be closed and Rossum app will trigger a validate call.
window.opener.postMessage(
{ type: 'FINISH' },
'https://<example>.rossum.app'
);
Providing message type to postMessage lets Rossum interface know what operation user requests and determines the type of the answer which could be used to match appropriate response.
Validate button
If popup_url
key is missing in button’s schema, clicking the button will trigger a standard validate call to connector. In such call, updated_datapoint_ids
will contain the ID of the pressed button.
Note: if you’re missing some annotation data that you’d like to receive in a similar way, do contact our support team. We’re collecting feedback to further expand this list.
Extension Logs
For easy and efficient development process of the extensions, our backend logs requests
, responses
(if enabled) and
additional information, when the hook is being called.
Hook Log
The hook log objects consist of following attributes, where it also differentiates between the hook events as follows:
Base Hook Log object
These attributes are included in all the logs independent of the hook event
Key | Type | Description | Optional |
---|---|---|---|
timestamp* | str | Timestamp of the log-record | |
request_id | UUID | Hook call request ID | |
event | string | Hook's event | |
action | string | Hook's action | |
organization_id | int | ID of the associated Organization. | |
queue_id | int | ID of the associated Queue. | true |
hook_id | int | ID of the associated Hook. | |
hook_type | str | Hook type. Possible values: webhook , function |
|
log_level | str | Log-level. Possible values: INFO , ERROR , WARNING |
|
message | str | A log-message | |
request | str | Raw request sent to the Hook | true |
response | str | Raw response received from the Hook | true |
*Timestamp is of the ISO 8601 format with UTC timezone e.g. 2023-04-21T07:58:49.312655
Annotation Content or Annotation Status Hook Events
In addition to the Base Hook Log object, the annotation content
and annotation status
event hook logs contains
the following attributes:
Key | Type | Description | Optional |
---|---|---|---|
annotation_id | int | ID of the associated Annotation. | true |
Email Hook Events
In addition to the Base Hook Log object, the email
event hook logs contains the following attributes:
Key | Type | Description | Optional |
---|---|---|---|
email_id | int | ID of the associated Email. | true |
Source IP Address ranges
Rossum will use these source IP addresses for outgoing connections to your services (e.g. when sending requests to a webhook URL):
Europe (Ireland):
- 34.254.110.123
- 52.209.175.153
- 54.217.193.239
- 54.246.127.143
Europe 2 (Frankfurt):
- 3.75.26.254
- 3.126.211.68
- 3.126.98.96
- 3.76.159.143
US (N. Virginia):
- 3.222.161.192
- 50.19.104.88
- 52.2.120.212
- 18.213.174.191
JP (Tokyo):
- 3.115.38.171
- 35.74.141.62
- 35.75.49.12
- 52.194.128.167
You can use the list to limit incoming connections on a firewall. The list may be updated eventually, please update your configuration at least once per three months.
If you have a customer-specific deployment, contact Rossum support for a specific IP list.
Rossum Transaction Scripts
The Rossum platform can evaluate snippets of Python code that can manipulate
business transactions processed by Rossum - Transaction Scripts (or TxScripts).
The principal use of these TxScript snippets is
to automatically fill in computed values of formula
type fields.
The code can be also evaluated as a serverless function based extension that is
hooked to the annotation_content
event.
The TxScript Python environment is based on Python 3.12 or newer, in addition including a variety of additional predefined functions and variables. The environment has been designed so that code operating on Rossum objects is very short, easy to read and write by both humans and LLMs, and many simple tasks are doable even by non-programmers (who could however e.g. build an Excel spreadsheet).
The environment is special in the following ways:
Predefined variables allowing easy access to Rossum objects.
Some environment-specific helper functions and aliases.
How code is evaluated specifically in formula field context to yield a computed value.
The TxScript environment provides accessors to Rossum objects associated with
the event that triggered the code evaluation.
The event context is generally available through a txscript.TxScript
object;
calling the object methods and modifying the attributes (such as raising
messages or modifying field values) controls the event hook response.
Example of a no-op serverless function instantiating the
TxScript
object
from txscript import TxScript
def rossum_hook_request_handler(payload: dict) -> dict:
t = TxScript.from_payload(payload)
print(t)
return t.hook_response()
In serverless functions,
this object must be explicitly imported and instantiated using a .from_payload()
function. The .hook_response()
method yields a dict representing the
prescribed event hook response (with keys such as "messages"
, "operations"
etc.) that can be directly returned from the handler.
Meanwhile, in formula fields it is instantiated automatically and its existence is entirely transparent to the developer as the object's attributes and methods are directly available as globals of the formula fields code.
Pythonized Rossum objects
The TxScript environment provides instances of several pertinent Rossum objects.
These instances are directly available in globals namespace in formula fields, and
as atributes of the TxScript
instance within serverless functions.
Fields Object
A field
object is provided that allows access to the fields of
annotation content.
Attributes
Object attributes correspond to annotation fields, e.g. field.amount_total
will evaluate
to the value of the amount_total
field. The attributes behave specially:
The field value types are pythonized. String fields are
str
type, number fields arefloat
type, date fields aredatetime.date
instances.Since number fields are of type
float
, they should always be rounded when tested for equality (because e.g. 0.1 + 0.2 isn't exactly 0.3 in floating-point arithmetics):round(field.amount_total, 2) == round(field.amount_total_base, 2)
In other words, this expression referencing table columns will behave intuitively:
if all(not is_empty(field.item_amount_base.all_values)):
sum(default_to(field.item_amount_tax.all_values, 0) * 0.9 + field.item_amount_base.all_values)
- You can access all in-multivalue field ids (table columns or simple multivalues)
via the
.all_values
property (e.g.field.item_amount.all_values
). Its value is a special sequence objectTableColumn
that behaves similarly to alist
, but with operators applying elementwise or distributive to scalars (NumPy-like). Outside a single row context, the.all_values
property is the only legal way to work with these field ids.
Example of iterating over rows in a formula field
for row in field.line_items:
if not is_empty(row.item_amount) and row.item_amount < 0:
show_warning("Negative amount", row.item_amount)
Example of iterating over rows in serverless function hook
from txscript import TxScript, is_empty
def rossum_hook_request_handler(payload: dict) -> dict:
t = TxScript.from_payload(payload)
for row in t.field.line_items:
if not is_empty(row.item_amount) and row.item_amount < 0:
t.show_warning("Negative amount", row.item_amount)
return t.hook_response()
You can access individual multivalue tuple rows by accessing the multivalue or tuple field id, which provides a list of
field
-like objects that provide in-row tuple field members as attributes named by their field id.While
field.amount_total
evaluates to a float-like value (or other types), the value also provides anattr
attribute that gives access to all field schema, field object value and field object value content API object attributes (i.e. one can writefield.amount_total.attr.rir_confidence
). Attributesposition
,page
,validation_sources
,hidden
andoptions
are read-write.Fields that are not set (or are in an error state due to an invalid value) evaluate to a
None
-like value (except strings which evaluate to""
), but because of the above they are in fact not pure PythonNone
s. Therefore, they must not be tested for usingis None
. Instead, convenience helpersis_empty(field.amount_total)
anddefault_to(field.amount_total, 0)
should be used. These helpers also behave correctly on string fields as well.
Example of updating field values in a serverless function hook
from txscript import TxScript, is_empty, default_to
def rossum_hook_request_handler(payload: dict) -> dict:
t = TxScript.from_payload(payload)
if not is_empty(t.field.amount_tax_base):
# Note: This type of operation is strongly discouraged in serverless
# functions, since the modification is non-transparent to the user and
# it is hard to trace down which hook modified the field.
# Always prefer making amount_total a formula field!
t.field.amount_total = t.field.amount_tax_base + default_to(t.field.amount_tax, 0)
# Merge po_number_external to the po_numbers multivalue
if not is_empty(t.field.po_number_external):
t.field.po_numbers.all_values.remove(t.field.po_number_external)
t.automation_blocker("External PO", t.field.po_numbers)
else:
t.field.po_number_external.attr.hidden = True
# Filter out non-empty line items and add a discount line item
t.field.line_items = [row for row in t.field.line_items if not is_empty(row.item_amount)]
if "10% discount" in t.field.terms and not is_empty(t.field.amount_total):
t.field.line_items.append({"item_amount": -t.field.amount_total * 0.1, "item_description": "10% discount"})
t.field.line_items[-1].item_amount.attr.validation_sources.append("connector")
t.field.line_items[-1].item_description.attr.validation_sources.append("connector")
t.field.po_match.attr.options = [{"label": f"PO: {po}", "value": po} for po in t.field.po_numbers.all_values]
t.field.po_match.attr.options += t.field.default_po_enum.attr.options
# Update the currently selected enum option if the value fell out of the list
if (
len(t.field.po_match.attr.options) > 0
and t.field.po_match not in [po.value for po in t.field.po_match.attr.options]
):
t.field.po_match = t.field.po_match.attr.options[0].value
return t.hook_response()
You can assign values to the field attributes and modify the multivalue lists, which will be reflected back in the app once your hook finishes. (This is not permitted in the read-only context of formula fields.) You may construct values of tuple rows as dicts indexed by column schema ids.
You can modify the
field.*.attr.validation_sources
list and it will be reflected back in the app once your hook finishes. It is not recommended to perform any operation except.append("connector")
(automates the field).For
enum
type fields, you can modify thefield.*.attr.options
list and it will be reflected back in the app once your hook finishes. Elements of the list are objects with thelabel
andvalue
attribute each. You may construct new elements as dicts with thelabel
andvalue
keys.
Annotation Object
An annotation
object is provided, representing the pertinent annotation
(Note that this object is not available in Formula Fields.)
Attributes
The status
and previous_status
attributes contains the annotation status string.
The url
attribute contains the API URL of the annotation object.
The raw_data
attribute is a dict containing all attributes
of the annotation API object.
Methods
Example of rejecting an annotation
from txscript import TxScript
def rossum_hook_request_handler(payload: dict) -> dict:
t = TxScript.from_payload(payload)
if round(t.field.amount_total) != round(t.field.amount_total_base + t.field.amount_tax):
annotation.action("reject", note_content="Amounts do not match")
if t.field.amount_total > 100000:
annotation.action("postpone")
return t.hook_response()
The action(verb: str, **args)
method issues a POST
on the annotation
API object for a given verb in the form POST /v1/annotations/{id}/{verb}
, passing
additional arguments as specified.
(Notable verbs are reject
, postpone
and delete
.)
Note that Rossum authorization token passing must be enabled on the hook.
TxScript Functions
Several functions are provided that map 1:1 to common extension hook return values.
These functions are directly available in globals namespace in formula fields, and
as methods of the TxScript
instance within serverless functions.
Example of raising a message in a formula field
if field.date_issue < date(2024, 1, 1):
show_warning("Issue date long in the past", field.date_issue)
Example of raising a message in serverless function hook
from txscript import TxScript
def rossum_hook_request_handler(payload: dict) -> dict:
t = TxScript.from_payload(payload)
if t.field.date_issue < date(2024, 1, 1):
t.show_warning("Issue date long in the past", field.date_issue)
return t.hook_response()
The show_error()
, show_warning()
and show_info()
functions raise a message,
either document-wide or attached to a particular field. As arguments, they take
the message text (content
key) and optionally the field to attach the message to
(converted to the id
key). If no field is passed, a document-level message is
created.
For example, you may use show_error()
for fatals like a missing required field,
whereas show_info()
is suitable to decorate a supplier company id with its name
as looked up in the suppliers database.
Example of a formula raising an automation blocker
if not is_empty(field.amount_total) and field.amount_total < 0:
automation_blocker("Total amount is negative", field.amount_total)
The automation_blocker()
function analogously
raises an automation blocker, creating automation blockers
of type extension
and therefore stopping the
automation without the need to create an error message.
The function signature is the same as for the show... methods above.
Helper Functions and Aliases
Whenever a helper function is available, it should be used preferentially.
This is for the sake of better usability for admin users, but also because
these functions are e.g. designed to seamlessly work with TableColumn
instances.
All identifiers below are directly available in globals namespace in formula fields.
Within serverless functions, they can be imported as from txscript import ...
(or all of them obtained via from txscript import *
).
Helper Functions
The is_empty(field.amount_total)
boolean function returns True if the given
field has no value set. Use this instead of testing for None.
The default_to(field.order_id, "INVALID")
returns either the field value,
or a fallback value (string INVALID in this example) in case it is not set.
Convenience Aliases
All string manipulations should be performed using substitute(...)
,
which is an alias for re.sub
.
These identifiers are automatically imported:
from datetime import date, timedelta
import re
Formula Fields
The Rossum Transaction Scripts can be evaluated in the context of a formula-type field to automatically compute its value.
In this context, the field
object is read-only, i.e. side-effects on
values of other fields are prohibited (though you can still attach a message
or automation blocker to another field).
The annotation
object is not available.
This example sets the formula field value to either 0 or the output of the specified regex substitution.
if field.order_id == "INVALID":
show_warning("Falling back to zero", field.order_id)
"0"
else:
substitute(r"[^0-9]", r"", field.order_id)
The Python code is evaluated just as Python's interactive mode would run it, using the last would-be printed value as the formula field value. In other words, the value of the last evaluated expression in the code is used as the new value of the field.
In case the field is within a multivalue tuple, it is evaluated for each
cell of that column, i.e. within each row. Referring to other fields within
the row via the field
object accesses the value of the respective single row cell
(just like the row
object when iterating over multivalue tuple rows). Referring
to fields outside the multivalue tuple via the field
object still works as usual.
Thus, in a definition of field.item_amount
formula, field.item_quantity
refers
to the quantity value of the current row, while you can still also access
field.amount_total
header field. Further, field._index
provides the row number.
Field dependencies of formula fields are determined automatically. The only caveat
is that in case you iterate over line item rows within the formula field code, you
must name your iterator row
.
Automation
All imported documents are processed by the data extraction process to obtain values of fields specified in the schema. Extracted values are then available for validation in the UI.
Using per-queue automation settings, it is possible to skip manual UI validation step and automatically switch document to confirmed state or proceed with the export of the document. Decision to export document or switch it to confirmed state is based on Queue settings.
Currently, there are three levels of automation:
No automation: User has to review all documents in the UI to validate extracted data (default).
Confidence automation: User only reviews documents with low data extraction confidence ("tick" icon is not displayed for one or more fields) or validation errors. By default, we automate documents that are duplicates and do not automate documents that edits (split) is proposed. You can change this in per-queue automation settings
Full automation: All documents with no validation errors are exported or switched to confirmed state only if they do not contain a suggested edit (split). You can change this in per-queue automation settings
An error triggered by a schema field constraint or connector validation blocks auto-export even in full-automation level. In such case, non-required fields with validation errors are cleared and validation is performed again. In case the error persists, the document must be reviewed manually, otherwise it is exported or switched to confirmed state.
Read more about the Automation framework on our developer hub.
Sources of field validation
Low-confidence fields are marked in the UI by an "eye" icon, we consider them
to be not validated. On the API level they have an empty validation_sources
list.
Validation of a field may be introduced by various sources: data extraction
confidence above a threshold, computation of various checksums (e.g. VAT rate,
net amount and gross amount) or a human review. These validations are recorded in
the validation_source
list. The data extraction confidence threshold may be
adjusted, see validation sources for details.
AI Confidence Scores
While there are multiple ways to automatically pre-validate fields, the most prominent one is score-based validation based on AI Core Engine confidence scores.
The confidence score predicted for each AI-extractd field is stored in the
rir_confidence
attribute. The score is a number between 0 and 1, and is
calibrated in such a way that it corresponds to the probability of a given
value to be correct. In other words, a field with score 0.80 is expected
to be correct 4 out of 5 times.
The value of the score_threshold
(can be set on queue,
or individually per datapoint in schema; default is 0.8)
attribute represents the minimum score that triggers
automatic validation. Because of the score meaning, this directly corresponds
to the achieved accuracy. For example, if a score threshold for validation is
set at 0.8, that gives an expected error rate of 20% for that field.
Autopilot
Autopilot is a automatic process removing "eye" icon from fields. This process is based on past occurrence of field value on documents which has been already processed in the same queue.
Read more about this Automation component on our developer hub.
Autopilot configuration
Default Autopilot configuration
{
"autopilot": {
"enabled": true,
"search_history":{
"rir_field_names": ["sender_ic", "sender_dic", "account_num", "iban", "sender_name"],
"matching_fields_threshold": 2
},
"automate_fields":{
"rir_field_names": [
"account_num",
"bank_num",
"iban",
"bic",
"sender_dic",
"sender_ic",
"recipient_dic",
"recipient_ic",
"const_sym"
],
"field_repeated_min": 3
}
}
}
Autopilot configuration can be modified in Queue.settings where you can set
rules for each queue.
If Autopilot is not explicitly disabled by switch enabled
set to false
, Autopilot is enabled.
Configuration is divided into two sections:
History search
This section configures process of finding documents from the same sender as the document which is currently being processed. Annotation is considered from the same sender if it contains fields with same rir_field_name and value as the current document.
When at least two fields listed in
rir_field_names
match values of the current document, document is is considered to have same sender
{
"search_history":{
"rir_field_names": ["sender_ic", "sender_dic", "account_num"],
"matching_fields_threshold": 2
}
}
Attribute | Type | Description |
---|---|---|
rir_field_names | list | List of rir_field_names used to find annotations from the same sender. This should contain fields which are unique for each sender. For example sender_ic or sender_dic .Please note that due to technical reasons it is not possible to use document_type in this field and it will be ignored. |
matching_fields_threshold | int | At least matching_fields_threshold fields must match current annotation in order to be considered from the same sender. See example on the right side. |
Automate fields
This section describes rules which will be applied on annotations found in previous step History search.
Field will have "eye" icon removed, if we have found at least field_repeated_min
fields with same rir_field_name and value
on documents found in step History search.
Attribute | Type | Description |
---|---|---|
rir_field_names | list | List of rir_field_names which can be validated based on past occurrence |
field_repeated_min | int | Number of times field must be repeated in order to be validated |
If any config section is missing, default value which you can see on the right side is applied.
Using Triggers
Trigger REST operations can be found here
When an event occurs, all triggers of that type will perform actions of their related objects:
Related object | Action | Description |
---|---|---|
Email template | Send email with the template to the event triggerer if automate=true |
Automatically respond to document vendors based on the document's content. The document has to come from an email |
Delete recommendations | Stop automation if one of the validation rules applies to the processed document | Based on the user's rules for delete recommendations, stop automation for the document which applies to these rules. The document requires manual evaluation |
Trigger Event Types
Trigger objects can have one of the following event types
Trigger Event type | Description (Trigger for an event of) |
---|---|
email_with_no_processable_attachments | An Email has been received without any processable attachments |
annotation_created | Processing of the Annotation started (Rossum received the Annotation) |
annotation_imported | Annotation data have been extracted by Rossum |
annotation_confirmed | Annotation was checked and confirmed by user (or automated) |
annotation_exported | Annotation was exported |
validation | Document is being validated |
Trigger Events Occurrence Diagram
To show an overview of the Trigger events and when they are happening, this diagram was created.
Trigger Condition
Simple condition validating the presence of
vendor_id
equal toMeat ltd.
{
"$and": [
{
"field.vendor_id": {
"$and": [
{"$exists": true},
{"$regex": "Meat ltd\\."}
]
}
}
]
}
Any required field is missing
{
"$and": [
{"required_field_missing": true}
]
}
At least one of the
iban
,date_due
, andsender_vat_id
fields is missing
{
"$and": [
{
"missing_fields": {
"$elemMatch": {
"$in": ["iban", "date_due", "sender_vat_id"]
}
}
}
]
}
Will match if a required field is missing in the annotation, and the annotation contains a
vendor_id
field with a value that does matchMilk( inc\.)?
regex. Or in other words, the trigger will activate if the Milk company sent us an invoice with missing data
{
"$and": [
{
"field.vendor_id": {
"$and": [
{"$exists": true},
{"$regex": "Milk( inc\\.)?"}
]
}
},
{"required_field_missing": true}
]
}
Will match if at least one of the
document_type
(Receipt
,Other
),language
(CZ
,EN
,CH
), orcurrency
(USD
,CZK
) field match.
{
"$or": [
{
"field.document_type": {
"$in": ["Receipt", "Other"]
},
"field.language": {
"$in": ["CZ", "EN", "CH"]
},
"field.currency": {
"$in": ["CZK", "USD"]
}
}
]
}
Will match if filename is a subset of the specified regular expression.
{
"$or": [
{
"filename": {"$regex": "Milk( inc\\.)?"}
}
]
}
Will match if filename is a subset of one of the specified regular expressions.
{
"$or": [
{
"filename": {
"$or": [
{"$regex": "Milk( inc\\.)?"},
{"$regex": "Barn( inc\\.)?"}
]
}
}
]
}
Will match if a number of pages in the processed document is higher than the specified threshold.
{
"$or": [
{
"number_of_pages": {
"$gt": 10
}
}
]
}
A subset of MongoDB Query Language. The annotation will get converted into JSON records behind the scenes. The trigger gets activated if at least one such record matches the condition according to the MQL query rules. A null
condition matches any record, just like {}
. Record format:
{ "field": { "{schema_id}": string | null, }, "required_field_missing": boolean, "missing_fields": string[], }
Supported MQL subset based on the trigger event type:
All trigger event types:
{}
Only annotation_imported
, annotation_confirmed
, and annotation_exported
trigger event types:
{ "$and": [ {"field.{schema_id}": {"$and": [{"$exists": true}, REGEX]}} ] }
Only annotation_imported
trigger event type:
{ "$and": [ {"field.{schema_id}": {"$and": [{"$exists": true}, REGEX]}}, {"required_field_missing": true}, {"missing_fields": {"$elemMatch": {"$in": list[str[schema_id]]}} ] }
Only validation
trigger event type:
{ "$or": [ {"field.document_type": {"$in": list[str[document_type]]}, {"field.language": {"$in": list[str[language]]}, {"field.currency": {"$in": list[str[currency]]}, {"number_of_pages": {"$gt": 10}, {"filename": REGEX} ] }
{ "$or": [ {"field.document_type": {"$in": list[str[document_type]]}, {"field.language": {"$in": list[str[language]]}, {"field.currency": {"$in": list[str[currency]]}, {"number_of_pages": {"$gt": 10}, {"filename": {"$or": [REGEX, REGEX]} ] }
Field | Required | Description |
---|---|---|
field.{schema_id} | A field contained in the Annotation data. The schema_id is the schema id it got extracted under |
|
required_field_missing | Any of the schema-required fields is missing. (*) Can not be combined with missing_fields |
|
missing_fields | At least one of the schema fields is missing. (*) Can not be combined with required_field_missing |
|
field.{validation_field} | A field contained a list of Delete Recommendation data. The validation_field is the schema id it got extracted under |
|
number_of_pages | A threshold value for the number of pages. A document with more pages is matched by the trigger. | |
filename | The filename or subset of filenames of the document is to match. | |
REGEX | true | Either {"$regex": re2} or {"$not": {"$regex": re2}} **. Uses re2 regex syntax |
(*) A field is considered missing if no value for it was extracted by the extraction engine with rir_confidence
score of at least 0.95
.
(**) The $not
option for REGEX is not valid for the validation
trigger.
Triggering Email Templates
Email template REST operations can be found here.
To set up email template trigger automation, link an email template object to a trigger object and set its automate
attribute to true
. Currently, only one trigger can be linked. To set up the recipient(s) of the automated emails,
you can use built-in placeholders or direct values in the to
, cc
, and bcc
fields in email templates.
Only some email template types and some trigger event types can be linked together:
Template type | Allowed trigger events |
---|---|
custom | * |
email_with_no_processable_attachments | email_with_no_processable_attachments |
rejection | annotation_imported |
rejection_default | annotation_imported |
Email templates of type rejection
and rejection_default
will also reject the associated annotation when triggered.
Every newly created queue has default email templates. Some of them have a trigger linked,
including an email template of type email_with_no_processable_attachments
which can not have its trigger unlinked
or linked to another trigger. To disable its automation, set its automate
attribute to false
.
Triggering Validation
Delete Recommendation REST operations can be found here.
To set up validation trigger automation, specify the rules for validation and set its enabled attribute to true
.
This trigger is only valid for the validation
trigger event.
Hooks and Triggers Workflow
Sometimes it may happen that there is a need to know, what triggers and hooks and when are they run. That can be found in this workflow.
Workflows
This feature must be explicitly enabled in queue settings.
Approval workflows
Approval workflows allow you to define multiple steps of approval process.
The workflow is started when the data extraction process is done (annotation is confirmed) - it enters in_workflow
status.
Then the annotation must be approved by defined approvers in order to be moved further (confirmed
or exported
status).
The annotation is moved to rejected
status if one of the assignees rejects it.
The current status of workflow is stored in workflow run object. All the events that happened during workflow can be tracked down by workflow activity resources.
Embedded Mode
In some use-cases, it is desirable to use only the per-annotation validation view of the Rossum application. Rossum may be integrated with other systems using so-called embedded mode.
In embedded mode, special URL is constructed and then used in iframe or popup browser window to show Rossum annotation view. Some view navigation widgets are hidden (such as home, postpone and delete buttons), so that user is only allowed to update and confirm all field values.
Embedded mode can be used to view annotations only in status to_review, reviewing, postponed, or confirmed.
Embedded mode workflow
The host application first uploads a document using standard Rossum
API. During this process, an annotation object
is created. It is possible to obtain a status of the annotation object and wait for
the status to become to_review
(ready for checking) using annotation endpoint.
As soon as importing of the annotation object has finished, an authenticated user
may call start_embedded endpoint to obtain a URL that is
to be included in iframe or popup browser window of the host application. Parameters of the call
are return_url
and cancel_url
that are used to redirect to in a browser when user finishes
the annotation.
The URL contains security token that is used by embedded Rossum application to access Rossum API.
When the checking of the document has finished, user clicks
on done button and host application is notified about finished annotation through
save
endpoint of the connector HTTP API. By default, this call is made asynchronously, which
causes a lag (up to a few seconds) between the click on done button and the call to save
endpoint. However, it is possible to switch the calls to synchronous mode by switching the
connector
asynchronous
toggle to false
(see connector for reference).
API Reference
For introduction to the Rossum API, see Overview
Most of the API endpoints require user to be authenticated, see Authentication for details.
Annotation
Example annotation object
{
"document": "https://<example>.rossum.app/api/v1/documents/314628",
"id": 314528,
"queue": "https://<example>.rossum.app/api/v1/queues/8199",
"schema": "https://<example>.rossum.app/api/v1/schemas/95",
"relations": [],
"pages": [
"https://<example>.rossum.app/api/v1/pages/558598"
],
"creator": "https://<example>.rossum.app/api/v1/users/1",
"modifier": null,
"modified_by": null,
"assigned_at": null,
"created_at": "2021-04-26T10:08:03.856648Z",
"confirmed_at": null,
"deleted_at": null,
"exported_at": null,
"export_failed_at": null,
"modified_at": null,
"purged_at": null,
"rejected_at": null,
"confirmed_by": null,
"deleted_by": null,
"exported_by": null,
"purged_by": null,
"rejected_by": null,
"status": "to_review",
"rir_poll_id": "54f6b9ecfa751789f71ddf12",
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/314528",
"content": "https://<example>.rossum.app/api/v1/annotations/314528/content",
"time_spent": 0,
"metadata": {},
"related_emails": [],
"email": "https://<example>.rossum.app/api/v1/emails/96743",
"automation_blocker": null,
"email_thread": "https://<example>.rossum.app/api/v1/email_threads/34567",
"has_email_thread_with_replies": true,
"has_email_thread_with_new_replies": false,
"organization": "https://<example>.rossum.app/api/v1/organizations/1",
"prediction": null,
"assignees": [],
"labels": []
}
An annotation object contains all extracted and verified data related to a document. Every document belongs to a queue and is related to the schema object, that defines datapoint types and overall shape of the extracted data.
Commonly you need to use queue the upload endpoint to create annotations instances.
Attribute | Type | Default | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the annotation | true | |
url | URL | URL of the annotation | true | |
status | enum | Status of the document, see Document Lifecycle for list of value. | ||
document | URL | Related document. | ||
queue | URL | Queue that annotation belongs to. | ||
schema | URL | Schema that defines content shape. | ||
relations | list[URL] | (Deprecated) List of relations that annotation belongs to. | ||
pages | list[URL] | List of rendered pages. | true | |
creator | URL | User that created the annotation. | true | |
created_at | datetime | Timestamp of object's creation. | true | |
modifier | URL | User that last modified the annotation. | ||
modified_by | URL | User that last modified the annotation. | ||
modified_at | datetime | Timestamp of last modification. | true | |
assigned_at | datetime | Timestamp of last assignment to a user or when the annotation was started being annotated. | true | |
confirmed_at | datetime | Timestamp when the annotation was moved to status confirmed . |
true | |
deleted_at | datetime | Timestamp when the annotation was moved to status deleted . |
true | |
exported_at | datetime | Timestamp of finished export. | true | |
export_failed_at | datetime | Timestamp of failed export. | true | |
purged_at | datetime | Timestamp when was annotation purged. | true | |
rejected_at | datetime | Timestamp when the annotation was moved to status rejected . |
true | |
confirmed_by | URL | User that confirmed the annotation. | true | |
deleted_by | URL | User that deleted the annotation. | true | |
exported_by | URL | User that exported the annotation. | true | |
purged_by | URL | User that purged the annotation. | true | |
rejected_by | URL | User that rejected the annotation. | true | |
rir_poll_id | string | Internal. | ||
messages | list[object] | [] |
List of messages from the connector (save). | |
content | URL | Link to annotation data (datapoint values), see Annotation data. | true | |
suggested_edit | URL | Link to Suggested edit object. | true | |
time_spent | float | 0 | Total time spent while validating the annotation. | |
metadata | object | {} |
Client data. | |
automated | boolean | false | Whether annotation was automated | |
related_emails | list[URL] | List emails related with annotation. | true | |
URL | Related email that the annotation was imported by (for annotations imported by email). | true | ||
automation_blocker | URL | Related automation blocker object. | true | |
email_thread | URL | Related email thread object. | true | |
has_email_thread_with_replies | bool | Related email thread contains more than one incoming email. |
true | |
has_email_thread_with_new_replies | bool | Related email thread contains an unread incoming email. |
true | |
organization | URL | Link to related organization. | true | |
automatically_rejected | bool | Read-only field of automatically_rejected annotation | true | |
prediction | object | Internal. | true | |
assignees | list[URL] | List of assigned users (only for internal purposes). | true | |
labels | list[URL] | List of selected labels | true | |
restricted_access | bool | false | Access to annotation is restricted | true |
Start annotation
Start annotation of object
319668
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/319668/start'
{
"annotation": "https://<example>.rossum.app/api/v1/annotations/319668",
"session_timeout": "01:00:00"
}
POST /v1/annotations/{id}/start
Start reviewing annotation by the calling user. Can be called with statuses
payload to specify allowed statuses for starting annotation.
Returns 409 Conflict
if annotation fails to be in one of the specified states.
Attribute | Type | Default | Description | required |
---|---|---|---|---|
statuses | list[str] | ["to_review", "reviewing", "postponed", "confirmed"] | List of allowed states for the starting annotation to be in | false |
Response
Status: 200
Returns object with annotation
and session_timeout
keys.
Start embedded annotation
Start embedded annotation of object
319668
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"return_url": "https://service.com/return", "cancel_url": "https://service.com/cancel"}' \
'https://<example>.rossum.app/api/v1/annotations/319668/start_embedded'
{
"url": "https://<example>.rossum.app/embedded/document/319668#authToken=1c50ae8552441a2cda3c360c1e8cb6f2d91b14a9"
}
POST /v1/annotations/{id}/start_embedded
Start embedded annotation.
Key | Description | Required |
---|---|---|
return_url | URL browser is redirected to in case of successful user validation | No |
cancel_url | URL browser is redirected to in case of user canceling the annotation | No |
postpone_url | URL browser is redirected to in case of user postponing the annotation | No |
delete_url | URL browser is redirected to in case of user deleting the annotation | No |
max_token_lifetime_s | Duration (in seconds) for which the token will be valid (default: queue's session_timeout , max: 162 hours) |
No |
Response
Status: 200
Returns object with url
that specifies URL to be used in the browser
iframe/popup window. URL includes a token that is valid for this document only
for a limited period of time.
Create embedded URL for annotation
Create embedded URL for annotation object
319668
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"return_url": "https://service.com/return", "cancel_url": "https://service.com/cancel"}' \
'https://<example>.rossum.app/api/v1/annotations/319668/create_embedded_url'
{
"url": "https://<example>.rossum.app/embedded/document/319668#authToken=1c50ae8552441a2cda3c360c1e8cb6f2d91b14a9",
"status": "exported"
}
POST /v1/annotations/{id}/create_embedded_url
Similar to start embedded annotation endpoint but can be called for annotations with all statuses and does not switch status.
Key | Description | Required |
---|---|---|
return_url | URL browser is redirected to in case of successful user validation | No |
cancel_url | URL browser is redirected to in case of user canceling the annotation | No |
postpone_url | URL browser is redirected to in case of user postponing the annotation | No |
delete_url | URL browser is redirected to in case of user deleting the annotation | No |
max_token_lifetime_s | Duration (in seconds) for which the token will be valid (default: queue's session_timeout , max: 162 hours) |
No |
Response
Status: 200
Key | Type | Description |
---|---|---|
url | str | URL to be used in the browser iframe/popup window. URL includes a token that is valid for this document only for a limited period of time. |
status | enum | Status of annotation, see annotation lifecycle. |
Confirm annotation
Confirm annotation of object
319668
Key | Default | Description | Required |
---|---|---|---|
skip_workflows | False | Whether to skip workflows evaluation. Read more about workflows here. bypass_workflows_allowed must be set to true in workflows queue settings in order to use this feature |
No |
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/319668/confirm'
POST /v1/annotations/{id}/confirm
Confirm annotation, switch status to exported
(or exporting
).
If the confirmed
state is enabled, this call moves the annotation
to the confirmed
status.
Confirm annotation can optionally accept time spent data as described in annotation time spent, for internal use only.
Response
Status: 204
Cancel annotation
Cancel annotation of object
319668
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/319668/cancel'
POST /v1/annotations/{id}/cancel
Cancel annotation, switch its status back to to_review
or postponed
.
Cancel annotation can optionally accept time spent data as described in annotation time spent, for internal use only.
Response
Status: 204
Approve annotation
Approve annotation of object
319668
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-d '{}' \
'https://<example>.rossum.app/api/v1/annotations/319668/approve'
POST /v1/annotations/{id}/approve
Approve annotation, switch its status to exporting
or confirmed
, or it stays in in_workflow
, depending on the evaluation of the current workflow step
Only admin, organization group admin, or an assigned user with approver role can approve annotation in this state. A workflow activity record object will be created.
Response
Status: 200
Key | Type | Description |
---|---|---|
status | string | New status of the annotation |
Assign annotation
Assign annotation
319668
to the user1122
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-d '{"annotations": ["https://<example>.rossum.app/api/v1/annotations/319668", \
"assignees": ["https://<example>.rossum.app/api/v1/users/1122"], \
"note_content": "I just want to reassign as I do not care about it"]}' \
'https://<example>.rossum.app/api/v1/annotations/assing'
POST /v1/annotations/assign
Change assignees
of the annotation.
Key | Type | Description | Required | Default |
---|---|---|---|---|
annotations | list[URL] | List of annotations to change the assignees of (currenlty we support only one annotation at a time) | yes | |
assignees | list[URL] | List of users to be added as annotation assignees | yes | |
note_content | string | Content of the note that will be added to the workflow activity of action reassign (only applicable for annotation in in_workflow state) |
no | "" |
Response
Status: 204
Reject annotation
Reject annotation of object
319668
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-d '{"note_content": "Rejected due to invalid due date."}' \
'https://<example>.rossum.app/api/v1/annotations/319668/reject'
POST /v1/annotations/{id}/reject
Reject annotation, switch its status to rejected
.
Key | Description | Required | Default |
---|---|---|---|
note_content | Rejection note | No | "" |
automatically_rejected | For internal use only (designates whether annotation is displayed as automatically rejected) in the statistics | No | false |
Reject annotation can optionally accept time spent data as described in annotation time spent, for internal use only.
If rejecting in in_workflow
state, the annotation.workflow_run.workflow_status
will also be set to rejected
and a workflow activity record object will be created. Only admin, organization group admin, or an assigned user can approve annotation in this state.
Response
Status: 200
Key | Type | Description |
---|---|---|
status | string | New status of the annotation (rejected). |
note | URL | Link to Note object. |
Switch to postponed
Postpone annotation status of object
319668
topostponed
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/319668/postpone'
POST /v1/annotations/{id}/postpone
Switch annotation status to postpone
.
Postpone annotation can optionally accept time spent data as described in annotation time spent, for internal use only.
Response
Status: 204
Switch to deleted
Switch annotation status of object
319668
todeleted
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/319668/delete'
POST /v1/annotations/{id}/delete
Switch annotation status to deleted
. Annotation with status deleted
is still available in Rossum UI.
Delete annotation can optionally accept time spent data as described in annotation time spent, for internal use only.
Response
Status: 204
Rotate the annotation
Rotate the annotation
319668
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' -d '{"rotation_deg": 270}' \
'https://<example>.rossum.app/api/v1/annotations/319668/rotate"
POST /v1/annotations/{id}/rotate
Rotate a document. It requires one parameter: rotation_deg
.
Status of the annotation is switched to importing
and the extraction phase begins over again.
After the new extraction, the value from rotation_deg
field is copied to pages rotation field rotation_deg
.
Key | Description |
---|---|
rotation_deg | States degrees by which the document shall be rotated. Possible values: 0, 90, 180, 270. |
Response
Status: 204
Edit the annotation
Edit the annotation
319668
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' -d '{"documents": [{"pages": [{"page": "https://<example>.rossum.app/api/v1/pages/1", "rotation_deg": 90}, {"page": "https://<example>.rossum.app/api/v1/pages/2", "rotation_deg": 90}], "metadata": {"document": {"my_info": "something I want to store here"}, "annotation": {"some_key": "some value"}}}, {"pages": [{"page": "https://<example>.rossum.app/api/v1/pages/2", "rotation_deg": 180}]}]}' \
'https://<example>.rossum.app/api/v1/annotations/319668/edit"
{
"results": [
{
"document": "https://<example>.rossum.app/api/v1/documents/320551",
"annotation": "https://<example>.rossum.app/api/v1/documents/320221"
},
{
"document": "https://<example>.rossum.app/api/v1/documents/320552",
"annotation": "https://<example>.rossum.app/api/v1/documents/320222"
}
]
}
POST /v1/annotations/{id}/edit
Edit a document. It requires parameter documents
that contains description of requested edits for annotations that should be
created from the original annotation. Description of each edit contains list of pages and rotation degree.
If used on an annotation in a way that after the editing only one document remains,
the original annotation will be edited. If multiple documents are to be created
after the call, status of the original annotation is switched to split
,
status of the newly created annotations is importing
and the
extraction phase begins over again. To split the annotation into multiple
annotations, consider using the latest dedicated split endpoint instead.
Key | Description |
---|---|
documents | Documents that should be created from the original annotation. Each document contains list of pages and rotation degree. |
The documents
object consists of following available parameters:
Key | Type | Description |
---|---|---|
pages | list[object] | A list of objects containing information about page (URL) and rotation_deg (integer) |
metadata | object | (optional) A dictionary with attributes document and annotation for adding/updating metadata of edited annotation and its related document. |
Response
Status: 200
Returns results
with a list of objects:
Key | Type | Description |
---|---|---|
document | URL | URL to the document that was newly created after calling the edit endpoint. |
annotation | URL | URL of the annotation assigned to the document. |
Split the annotation
Split the annotation
319668
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' -d '{"documents": [{"pages": [{"page": "https://<example>.rossum.app/api/v1/pages/1", "rotation_deg": 90}, {"page": "https://<example>.rossum.app/api/v1/pages/2", "rotation_deg": 90}], "metadata": {"document": {"my_info": "something I want to store here"}, "annotation": {"some_key": "some value"}}}]}' \
'https://<example>.rossum.app/api/v1/annotations/319668/split"
{
"results": [
{
"document": "https://<example>.rossum.app/api/v1/documents/320551",
"annotation": "https://<example>.rossum.app/api/v1/documents/320221"
}
]
}
POST /v1/annotations/{id}/split
Split a document based on editing rules. It requires parameter documents
that contains description of requested edits for annotations that should be
created from the original annotation. Description of each edit contains list of pages and rotation degree.
When using this endpoint, status of the original annotation is switched to split
,
status of the newly created annotations is importing
and the
extraction phase begins over again.
This endpoint can be used for splitting annotations also from webhook listening to
annotation_content.initialize
event and action.
Key | Description |
---|---|
documents | Documents that should be created from the original annotation. Each document contains list of pages and rotation degree. |
The documents
object consists of following available parameters:
Key | Type | Description |
---|---|---|
pages | list[object] | A list of objects containing information about page (URL) and rotation_deg (integer) |
metadata | object | (optional) A dictionary with attributes document and annotation for adding/updating metadata of edited annotation and its related document. |
Edit annotation can optionally accept time spent data as described in annotation time spent, for internal use only.
Response
Status: 200
Returns results
with a list of objects:
Key | Type | Description |
---|---|---|
document | URL | URL to the document that was newly created after calling the edit endpoint. |
annotation | URL | URL of the annotation assigned to the document. |
Edit pages Start
Start splitting the document and all its child documents.
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/111/edit_pages/start'
{
"parent_annotation": "http://<example>.rossum.app/api/v1/annotations/111",
"children": [
{
"url": "http://<example>.rossum.app/api/v1/annotations/120",
"queue": "http://<example>.rossum.app/api/v1/queues/1",
"status": "reviewing",
"started": true,
"original_file_name": "large_4.pdf",
"parent_pages": [
{
"page": "http://<example>.rossum.app/api/v1/pages/142",
"rotation_deg": 0
},
{
"page": "http://<example>.rossum.app/api/v1/pages/143",
"rotation_deg": 0
},
{
"page": "http://<example>.rossum.app/api/v1/pages/144",
"rotation_deg": 0
}
]
},
{
"url": "http://<example>.rossum.app/api/v1/annotations/119",
"queue": "http://<example>.rossum.app/api/v1/queues/1",
"status": "reviewing",
"started": true,
"original_file_name": "large_3.pdf",
"parent_pages": [
{
"page": "http://<example>.rossum.app/api/v1/pages/139",
"rotation_deg": 0
},
{
"page": "http://<example>.rossum.app/api/v1/pages/140",
"rotation_deg": 0
},
{
"page": "http://<example>.rossum.app/api/v1/pages/141",
"rotation_deg": 0
}
]
}
],
"session_timeout": "01:00:00"
}
POST /v1/annotations/{id}/edit_pages/start
Starts editing the annotation and all its child documents (the documents into which the original document was split). The parent annotation must be in the to_review
, split
or reviewing
state (for the calling user).
This call will "lock" the parent and child annotations from being edited. It returns some basic information about the parent annotation and a list of its children. Children to which the current user does not have rights contains only limited information.
If the parent annotation cannot be "locked", an error is returned. If the child annotation cannot be locked, it is skipped and sent in a response with value started
=False.
Response
Status: 200
Returns object with following keys.
Key | Type | Description |
---|---|---|
parent_annotation | URL | URL of annotation |
children | list[object] | List of child annotation objects |
session_timeout | string | timeout in format "HH:MM:SS" |
The children
member object has following keys:
Key | Type | Description |
---|---|---|
url | URL | URL of the annotation |
queue | URL | URL of the queue |
status | string | Status of the parent annotation |
started | boolean | was annotation started or not |
original_file_name | string | File name of original document |
parent_pages | list[object] | List of annotation pages from parent document with its rotation. |
The parent_pages
member object has following keys:
Key | Type | Description |
---|---|---|
page | URL | URL of annotation |
rotation_deg | integer | Rotation in degrees |
Status: 403
User doesn't have a right to edit parent annotation.
Edit pages Cancel
Cancel splitting the document and its child documents.
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/111/edit_pages/cancel' -d \
'{"annotations": ["http://<example>.rossum.app/api/v1/annotations/119"], "cancel_parent": false, "processing_duration": {"time_spent": 10.0}}'
POST /v1/annotations/{id}/edit_pages/cancel
Cancel multiple started child annotations at once. By default cancel also parent annotation (optional).
Key | Type | Description |
---|---|---|
annotations | list[URL] | List of urls of child annotations to cancel. Must be in reviewing state. |
cancel_parent | boolean | Cancel parent annotation. Optional, default true. |
processing_duration | object | Optional processing_duration object |
Response
Status: 204
on success.
Status: 400
when preconditions are not met.
Edit pages
Split the document and move one of the new child documents into different queue.
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/111/edit_pages' -d \
'{"edit": [{"parent_pages": [{"page": "http://<example>.rossum.app/api/v1/pages/142", "rotation_deg": 90}]},{"parent_pages": [{"page": "http://<example>.rossum.app/api/v1/pages/141", "rotation_deg": 90}], "target_queue": "https://<example>.rossum.app/api/v1/queues/23"}], "stop_parent": true}'
{
"results": [
{
"document": "https://<example>.rossum.app/api/v1/documents/320551",
"annotation": "https://<example>.rossum.app/api/v1/annotations/320221"
},
{
"document": "https://<example>.rossum.app/api/v1/documents/320552",
"annotation": "https://<example>.rossum.app/api/v1/annotations/320222"
}
]
}
Join of two child documents (784, 785, each with one page) into single new document.
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/111/edit_pages' -d \
'{"edit": [{"parent_pages": [{"page":"https://<example>.rossum.app/api/v1/pages/1088", "rotation_deg": 0}, {"page": "https://<example>.rossum.app/api/v1/pages/1089", "rotation_deg": 0}], "document_name": "joined_pages.pdf"}],"delete": ["https://<example>.rossum.app/api/v1/annotations/784", "https://<example>.rossum.app/api/v1/annotations/785"]}'
{
"results": [
{"document": "https://<example>.rossum.app/api/v1/documents/320551","annotation": "https://<example>.rossum.app/api/v1/annotations/786"}
]
}
Move one child document into different queue.
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/111/edit_pages' -d \
'{"move": [{"annotation": "https://<example>.rossum.app/api/v1/annotations/784", "target_queue": "https://<example>.rossum.app/api/v1/queues/23"}]}'
{
"results": [
{"document": "https://<example>.rossum.app/api/v1/documents/320551","annotation": "https://<example>.rossum.app/api/v1/annotations/784"}
]
}
Delete one child document.
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/111/edit_pages' -d \
'{"delete": ["https://<example>.rossum.app/api/v1/annotations/784"]}'
{
"results": []
}
POST /v1/annotations/{parent_id}/edit_pages
Edit document pages, split and re-split already split document.
When using this endpoint, status of the original annotation (when not editing existing split) is switched to split
,
status of the newly created annotations is importing
and the
extraction phase begins over again.
This endpoint can be used for splitting annotations also from webhook listening to
annotation_content.initialize
event and action.
Key | Type | Description |
---|---|---|
delete | list[URL] | Optional list of urls of child annotations to delete. |
move | list[object] | Optional list of Move objects. |
edit | list[object] | Optional list of Edit objects. |
stop_reviewing | list[URL] | Optional list of urls of child annotations to stop reviewing. Must be in reviewing state. |
stop_parent | boolean | Stop also parent annotation. Optional, default true. |
edit_data_source | String | Optional source of edit data. Either automation , suggest , modified_suggest or manual . |
processing_duration | object | Optional processing_duration object. |
The Move object
has the following keys:
Key | Type | Description |
---|---|---|
annotation | URL | URL of annotation. |
target_queue | URL | URL of target queue. |
The Edit object
has the following keys:
Key | Type | Description |
---|---|---|
annotation | URL | Optional URL of annotation. |
target_queue | URL | Optional URL of target queue. |
document_name | String | Optional document name. When not provided, generated automatically. |
parent_pages | list[object] | List of parent pages with rotation. |
metadata | object | Metadata object. May contain objects annotation and metadata which are saved in created/edited annotation/document metadata. |
The Parent page
object has the following keys:
Key | Type | Description | Required | Default value |
---|---|---|---|---|
page | URL | URL of page. | yes | |
rotation_deg | int | Rotation angle in degrees with a step of 90 degrees | no | 0 |
Response
Status: 200
on success.
Returns results
with a list of objects:
Key | Type | Description |
---|---|---|
document | URL | URL to the document that was newly created after calling the edit endpoint. |
annotation | URL | URL of the annotation assigned to the document. |
Status: 400
when preconditions are not met.
Edit pages in-place
Edit pages of document and move to different queue.
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/111/edit_pages/in_place' -d \
'{"parent_pages": [{"page": "http://<example>.rossum.app/api/v1/pages/142", "rotation_deg": 90}], "target_queue": "https://<example>.rossum.app/api/v1/queues/23"}'
{
"results": [
{
"document": "https://<example>.rossum.app/api/v1/documents/2121",
"annotation": "https://<example>.rossum.app/api/v1/annotations/111"
}
]
}
POST /v1/annotations/{parent_id}/edit_pages/in_place
Edit existing document pages without creating new annotations. You can rotate pages, delete pages or move the annotation into another queue. This endpoint can be used for the embedded mode.
Key | Type | Description |
---|---|---|
parent_pages | list[object] | List of parent pages with rotation. |
target_queue | URL | Optional URL of target queue. |
metadata | object | Optional metadata object. May contain objects annotation and metadata which are saved in created/edited annotation/document metadata. |
edit_data_source | String | Optional source of edit data. Either automation , suggest , modified_suggest or manual . |
processing_duration | object | Optional processing_duration object. |
The Parent page
object has the following keys:
Key | Type | Description |
---|---|---|
page | URL | URL of page. |
rotation_deg | int | Rotation angle in deg. with step 90 deg. |
Response
Status: 200
on success.
Returns results
with a list of objects:
Key | Type | Description |
---|---|---|
document | URL | URL to the document that was newly created after calling the edit endpoint. |
annotation | URL | URL of the annotation assigned to the document. |
Status: 400
when preconditions are not met.
Search for text
Search for text in annotation
319668
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/319668/search?phrase=some'
{
"results": [
{
"rectangle": [
67.15157010915198,
545.9286363906203,
87.99106633081445,
563.4617583852776
],
"page": 1
},
{
"rectangle": [
45.27717884130982,
1060.3084761056693,
66.11667506297229,
1077.8415981003266
],
"page": 1
}
],
"status": "ok"
}
GET /v1/annotations/{id}/search
Search for a phrase in the document.
Argument | Type | Description |
---|---|---|
phrase | string | A phrase to search for |
tolerance | integer | Allowed Edit distance from the search phrase (number of removal, insertion or substitution operations that need to be performed for strings to match). Only used for OCR invoices (images, such as png or PDF with scanned images). Default value is computed as length(phrase)/4 . |
Response
Status: 200
Returns results
with a list of objects:
Key | Type | Description |
---|---|---|
rectangle | list[float] | Bounding box of an occurrence. |
page | integer | Page of occurrence. |
Search for annotations
Supported ordering: id
, arrived_at
, assigned_at
, assignees
, automated
, confirmed_at
, confirmed_by__username
,
confirmed_by
, created_at
, creator__username
, creator
, deleted_at
, deleted_by__username
, deleted_by
,
document
, exported_at
, exported_by__username
, exported_by
, export_failed_at
, has_email_thread_with_new_replies
,
has_email_thread_with_replies
, labels
, modified_at
, modifier__username
, modifier
, original_file_name
,
purged_at
, purged_by__username
, purged_by
, queue
, rejected_at
, rejected_by__username
, rejected_by
,
relations__key
, relations__parent
, relations__type
, rir_poll_id
, status
, workspace
, email_thread
,
email_sender
, field.<schema_id>.<format>
(where format
is one of number
, date
, string
).
Obtain only annotations matching a complex filter
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' \
-d '{"query": {"$and": [{"field.vendor_name.string": {"$eq": "ACME corp"}}, {"labels": {"$in": ["https://<example>.rossum.app/api/v1/labels/12", "https://<example>.rossum.app/api/v1/labels/34"]}}]}, "query_string": {"string": "explosives"}}' \
'https://<example>.rossum.app/api/v1/annotations/search?ordering=status,confirmed_by__username,field.amount_total.number'
{
"pagination": {
"total": 101,
"total_pages": 6,
"next": "https://<example>.rossum.app/api/v1/annotations/search?search_after=eyJxdWVyeV9oYXNoIjogImM2ZWIzNjA5MDI1NWNmNTg4ODk0YWE5MGZiMjVmZjBlIiwgInNlYXJjaF9hZnRlciI6IFsxNTg2NTMwMzI0MDAwLCAyXSwgInJldmVyc2VkIjogZmFsc2V9%3A1NYBmgNCV-Ssmf7G9rd9vXnBY-BuvCZWrD95wcb2jIg",
"previous": null
},
"results": [
{
"url": "https://<example>.rossum.app/api/v1/annotations/315777",
"content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
"document": "https://<example>.rossum.app/api/v1/documents/315877",
...
}
]
}
POST /v1/annotations/search
Search for annotations matching a complex filter
Key | Type | Description |
---|---|---|
query | object | A subset of MongoDB Query Language (see query definition below) |
query_string | object | Object with configuration for full-text search (see query string definition below) |
If query_string
is used together with query
, search is done as a conjunction of these expressions
(query_string
AND query
).
Search Query
A list of definitions under a $and
key:
Key | Type | Description |
---|---|---|
<meta_field> | object | Matches against annotation metadata according to <meta_field>. (See definition below) |
field.<schema_id>.<type> | object | Matches against annotation content value according to <schema_id> treating it as <type>. (See definition below) |
field.<schema_id>.type
is of type: string | number | date (in ISO 8601 format). Max. 256 characters long strings are allowed.
meta_field
can be one of:
Meta field name | Type |
---|---|
annotation |
URL |
arrived_at |
date |
assigned_at |
date |
assignees |
URL |
automated |
bool |
automatically_rejected |
bool |
confirmed_at |
date |
confirmed_by__username |
string |
confirmed_by |
URL |
created_at |
date |
creator__username |
string |
creator |
URL |
deleted_at |
date |
deleted_by__username |
string |
deleted_by |
URL |
document |
URL |
exported_at |
date |
exported_by__username |
string |
exported_by |
URL |
has_email_thread_with_new_replies |
bool |
has_email_thread_with_replies |
bool |
labels |
URL |
messages |
string |
modified_at |
date |
modifier__username |
string |
modifier |
URL |
original_file_name |
string |
purged_at |
date |
purged_by__username |
string |
purged_by |
URL |
queue |
URL |
rejected_at |
date |
rejected_by__username |
string |
rejected_by |
URL |
relations__key |
string |
relations__parent |
URL |
relations__type |
string |
restricted_access |
bool |
rir_poll_id |
string |
status |
string |
workspace |
URL |
email_thread |
URL |
email_sender |
string |
Search Query Objects
Key | Type | Description |
---|---|---|
$startsWith | string | Matches the start of a value. Must be at least 2 characters long. |
$anyTokenStartsWith | string | Matches the start of each token within a string. Must be at least 2 characters long. |
$containsPrefixes | string | Same as $anyTokenStartsWith but query is split into tokens (words). Must be at least 2 characters long. Example query quick brown matches quick brown fox but also brown quick dog or quickiest brown fox , but not quick dog . |
$emptyOrMissing | bool | Matches values that are empty or missing. When false , matches existing non-empty values. |
$eq | $ne | number | string | date | URL | Default MQL behavior |
$gt | $lt | $gte | $lte | number | string | date | Default MQL behavior |
$in | $nin | list[number | string | URL] | Default MQL behavior |
Related objects can be sideloaded and query fields can be used in the same way as when listing annotations.
Response
Status: 200
Returns paginated response with a list of annotation objects, like annotations list
Status: 410
Value of search_after
is not valid anymore. Retry the search with a different value.
Search Query String
Obtain only annotations matching prefix
explosive
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' \
-d '{"query_string": {"string": "expl"}}' \
'https://<example>.rossum.app/api/v1/annotations/search?ordering=status,confirmed_by__username,field.amount_total.number'
{
"pagination": {
"total": 101,
"total_pages": 6,
"next": "https://<example>.rossum.app/api/v1/annotations/search?search_after=eyJxdWVyeV9oYXNoIjogImM2ZWIzNjA5MDI1NWNmNTg4ODk0YWE5MGZiMjVmZjBlIiwgInNlYXJjaF9hZnRlciI6IFsxNTg2NTMwMzI0MDAwLCAyXSwgInJldmVyc2VkIjogZmFsc2V9%3A1NYBmgNCV-Ssmf7G9rd9vXnBY-BuvCZWrD95wcb2jIg",
"previous": null
},
"results": [
{
"url": "https://<example>.rossum.app/api/v1/annotations/315777",
"content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
"document": "https://<example>.rossum.app/api/v1/documents/315877",
...
}
]
}
Apply full-text search to datapoint values using a chosen term. The value is searched by
its prefix, separately for each term separated by whitespace, in case-insensitive way. Special characters
at the end of the strings are ignored. For example, when searching for a term Large drink
, all of the following
values passed would give a match: lar#
, lar dri
, dri
.
We search also in the non-extracted page data, if the data are available.
If query_string
is used together with query
, search is done as a conjunction of these expressions
(query_string
AND query
).
Key | Type | Description |
---|---|---|
string | string | String to be used for full-text search. At least 2 characters need to be passed to apply this search. Max. 256 characters long strings are allowed. |
Annotation search pagination
Pagination is set by query parameters of the URL. Request body and ordering mustn't be changed when listing through pages, otherwise 400
response is returned.
Key | Default | Type | Description |
---|---|---|---|
page_size | 20 | int | Number of results per page. The maximum value is 500 (*) |
search_after | null |
string | Encoded value acting as a cursor (do not try to modify, only for internal purposes). |
(*) For requests that sideload content
, the maximum value is limited to 100. Sideloading content for this endpoint is deprecated and
will be removed in the near future.
Convert grid to table data
Convert grid to tabular data in annotation
319623
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/319623/content/37507202/transform_grid_to_datapoints'
POST /v1/annotations/{id}/content/{id of the child node}/transform_grid_to_datapoints
Transform grid structure to tabular data of related multivalue object.
Response
Status: 200
All tuple datapoints and their children are returned.
Add new row to multivalue datapoint
Add row to annotation
319623
multivalue37507202
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/319623/content/37507202/add_empty'
POST /v1/annotations/{id}/content/{id of the child node}/add_empty
Adds a row to a multivalue table. This row will not be connected to the grid and modifications of the grid will not trigger any OCR on the cells of this row.
Response
Status: 200
Validate annotation content
Validate the content of annotation
319623
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' -d '{"updated_datapoint_ids": [37507204]}' \
'https://<example>.rossum.app/api/v1/annotations/319623/content/validate'
{
"messages": [
{
"id": "1038654",
"type": "error",
"content": "required",
"detail": {
"hook_id": "42345",
"hook_name": "Webhook 8365",
"request_id": "6166deb3-2f89-4fc2-9359-56cc8e3838e4",
"is_exception": true,
"timestamp": "2022-10-10T15:00:00.000000Z"
}
},
{
"id": "all",
"type": "error",
"content": "Whole document is invalid.",
"detail": {
"hook_id": "94634",
"hook_name": "Function 4934",
"request_id": "5477aeb2-8f43-3fe1-9279-23bc8e4121e5",
"is_exception": true,
"timestamp": "2022-10-10T15:00:00.000000Z"
}
},
{
"id": "1038456",
"type": "aggregation",
"content": "246.456",
"aggregation_type": "sum",
"schema_id": "vat_detail_tax2"
}
],
"updated_datapoints": [
{
"id": 37507205,
"url": "https://<example>.rossum.app/api/v1/annotations/319623/content/37507205",
"content": {
"value": "new value",
"page": 1,
"position": [
0.0,
1.0,
2.0,
3.0
],
"rir_text": null,
"rir_page": null,
"rir_position": null,
"rir_confidence": null,
"connector_position": [
0.0,
1.0,
2.0,
3.0
],
"connector_text": "new value"
},
"category": "datapoint",
"schema_id": "vat_rate",
"validation_sources": [
"connector",
"history"
],
"time_spent": 0.0,
"time_spent_overall": 0.0,
"options": [
{
"value": "value",
"label": "label"
}
],
"hidden": false
}
],
"suggested_operations": [
{
"op": "replace",
"id": "198143",
"value": {
"content": {
"value": "John",
"position": [
103,
110,
121,
122
],
"page": 1
},
"hidden": false,
"options": [],
"validation_sources": [
"human"
]
}
},
{
"op": "remove",
"id": "884061"
}
],
"matched_trigger_rules": [
{
"type": "page_count",
"value": 24,
"threshold": 10
},
{
"type": "filename",
"value": "spam.pdf",
"regex": "^spam.*"
},
{
"id": 198143,
"value": "foobar",
"type": "datapoint"
}
]
}
POST /v1/annotations/{id}/content/validate
Validate the content of an annotation.
At first, the content is sent to the validate hook of connected extension.
Then some standard validations (data type
, constraints
are checked) are carried out in Rossum.
Additionally, if the annotation's respective queue has enabled delete recommendation conditions,
they are evaluated as well.
Key | Type | Description |
---|---|---|
actions | list[enum] | Validation actions. Possible values : ["user_update"] , ["user_update", "updated"] or ["user_update", "started"] (default: ["user_update"] ) |
updated_datapoint_ids | list[int] | List of IDs of datapoints that were changed since last call of this endpoint. |
Response
Status: 200
Key | Type | Description |
---|---|---|
messages | list[object] | Bounding box of an occurrence. |
updated_datapoints | list[object] | Page of occurrence. |
suggested_operations | list[object] | Datapoint operations suggested as a result of validation. |
matched_trigger_rules | list[object] | Delete Recommendation rules that matched. |
Messages
The message object contains attributes:
Key | Type | Description |
---|---|---|
id | string | ID of the concerned datapoint; "all" for a document-wide issues |
type | enum | One of: error, warning, info or aggregation. |
content | string | A message shown in UI. Limited to 4096 characters. |
detail | object | Detail object that enhances the response from a hook. |
aggregation_type (*) | enum | Type of aggregation (currently supported "sum" aggregation type). |
schema_id (*) | string | Identifier of schema datapoint for which is aggregation computed. |
(*) Attribute present only in message with type "aggregation"
.
Message detail
The message detail object is present only in annotation_content hook events responses and contains following attributes:
Key | Type | Description |
---|---|---|
hook_id | int | ID of the responding hook. |
hook_name | string | Name of the responding hook. |
request_id | string | ID of the request preceding this hook's response. |
is_exception | bool | Flag signaling non-200 response from the hook. |
timestamp | string | Timestamp of the request preceding this hook's response. |
Updated datapoints
The updated datapoint object contains the subtrees of datapoints updated from an extension.
Suggested operations
The suggestions follow the same format as the one that can be specified in requests - please refer to the annotation data API for a complete description.
Matched trigger rules
The base of the response looks like this, the remaining fields depend on the "type" field and are prone to change.
Key | Type | Description |
---|---|---|
type | string | One of "page_count", "filename", "datapoint". |
List all annotations
List all annotations
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations'
{
"pagination": {
"total": 22,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"document": "https://<example>.rossum.app/api/v1/documents/315877",
"id": 315777,
"queue": "https://<example>.rossum.app/api/v1/queues/8236",
"schema": "https://<example>.rossum.app/api/v1/schemas/31336",
"pages": [
"https://<example>.rossum.app/api/v1/pages/561206"
],
"creator": "https://<example>.rossum.app/api/v1/users/1",
"modifier": null,
"modified_by": null,
"assigned_at": null,
"created_at": "2021-04-26T10:08:03.856648Z",
"confirmed_at": null,
"deleted_at": null,
"exported_at": null,
"export_failed_at": null,
"modified_at": null,
"purged_at": null,
"rejected_at": null,
"confirmed_by": null,
"deleted_by": null,
"exported_by": null,
"purged_by": null,
"rejected_by": null,
"status": "to_review",
"rir_poll_id": "54f6b9ecfa751789f71ddf12",
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/315777",
"content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
"time_spent": 0,
"metadata": {},
...
},
{
...
}
]
}
GET /v1/annotations
Retrieve all annotation objects.
Supported ordering: document
, document__arrived_at
, document__original_file_name
, modifier
,
modifier__username
, modified_by
, modified_by__username
, creator
, creator__username
,queue
, status
, created_at
,
assigned_at
,confirmed_at
, modified_at
, exported_at
, export_failed_at
, purged_at
, rejected_at
,
deleted_at
, confirmed_by
, deleted_by
, exported_by
, purged_by
, rejected_by
, confirmed_by__username
,
deleted_by__username
, exported_by__username
, purged_by__username
, rejected_by__username
Filters
Obtain only annotations with parent annotation 1500
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations?relations__parent=1500'
{
"pagination": {
"total": 2,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"document": "https://<example>.rossum.app/api/v1/documents/2",
"id": 2,
"queue": "https://<example>.rossum.app/api/v1/queues/1",
"schema": "https://<example>.rossum.app/api/v1/schemas/1",
"relations": [
"https://<example>.rossum.app/api/v1/relations/1"
],
...
"url": "https://<example>.rossum.app/api/v1/annotations/2",
...
},
{
"document": "https://<example>.rossum.app/api/v1/documents/3",
"id": 3,
"queue": "https://<example>.rossum.app/api/v1/queues/2",
"schema": "https://<example>.rossum.app/api/v1/schemas/2",
"relations": [
"https://<example>.rossum.app/api/v1/relations/1"
],
...
"url": "https://<example>.rossum.app/api/v1/annotations/3",
...
}
]
}
Filters may be specified to limit annotations to be listed.
Attribute | Description |
---|---|
status | Annotation status, multiple values may be separated using a comma |
id | List of ids separated by a comma |
modifier | User id |
confirmed_by | User id |
deleted_by | User id |
exported_by | User id |
purged_by | User id |
rejected_by | User id |
assignees | User id, multiple values may be separated using a comma |
labels | Label id, multiple values may be separated using a comma |
document | Document id |
queue | List of queue ids separated by a comma |
queue__workspace | List of workspace ids separated by a comma |
relations__parent | ID of parent annotation defined in related Relation object |
relations__type | Type of Relation that annotation belongs to |
relations__key | Key of Relation that annotation belongs to |
arrived_at_before | ISO 8601 timestamp (e.g. arrived_at_before=2019-11-15 ) |
arrived_at_after | ISO 8601 timestamp (e.g. arrived_at_after=2019-11-14 ) |
assigned_at_before | ISO 8601 timestamp (e.g. assigned_at_before=2019-11-15 ) |
assigned_at_after | ISO 8601 timestamp (e.g. assigned_at_after=2019-11-14 ) |
confirmed_at_before | ISO 8601 timestamp (e.g. confirmed_at_before=2019-11-15 ) |
confirmed_at_after | ISO 8601 timestamp (e.g. confirmed_at_after=2019-11-14 ) |
modified_at_before | ISO 8601 timestamp (e.g. modified_at_before=2019-11-15 ) |
modified_at_after | ISO 8601 timestamp (e.g. modified_at_after=2019-11-14 ) |
deleted_at_before | ISO 8601 timestamp (e.g. deleted_at_before=2019-11-15 ) |
deleted_at_after | ISO 8601 timestamp (e.g. deleted_at_after=2019-11-14 ) |
exported_at_before | ISO 8601 timestamp (e.g. exported_at_before=2019-11-14 22:00:00 ) |
exported_at_after | ISO 8601 timestamp (e.g. exported_at_after=2019-11-14 12:00:00 ) |
export_failed_at_before | ISO 8601 timestamp (e.g. export_failed_at_before=2019-11-14 22:00:00 ) |
export_failed_at_after | ISO 8601 timestamp (e.g. export_failed_at_after=2019-11-14 12:00:00 ) |
purged_at_before | ISO 8601 timestamp (e.g. purged_at_before=2019-11-15 ) |
purged_at_after | ISO 8601 timestamp (e.g. purged_at_after=2019-11-14 ) |
rejected_at_before | ISO 8601 timestamp (e.g. rejected_at_before=2019-11-15 ) |
rejected_at_after | ISO 8601 timestamp (e.g. rejected_at_after=2019-11-14 ) |
restricted_access | Boolean |
automated | Boolean |
has_email_thread_with_replies | Boolean (related email thread contains more than one incoming emails) |
has_email_thread_with_new_replies | Boolean (related email thread contains unread incoming email) |
search | String, see Annotation search |
Annotation search
If this filter is used, annotations are filtered based on full-text search in annotation's datapoint values, original file name, modifier user email and messages. Max. 256 characters allowed.
Query fields
Obtain only subset of annotation attributes
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations?fields=id,url'
{
"pagination": {
"total": 22,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 320332,
"url": "https://<example>.rossum.app/api/v1/annotations/320332"
},
{
"id": 319668,
"url": "https://<example>.rossum.app/api/v1/annotations/319668"
},
...
]
}
In order to obtain only subset of annotation object attributes, one can use query parameter fields
.
Argument | Description |
---|---|
fields | Comma-separated list of attributes to be included in the response. |
fields! | Comma-separated list of attributes to be excluded from the response. |
Sideloading
Sideload documents, modifiers and content
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations?sideload=modifiers,documents,content&content.schema_id=item_amount_total'
{
"pagination": {
"total": 22,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"document": "https://<example>.rossum.app/api/v1/documents/320432",
"id": 320332,
...,
"modifier": "https://<example>.rossum.app/api/v1/users/10775",
"status": "to_review",
"rir_poll_id": "a898b6bdc8964721b38e0160",
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/320332",
"content": "https://<example>.rossum.app/api/v1/annotations/320332/content",
"time_spent": 0,
"metadata": {}
},
...
],
"documents": [
{
"id": 320432,
"url": "https://<example>.rossum.app/api/v1/documents/320432",
...
},
...
],
"modifiers": [
{
"id": 10775,
"url": "https://<example>.rossum.app/api/v1/users/10775",
...
},
...
],
"content": [
{
"id": 19434,
"url": "https://<example>.rossum.app/api/v1/annotations/320332/content/19434",
"category": "datapoint",
"schema_id": "item_amount_total",
...
}
...
]
}
Sideload content filtered by schema_id
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations?sideload=content&content.schema_id=sender_id,vat_detail_tax'
{
"pagination": {
"total": 22,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"document": "https://<example>.rossum.app/api/v1/documents/320432",
"id": 320332,
...,
"modifier": "https://<example>.rossum.app/api/v1/users/10775",
"status": "to_review",
"rir_poll_id": "a898b6bdc8964721b38e0160",
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/320332",
"content": "https://<example>.rossum.app/api/v1/annotations/320332/content",
"time_spent": 0,
"metadata": {}
},
...
],
"content": [
{
"id": 15984,
"url": "https://<example>.rossum.app/api/v1/annotations/320332/content/15984",
"category": "datapoint",
"schema_id": "sender_id",
...
},
{
"id": 15985,
"url": "https://<example>.rossum.app/api/v1/annotations/320332/content/15985",
"category": "datapoint",
"schema_id": "vat_detail_tax",
...
},
...
]
}
In order to decrease the number of requests necessary for obtaining useful information about annotations, modifiers and documents can be sideloaded using query parameter sideload
. This parameter accepts comma-separated list of keywords: assignees
, automation_blockers
, confirmed_bys
, content
, deleted_bys
, documents
, emails
, exported_bys
, labels
, modifiers
, notes
, organizations
, pages
, purged_bys
, queues
, rejected_bys
, related_emails
, relations
, child_relations
, schemas
, suggested_edits
, workspaces
.
The response is then enriched by the requested keys, which contain lists of the sideloaded objects. Sideloaded content
can be filtered by schema_id
to obtain only a subset of datapoints in content part of response, but is a deprecated feature and will be removed in the future.
Filter on content
can be specified using query parameter content.schema_id
that accepts comma-separated list of required schema_id
s.
Response
Status: 200
Returns paginated response with a list of annotation objects.
Create an annotation
Create an annotation
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-d '{"status": "created", "document": "https://<example>.rossum.app/api/v1/documents/315877", "queue": "https://<example>.rossum.app/api/v1/queues/8236", "content_data": [{category: "datapoint", schema_id: "doc_id", content: {value: "122"}, "validation_sources": []}], "values": {}, "metadata": {}}' \
'https://<example>.rossum.app/api/v1/annotations'
{
"document": "https://<example>.rossum.app/api/v1/documents/315877",
"id": 315777,
"queue": "https://<example>.rossum.app/api/v1/queues/8236",
"schema": "https://<example>.rossum.app/api/v1/schemas/31336",
"pages": [
"https://<example>.rossum.app/api/v1/pages/561206"
],
"creator": "https://<example>.rossum.app/api/v1/users/1",
"modifier": null,
"modified_by": null,
"assigned_at": null,
"created_at": "2021-04-26T10:08:03.856648Z",
"confirmed_at": null,
"deleted_at": null,
"exported_at": null,
"modified_at": null,
"purged_at": null,
"rejected_at": null,
"confirmed_by": null,
"deleted_by": null,
"exported_by": null,
"purged_by": null,
"rejected_by": null,
"status": "created",
"rir_poll_id": null,
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/315777",
"content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
"time_spent": 0,
"metadata": {},
"related_emails": [],
"email": null,
...
}
POST /v1/annotations
Create an annotation object.
Normally you create annotations via the upload endpoint.
This endpoint could be used for creating annotation instances including their content and with status
set to an
explicitly requested value. Currently only created
is supported which is not touched by the rest of the platform
and is not visible via the Rossum UI. This allows for subsequent updates before switching the status to importing
so that it is passed through the rest of the upload pipeline.
The use-case for this is the upload.created
hook event where new
annotations could be created and the platform runtime then switches all such annotations' status to importing
.
Key | type | Description | Required |
---|---|---|---|
status | enum | Requested annotation status. Only created is currently supported. |
Yes |
document | URL | Annotation document. | Yes |
queue | URL | Target queue. | Yes |
content_data | list[object] | Array of annotation data content objects. | No |
values | object | Values object as described in upload endpoint. | No |
metadata | object | Client data. | No |
Response
Status: 200
Returns annotation object.
Retrieve an annotation
Get annotation object
315777
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/315777'
{
"document": "https://<example>.rossum.app/api/v1/documents/315877",
"id": 315777,
"queue": "https://<example>.rossum.app/api/v1/queues/8236",
"schema": "https://<example>.rossum.app/api/v1/schemas/31336",
"pages": [
"https://<example>.rossum.app/api/v1/pages/561206"
],
"creator": "https://<example>.rossum.app/api/v1/users/1",
"modifier": null,
"modified_by": null,
"assigned_at": null,
"created_at": "2021-04-26T10:08:03.856648Z",
"confirmed_at": null,
"deleted_at": null,
"exported_at": null,
"export_failed_at": null,
"modified_at": null,
"purged_at": null,
"rejected_at": null,
"confirmed_by": null,
"deleted_by": null,
"exported_by": null,
"purged_by": null,
"rejected_by": null,
"status": "to_review",
"rir_poll_id": "54f6b9ecfa751789f71ddf12",
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/315777",
"content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
"time_spent": 0,
"metadata": {},
"related_emails": [],
"email": null,
...
}
GET /v1/annotations/{id}
Get an annotation object.
Response
Status: 200
Returns annotation object.
Update an annotation
Update annotation object
315777
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"document": "https://<example>.rossum.app/api/v1/documents/315877", "queue": "https://<example>.rossum.app/api/v1/queues/8236", "status": "postponed"}' \
'https://<example>.rossum.app/api/v1/annotations/315777'
{
"document": "https://<example>.rossum.app/api/v1/documents/315877",
"id": 315777,
"queue": "https://<example>.rossum.app/api/v1/queues/8236",
...
"status": "postponed",
"rir_poll_id": "a898b6bdc8964721b38e0160",
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/315777",
"content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
"time_spent": 0,
"metadata": {},
"related_emails": [],
"email": null
}
PUT /v1/annotations/{id}
Update annotation object.
Response
Status: 200
Returns updated annotation object.
Update part of an annotation
Update status of annotation object
315777
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"status": "deleted"}' \
'https://<example>.rossum.app/api/v1/annotations/315777'
{
"document": "https://<example>.rossum.app/api/v1/documents/315877",
"id": 315777,
...
"status": "deleted",
"rir_poll_id": "a898b6bdc8964721b38e0160",
"messages": null,
"url": "https://<example>.rossum.app/api/v1/annotations/315777",
"content": "https://<example>.rossum.app/api/v1/annotations/315777/content",
"time_spent": 0,
"metadata": {},
"related_emails": [],
"email": null
}
PATCH /v1/annotations/{id}
Update part of annotation object.
Response
Status: 200
Returns updated annotation object.
Copy annotation
Copy annotation
315777
to a queue8236
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"target_queue": "https://<example>.rossum.app/api/v1/queues/8236", "target_status": "to_review"}' \
'https://<example>.rossum.app/api/v1/annotations/315777/copy'
{
"annotation": "https://<example>.rossum.app/api/v1/annotations/320332"
}
POST /v1/annotations/{id}/copy
Make a copy of annotation in another queue. All data and metadata are copied.
Key | Description |
---|---|
target_queue | URL of queue, where the copy should be placed. |
target_status | Status of copied annotation (if not set, it stays the same) |
If you want to directly reimport the copied annotation, you can use reimport=true
query parameter (such annotation will be billed).
Response
Status: 200
Returns URL of the new annotation object.
Delete annotation
Delete annotation
315777
curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/315777'
DELETE /v1/annotations/{id}
Delete an annotation object from the database. It also deletes the related page objects.
Never call this internal API, mark the annotation as deleted instead.
Response
Status: 204
Get suggested email recipients
Get
315777
and78590
annotations suggested email recipients
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"annotations": ["https://<example>.rossum.app/api/v1/annotations/315777", https://<example>.rossum.app/api/v1/annotations/78590]' \
'https://<example>.rossum.app/api/v1/annotations/suggested_recipients'
{
"results": [
{
"source": "email_header",
"email": "don.joe@corp.us",
"name": "Don Joe"
},
...
]
}
POST /v1/annotations/suggested_recipients
Retrieves annotations suggested email recipients depending on Queues suggested recipients settings.
Response
Status: 200
Returns a list of source objects.
Suggested recipients source object
Parameter | Description |
---|---|
source | Specifies where the email is found, see possible sources |
Email address of the suggested recipient | |
name | Name of the suggested recipient. Either a value from an email header or a value from parsing the email address |
Purge deleted annotations
Purge deleted annotations from queue
42
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"queue": "https://<example>.rossum.app/api/v1/queues/42"}' \
'https://<example>.rossum.app/api/v1/annotations/purge_deleted'
POST /v1/annotations/purge_deleted
Start the asynchronous process of purging customer's data related to selected annotations with deleted
status. The following operations will happen:
- delete annotation data
- delete pages
- remove content and file names of documents
- remove annotations from relations of type
duplicate
- preserve annotations objects, move them to
purged
status
Key | Type | Required | Description |
---|---|---|---|
annotations | list[URL] | false | List of annotations to be purged |
queue | URL | false | Queue of which the annotations should be purged. |
At least one of annotations
, queue
fields must be filled in. The resulting set of annotations is the disjunction of queue
and annotations
filter.
Response
Status: 202
This is an asynchronous endpoint, status of annotations is changed to purged
and related objects are gradually being deleted.
Annotation time spent
Time spent information can be optionally passed along the following annotation endpoints: cancel, confirm, delete, edit, postpone, reject.
Confirm annotation
315777
and also update time spent data
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-d '{"processing_duration": {"time_spent_active": 10.0, "time_spent_overall": 20.0, "time_spent_edit": 1.0, "time_spent_blockers": 2.0, "time_spent_emails": 3.0, "time_spent_opening": 1.5}}' \
'https://<example>.rossum.app/api/v1/annotations/315777/confirm'
POST /v1/annotations/{id}/cancel
POST /v1/annotations/{id}/confirm
POST /v1/annotations/{id}/delete
POST /v1/annotations/{id}/edit
POST /v1/annotations/{id}/postpone
POST /v1/annotations/{id}/reject
See annotation processing duration object.
Get page spatial data
Get spatial data for two first pages of annotation
1421
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://example.app.rossum.ai/api/v1/annotations/1421/page_data?granularity=words&page_numbers=1,2'
{
"results": [
{
"page_number": 1,
"granularity": "words",
"items": [
{"position": [120,22,33,44] , "text": "full"},
{"position": [180,22,33,44] , "text": "of"},
{"position": [180,22,33,44] , "text": "eels"},
]
},
{
"page_number": 2,
"granularity": "words",
"items": [
{"position": [120,22,33,44] , "text": "it"},
{"position": [180,22,33,44] , "text": "is"},
{"position": [180,22,33,44] , "text": "scratched"},
]
},
]
}
GET /v1/annotations/{id}/page_data
Get text content for every page, including position coordinates, considering granularity options like lines, words, characters, or complete page text content.
Query parameters:
Key | Type | Default | Description | Required |
---|---|---|---|---|
granularity | str | One of lines , words , chars , texts . |
Yes | |
page_numbers | str | First 20 pages of the document | Comma separated page numbers. Max. 20 page numbers, if there is more, they are silently ignored. | No |
Response
Status: 200
Response result objects consist of following keys:
Key | Type | Description |
---|---|---|
page_number | int | Number of page. |
granularity | str | One of lines , words , chars , texts . |
items | list[object] | List of objects divided by the chosen granularity. |
Items consist of following keys:
Key | Type | Description |
---|---|---|
position | list[int] | Coordinates of the item on the given page. In case of texts granularity, the result items objects are missing position key, since the text value is the full page text. |
text | str | Value of the item. |
Status: 404
If there are no spatial data available for the given annotation.
Annotation Data
Example annotation data
{
"content": [
{
"id": 27801931,
"url": "https://<example>.rossum.app/api/v1/annotations/319668/content/27801931",
"children": [
{
"id": 27801932,
"url": "https://<example>.rossum.app/api/v1/annotations/319668/content/27801932",
"content": {
"value": "2183760194",
"normalized_value": "2183760194",
"page": 1,
"position": [
761,
48,
925,
84
],
"rir_text": "2183760194",
"rir_position": [
761,
48,
925,
84
],
"connector_text": null,
"rir_confidence": 0.99234
},
"category": "datapoint",
"schema_id": "document_id",
"validation_sources": [
"score"
],
"time_spent": 0,
"time_spent_overall": 0,
"hidden": false
},
{
"id": 27801933,
"url": "https://<example>.rossum.app/api/v1/annotations/319668/content/27801933",
"content": {
"value": "6/8/2018",
"normalized_value": "2018-08-06",
"page": 1,
"position": [
283,
300,
375,
324
],
"rir_text": "6/8/2018",
"rir_position": [
283,
300,
375,
324
],
"connector_text": null,
"rir_confidence": 0.98279
},
"category": "datapoint",
"schema_id": "date_issue",
"validation_sources": [
"score"
],
"time_spent": 0,
"time_spent_overall": 0,
"hidden": false
},
{
"id": 27801934,
"url": "https://<example>.rossum.app/api/v1/annotations/319668/content/27801934",
"content": null,
"category": "datapoint",
"schema_id": "email_button",
"validation_sources": [
"NA"
],
"time_spent": 0,
"time_spent_overall": 0,
"hidden": false
},
...
}
]
}
Annotation data is used by the Rossum UI to display annotation data properly. Be
aware that values in attribute value
are not normalized (e.g. numbers, dates) and data structure
may be changed to accommodate UI requirements.
Top level content
contains a list of section objects. results
is currently
a copy of content
and is deprecated.
Section objects:
Attribute | Type | Description | Read-only |
---|---|---|---|
id | int64 | A unique ID of a given section. | true |
url | URL | URL of the section. | true |
schema_id | string | Reference mapping the object to the schema tree. | |
category | string | section |
|
children | list | Array specifying objects that belong to the section. |
Datapoint, multivalue and tuple objects:
Attribute | Type | Description | Read-only |
---|---|---|---|
id | int64 | A unique ID of a given object. | true |
url | URL | URL of a given object. | true |
schema_id | string | Reference mapping the object to the schema tree. | |
category | string | Type of the object (datapoint , multivalue or tuple ). |
true |
children | list | Array specifying child objects. Only available for multivalue and tuple categories. |
true |
content | object | (optional) A dictionary of the attributes of a given datapoint (only available for datapoint ) see below for details. |
true |
validation_sources | list[object] | Source of validation of the extracted data, see below. | |
time_spent | float | (optional) Time spent while actively working on a given node, in seconds. | |
time_spent_overall | float | (optional) Total time spent while validating a given node, in seconds. (only for internal purposes). | |
time_spent_grid | float | (optional) Total time spent while actively working on a grid, in seconds. Only available for multivalue category. (only for internal purposes). |
|
time_spent_grid_overall | float | (optional) Total time spent while validating a given grid, in seconds. Only available for multivalue category. (only for internal purposes). |
|
hidden | bool | If set to true, the datapoint is not visible in the user interface, but remains stored in the database. | |
no_recalculation | bool | If set to true, the datapoint's formula is not recalculated automatically. Only available for datapoint category editable formula datapoints. see below |
|
grid | object | Specify grid structure, see below for details. Only allowed for multivalue object. |
Time spent
Time spents on datapoint are in seconds and are stored on datapoint object, for category multivalue
or datapoint
. For time spent on the annotation level, see annotation processing duration.
Active time spent is stored in time_spent
.
Overall time spent is stored in time_spent_overall
.
Active time spent with an active magic grid is stored in time_spent_grid
.
Overall time spent with an active magic grid is stored in time_spent_grid_overall
.
Measuring starts when an annotation is not in a read-only mode after selecting a datapoint.
Measuring ends when:
- another datapoint is selected. Selecting of datapoints when showing automation blockers doesn’t end or affect the measuring.
- the user leaves an annotation (for the same reasons as measuring ends on an annotation)
- the user goes to edit mode
When a measuring ends time_spent of the previously selected datapoint is incremented by measured time_spent and the result is patched together with adding a human validation source to validation sources.
Content object
Can be null for datapoints of type button
Attribute | Type | Description | Read-only |
---|---|---|---|
value | string | The extracted data of a given node. Maximum length: 1500 UTF characters. | |
normalized_value | string | Normalized value for date (in ISO 8601 format) and number fields (in JSON number format). | |
page | int | Number of page where the data is situated (see position). | |
position | list | List of the coordinates of the label box of the given node. (left, top, right, bottom) | |
rir_text | string | The extracted text, used as a reference for data extraction models. | true |
rir_raw_text | string | Raw extracted text (only for internal purposes, may be removed in the future). | true |
rir_page | int | The extracted page, used as a reference for data extraction models. | true |
rir_position | list | The extracted position, used as a reference for data extraction models. (left, top, right, bottom) | true |
rir_confidence | float | Confidence (estimated probability) that this field was extracted correctly. | true |
connector_text | string | Text set by the connector. | true |
connector_position | list | Position set by the connector. (left, top, right, bottom) | true |
ocr_text | string | Value extracted by OCR, if applicable. (only for internal purposes, may be removed in the future) | true |
ocr_raw_text | string | Raw value extracted by OCR, if applicable. (only for internal purposes, may be removed in the future) | true |
ocr_position | string | OCR position, if applicable. (left, top, right, bottom) (only for internal purposes, may be removed in the future) | true |
When both value
and normalized_value
is set, normalized_value
is ignored on update.
Formula datapoints
For datapoint
category fields which have their schema UI configuration's type
property set to
formula
the datapoint content and attributes are being updated automatically based on the provided formula code.
For editable formula fields (i.e. the corresponding UI configuration's edit
property is not set
to disabled
option) the automatic recalculation can be disabled by setting the datapoint no_recalculation
flag to true.
To re-enable the formula automatic recalculation set the no_recalculation
flag to false.
Validation sources
validation_sources
property is a list of sources that verified the extracted data. When the list is
non-empty, datapoint is considered to be validated (and no eye-icon is displayed next to it in the Rossum UI).
Currently, these are the sources of validation:
- score: confidence score coming from the AI Core Engine was higher than a preset score threshold (can be set on queue, or individually per datapoint in schema; default is 0.8).
- checks: Data extractor does several checks like summing up tax_details, which can verify that the data were extracted correctly.
- not_found: Value was not found by the AI engine. As we do not report confidence in such cases yet, we add a validation source instead. It will be removed as soon as we have confidence present for field that were not found.
- data_matching Set by a hook when, for example, the datapoint matches some other database.
- history: Several fields can be confirmed from historical data in exported documents (can be turned on/off on per queue basis using autopilot section in its settings).
- connector: A connector verified the validity.
- table_suggester: Used internally for the complex line items user interface.
- human: An operator visited the field in validation interface (assumed just verifying the value, not necessarily making any corrections).
- non_required: Value was not found, is non-required and has no rir_field_name set.
Additional possible validation source value NA signs that validation sources are "Not Applicable" and may now occur only for button datapoints.
The list is subject to ongoing expansion.
Example multivalue datapoint object with a grid
{
"id": 122852,
"schema_id": "line_items",
"category": "multivalue",
"time_spent": 3.4,
"time_spent_overall": 4.5,
"time_spent_grid": 1.2,
"time_spent_grid_overall": 2.3,
"grid": {
"parts": [
{
"page": 1,
"columns": [
{
"left_position": 348,
"schema_id": "item_description",
"header_texts": ["Description"]
},
{
"left_position": 429,
"schema_id": "item_quantity",
"header_texts": ["Qty"]
}
],
"rows": [
{
"top_position": 618,
"tuple_id": null,
"type": "header"
},
{
"top_position": 649,
"tuple_id": 123,
"type": "data"
}
],
"width": 876,
"height": 444
}
]
},
...
}
Grid object (for internal use only) is used to store table vertical and horizontal separators and
related attributes. Every grid consists of zero or more parts
.
Every part
object consists of several attributes:
Attribute | Type | Description |
---|---|---|
page | int | A unique ID of a given object. |
columns | list[object] | Description of grid columns. |
rows | list[object] | Description of grid rows. |
width | float | Total width of the grid. |
height | float | Total height of the grid. |
Every column contains attributes:
Attribute | Type | Description |
---|---|---|
left_position | float | Position of the column left edge. |
schema_id | string | Reference to datapoint schema id. Used in grid-to-table conversion. |
header_texts | list[string] | Extracted texts from column headers. |
Every row contains attributes:
Attribute | Type | Description |
---|---|---|
top_position | float | Position of the row top edge. |
tuple_id | int | Id of the corresponding tuple datapoint if it exists else null. |
type | string | Row type. Allowed values are specified in the schema, see grid. If null , the row is ignored during grid-to-table conversion. |
Currently, it is only allowed to have one part per page (for a particular grid).
Get the annotation data
Get annotation data of annotation
315777
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/315777/content'
GET /v1/annotations/{id}/content
Get annotation data.
Response
Status: 200
Returns annotation data.
Update annotation data
Update annotation data of annotation
315777
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"content": [{"category": "section", "schema_id": "invoice_details_section", "children": [{"category": "datapoint", "schema_id": "document_id", "content": {"value": "12345"}, "validation_sources": ["human"], "type": "string", "rir_confidence": 0.99}]}]}' \
'https://<example>.rossum.app/api/v1/annotations/315777/content'
{
"content": [
{
"category": "section",
"schema_id": "invoice_details_section",
"children": [
{
"category": "datapoint",
"schema_id": "document_id",
"content": {
"value": "12345"
},
"type": "string",
"validation_sources": ["human"]
}
]
}
]
}
PATCH /v1/annotations/{id}/content
Update annotation data. The format is the same as for GET, datapoints missing in the uploaded content are preserved.
Response
Status: 200
Returns annotation data.
Bulk update annotation data
Example of body for bulk update of annotation data
{
"operations": [
{
"op": "replace",
"id": "198143",
"value": {
"content": {
"value": "John",
"position": [103, 110, 121, 122],
"page": 1
},
"hidden": false,
"options": [],
"validation_sources": ["human"]
}
},
{
"op": "remove",
"id": "884061"
},
{
"op": "add",
"id": "884060",
"value": [
{
"schema_id": "item_description",
"content": {
"page": 1,
"position": [162, 852, 371, 875],
"value": "Bottle"
}
}
]
}
]
}
POST /v1/annotations/{id}/content/operations
Allows to specify a sequence of operations that should be performed on particular datapoint objects.
To replace a datapoint
value (or other supported attribute), use replace operation:
Key | Type | Description |
---|---|---|
op | string | Type of operation: replace |
id | integer | Datapoint id |
value | object | Updated data, format is the same as in Anotation Data. Only value (*), position , page , validation_sources , hidden and options attributes may be updated. Please note that value is parsed and formatted. |
(*) normalized_value
may also be specified. When both value
and normalized_value
are specified, they must match, otherwise datapoint won't be modified (this may be changed in the future).
Please note that section
, multivalue
and tuple
should not be updated.
To add a new row into a table multivalue
, use add operation:
Key | Type | Description |
---|---|---|
op | string | Type of operation: add |
id | integer | Multivalue id (parent of new datapoint) |
value | list[object] | Added row data. List of objects, format of the object is the same as in Anotation Data. schema_id attribute is required, only value , position , page , validation_sources , hidden and options attributes may be set. |
validation_sources | list[object] | (optional) List of validation sources to set for all fields of the row by default (unless overriden in value ). This allows easily adding rows without breaking automation. See the "Validation sources" section below. |
The row will be appended to the current list of rows.
For simple multivalues, the add operation can be used to add one child datapoint:
Key | Type | Description |
---|---|---|
op | string | Type of operation: add |
id | integer | Multivalue id (parent of new datapoint) |
value | object | Updated data, format is the same as in Anotation Data. Only value (*), position , page , validation_sources , hidden and options attributes may be updated. Please note that value is parsed and formatted. |
To remove a row from a multivalue, use remove operation:
Key | Type | Description |
---|---|---|
op | string | Type of operation: remove |
id | integer | Datapoint id |
Please note that only multivalue
children datapoints may be removed.
Response
Status: 200
Returns annotation data.
Replace annotation data by OCR
Replace annotation data value by text extracted from a rectangle
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' -d '{"rectangle": [316.2, 533.9, 352.7, 556.5], "page": "https://<example>.rossum.app/api/v1/pages/12221"}' \
'https://<example>.rossum.app/api/v1/annotations/319668/content/21233223/select"
POST /v1/annotations/{id}/content/{id of child node}/select
Replace annotation data by OCR extracted from the rectangle of the document page. Payload of the request:
Key | Type | Description |
---|---|---|
rectangle | list[float] | Bounding box of an occurrence. |
page | URL | Page of occurrence. |
When the rectangle size is unsuitable for OCR (any rectangle side is smaller than 4 px), rectangle is extended to cover the text that overlaps with the rectangle.
Response
Status: 200
Returns annotation data.
Grid operations
Update multiple grid parts and perform OCR on created and updated grids
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' -d '{"operations": [{"op": "update", "grid_index": 0, "grid": {"page": 1, "columns": [...], "rows": [...]}}]}' \
'https://elis.rossum.ai/api/v1/annotations/319668/content/21233223/grid_operations"
POST /v1/annotations/{id}/content/{id of the multivalue}/grid_operations
This endpoint applies multiple operations on multiple grids for one multivalue and perform OCR if required, and update the multivalue with the resulting grid.
For update
operation the position of the grid and its rows and columns can be changed, the column layout can be changed, but the row structure must be unchanged.
Payload of the request:
Key | Type | Description |
---|---|---|
operations | list[object] | List of operations to apply to the grid |
Single operations:
Key | Type | Description | Required |
---|---|---|---|
op | str | update or delete or create |
Yes |
grid_index | int | Index of the grid, | Yes |
grid | object | New grid part | For create and update operations |
The operations are applied sequentially. The grid_index
corresponds to the index of the grid parts when the operation is applied. Combining different types of operations is not supported.
Response
Status: 200
Returns updated multivalue content as a tree, with only updated datapoints.
Partial grid updates
Update a grid part and perform OCR on modified cell datapoints
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' -d '{"grid_index": 0, "grid": {"page": 1, "columns": [...], "rows": [...]}, "operations": {"columns": [{"op": "update", "schema_id": "vat_rate"}], "rows": [{"op": "delete", "tuple_id": 1256}]}' \
'https://elis.rossum.ai/api/v1/annotations/319668/content/21233223/grid_parts_operations"
POST /v1/annotations/{id}/content/{id of the multivalue}/grid_parts_operations
Apply multiple operations on a grid and perform OCR on modified cell datapoints. Update the multivalue with the new grid.
Query parameters
Query parameter | Type | Default | Required | Description |
---|---|---|---|---|
full_response | boolean | false | false | Use this parameter to get all datapoints in the grid part in the response |
Payload of the request:
Key | Type | Description |
---|---|---|
operations | object | Operations to apply to the grid |
grid | object | Updated grid part |
grid_index | int | Index of the grid part |
Operations are grouped in rows
operations and columns
operations:
Key | Type | Description |
---|---|---|
rows | list[object] | List of row operations |
columns | list[object] | List of column operations |
Single operations must contain the following parameters:
Key | Type | Description |
---|---|---|
op | str | update or delete or create |
row_index | int | Required for row update and row create operations |
tuple_id | int | Id of the tuple datapoint, required for row delete and row update operations |
schema_id | int | Id of the schema, required for column operations |
Possible operations:
axis | op | required parameters | OCR | Result |
---|---|---|---|---|
columns | update | schema_id | Yes | Update column datapoints |
columns | delete | schema_id | No | Set content to empty for column datapoints |
rows | create | row_index | Yes | Insert a new row, create datapoints and perform OCR |
rows | update | row_index, tuple_id | Yes | Update datapoints via OCR |
rows | delete | tuple_id | No | Delete the tuple associated to this row |
OCR is performed only for rows of extractable type as defined in the multivalue schema by row_types_to_extract
, or by default for rows of type data
only.
Response
Status: 200
Returns updated multivalue content as a tree. By default, only updated datapoints and updated grid are returned. Add ?full_response=true
to the url to get in the response all the datapoints in this grid.
Send updated annotation data
Send feedback on annotation
315777
Start the annotation
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/315777/start'
{
"annotation": "https://<example>.rossum.app/api/v1/annotations/315777",
"session_timeout": "01:00:00"
}
Get the annotation data
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/315777/content'
{
"id": 37507206,
"url": "https://<example>.rossum.app/api/v1/annotations/315777/content/37507206",
"content": {
"value": "001",
"page": 1,
"position": [
302,
91,
554,
56
],
"rir_text": "000957537",
"rir_position": [
302,
91,
554,
56
],
"connector_text": null,
"rir_confidence": null
},
"category": "datapoint",
"schema_id": "document_id",
"validation_sources": [
"human"
],
"time_spent": 2.7,
"time_spent_overall": 6.1,
"hidden": false
}
Patch the annotation
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Type:application/json' -d '{"content": {"value": "#INV00011", "position": [302, 91, 554, 56]}}' \
'https://<example>.rossum.app/api/v1/annotations/315777/content/37507206'
{
"id": 37507206,
"url": "https://<example>.rossum.app/api/v1/annotations/431694/content/39125535",
"content": {
"value": "#INV00011",
"page": 1,
"position": [
302,
91,
554,
56
],
"rir_text": "",
"rir_position": null,
"rir_confidence": null,
"connector_text": null
},
"category": "datapoint",
"schema_id": "document_id",
"validation_sources": [],
"time_spent": 0,
"time_spent_overall": 0,
"hidden": false
}
Confirm the annotation
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/315777/confirm'
PATCH /v1/annotations/{id}/content/{id of the child node}
Update a particular annotation content node.
It is enough to pass just the updated attributes in the PATCH payload.
Response
Status: 200
Returns updated annotation data for the given node.
Annotation Processing Duration
Example annotation processing duration
{
"annotation": "https://<example>.rossum.app/api/v1/annotations/1",
"time_spent_active": 12.3,
"time_spent_overall": 23.4,
"time_spent_edit": 1.23,
"time_spent_blockers": 2.34,
"time_spent_emails": 3.45,
"time_spent_opening": 4.56
}
Annotation processing duration stores additional time spent information for an Annotation.
Annotation processing duration object:
Attribute | Type | Description | Read-only | Optional |
---|---|---|---|---|
annotation | URL | Annotation that the processing duration is related to | true | |
time_spent_active | float | Total active time spent on the annotation, in seconds | true | |
time_spent_overall | float | Total time spent on the annotation, in seconds (same value as Annotation.time_spent) | true | |
time_spent_edit | float | Time spent editing the annotation, in seconds | true | |
time_spent_blockers | float | Time spent on annotation blockers, in seconds | true | |
time_spent_emails | float | Time spent on emails, in seconds | true | |
time_spent_opening | float | Time spent opening the annotation, in seconds | true |
Measuring of time spent starts after an annotation is successfully started and datapoints and schema for annotation are fetched.
Measuring ends when:
- user changes annotation status (confirm, postpone, delete, reject)
- user leaves validation (goes back to dashboard or another page)
- user goes to the next annotation
- user confirms changes in edit mode
- annotation time expires (checked periodically every 5 minutes if the current annotation is in
reviewing
state) - user closes a tab
time_spent_overall
is the total time spent on the annotation, time_spent_active
is the same but measurement is stopped after 10 seconds of inactivity (no mouse movement nor key stroke or inactive tab).
Get the annotation processing duration
Get annotation processing duration of annotation
315777
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/annotations/315777/processing_duration'
GET /v1/annotations/{id}/processing_duration
Get annotation processing duration.
Response
Status: 200
Returns annotation processing duration.
Update annotation processing duration
Update annotation processing duration of annotation
315777
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"time_spent_active": 10.00, "time_spent_overall": 20.0, "time_spent_edit": 1.0, "time_spent_blockers": 2.0, "time_spent_emails": 3.0, "time_spent_opening": 1.5}' \
'https://<example>.rossum.app/api/v1/annotations/315777/processing_duration'
{
"annotation": "https://<example>.rossum.app/api/v1/annotations/315777",
"time_spent_active": 10.0,
"time_spent_overall": 20.0,
"time_spent_edit": 1.0,
"time_spent_blockers": 2.0,
"time_spent_emails": 2.0,
"time_spent_opening": 1.5
}
PATCH /v1/annotations/{id}/processing_duration
Update annotation processing duration.
Response
Status: 200
Returns annotation processing duration.
Audit log
Audit log represents a log record of actions performed by users.
Only admin or organization group admins can access the log records. Logs do not include records about changes made by Rossum representatives via internal systems. The log retention policy is set to 1 year.
Attribute | Type | Description |
---|---|---|
organization_id | integer | ID of the organization. |
timestamp* | str | Timestamp of the log record. |
username | str | Username of the user that performed the action. |
object_id | int | ID of the object on which the action was performed. |
object_type | str | Type of the object on which the action was performed. |
action | str | Type of the action performed. |
content | object | Detailed content of the action. |
*Timestamp is of the ISO 8601 format with UTC timezone e.g. 2024-07-01T07:00:00.000000
content
consists of the following elements:
Attribute | Type | Description |
---|---|---|
path | str | Partial URL path of the request. |
method | str | Method of the request. |
request_id | str | ID of the request. Use this when contacting Rossum support with any related questions. |
status_code | int | Status code of the response. |
details | object | Details about the request (if available). For most cases, this field will be {} . |
details
may include following attributes:
Attribute | Type | Description |
---|---|---|
groups | list | Name of the user roles that were sent (if sent) in a request on a user object. |
List all audit logs
List all audit logs for update actions on user objects
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/audit_logs?object_type=user&action=update'
{
"pagination": {
"total": 1,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"object_type": "user",
"action": "update",
"username": "john.doe@example.com",
"object_id": 131,
"timestamp": "2024-07-01T07:00:00.000000",
"details": {
"path": "api/v1/users/131",
"method": "PATCH",
"request_id": "0aadfd75-8dcz-4e62-94d9-a23811d0d0b0",
"status_code": 200,
"payload": {"groups": ["admin"]},
}
}
]
}
GET /v1/audit_logs
List audit log records for chosen objects and actions.
Using filters, you can narrow down the number of records. object_type
is a required filter.
Supported filters:
Attribute | Type | Description | Required |
---|---|---|---|
object_type | str | Type of the object on which the action was performed. Available types are document , annotation , user . |
Yes |
action | str | Type of the action performed. See below. | No |
object_id | int | ID of the object on which the action was performed. | No |
timestamp_before | str | Filter for log entries before the given timestamp. | No |
timestamp_after | str | Filter for log entries after the given timestamp. | No |
username | str | Username of the user that performed the action. | No |
Depending on the object_type
, you can choose to filter the logs based on action
. Each object_type
supports filtering by different actions:
object_type | Available actions |
---|---|
document | create |
annotation | update-status |
user | create, delete, purge, update, destroy, app_load*, reset-password, change_password |
*app_load
value represents records of when api/v1/auth/user
endpoint was called
Response
Status: 200
Returns paginated response with a list of audit logs objects.
Automation blocker
Example automation blocker object
{
"id": 1,
"url": "https://<example>.rossum.app/api/v1/automation_blockers/1",
"annotation": "https://<example>.rossum.app/api/v1/annotations/4",
"content": [
{
"level": "datapoint",
"type": "low_score",
"schema_id": "invoice_id",
"samples_truncated": false,
"samples": [
{
"datapoint_id": 1234,
"details": {
"score": 0.901,
"threshold": 0.975
}
}
]
},
{
"level": "datapoint",
"type": "failed_checks",
"schema_id": "invoice_id",
"samples_truncated": false,
"samples": {
"datapoint_id": 1234,
"details": {"validation": "bad"}
}
},
{
"level": "datapoint",
"type": "no_validation_sources",
"schema_id": "invoice_id",
"samples_truncated": false,
"samples": {
"datapoint_id": 1234
}
},
{
"level": "datapoint",
"type": "error_message",
"schema_id": "invoice_id",
"samples_truncated": false,
"samples": [
{
"datapoint_id": 1234,
"details": {
"message_content": ["Error 1", "Error 2"]
}
}
]
},
{
"level": "annotation",
"type": "suggested_edit_present"
},
{
"level": "annotation",
"type": "is_duplicate"
},
{
"level": "annotation",
"type": "error_message",
"details": {
"message_content": ["Error 1"]
}
}
]
}
Automation blocker stores reason why annotation was not automated.
Attribute | Type | Read-only | Description |
---|---|---|---|
id | integer | yes | AutomationBlocker object ID. |
url | URL | yes | AutomationBlocker object URL. |
annotation | URL | yes | URL of related Annotation object. |
content | list[object] | no | List of reasons why automation is blocked. |
Content consists of following elements
Attribute | Type | Description |
---|---|---|
level | enum | Designates whether automation blocker relates to specific datapoint or to the whole annotation . |
type | enum | See below for possible values. |
schema_id | string | Only for datapoint level objects. |
samples | list[object] | Contains sample of specific datapoints with detailed info (only for datapoint level objects). Only first 10 samples are listed. |
samples_truncated | bool | Whether number samples were truncated to 10, or contains all of them. |
details | object | Only for level : annotation with type : error_message . Contains message_content with list of error messages. |
Automation blocker types
low_score
automation blocker example
{
"level": "datapoint",
"type": "low_score",
"schema_id": "invoice_id",
"samples_truncated": false,
"samples": [
{
"datapoint_id": 1234,
"details": {
"score": 0.901,
"threshold": 0.975
}
},
{
"datapoint_id": 1235,
"details": {
"score": 0.968,
"threshold": 0.975
}
}
]
}
failed_checks
automation blocker example
{
"level": "datapoint",
"type": "failed_checks",
"schema_id": "schema_id",
"samples_truncated": false,
"samples": [
{
"datapoint_id": 43,
"details": {
"validation": "bad"
}
}
]
}
no_validation_sources
automation blocker example
{
"level": "datapoint",
"type": "no_validation_sources",
"schema_id": "schema_id",
"samples_truncated": false,
"samples": [
{
"datapoint_id": 412
}
]
}
error_message
automation blocker example
[
{
"level": "annotation",
"type": "error_message",
"details": {
"message_content": ["annotation error"]
}
},
{
"level": "datapoint",
"type": "error_message",
"schema_id": "schema_id",
"samples_truncated": false,
"samples": [
{
"datapoint_id": 45,
"details": {
"message_content": ["longer than 3 characters"]
}
}
]
}
]
delete_recommendations
automation blocker example
[
{
"level": "annotation",
"type": "delete_recommendation_filename | delete_recommendation_page_count",
"details": {
"message_content": ["annotation error"]
}
},
{
"level": "datapoint",
"type": "delete_recommendation_field",
"schema_id": "document_type",
"samples_truncated": false,
"samples": [
{
"datapoint_id": 45
}
]
}
]
extension
automation blocker example
[
{
"level": "annotation",
"type": "extension",
"details": {
"content": ["PO not found in the master data!"]
}
},
{
"level": "datapoint",
"type": "extension",
"schema_id": "sender_name",
"samples_truncated": false,
"samples": [
{
"datapoint_id": 1357,
"details": {
"content": ["Unregistered vendor"]
}
}
]
}
]
automation_disabled
- automation is disabled due to queue settings
level: annotation
only- occurs when automation level is set to
never
orautomation_enabled
queue settings isfalse
is_duplicate
- annotation is a duplicate of another one (there exists a relation of
duplicate
type) andautomate_duplicate
queue settings is set tofalse
level: annotation
only
- annotation is a duplicate of another one (there exists a relation of
suggested_edit_present
- there is a suggested edit by the AI engine and
automate_suggested_edit
queue settings is set tofalse
level: annotation
only
- there is a suggested edit by the AI engine and
low_score
- AI confidence score is lower than
score_threshold
set for given datapoint level: datapoint
only
- AI confidence score is lower than
failed_checks
- schema field constraint or connector validation failed
- only for
level: datapoint
no_validation_sources
- validation source list was reset e.g. by hook, so automation was blocked
- only for
level: datapoint
error_message
- for both
levels
,annotation
anddatapoint
error
type messages received from connector
- for both
- Delete recommendation based on validation trigger match for the document
delete_recommendation_filename
,delete_recommendation_page_count
level: annotation
only- deletion was recommended based on filename/page count condition of the trigger
delete_recommendation_field
- only for
level: datapoint
- deletion recommended based on a value of given field (defined in the condition of trigger)
- only for
extension
- automation blocker created by an extension
- for both levels -
annotation
anddatapoint
List all automation blockers
List all automation blockers
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/automation_blockers'
{
"pagination": {
"total": 1,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 1,
"url": "https://<example>.rossum.app/api/v1/automation_blockers/1",
"annotation": "https://<example>.rossum.app/api/v1/annotations/4",
"content": [
{
"level": "datapoint",
"type": "low_score",
"schema_id": "invoice_id",
"samples_truncated": false,
"samples": [
{
"datapoint_id": 1234,
"details": {
"score": 0.901,
"threshold": 0.975
}
}
]
},
{
"level": "datapoint",
"type": "failed_checks",
"schema_id": "invoice_id",
"samples_truncated": false,
"samples": {
"datapoint_id": 1234,
"details": {"validation": "bad"}
}
},
{
"level": "datapoint",
"type": "error_message",
"schema_id": "invoice_id",
"samples_truncated": false,
"samples": {
"datapoint_id": 1234,
"details": {
"message_content": ["Error 1", "Error 2"]
}
}
},
{
"level": "annotation",
"type": "suggested_edit_present"
},
{
"level": "annotation",
"type": "is_duplicate"
},
{
"level": "annotation",
"type": "error_message",
"details": {
"message_content": ["Error 1"]
}
}
]
}
]
}
GET /v1/automation_blockers
List all automation blocker objects.
Supported filters: annotation
For additional info please refer to filters and ordering.
Response
Status: 200
Returns paginated response with a list of automation blocker objects.
Retrieve automation blocker
Get automation blocker
12
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/automation_blocker/12'
{
"id": 12,
"url": "https://<example>.rossum.app/api/v1/automation_blockers/12",
"annotation": "https://<example>.rossum.app/api/v1/annotations/481",
"content": [
{
"level": "annotation",
"type": "automation_disabled"
}
]
}
GET /v1/automation_blockers/{id}
Response
Status 200
Returns automation blocker object.
Connector
Example connector object
{
"id": 1500,
"name": "MyQ Connector",
"queues": [
"https://<example>.rossum.app/api/v1/queues/8199"
],
"url": "https://<example>.rossum.app/api/v1/connectors/1500",
"service_url": "https://myq.east-west-trading.com",
"params": "strict=true",
"client_ssl_certificate": "-----BEGIN CERTIFICATE-----\n...",
"authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
"asynchronous": true,
"metadata": {},
"modified_by": "https://<example>.rossum.app/api/v1/users/1",
"modified_at": "2020-01-01T10:08:03.856648Z"
}
A connector is an extension of Rossum that allows to validate and modify data during validation and also export data to an external system. A connector object is used to configure external or internal endpoint of such an extension service. For more information see Extensions.
Attribute | Type | Default | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the connector | true | |
name | string | Name of the connector (not visible in UI) | ||
url | URL | URL of the connector | true | |
queues | list[URL] | List of queues that use connector object. | ||
service_url | URL | URL of the connector endpoint | ||
params | string | Query params appended to the service_url | ||
client_ssl_certificate | string | Client SSL certificate used to authenticate requests. Must be PEM encoded. | ||
client_ssl_key | string | Client SSL key (write only). Must be PEM encoded. Key may not be encrypted. | ||
authorization_type | string | secret_key |
String sent in HTTP header Authorization could be set to secret_key or Basic . For details see Connector API. |
|
authorization_token | string | Token sent to connector in Authorization header to ensure connector was contacted by Rossum (displayed only to admin user). |
||
asynchronous | bool | true |
Affects exporting: when true , confirm endpoint returns immediately and connector's save endpoint is called asynchronously later on. |
|
metadata | object | {} |
Client data. | |
modified_by | URL | null |
URL of the last connector modifier | true |
modified_at | datetime | null |
Date of last modification | true |
List all connectors
List all connectors
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/connectors'
{
"pagination": {
"total": 1,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 1500,
"name": "MyQ Connector",
"queues": [
"https://<example>.rossum.app/api/v1/queues/8199"
],
"url": "https://<example>.rossum.app/api/v1/connectors/1500",
"service_url": "https://myq.east-west-trading.com",
"params": "strict=true",
"client_ssl_certificate": null,
"authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
"asynchronous": true,
"metadata": {},
"modified_by": "https://<example>.rossum.app/api/v1/users/1",
"modified_at": "2020-01-01T10:08:03.856648Z"
}
]
}
GET /v1/connectors
Retrieve all connector objects.
Supported filters: id
, name
, service_url
Supported ordering: id
, name
, service_url
For additional info please refer to filters and ordering.
Response
Status: 200
Returns paginated response with a list of connector objects.
Create a new connector
Create new connector related to queue
8199
with endpoint URLhttps://myq.east-west-trading.com
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"name": "MyQ Connector", "queues": ["https://<example>.rossum.app/api/v1/queues/8199"], "service_url": "https://myq.east-west-trading.com", "authorization_token":"wuNg0OenyaeK4eenOovi7aiF"}' \
'https://<example>.rossum.app/api/v1/connectors'
{
"id": 1500,
"name": "MyQ Connector",
"queues": [
"https://<example>.rossum.app/api/v1/queues/8199"
],
"url": "https://<example>.rossum.app/api/v1/connectors/1500",
"service_url": "https://myq.east-west-trading.com",
"params": null,
"client_ssl_certificate": null,
"authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
"asynchronous": true,
"metadata": {},
"modified_by": "https://<example>.rossum.app/api/v1/users/1",
"modified_at": "2020-01-01T10:08:03.856648Z"
}
POST /v1/connectors
Create a new connector object.
Response
Status: 201
Returns created connector object.
Retrieve a connector
Get connector object
1500
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/connectors/1500'
{
"id": 1500,
"name": "MyQ Connector",
"queues": [
"https://<example>.rossum.app/api/v1/queues/8199"
],
"url": "https://<example>.rossum.app/api/v1/connectors/1500",
"service_url": "https://myq.east-west-trading.com",
"params": null,
"client_ssl_certificate": null,
"authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
"asynchronous": true,
"metadata": {},
"modified_by": null,
"modified_at": null
}
GET /v1/connectors/{id}
Get a connector object.
Response
Status: 200
Returns connector object.
Update a connector
Update connector object
1500
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"name": "MyQ Connector (stg)", "queues": ["https://<example>.rossum.app/api/v1/queues/8199"], "service_url": "https://myq.stg.east-west-trading.com", "authorization_token":"wuNg0OenyaeK4eenOovi7aiF"} \
'https://<example>.rossum.app/api/v1/connectors/1500'
{
"id": 1500,
"name": "MyQ Connector (stg)",
"queues": [
"https://<example>.rossum.app/api/v1/queues/8199"
],
"url": "https://<example>.rossum.app/api/v1/connectors/1500",
"service_url": "https://myq.stg.east-west-trading.com",
"params": null,
"client_ssl_certificate": null,
"authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
"asynchronous": true,
"metadata": {},
"modified_by": "https://<example>.rossum.app/api/v1/users/1",
"modified_at": "2020-01-01T10:08:03.856648Z"
}
PUT /v1/connectors/{id}
Update connector object.
Response
Status: 200
Returns updated connector object.
Update part of a connector
Update connector URL of connector object
1500
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"service_url": "https://myq.stg2.east-west-trading.com"}' \
'https://<example>.rossum.app/api/v1/connectors/1500'
{
"id": 1500,
"name": "MyQ Connector",
"queues": [
"https://<example>.rossum.app/api/v1/queues/8199"
],
"url": "https://<example>.rossum.app/api/v1/connectors/1500",
"service_url": "https://myq.stg2.east-west-trading.com",
"params": null,
"client_ssl_certificate": null,
"authorization_token": "wuNg0OenyaeK4eenOovi7aiF",
"asynchronous": true,
"metadata": {},
"modified_by": "https://<example>.rossum.app/api/v1/users/1",
"modified_at": "2020-01-01T10:08:03.856648Z"
}
PATCH /v1/connectors/{id}
Update part of connector object.
Response
Status: 200
Returns updated connector object.
Delete a connector
Delete connector
1500
curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/connectors/1500'
DELETE /v1/connectors/{id}
Delete connector object.
Response
Status: 204
Dedicated Engine
Example engine object
{
"id": 3000,
"name": "Dedicated engine 1",
"description": "AI engine trained to recognize data for the specific data capture requirement",
"url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
"status": "draft",
"schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
"queues": []
}
A Dedicated Engine object holds specification and a current state of training setup for a Dedicated Engine.
Attribute | Type | Default | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the engine | true | |
name | string | Name of the engine | ||
description | string | Description of the engine | ||
url | URL | URL of the engine | true | |
status | enum | draft |
Current status of the engine, see below | true |
schema | url | null | Related dedicated engine schema |
Dedicated Engine Status
Can be one of draft
, schema_review
, annotating_initial
, annotating_review
, annotating_training
, training_started
, training_finished
, and retraining
If status
is not draft
, the whole engine and its schema become read-only.
Request a new Dedicated Engine
Request a new Dedicated Engine using a form (multipart/form-data)
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-F document_type="Custom invoice" -F document_language="en-US" -F volume="9" \
-F sample_uploads=@document1.pdf -F sample_uploads=@document2.pdf \
'https://<example>.rossum.app/api/v1/dedicated_engines/request'
{
"id": 3001,
"url": "https://<example>.rossum.app/api/v1/dedicated_engines/3001",
"name": "Requested engine - Custom invoice",
"status": "sample_review",
"description": "AI engine trained to recognize customer-provided data for the customer's specific data capture requirements",
"schema": null
}
POST /v1/dedicated_engines/request
Request training of a new Dedicated Engine
Field | Type | Description | Required |
---|---|---|---|
document_type | str | Type of the document the engine should predict | True |
document_language | str | Language of the documents | True |
volume | int | Estimated volume per year | True |
sample_uploads | list[FILE] | Multiple sample files of the documents. |
Response
Status: 200
Returns created dedicated engine object.
List all dedicated engines
List all dedicated engines
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/dedicated_engines'
{
"pagination": {
"total": 1,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 3000,
"name": "Dedicated engine 1",
"description": "AI engine trained to recognize data for the specific data capture requirement",
"url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
"status": "draft",
"schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"
}
]
}
GET /v1/dedicated_engines
Retrieve all dedicated engine objects.
Response
Status: 200
Returns paginated response with a list of dedicated engine objects.
Create a new dedicated engine
Create a new dedicated engine
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"name": "Dedicated engine 1", "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6001"}' \
'https://<example>.rossum.app/api/v1/dedicated_engines'
{
"id": 3001,
"name": "Dedicated engine 1",
"description": "AI engine trained to recognize data for the specific data capture requirement",
"url": "https://<example>.rossum.app/api/v1/dedicated_engines/3001",
"status": "draft",
"schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6001"
}
POST /v1/dedicated_engines
Create a new dedicated engine object.
Response
Status: 201
Returns created dedicated engine object.
Retrieve a dedicated engine
Get dedicated engine object
3000
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/dedicated_engines/3000'
{
"id": 3000,
"name": "Dedicated engine 1",
"description": "AI engine trained to recognize data for the specific data capture requirement",
"url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
"status": "draft",
"schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"
}
GET /v1/dedicated_engines/{id}
Get a dedicated engine object.
Response
Status: 200
Returns dedicated engine object.
Update a dedicated engine
Update dedicated engine object
3000
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"name": "New name", "schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"}' \
'https://<example>.rossum.app/api/v1/dedicated_engines/3000'
{
"id": 3000,
"name": "New name",
"description": "AI engine trained to recognize data for the specific data capture requirement",
"url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
"status": "draft",
"schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"
}
PUT /v1/dedicated_engines/{id}
Update dedicated engine object.
Response
Status: 200
Returns updated dedicated engine object.
Update part of a dedicated engine
Update content URL of dedicated engine object
3000
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"name": "New name"}' \
'https://<example>.rossum.app/api/v1/dedicated_engines/3000'
{
"id": 3000,
"name": "New name",
"description": "AI engine trained to recognize data for the specific data capture requirement",
"url": "https://<example>.rossum.app/api/v1/dedicated_engines/3000",
"status": "draft",
"schema": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000"
}
PATCH /v1/dedicated_engines/{id}
Update part of a dedicated engine object.
Response
Status: 200
Returns updated dedicated engine object.
Delete a dedicated engine
Delete dedicated engine
3000
curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/dedicated_engines/3000'
DELETE /v1/dedicated_engines/{id}
Delete dedicated engine object.
Response
Status: 204
Dedicated Engine Schema
Example dedicated engine schema object
{
"id": 6000,
"url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
"content": {
"training_queues": [
"https://<example>.rossum.app/api/v1/queues/123",
"https://<example>.rossum.app/api/v1/queues/200",
"https://<example>.rossum.app/api/v1/queues/321"
],
"fields": [
{
"category": "datapoint",
"engine_output_id": "document_id",
"type": "string",
"label": "Document ID",
"description": "Document number",
"trained": true,
"sources": [
{
"queue": "https://<example>.rossum.app/api/v1/queues/123",
"schema_id": "document_id"
},
{
"queue": "https://<example>.rossum.app/api/v1/queues/200",
"schema_id": "custom_name_document_id"
}
]
},
{
"category": "multivalue",
"children": {
"category": "datapoint",
"engine_output_id": "order_id",
"type": "string",
"label": "Order Number",
"description": "Purchase order identification (Order Numbers not captured as 'sender_order_id')",
"trained": false,
"sources": [
{
"queue": "https://<example>.rossum.app/api/v1/queues/200",
"schema_id": "custom_name_order_id"
},
{
"queue": "https://<example>.rossum.app/api/v1/queues/321",
"schema_id": "order_id"
}
]
}
},
{
"category": "multivalue",
"engine_output_id": "line_items",
"type": "grid",
"label": "Line Items",
"description": "Line item column types.",
"trained": true,
"children": {
"category": "tuple",
"children": [
{
"category": "datapoint",
"engine_output_id": "table_column_tax",
"type": "number",
"label": "Item Tax",
"description": "Tax amount for the line",
"trained": true,
"sources": [
{
"queue": "https://<example>.rossum.app/api/v1/queues/123",
"schema_id": "table_column_tax"
},
{
"queue": "https://<example>.rossum.app/api/v1/queues/200",
"schema_id": "custom_table_column_tax"
}
]
},
{
"category": "datapoint",
"engine_output_id": "table_column_rate",
"type": "number",
"label": "Item Rate",
"description": "Tax rate for the line item",
"trained": true,
"sources": [
{
"queue": "https://<example>.rossum.app/api/v1/queues/321",
"schema_id": "table_column_rate"
}
]
}
]
}
}
]
}
}
An engine schema is an object which describes what fields are available in the engine. Do not confuse engine schema with Document Schema.
Attribute | Type | Default | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the engine schema | true | |
url | URL | URL of the engine schema | true | |
content | object | See below for description of the engine schema content |
Schema can be edited only if its Dedicated Engine has status draft
.
Content structure
Top-level
Attribute | Type | Description |
---|---|---|
training_queues | list[URL] | List of Queues that will be used for the training. Note that queues can't have delete_after field set, otherwise a validation error is raised. (see queue fields) |
fields | list[object] | Container for fields declarations. It may contain only objects of category multivalue or datapoint |
Multivalue
Attribute | Type | Description | Read-only |
---|---|---|---|
category | string | Category of the object, multivalue |
|
engine_output_id | string | Unique name of the new extracted field in the trained Dedicated Engine | |
label | string | User-friendly label for an object, shown in the user interface | |
trained | bool | Whether the field was successfully trained | true |
type | enum | Type of the trained field. One of: grid and freeform . |
|
description | string | Description of field attribute | |
children | object | Object specifying type of children. It may contain only objects with categories tuple or datapoint . |
Multivalue objects with datapoint
children do not have engine_output_id
, label
, trained
, type
, or description
attributes
Tuple
Attribute | Type | Description |
---|---|---|
category | string | Category of the object, tuple |
children | list[object] | Array specifying objects that belong to a given tuple. It may contain only objects with category datapoint . |
Datapoint
Attribute | Type | Description | Read-only |
---|---|---|---|
category | string | Category of the object, datapoint |
|
engine_output_id | string | Name of the new extracted field in the trained Dedicated Engine | |
label | string | User-friendly label for an object, shown in the user interface | |
trained | bool | Whether the field was successfully trained | true |
type | enum | Type of the trained field. One of: number , string , date , and enum |
|
description | string | Description of field attribute | |
sources | list[Sources] | Mapping describing the source Queues and their fields to train this field from |
Sources
Attribute | Type | Description |
---|---|---|
queue | URL | Queue to map the field from. Only one Queue per engine output is allowed |
schema_id | string | Id of the field to map. The id must exist in the mapped Queue's schema |
Validate a dedicated engine schema
Validate content and integrity of dedicated engine schema object
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"content":{"training_queues":["https://<example>.rossum.app/api/v1/queues/123"],"fields":[{"engine_output_id":"document_id","category":"datapoint","type":"string","label":"ID","description":"Document ID","sources":[{"queue":"https://<example>.rossum.app/api/v1/queues/123","schema_id":"document_id"}]}]}}' \
'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/validate'
POST /v1/dedicated_engine_schemas/validate
Validate dedicated engine schema object, check for errors. Additionally, to the basic checks done by the CRUD endpoints, this endpoint checks that:
- The declared
engine_output_id
s are unique across the whole schema - The mapped Queue datapoints (via
schema_id
s) are of the same type as the declaredtype
- The mapped Queue datapoints of
enum
type have exactly the same option values declared - Different shapes of datapoints are not mixed together
- The mapped Queue datapoints of Multivalue-Tuple fields are of the same
grid
/freeform
type - When mapping to a single Multivalue-Tuple field, all the datapoints mapped from one Queue must come from a single tabular datapoint
- Multiple fields do not link to the same Queue Datapoint
- A mapped field either maps a Queue Datapoint with
null
/emptyrir_field_names
or theengine_output_id
matches one of the mapped rir-namespacedrir_field_names
(prefixed byrir:
or nothing)
Response
Status: 200
Returns 200 and error description in case of validation failure.
Predict a dedicated engine schema
Predict a dedicated engine schema
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"training_queues":["https://<example>.rossum.app/api/v1/queues/123", "https://<example>.rossum.app/api/v1/queues/200", "https://<example>.rossum.app/api/v1/queues/321"]}' \
'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/predict'
{
"content": {
"training_queues": [
"https://<example>.rossum.app/api/v1/queues/123",
"https://<example>.rossum.app/api/v1/queues/200",
"https://<example>.rossum.app/api/v1/queues/321"
],
"fields": [...]
}
}
POST /v1/dedicated_engine_schemas/predict
Try to predict a dedicated engine schema based on the provided training queue's schemas. The predicted schema is not guaranteed to pass /v1/dedicated_engine_schemas/validate
check, only the checks done on engine schema save
Response
Status: 200
Returns 200 and predicted dedicated engine schema
List all dedicated engine schemas
List all dedicated engine schemas
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/dedicated_engine_schemas'
{
"pagination": {
"total": 1,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 6000,
"url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
"content": {
"training_queues": [...],
"fields": [...]
}
}
]
}
GET /v1/dedicated_engine_schemas
Retrieve all dedicated engine schema objects.
Response
Status: 200
Returns paginated response with a list of dedicated engine schema objects.
Create a new dedicated engine schema
Create a new dedicated engine schema
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"content": {"fields": [...], "training_queues": [...]}}' \
'https://<example>.rossum.app/api/v1/dedicated_engine_schemas'
{
"id": 6001,
"url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6001",
"content": {
"training_queues": [...],
"fields": [...]
}
}
POST /v1/dedicated_engine_schemas
Create a new dedicated engine schema object.
Response
Status: 201
Returns created dedicated engine schema object.
Retrieve a dedicated engine schema
Retrieve dedicated engine schema object
6000
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000'
{
"id": 6000,
"url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
"content": {
"training_queues": [...],
"fields": [...]
}
}
GET /v1/dedicated_engine_schemas/{id}
Get a dedicated engine schema object.
Response
Status: 200
Returns dedicated engine schema object.
Update a dedicated engine schema
Update dedicated engine schema object
6000
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"content": {"fields": [...], "training_queues": [...]}}' \
'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000'
{
"id": 6000,
"url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
"content": {
"training_queues": [...],
"fields": [...]
}
}
PUT /v1/dedicated_engine_schemas/{id}
Update dedicated engine schema object.
Response
Status: 200
Returns updated dedicated engine schema object.
Update part of a dedicated engine schema
Update content URL of dedicated engine schema object
6000
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"content": {"fields": [...], "training_queues": [...]}}' \
'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000'
{
"id": 6000,
"url": "https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000",
"content": {
"training_queues": [...],
"fields": [...]
}
}
PATCH /v1/dedicated_engine_schemas/{id}
Update part of a dedicated engine schema object.
Response
Status: 200
Returns updated dedicated engine schema object.
Delete a dedicated engine schema
Delete dedicated engine schema
6000
curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/dedicated_engine_schemas/6000'
DELETE /v1/dedicated_engine_schemas/{id}
Delete a dedicated engine schema object.
Response
Status: 204
Delete Recommendation
Example delete-recommendation object
{
"id": 1244,
"enabled": true,
"url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"triggers": [
"https://<example>.rossum.app/api/v1/triggers/500",
]
}
Attribute | Type | Required | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the delete recommendation. | true | |
enabled | boolean | Whether the associated triggers' rules should be active | ||
url | URL | URL of the delete recommendation. | true | |
organization | URL | URL of the associated organization. | true | |
queue | URL | URL of the associated queue. | ||
triggers | List[URL] | URL of the associated triggers. |
A Delete-recommendation is an object that binds together triggers that fire when a document meets a queue's criteria for a deletion recommendation. Currently, only binding to a single trigger is supported. The trigger bound to a DeleteRecommendation must belong to the same queue.
List all delete recommendations
List all delete recommendations
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/delete_recommendations'
{
"pagination": {
"total": 2,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"triggers": [
"https://<example>.rossum.app/api/v1/triggers/500",
],
},
...
]
}
GET /v1/delete_recommendations
Retrieve all delete recommendations objects.
Supported filters
Delete recommendations currently support the following filters:
Filter name | Type | Description |
---|---|---|
queue | integer | Filter only delete recommendations associated with given queue id (or multiple ids). |
Supported ordering
Delete recommendations currently support the following ordering:
id
, queue
For additional info please refer to filters and ordering.
Response
Status: 200
Returns paginated response with a list of delete recommendation objects.
Retrieve a delete recommendation
Get the delete recommendation object with ID
1244
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/delete_recommendations/1244'
{
"id": 1244,
"enabled": true,
"url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"triggers": [
"https://<example>.rossum.app/api/v1/triggers/500",
]
}
GET /v1/delete_recommendations/{id}
Get a delete recommendation object object.
Response
Status: 200
Returns a delete recommendation object.
Create a delete recommendation
Create a new delete recommendation
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"organization": "https://<example>.rossum.app/api/v1/organizations/132", "triggers": ["https://<example>.rossum.app/api/v1/triggers/5000"], "queue": "https://<example>.rossum.app/api/v1/queues/4857", "enabled": "True"}' \
'https://<example>.rossum.app/api/v1/delete_recommendations/'
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"enabled": true,
"triggers": ["https://<example>.rossum.app/api/v1/triggers/5000"]
}
POST /v1/delete_recommendations/
Create a new delete recommendation
Update a delete recommendation
Update the delete recommendation object with ID
1244
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"triggers": [], "enabled": "False"}' \
'https://<example>.rossum.app/api/v1/delete_recommendations/1244'
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"enabled": false,
"triggers": [],
...
}
PUT /v1/delete_recommendations/{id}
Update a delete recommendation
Update a part of a delete recommendation
Update flag enabled of delete recommendation object
1244
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"enabled": "False"}' \
'https://<example>.rossum.app/api/v1/delete_recommendations/1244'
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/delete_recommendations/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"enabled": false,
...
}
PATCH /v1/delete_recommendations/{id}
Update a part of a delete recommendation
Remove a delete recommendation
Remove the delete recommendation object 1244
curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/delete_recommendations/1244'
DELETE /v1/delete_recommendations/{id}
Remove a delete recommendation.
Document
Example document object
{
"id": 314628,
"url": "https://<example>.rossum.app/api/v1/documents/314628",
"s3_name": "272c2f01ae84a4e19a421cb432e490bb",
"parent": "https://<example>.rossum.app/api/v1/documents/203517",
"email": "https://<example>.rossum.app/api/v1/emails/987654",
"annotations": [
"https://<example>.rossum.app/api/v1/annotations/314528"
],
"mime_type": "application/pdf",
"creator": "https://<example>.rossum.app/api/v1/users/1",
"created_at": "2019-10-13T23:04:00.933658Z",
"arrived_at": "2019-10-13T23:04:00.933658Z",
"original_file_name": "test_invoice_1.pdf",
"content": "https://<example>.rossum.app/api/v1/documents/314628/content",
"attachment_status": null,
"metadata": {}
}
A document object contains information about one input file. To create it, one can:
Use upload endpoint
Attribute | Type | Default | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the document | true | |
url | URL | URL of the document | true | |
s3_name | string | Internal | true | |
parent | URL | null | URL of the parent document (e.g. the zip file it was extracted from) | true |
URL | URL of the email object that document was imported by (only for documents imported by email). | true | ||
annotations | list[URL] | List of annotations related to the document. Usually there is only one annotation. | true | |
mime_type | string | MIME type of the document (e.g. application/pdf ) |
true | |
creator | URL | User that created the annotation. | true | |
created_at | datetime | Timestamp of document upload or incoming email attachment extraction. | true | |
arrived_at | datetime | (Deprecated) See created_at |
true | |
original_file_name | string | File name of the attachment or upload. | true | |
content | URL | Link to the document's raw content (e.g. PDF file). May be null if there is no file associated. |
true | |
attachment_status | string | null | Reason, why the Document got filtered out on Email ingestion. See attachment status | true |
metadata | object | {} |
Client data. |
Attachment status
Possible values: filtered_by_inbox_resolution
, filtered_by_inbox_size
, filtered_by_inbox_mime_type
, filtered_by_inbox_file_name
, filtered_by_hook_custom
, filtered_by_queue_mime_type
, hook_additional_file
, filtered_by_insecure_mime_type
, extracted_archive
, failed_to_extract
, processed
, password_protected_archive
, broken_image
and null
List all documents
List all documents
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/documents'
{
"pagination": {
"total": 2,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 314628,
"url": "https://<example>.rossum.app/api/v1/documents/314628",
...
},
{
"id": 315609,
"url": "https://<example>.rossum.app/api/v1/documents/315609",
...
}
]
}
GET /v1/documents
Retrieve all document objects.
Supported filters: id
, email
, creator
, arrived_at
, created_at
, original_file_name
, attachment_status
Supported ordering: id
, arrived_at
, created_at
, original_file_name
, mime_type
, attachment_status
For additional info please refer to filters and ordering.
Response
Status: 200
Returns paginated response with a list of document objects.
Retrieve a document
Get document object
314628
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/documents/314628'
{
"id": 314628,
"url": "https://<example>.rossum.app/api/v1/documents/314628",
...
}
GET /v1/documents/{id}
Get a document object.
Response
Status: 200
Returns document object.
Create document
Create new document using a form (multipart/form-data)
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-F content=@document.pdf \
'https://example.app.rossum.ai/api/v1/documents'
Create new document by sending file in a request body
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H 'Content-Disposition: attachment; filename=document.pdf' --data-binary @file.pdf \
'https://example.app.rossum.ai/api/v1/documents'
Create new document by sending file in a request body (UTF-8 filename must be URL encoded)
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-H "Content-Disposition: attachment; filename*=utf-8''document%20%F0%9F%8E%81.pdf" --data-binary @file.pdf \
'https://example.app.rossum.ai/api/v1/documents'
Create documents using basic authentication
curl -u 'east-west-trading-co@example.app.rossum.ai:secret' \
-F content=@document.pdf \
'https://example.app.rossum.ai/api/v1/documents'
Create document with metadata and a parent document
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-F content=@document.pdf \
-F metadata='{"project":"Market ABC"}' \
-F parent='https://example.app.rossum.ai/api/v1/documents/456700' \
'https://example.app.rossum.ai/api/v1/documents'
{
"id": 314628,
"url": "https://example.app.rossum.ai/api/v1/documents/314628",
...
}
POST /v1/documents
Create a new document object.
Use this API call to create a document without an annotation. Suitable for creating documents for mime types that cannot be extracted by Rossum. Only one document can be created per request. Allowed attributes for creation request:
Attribute | Type | Description |
---|---|---|
content | bytes | The file to be uploaded. |
metadata | object | Client data. |
parent | URL | URL of the parent document (e.g. the original file based on which the uploaded content was created) |
Response
Status: 201
Returns created document object.
Update part of a document
Update metadata of a document object
314628
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"metadata": {"translation_file_name": "Rechnung.pdf"}}' \
'https://<example>.rossum.app/api/v1/documents/314628'
{
"id": 314628,
"url": "https://<example>.rossum.app/api/v1/documents/314628",
"metadata": {"translation_file_name": "Rechnung.pdf"},
...
}
PATCH /v1/documents/{id}
Update part of a document object.
Document content
Download document original
To download multiple documents in one archive, refer to documents download object.
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/documents/314628/content'
GET /v1/documents/{id}/content
Get original document content (e.g. PDF file).
Response
Status: 200
Returns original document file.
Permanent URL
Download document original from a permanent URL
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/original/272c2f01ae84a4e19a421cb432e490bb'
GET /v1/original/272c2f01ae84a4e19a421cb432e490bb
Get original document content (e.g. PDF file).
Response
Status: 200
Returns original document file.
Delete a document
Delete document
314628
curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/documents/314628'
DELETE /v1/documents/{id}
Delete a document object from the database. It also deletes the related annotation and page objects.
Never call this internal API, mark the annotation as deleted instead.
Response
Status: 204
Document Relation
Example document relation object
{
"id": 1,
"type": "export",
"key": null,
"annotation": "https://<example>.rossum.app/api/v1/annotations/123",
"documents": [
"https://<example>.rossum.app/api/v1/documents/124",
"https://<example>.rossum.app/api/v1/documents/125"
],
"url": "https://<example>.rossum.app/api/v1/document_relations/1"
}
A document relation object introduces additional relations between annotations and documents. An annotation can be related to one or more documents and it may belong to several such relations of different types at the same time. These are additional to the main relation between the annotation and the document from which it was created, see annotation.
Attribute | Type | Default | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the document relation | true | |
type | string | export |
Type of relationship. Possible values are export . See below |
|
key | string | Key used to distinguish several relationships of the same type | ||
annotation | URL | Annotation | ||
documents | list[URL] | List of related documents | ||
url | URL | URL of the relation | true |
Document relation types:
export
- Related documents are exports of the annotation data (e.g. in XML or JSON formats).
List all document relations
List all document relations
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/document_relations'
{
"pagination": {
"total": 1,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 1500,
"type": "export",
"key": null,
"annotation": "https://<example>.rossum.app/api/v1/annotations/123",
"documents": [
"https://<example>.rossum.app/api/v1/documents/456",
"https://<example>.rossum.app/api/v1/documents/457"
],
"url": "https://<example>.rossum.app/api/v1/document_relations/1500"
}
]
}
GET /v1/document_relations
Retrieve all document relation objects.
Supported filters:
Attribute | Description |
---|---|
id | ID of the document relation. Multiple values may be separated using a comma. |
type | Relation type. Multiple values may be separated using a comma. |
annotation | ID of annotation. Multiple values may be separated using a comma. |
key | Document relation key |
documents | ID of related document. Multiple values may be separated using a comma. |
Default ordering is by id
in descending order. Supported other orderings are: id
, type
, annotation
.
For additional info please refer to filters and ordering.
Response
Status: 200
Returns paginated response with a list of document relation objects.
Create a new document relation
Create a new document relation of type export
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"type": "export", "annotation": "https://<example>.rossum.app/api/v1/annotations/123", "documents":' \
'["https://<example>.rossum.app/api/v1/documents/124"]}' \
'https://<example>.rossum.app/api/v1/document_relations'
{
"id": 789,
"type": "export",
"key": null,
"annotation": "https://<example>.rossum.app/api/v1/annotations/123",
"documents": ["https://<example>.rossum.app/api/v1/documents/124"],
"url": "https://<example>.rossum.app/api/v1/document_relations/789"
}
POST /v1/document_relations
Create a new document relation object.
Response
Status: 201
Returns created document relation object.
Retrieve a document relation
Get document relation object with id
1500
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/document_relations/1500'
{
"id": 1500,
"type": "export",
"key": null,
"annotation": "https://<example>.rossum.app/api/v1/annotations/123",
"documents": ["https://<example>.rossum.app/api/v1/documents/124", "https://<example>.rossum.app/api/v1/documents/125"],
"url": "https://<example>.rossum.app/api/v1/document_relations/1500"
}
GET /v1/document_relations/{id}
Get a document relation object.
Response
Status: 200
Returns document relation object.
Update a document relation
Update the document relation object with id
1500
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"type": "export", "key": None, "annotation": "https://<example>.rossum.app/api/v1/annotations/123", "documents": ["https://<example>.rossum.app/api/v1/documents/124"]}' \
'https://<example>.rossum.app/api/v1/document_relations/1500'
{
"id": 1500,
"type": "edit",
"key": null,
"annotation": "https://<example>.rossum.app/api/v1/annotations/123",
"documents": ["https://<example>.rossum.app/api/v1/documents/124"],
"url": "https://<example>.rossum.app/api/v1/document_relations/1500"
}
PUT /v1/document_relations/{id}
Update document relation object.
Response
Status: 200
Returns updated document relation object.
Update part of a document relation
Update related documents on document relation object with ID
1500
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"documents": ["https://<example>.rossum.app/api/v1/documents/124", "https://<example>.rossum.app/api/v1/documents/125"]}' \
'https://<example>.rossum.app/api/v1/document_relations/1500'
{
"id": 1500,
"type": "export",
"key": null,
"annotation": "https://<example>.rossum.app/api/v1/annotations/123",
"documents": ["https://<example>.rossum.app/api/v1/documents/124", "https://<example>.rossum.app/api/v1/documents/125"],
"url": "https://<example>.rossum.app/api/v1/document_relations/1500"
}
PATCH /v1/document_relations/{id}
Update part of a document relation object.
Response
Status: 200
Returns updated document_relation object.
Delete a document relation
Delete empty document relation
1500
curl -X DELETE -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/document_relations/1500'
DELETE /v1/document_relations/{id}
Delete a document relation object with empty related documents. If some documents still participate in the relation, the caller must first delete those documents or update the document relation before deleting it.
Response
Status: 204
Documents Download
Example download object
{
"id": 105,
"url": "https://<example>.rossum.app/api/v1/documents/downloads/105",
"file_name": "test_invoice_1.pdf",
"expires_at": "2023-09-13T23:04:00.933658Z",
"content": "https://<example>.rossum.app/api/v1/documents/downloads/105/content",
}
Set of endpoints enabling download of multiple documents at once. The workflow of such action is as follows:
create a download object via POST on /documents/downloads. The response of the call will contain a task URL.
call GET on the task URL. Watch the task
status
to see when the task is ready.result_url
of a successful task will contain URL to the download object.either call GET on the download object to get metadata about the object or call GET on the download object's content endpoint to download the archive directly.
A download object contains information about a downloadable archive in .zip
format.
Attribute | Type | Description | Read-only |
---|---|---|---|
id | integer | Id of the download object | true |
url | URL | URL of the download object | true |
expires_at | datetime | Timestamp of a guaranteed availability of the download object and its content. Set to the archive creation time plus 2 hours. Expired downloads are being deleted periodically. | true |
file_name | string | Name of the archive to be downloaded. | true |
content | URL | Link to the download's raw content. May be null if there is no archive associated yet. |
true |
Retrieve a download
GET /v1/documents/downloads/{id}
Get a download object.
Response
Status: 200
Returns download object.
Create new download
Create new download object
curl -s -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-d '{"documents": ["https://<example>.rossum.app/api/v1/documents/123000", "https://<example>.rossum.app/api/v1/documents/123001"], "file_name": "monday_invoices.zip"}' \
'https://<example>.rossum.app/api/v1/documents/downloads'
{
"url": "https://<example>.rossum.app/api/v1/tasks/301"
}
POST /v1/documents/downloads
Create a new download object.
Argument | Type | Required | Default | Description |
---|---|---|---|---|
documents | list[URL] | true | Comma-separated list of document URLs to be included in the resulting downloadable archive. Max. 500 documents. | |
file_name | string | documents.zip |
The filename of the resulting archive. Must include a .zip extension. |
|
type | enum | document |
One of: document and source_document . |
|
zip | boolean | true | Use application/zip to bundle the download contents. |
- The
zip
value offalse
is only applicable for single document downloads where thefile_name
if omitted in the request is taken from the document being downloaded. - The
source_document
means that for each of the documents the most distant non-emptyparent
document is put into the download.
Response
Status: 202
The response Location
header provides the task url (same as in the JSON body of the response).
Returns created task object.
Retrieve download content
Download archive with original documents files
curl -s -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/documents/downloads/100/content'
GET /v1/documents/downloads/{id}/content
Get archive with original document files.
Response
Status: 200
Returns an archive with original document files.
Example email object
{
"id": 1234,
"url": "https://<example>.rossum.app/api/v1/emails/1234",
"queue": "https://<example>.rossum.app/api/v1/queues/4321",
"inbox": "https://<example>.rossum.app/api/v1/inboxes/8199",
"documents": [
"https://<example>.rossum.app/api/v1/documents/5678"
],
"parent": "https://<example>.rossum.app/api/v1/emails/1230",
"children": [
"https://<example>.rossum.app/api/v1/emails/1244"
],
"created_at": "2021-03-26T14:31:46.993427Z",
"last_thread_email_created_at": "2021-03-27T14:29:48.665478Z",
"subject": "Some email subject",
"from": {"email": "company@east-west.com", "name": "Company East"},
"to": [{"email": "east-west-trading-co-a34f3a@<example>.rossum.app", "name": "East West Trading"}],
"cc": [],
"bcc": [],
"body_text_plain": "Some body",
"body_text_html": "<div dir=\"ltr\">Some body</div>",
"metadata": {},
"type": "outgoing",
"annotation_counts": {
"annotations": 3,
"annotations_processed": 1,
"annotations_purged": 0,
"annotations_unprocessed": 1,
"annotations_rejected": 1
},
"annotations": [
"https://<example>.rossum.app/api/v1/annotations/1",
"https://<example>.rossum.app/api/v1/annotations/2",
"https://<example>.rossum.app/api/v1/annotations/4"
],
"related_annotations": [],
"related_documents": [
"https://<example>.rossum.app/api/v1/documents/3"
],
"filtered_out_document_count": 2,
"labels": ["rejected"]
}
An email object represents emails sent to Rossum inboxes.
Attribute | Type | Required | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the email | true | |
url | URL | URL of the email | true | |
queue | URL | true | URL of the associated queue | |
inbox | URL | true | URL of the associated inbox | |
parent | URL | URL of the parent email | ||
email_thread | URL | URL of the associated email thread | true | |
children | list[URL] | List of URLs of the children emails | ||
documents | list[URL] | List of documents attached to email | true | |
created_at | datetime | Timestamp of incoming email | true | |
last_thread_email_created_at | datetime | (Deprecated) Timestamp of the most recent email in this email thread | true | |
subject | string | Email subject | ||
from | email_address_object | Information about sender containing keys email and name . |
true | |
to | list[email_address_object] | List that contains information about recipients. | true | |
cc | list[email_address_object] | List that contains information about recipients of carbon copy. | true | |
bcc | list[email_address_object] | List that contains information about recipients of blind carbon copy. | true | |
body_text_plain | string | Plain text email section (shortened to 4kB). | ||
body_text_html | string | HTML email section (shortened to 4kB). | ||
metadata | object | Client data. | ||
type | string | Email type. Can be incoming or outgoing . |
true | |
annotation_counts | object | This attribute is intended for INTERNAL use only and may be changed in the future. Information about how many annotations were extracted from email attachments and in which state they currently are | true | |
annotations | list[URL] | List of URLs of annotations that arrived via email | true | |
related_annotations | list[URL] | List of URLs of annotations that are related to the email (e.g. rejected by that, added as attachment etc.) | true | |
related_documents | list[URL] | List of URLs of documents related to the email (e.g. by forwarding email containing document as attachment etc.) | true | |
creator | URL | User that have sent the email. None if email has been received via SMTP |
true | |
filtered_out_document_count | integer | This attribute is intended for INTERNAL use only and may be changed in the future without notice. Number of documents automatically filtered out by Rossum smart inbox (this feature can be configured in inbox settings). | true | |
labels | list[string] | List of email labels. Possible values are rejection , automatic_rejection , rejected , automatic_status_changed_info , forwarded , reply |
false | |
content | URL | URL of the emails content | true |
Email address object
Attribute | Type | Default | Description | Required |
---|---|---|---|---|
string | Email address | true | ||
name | string | Name of the email recipient |
Annotation counts object
This object stores numbers of annotations extracted from email attachments and their current status.
Attribute | Type | Description | Annotation status |
---|---|---|---|
annotations | integer | Total number of annotations | Any |
annotations_processed | integer | Number of processed annotations | exported , deleted , purged , split |
annotations_purged | integer | Number of purged annotations | purged |
annotations_unprocessed | integer | Number of not yet processed annotations | importing , failed_import , to_review , reviewing , confirmed , exporting , postponed , failed_export |
annotations_rejected | integer | Number of rejected annotations | rejected |
related_annotations | integer | Total number of related annotations | Any |
Email labels
Email objects can have assigned any number of labels.
Label name | Description |
---|---|
rejection | Outgoing informative email sent by Rossum after email was manually rejected. |
automatic_rejection | Informative automatic email sent by Rossum when no document was extracted from incoming email. |
automatic_status_changed_info | Informative automatic email sent by Rossum about document status change. |
rejected | Incoming email rejected together with all attached documents. |
forwarded | Outgoing email sent by forwarding other email. |
reply | Outgoing email sent by replying to another email. |
List all emails
List all emails
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/emails'
{
"pagination": {
"total": 2,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 1234,
"url": "https://<example>.rossum.app/api/v1/emails/1234",
"inbox": "https://<example>.rossum.app/api/v1/inboxes/8199",
"queue": "https://<example>.rossum.app/api/v1/queues/4321",
"documents": [
"https://<example>.rossum.app/api/v1/documents/5678"
],
...
]
}
GET /v1/emails
Retrieve all emails objects.
Supported filters: id
, created_at
, subject
, queue
, inbox
, documents
, from__email
, from__name
, to
, last_thread_email_created_at_before
, last_thread_email_created_at_after
, type
, email_thread
, has_documents
Supported ordering: id
, created_at
, subject
, queue
, inbox
, from__email
, from__name
, last_thread_email_created_at
For additional info please refer to filters and ordering.
Response
Status: 200
Returns paginated response with a list of email objects.
Retrieve an email
Get email object
1244
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/emails/1244'
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/emails/1244",
"queue": "https://<example>.rossum.app/api/v1/queues/4321",
"inbox": "https://<example>.rossum.app/api/v1/inboxes/8199",
"documents": ["https://<example>.rossum.app/api/v1/documents/5678"],
"parent": "https://<example>.rossum.app/api/v1/emails/1230",
"children": [],
"arrived_at": "2021-03-26T14:31:46.993427Z",
"last_thread_email_created_at": "2021-03-27T14:29:48.665478Z",
"subject": "Some email subject",
"from": {"email": "company@east-west.com"},
"to": [{"email": "east-west-trading-co-a34f3a@<example>.rossum.app"}],
"cc": [],
"bcc": [],
"body_text_plain": "",
"body_text_html": "",
"metadata": {},
"type": "outgoing",
"labels": [],
...
}
GET /v1/emails/{id}
Get an email object.
Response
Status: 200
Returns email object.
Update an email
Update email object
1244
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"queue": "https://<example>.rossum.app/api/v1/queues/4321", "inbox": "https://<example>.rossum.app/api/v1/inboxes/8236", "subject": "Some subject", "to": [{"email": "jack@east-west-trading.com"}]}' \
'https://<example>.rossum.app/api/v1/emails/1244'
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/emails/1244",
"queue": "https://<example>.rossum.app/api/v1/queues/4321",
"inbox": "https://<example>.rossum.app/api/v1/inboxes/8199",
"documents": [],
"parent": null,
"children": [],
"arrived_at": "2021-03-26T14:31:46.993427Z",
"last_thread_email_created_at": "2021-03-27T14:29:48.665478Z",
"subject": "Some subject",
"from": null,
"to": [{"email": "jack@east-west-trading.com"}],
"body_text_plain": "",
"body_text_html": "",
"metadata": {},
"type": "outgoing",
"labels": [],
...
}
PUT /v1/emails/{id}
Update email object.
Response
Status: 200
Returns updated email object.
Update part of an email
Update subject of email object
1244
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"subject": "Some subject"}' \
'https://<example>.rossum.app/api/v1/emails/1244'
{
"id": 1244,
"subject": "Some subject",
...
}
PATCH /v1/emails/{id}
Update part of email object.
Response
Status: 200
Returns updated email object.
Send email
Send email
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"to": [{"email": "jack@east-west-trading.com"}], "queue": "https://<example>.rossum.app/api/v1/queues/145300", "template_values": {"subject": "Some subject", "message": "<b>Hello!</b>"}}' \
'https://<example>.rossum.app/api/v1/emails/send'
POST /v1/emails/send
Send email to specified recipients. The number of emails that can be sent is limited (10 for trials accounts).
Key | Type | Required | Description |
---|---|---|---|
to | list[email_address_object] | List that contains information about recipients. | |
cc | list[email_address_object] | List that contains information about recipients of carbon copy. | |
bcc | list[email_address_object] | List that contains information about recipients of blind carbon copy. | |
template_values | object | false | Values to fill in the email template, it should always contain subject and message keys. See below for description. |
queue | URL | true | Link to email-related queue. |
related_annotations | list[URL] | false | List of links to email-related annotations. |
related_documents | list[URL] | false | List of URLs to email-related documents (on the top of related_annotations documents which are linked automatically). |
attachments | object | false | Keys are attachment types (currently only documents key is supported), value is list of URL. |
parent_email | URL | false | Link to parent email. |
labels | list[string] | false | List of email labels. |
At least one email in to
, cc
, bcc
must be filled.
email
object consists of names and email addresses:
Key | Type | Required | Description |
---|---|---|---|
true | Email address, e.g. john.doe@example.com |
||
name | string | false | Name related to the email, e.g. John Doe |
Template values
Object template_values
is used to create an outgoing email. Key subject
is used to fill an email subject and message
is used to fill a body of the email (it may contain a subset of html).
Values may contain other placeholders that are either built-in (see below) or specified in the template_values
object as well. For placeholders referring to annotations, the annotations from
related_annotations
attribute are used for filling in correct values.
Example of template_values
{
...
"template_values": {
"subject": "Document processed",
"message": "<p>The document was processed.<br>{{user_name}}<br>Additional notes: {{note}}</p>",
"note": "No issues found"
}
...
}
List of built-in placeholders
Placeholder | Description | Can be used in automation |
---|---|---|
organization_name | Name of the organization. | True |
app_url | App root url | True |
user_name | Username of the user sending the email. | False |
current_user_fullname | Full name of user sending the email. | False |
current_user_email | Email address of the user sending the email. | False |
parent_email_subject | Subject of the email we are replying to. | True |
sender_email | Email address of the author of the incoming email. | True |
annotation.document.original_file_name | Filenames of the documents belonging to the related annotation(s) | True |
annotation.content.value.{schema_id} | Content value of datapoints from email related annotation(s) | True |
annotation.id | Ids of the related annotation(s) | True |
annotation.url | Urls of the related annotation(s) | True |
annotation.assignee_email | Emails of the assigned users to the related annotation(s) | True |
Example request data
{
"to": [{"name": "John Doe", "email": "john.doe@rossum.ai"}],
"template_values": {
"subject": "Rejected!: {{parent_email_subject}}",
"message": "<p>Dear user,<br>Error occurred!<br><br>Note: {{rejection_note}}. Occurred on your document issued at {{ annotation.content.value.date_issue }}.<br>Yours, Rossum</p>",
"rejection_note": "There is no invoice id!"
},
"annotations": ["https://<example>.rossum.app/api/v1/annotations/123"],
"attachments": {
"documents": ["https://<example>.rossum.app/api/v1/documents/123"]
}
}
Response
Status: 200
Returns created email link.
Get email counts
Get email counts
curl -X GET -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
'https://<example>.rossum.app/api/v1/emails/counts'
{
"incoming": {
"total": 12,
"no_documents": 5,
"recent_with_no_documents_not_replied": 2,
"rejected": 1,
"recent_filtered_out_documents": 2
}
}
GET /v1/emails/counts
Retrieve counts of emails grouped based on status of extracted annotations.
Supports the same filters as list emails endpoint.
Response
Status: 200
Returns object which under the incoming
key contains object with email counts computed based on the status of extracted documents
Attribute | Type | Description |
---|---|---|
total | integer | Total number of emails |
no_documents | integer | Number of emails containing no attachment which was processed by Rossum |
recent_with_no_documents_not_replied | integer | Number of emails arrived in the last 14 days with no attachment processed by Rossum, not with rejected label and without any reply (i. e. email has no related children emails - see email docs). |
rejected | integer | Number of emails containing at least one document in rejected status (see document lifecycle) or with rejected label. |
recent_with_filtered_out_documents | integer | Number of emails arrived in the last 14 days containing one or more automatically rejected attachment by Rossum smart inbox (rules for email attachment filtering is defined here). |
Email content
GET /emails/<id>/content
Retrieve content of email.
Response
Status: 200
Email notifications management
Unsubscribe from automatic email notifications
curl -X GET 'https://<example>.rossum.app/api/v1/emails/subscription?content=eyJldmVudCI6ImRvY3VtZW50X3JlY2VpdmVkIiwiZW1haWwiOiJqaXJpLmJhdWVyQHJvc3N1bS5haSIsIm9yZ2FuaXphdGlvbiI6Imh0dHA6Ly9sb2NhbGhvc3Q6ODAwMC92MS9vcmdhbml6YXRpb25zLzEifQ&signature=LhgMR01vQ9NAsvAtOKifZpaYBi20vkhOK-Cm7HT1Cqs&subscribe=false'
<!DOCTYPE html>
...
</html>
GET /v1/emails/subscription?subscribe=false
Enable or disable subscription to automatic email notifications sent by Rossum.
Query parameter | Type | Default | Required | Description |
---|---|---|---|---|
signature | string | true | Signature used to sign the content (generated by our backend). | |
content | string | true | Signed content of the payload (generated by our backend). | |
subscribe | boolean | true | false | Designates whether the subscription is enabled or disabled. |
Response
Status: 200
Renders HTML page.
Email tracking events
Email tracking events
curl -X POST -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-d '{"payload": "ORSXG5DTFVZHVZLSFZQG6Y3BNQ5DC===", "signature": "nGoqalaYlSMFiCPmJDPWaiN3FLEm_cPbxA4mrgqodpk", "link": "https://rossum.ai", "event": "click"}' \
'https://<example>.rossum.app/api/v1/email_tracking_events'
POST /v1/email_tracking_events
Rossum has the ability to track email events: send
, delivery
, open
, click
, bounce
for sent emails.
Key | Type | Required | Description |
---|---|---|---|
payload | string | True | Encrypted email, domain and organization ID. |
event | string | True | Actions performed on the sent email: bounce, send, delivery, open, click. |
link | URL | False | The link from the email body that the user clicked on. |
signature | string | True | Signature used to sign the encrypted domain (generated by our backend). |
Response
Status: 201
Email Template
Example email template object
{
"id": 1234,
"url": "https://<example>.rossum.app/api/v1/email_templates/1234",
"name": "My Email Template",
"queue": "https://<example>.rossum.app/api/v1/queues/4321",
"organization": "https://<example>.rossum.app/api/v1/queues/210",
"triggers": [
"https://<example>.rossum.app/api/v1/triggers/500",
"https://<example>.rossum.app/api/v1/triggers/600"
],
"type": "custom",
"subject": "My Email Template Subject",
"message": "<p>My Email Template Message</p>",
"automate": true
}
An email template object represents templates one can choose from when sending an email from Rossum.
Attribute | Type | Default | Required | Description | Read-only |
---|---|---|---|---|---|
id | integer | Id of the email template | true | ||
url | URL | URL of the email template | |||
name | string | true | Name of the email template | ||
queue | URL | true | URL of the associated queue | ||
organization | URL | URL of the associated organization | |||
triggers | list[URL] | URLs of the linked triggers. Read more | |||
type | string | custom |
Type of the email template (see email template types) | ||
subject | string | "" |
Email subject | ||
message | string | "" |
HTML subset of text email section | ||
enabled | bool | true | (Deprecated) Use automate instead |
||
automate | bool | true | True if user wants to send email automatically on the action, see types | ||
to | list[email_address_object] | [] | List that contains information about recipients. | ||
cc | list[email_address_object] | [] | List that contains information about recipients of carbon copy. | ||
bcc | list[email_address_object] | [] | List that contains information about recipients of blind carbon copy. |
Email Template Types
Email Template objects can have one of the following types. Only templates with types rejection
and custom
can be manually created and deleted.
Template type name | Description |
---|---|
rejection | Template for a rejection email |
rejection_default | Default template for a rejection email |
email_with_no_processable_attachments | Template for a reply to an email with no attachments |
custom | Custom email template |
Default Email templates
Every newly created queue triggers a creation of five default email templates with default messages and subjects.
[
{
"id": 1234,
"url": "https://<example>.rossum.app/api/v1/email_templates/1234",
"name": "Annotation status change - confirmed",
"queue": "https://<example>.rossum.app/api/v1/queues/501",
"organization": "https://<example>.rossum.app/api/v1/organizations/123",
"subject": "Verified documents: {{ parent_email_subject }}",
"message": "<p>Dear sender,<br><br>Your documents have been checked by annotator.<br><br>{{ document_list }}<br><br>Regards</p>",
"type": "custom",
"triggers": ["https://<example>.rossum.app/api/v1/triggers/456"],
"automate": false,
"to": [{"email": "{{sender_email}}"}]
},
{
"id": 1235,
"url": "https://<example>.rossum.app/api/v1/email_templates/1235",
"name": "Annotation status change - exported",
"queue": "https://<example>.rossum.app/api/v1/queues/501",
"organization": "https://<example>.rossum.app/api/v1/organizations/123",
"subject": "Documents exported: {{ parent_email_subject }}",
"message": "<p>Dear sender,<br><br>Your documents have been successfully exported.<br><br>{{ document_list }}<br><br>Regards</p>",
"type": "custom",
"triggers": ["https://<example>.rossum.app/api/v1/triggers/457"],
"automate": false,
"to": [{"email": "{{sender_email}}"}]
},
{
"id": 1236,
"url": "https://<example>.rossum.app/api/v1/email_templates/1236",
"name": "Annotation status change - received",
"queue": "https://<example>.rossum.app/api/v1/queues/501",
"organization": "https://<example>.rossum.app/api/v1/organizations/123",
"subject": "Documents received: {{ parent_email_subject }}",
"message": "<p>Dear sender,<br><br>Your documents have been successfully received.<br><br>{{ document_list }}<br><br>Regards</p>",
"type": "custom",
"triggers": ["https://<example>.rossum.app/api/v1/triggers/458"],
"automate": false,
"to": [{"email": "{{sender_email}}"}]
},
{
"id": 1237,
"url": "https://<example>.rossum.app/api/v1/email_templates/1237",
"name": "Default rejection template",
"queue": "https://<example>.rossum.app/api/v1/queues/501",
"organization": "https://<example>.rossum.app/api/v1/organizations/123",
"subject": "Rejected document {{parent_email_subject}}",
"message": "<p>Dear sender,<br><br>The attached document has been rejected.<br><br><br>Best regards,<br>{{ user_name }}</p>",
"type": "rejection_default",
"triggers": [],
"automate": true,
"to": [{"email": "{{sender_email}}"}]
},
{
"id": 1238,
"url": "https://<example>.rossum.app/api/v1/email_templates/1238",
"name": "Email with no processable attachments",
"queue": "https://<example>.rossum.app/api/v1/queues/501",
"organization": "https://<example>.rossum.app/api/v1/organizations/123",
"subject": "No processable documents: {{ parent_email_subject }}",
"message": "<p>Dear sender,<br><br>Unfortunately, we have not received any document in the email that we can process. Please send a corrected version if appropriate.<br><br>Regards</p>",
"type": "email_with_no_processable_attachments",
"triggers": ["https://<example>.rossum.app/api/v1/triggers/459"],
"automate": false,
"to": [{"email": "{{sender_email}}"}]
}
]
Email template rendering
Email templates support Django Template Variables.
Please note that only simple variables are supported. Filters and the .
lookup are not. A template such as:
{% if subject %} The subject is {{ subject }}. {% endif %} The message is {{ message|lower }}.
with template settings such as:
{'subject': 'Hello', 'message': 'World'}
will render as:
{% if subject %} The subject is Hello. {% endif %} The message is .
List all email templates
List all email templates
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/email_templates'
{
"pagination": {
"total": 1,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 1234,
"url": "https://<example>.rossum.app/api/v1/email_templates/1234",
"name": "My Email Template",
"queue": "https://<example>.rossum.app/api/v1/queues/4321",
"organization": "https://<example>.rossum.app/api/v1/queues/210",
"subject": "My Email Template Subject",
"message": "<p>My Email Template Message</p>",
"type": "custom",
"automate": true
}
]
}
GET /v1/email_templates
Retrieve all email template objects.
Supported filters: id
, queue
, type
, name
Supported ordering: id
, name
For additional info please refer to filters and ordering.
Response
Status: 200
Returns paginated response with a list of email template objects.
Create new email template object
Create new email template in queue
4321
curl -s -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-d '{"queue": "https://<example>.rossum.app/api/v1/queues/4321", "name": "My Email Template", "subject": "My Email Template Subject", "message": "<p>My Email Template Message</p>", "type": "custom"}' \
'https://<example>.rossum.app/api/v1/email_templates'
{
"id": 1234,
"url": "https://<example>.rossum.app/api/v1/email_templates/1234",
"name": "My Email Template",
"queue": "https://<example>.rossum.app/api/v1/queues/4321",
"organization": "https://<example>.rossum.app/api/v1/queues/210",
"subject": "My Email Template Subject",
"message": "<p>My Email Template Message</p>",
"type": "custom"
}
POST /v1/email_templates
Create new email template object.
Response
Status: 201
Returns new email template object.
Retrieve an email template object
Get email template object
1234
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/email_templates/1234'
{
"id": 1234,
"url": "https://<example>.rossum.app/api/v1/email_templates/1234",
"name": "My Email Template",
"queue": "https://<example>.rossum.app/api/v1/queues/4321",
"organization": "https://<example>.rossum.app/api/v1/queues/210",
"subject": "My Email Template Subject",
"message": "<p>My Email Template Message</p>",
"type": "custom",
"automate": true
}
GET /v1/email_templates/{id}
Get an email template object.
Response
Status: 200
Returns email template object.
Update an email template
Update email template object
1234
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"queue": "https://<example>.rossum.app/api/v1/queues/4321", "subject": "Some new subject"}' \
'https://<example>.rossum.app/api/v1/email_templates/1234'
{
"id": 1234,
"url": "https://<example>.rossum.app/api/v1/email_templates/1234",
"name": "My Email Template",
"queue": "https://<example>.rossum.app/api/v1/queues/4321",
"organization": "https://<example>.rossum.app/api/v1/queues/210",
"subject": "Some new subject",
"message": "<p>My Email Template Message</p>",
"type": "custom",
"automate": true
}
PUT /v1/email_templates/{id}
Update email template object.
Response
Status: 200
Returns updated email template object.
Update part of an email template
Update subject of email template object
1234
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"subject": "Some new subject"}' \
'https://<example>.rossum.app/api/v1/email_templates/1234'
{
"id": 1234,
"subject": "Some new subject",
...
}
PATCH /v1/email_templates/{id}
Update part of an email template object.
Response
Status: 200
Returns updated email template object.
Delete an email template
Delete email template object
1234
curl -X DELETE 'https://<example>.rossum.app/api/v1/email_templates/1234' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03'
DELETE /v1/email_templates/{id}
Delete an email template object.
Response
Status: 204
Get email templates stats
Get stats for all email templates from queue with id
478
curl -X GET 'https://<example>.rossum.app/api/v1/email_templates/stats?queue=478' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03'
{
"pagination": {
"total": 6,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"url": "https://<example>.rossum.app/api/v1/email_templates/2",
"manual_count": 12,
"automated_count": 190
},
{
"url": "https://<example>.rossum.app/api/v1/email_templates/3",
"manual_count": 87,
"automated_count": 0
},
...
]
}
GET /v1/email_templates/stats
Get stats for email templates.
Response
Status: 200
Returns paginated response with a list of following objects
Attribute | Type | Description |
---|---|---|
url | URL | Link of the email template. |
manual_count | integer | Number of manually sent emails in the last 90 days based on given email template. |
automated_count | integer | Number of automatically sent emails in the last 90 days based on given email template. |
Supports the same filters as list email templates endpoint.
Render email template
Render email template
221
curl -s -X POST -H 'Content-Type: application/json' -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
-d '{"parent_email": "https://<example>.rossum.app/api/v1/emails/1234", "document_list": ["https://<example>.rossum.app/api/v1/documents/2314"], "to": [{"email": "{{ current_user_email }}"}]}' \
'https://<example>.rossum.app/api/v1/email_templates/221/render'
{
"to": [{"email": "satisfied.customer@rossum.ai"}],
"cc": [],
"bcc": [],
"subject": "My Email Template Subject: Rendered Parent Email Subject",
"message": "<p>My Email Template Message from user@example.com</p>"
}
POST /v1/email_templates/{id}/render
The rendered email template can be requested via the render
endpoint with the following attributes:
Attribute | Type | Default | Required | Description |
---|---|---|---|---|
to* | list[email_address_object] | [] |
false | List that contains information about recipients to be rendered. |
cc* | list[email_address_object] | [] |
false | List that contains information about recipients of carbon copy to be rendered. |
bcc* | list[email_address_object] | [] |
false | List that contains information about recipients of blind carbon copy to be rendered. |
parent_email | URL | false | Link to parent_email. | |
document_list | list[URL] | [] |
false | List of document's URLs to simulate sending of documents over email into Rossum |
annotation_list | list[URL] | [] |
false | List of annotation's URLs to use for rendering values for annotation.content placeholders |
template_values | object | {} |
false | Values to fill in the email template. Read more. |
- Inside the To, Cc and Bcc attributes a template variables can be used. The following ones are allowed to be used instead of the email field of the email_address_object.
Placeholder | Description | Can be used in automation |
---|---|---|
current_user_email | Email address of the user sending the email. | False |
sender_email | Email address of the author of the incoming email. | True |
annotation.document.original_file_name | Filename of the documents passed under annotation_list . |
True |
annotation.content.value.{schema_id} | Email address from a datapoint value of a related annotation. | True |
Render an email template object.
Response
Status: 200
Returns rendered message and subject of an email template
Attribute | Type | Description |
---|---|---|
to | list[email_address_object] | List that contains rendered information about recipients. |
cc | list[email_address_object] | List that contains rendered information about recipients of carbon copy. |
bcc | list[email_address_object] | List that contains rendered information about recipients of blind carbon copy. |
message | string | Rendered email template's message. |
subject | string | Rendered email template's subject. |
Email Thread
Example email thread object
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/email_threads/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"root_email": "https://<example>.rossum.app/api/v1/emails/5432",
"has_replies": false,
"has_new_replies": false,
"root_email_read": false,
"last_email_created_at": "2021-11-01T18:02:24.740600Z",
"subject": "Root email subject",
"from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
"created_at": "2021-06-10T12:38:44.866180Z",
"labels": [],
"annotation_counts": {
"annotations": 4,
"annotations_processed": 2,
"annotations_purged": 0,
"annotations_rejected": 1,
"annotations_unprocessed": 1
}
}
An email thread object represents thread of related objects in Rossum's inbox.
Attribute | Type | Required | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the email thread. | true | |
url | URL | URL of the email thread. | ||
organization | URL | URL of the associated organization. | true | |
queue | URL | URL of the associated queue. | true | |
root_email | URL | URL of the associated root email (first incoming email in the thread). | true | |
has_replies | boolean | True if the thread has more than one incoming emails. | true | |
has_new_replies | boolean | True if the thread has unread incoming emails. | ||
root_email_read | boolean | True if the root email has been opened in Rossum UI at least once. | true | |
created_at | datetime | Timestamp of the creation of email thread (inherited from arrived_at timestamp of the root email). | true | |
last_email_created_at | datetime | Timestamp of the most recent email in this email thread. | true | |
subject | string | Subject of the root email. | true | |
from | object | Information about sender of the root email containing keys email and name . |
true | |
labels | list[string] | This attribute is intended for INTERNAL use only and may be changed without notice. List of email thread labels set by root email. If root email is rejected and no other incoming emails are in thread, labels field is set to [rejected] . Labels is an empty list in all the other cases. |
true | |
annotation_counts | object | This attribute is intended for INTERNAL use only and may be changed without notice. Information about how many annotations were extracted from all emails in the thread and in which state they currently are | true |
Thread Annotation counts object
This object stores numbers of annotations extracted from all emails in given email thread.
Attribute | Type | Description | Annotation status |
---|---|---|---|
annotations | integer | Total number of annotations | Any |
annotations_processed | integer | Number of processed annotations | exported , deleted , purged , split |
annotations_purged | integer | Number of purged annotations | purged |
annotations_unprocessed | integer | Number of not yet processed annotations | importing , failed_import , to_review , reviewing , confirmed , exporting , postponed , failed_export |
annotations_rejected | integer | Number of rejected annotations | rejected |
related_annotations | integer | Total number of related annotations | Any |
List all email threads
List all email threads
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/email_threads'
{
"pagination": {
"total": 2,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/email_threads/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"root_email": "https://<example>.rossum.app/api/v1/emails/5432",
"has_replies": false,
"has_new_replies": false,
"root_email_read": false,
"last_email_created_at": "2021-11-01T18:02:24.740600Z",
"subject": "Root email subject",
"from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
"created_at": "2021-06-10T12:38:44.866180Z",
...
},
...
]
}
GET /v1/email_threads
Retrieve all email thread objects.
Supported filters
Email threads support following filters:
Filter name | Type | Description |
---|---|---|
has_root_email | boolean | Filter only email threads with a root email. |
has_replies | boolean | Filter only email threads with two and more emails with type incoming |
queue | integer | Filter only email threads associated with given queue id (or multiple ids). |
has_new_replies | boolean | Filter only email threads with unread emails with type incoming |
created_at_before | datetime | Filter only email threads with root email created before given timestamp. |
created_at_after | datetime | Filter only email threads with root email created after given timestamp. |
last_email_created_at_before | datetime | Filter only email threads with the last email in the thread created before given timestamp. |
last_email_created_at_after | datetime | Filter only email threads with the last email in the thread created after given timestamp. |
recent_with_no_documents_not_replied | boolean | Filter only email threads with root email that arrived in the last 14 days with no attachment processed by Rossum, excluding those: with rejected label, without any reply and when root email has been read. |
Supported ordering
Email threads support following ordering:
id
, created_at
, last_email_created_at
, subject
, from__email
, from__name
, queue
For additional info please refer to filters and ordering.
Response
Status: 200
Returns paginated response with a list of email thread objects.
Retrieve an email thread
Get email thread object
1244
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/email_threads/1244'
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/email_threads/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"root_email": "https://<example>.rossum.app/api/v1/emails/5432",
"has_replies": false,
"has_new_replies": false,
"root_email_read": false,
"last_email_created_at": "2021-11-01T18:02:24.740600Z",
"subject": "Root email subject",
"from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
"created_at": "2021-06-10T12:38:44.866180Z",
...
}
GET /v1/email_threads/{id}
Get an email thread object.
Response
Status: 200
Returns email thread object.
Update an email thread
Update email thread object
1244
curl -X PUT -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"root_email": "https://<example>.rossum.app/api/v1/emails/5432", "has_new_replies": "True"}' \
'https://<example>.rossum.app/api/v1/email_threads/1244'
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/email_threads/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"root_email": "https://<example>.rossum.app/api/v1/emails/5432",
"has_replies": false,
"has_new_replies": true,
"root_email_read": true,
"last_email_created_at": "2021-11-01T18:02:24.740600Z",
"subject": "Root email subject",
"from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
"created_at": "2021-06-10T12:38:44.866180Z",
...
}
PUT /v1/email_threads/{id}
Update email thread object.
Response
Status: 200
Returns updated email thread object.
Update part of an email thread
Update flag has_new_responses of email thread object
1244
curl -X PATCH -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
-d '{"has_new_replies": "True"}' \
'https://<example>.rossum.app/api/v1/emails/1244'
{
"id": 1244,
"url": "https://<example>.rossum.app/api/v1/email_threads/1244",
"organization": "https://<example>.rossum.app/api/v1/organizations/132",
"queue": "https://<example>.rossum.app/api/v1/queues/4857",
"root_email": "https://<example>.rossum.app/api/v1/emails/5432",
"has_replies": false,
"has_new_replies": true,
"root_email_read": true,
"last_email_created_at": "2021-11-01T18:02:24.740600Z",
"subject": "Root email subject",
"from": {"email": "satisfied.customer@rossum.ai", "name": "Satisfied Customer"},
"created_at": "2021-06-10T12:38:44.866180Z",
...
}
PATCH /v1/email_threads/{id}
Update part of email thread object.
Response
Status: 200
Returns updated email thread object.
Get email thread counts
Get email thread counts
curl -X GET -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' -H 'Content-Type: application/json' \
'https://<example>.rossum.app/api/v1/email_threads/counts'
{
"with_replies": 5,
"with_new_replies": 3,
"recent_with_no_documents_not_replied": 2
}
GET /v1/email_threads/counts
Retrieve counts of email threads.
Supports the same filters as list email threads endpoint.
Response
Status: 200
Returns object with email thread counts.
Attribute | Type | Description |
---|---|---|
with_replies | integer | Number of email threads containing two or more incoming emails |
with_new_replies | integer | Number of emails threads containing unread incoming replies. |
recent_with_no_documents_not_replied | integer | Number of email threads with root email that arrived in the last 14 days without any attachments processed by Rossum, excluding those: with rejected label, without any reply (email thread contains only this email) and when root email has been read. |
Generic Engine
Example generic engine object
{
"id": 3000,
"url": "https://<example>.rossum.app/api/v1/generic_engines/3000",
"name": "Generic engine",
"description": "AI engine trained to recognize data for the specific data capture requirement",
"documentation_url": "https://rossum.ai/help/faq/generic-ai-engine/",
"schema": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000"
}
A Generic Engine object holds specification of training setup for Rossum trained Engine.
Attribute | Type | Default | Description | Read-only |
---|---|---|---|---|
id | integer | Id of the generic engine | true | |
url | URL | URL of the generic engine | true | |
name | string | Name of the generic engine | ||
description | string | Description of the generic engine | ||
documentation_url | url | null | URL of the generic engine's documentation | |
schema | url | null | Related generic engine schema |
List all generic engines
List all generic engines
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/generic_engines'
{
"pagination": {
"total": 1,
"total_pages": 1,
"next": null,
"previous": null
},
"results": [
{
"id": 3000,
"url": "https://<example>.rossum.app/api/v1/generic_engines/3000",
"name": "Generic engine",
"description": "AI engine trained to recognize data for the specific data capture requirement",
"documentation_url": "https://rossum.ai/help/faq/generic-ai-engine/",
"schema": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000"
}
]
}
GET /v1/generic_engines
Retrieve all generic engine objects.
Response
Status: 200
Returns paginated response with a list of generic engine objects.
Retrieve a generic engine
Get generic engine object
3000
curl -H 'Authorization: Bearer db313f24f5738c8e04635e036ec8a45cdd6d6b03' \
'https://<example>.rossum.app/api/v1/generic_engines/3000'
{
"id": 3000,
"url": "https://<example>.rossum.app/api/v1/generic_engines/3000",
"name": "Generic engine",
"description": "AI engine trained to recognize data for the specific data capture requirement",
"documentation_url": "https://rossum.ai/help/faq/generic-ai-engine/",
"schema": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000"
}
GET /v1/generic_engines/{id}
Get a generic engine object.
Response
Status: 200
Returns generic engine object.
Generic Engine Schema
Example generic engine schema object
{
"id": 6000,
"url": "https://<example>.rossum.app/api/v1/generic_engine_schemas/6000",
"content": {
"training_queues": [],
"fields": [
{
"category": "datapoint",
"engine_output_id": "document_id",
"type": "string",
"label": "label text",
"description": "description text",
"trained": true,
"sources": []
},
{
"category": "multivalue",
"engine_output_id": "my_cool_ids",
"label": "label text",
"description": "description text",
"type": "freeform",
"trained": false,
"children": {
"category": "datapoint",
"engine_output_id": "my_cool_id",
"type": "enum",
"label": "label text",
"description": "description text",
"trained": false,
"sources": []
}
},
{
"category": "multivalue",
"engine_output_id": "date_timezone_table",
"label": "label text",
"description": "description text",
"type": "grid",
"trained": true,
"children": {
"category": "tuple",
"children": [
{
"category": "datapoint",
"engine_output_id": "date",
"type": "date",
"label": "label text",
"description": "description text",
"trained": true,
"sources": []
},
{
"category": "datapoint",
"engine_output_id": "timezone",
"type": "string",
"label": "label text",
"description"<