Esta página aún no está disponible en tu idioma.

How to use Email Spam Filter

The Email Spam Filter lets your services handle inbound emails in a reliable and structured way. Instead of writing your own parser or classifier, you can send raw EML (RFC 822) files to Sentinel’s HTTP API and receive back a parsed email object along with a detailed classification report.

This guide walks you through the setup, API usage, and configuration options to help you integrate the Email Spam Filter into your applications and workflows.

Setup

Ensure you have Sentinel installed.
Create a new API key and assign it to a Security Group with Restricted access level (the EML API is non-public).

Receiving Emails

Sentinel does not include an SMTP server. You need to handle email reception separately and then forward the messages to Sentinel for analysis. Common options include:

Fetching new emails directly from your mail server
Using a cloud provider like AWS SES for inbound email

In all cases, emails are exchanged in EML (RFC 822) format, which is the raw format you must submit to Sentinel’s API for parsing and classification.

Parsing and Classifying Emails

To process an email in EML format, use the POST /v1/eml endpoint:

Request
Response

curl -X POST http://localhost:8080/v1/eml \
  -H "Content-Type: application/octet-stream" \
  -H "Authorization: Bearer {API_KEY}" \
  --data-binary @email.eml

{
  "authentication": {
    "ARC": {
      "comment": "i=1 spf=pass dkim=pass dkdomain=example.com dmarc=pass fromdomain=example.com",
      "result": "pass"
    },
    "DKIM": [
      {
        "comment": "invalid public key",
        "result": "neutral",
        "signingDomain": "example.com"
      }
    ],
    "SPF": {
      "comment": "example.host: domain of hello@example.com designates 1.2.3.4 as permitted sender",
      "result": "pass"
    }
  },
  "classification": {
    "classification": "GOOD",
    "score": 0.5,
    "email": {
      "rules": {
        "DISPOSABLE": {
          "score": 0
        },
        "DMARC": {
          "score": 0
        },
        "FREE_PROVIDER": {
          "score": 0
        },
        "MX": {
          "score": 0
        }
      },
      "score": 0,
      "time": 0.334,
      "triggeredRules": []
    },
    "ip": null,
    "location": null,
    "rateLimit": null,
    "similarity": null,
    "text": {
      "classifier": "en",
      "language": "en",
      "rules": {
        "CAPITALIZATION": {
          "score": 0
        },
        "CURRENCY": {
          "score": 0
        },
        "EMOJI": {
          "score": 0
        },
        "EXCLAMATION": {
          "score": 0
        },
        "HASH_TAGS": {
          "score": 0
        },
        "HTML": {
          "score": 0
        },
        "HTML_INJECTION": {
          "score": 0
        },
        "NUMBERS_ONLY": {
          "score": 0
        },
        "PROFANITY": {
          "score": 0
        },
        "RANDOM_CHARS": {
          "score": 0
        },
        "SHORT_TEXT": {
          "score": 1
        },
        "SPAM_WORDS": {
          "score": 0
        },
        "SPECIAL_CHARS": {
          "score": 0
        },
        "SQL_INJECTION": {
          "score": 0
        },
        "UNEXPECTED_LANGUAGE": {
          "score": 0
        },
        "URL": {
          "score": 0
        }
      },
      "score": 1,
      "time": 0.173,
      "triggeredRules": [
        "SHORT_TEXT"
      ]
    },
    "triggeredRules": [
      "SHORT_TEXT"
    ]
  },
  "mail": {
    "attachments": [],
    "cc": null,
    "from": [
      {
        "address": "test@example.com",
        "name": ""
      }
    ],
    "headers": [
      {
        "name": "subject",
        "value": "Test email"
      },
      {
        "name": "from",
        "value": {
          "address": "test@example.com",
          "name": ""
        }
      },
      {
        "name": "content-type",
        "params": {
          "boundary": "aaaaa"
        },
        "value": "multipart/mixed"
      }
    ],
    "html": "<h1>Heading 1</h1>\n\n<p>Paragraph</p>",
    "inReplyTo": null,
    "messageId": null,
    "priority": null,
    "replyTo": null,
    "subject": "Test email",
    "text": "Hello World",
    "to": null
  },
  "rules": {
    "ARC": {
      "score": null
    },
    "CLASSIFICATION": {
      "score": 0.5
    },
    "DELIVERED_TO_MISMATCH": {
      "score": 0
    },
    "DKIM": {
      "score": null
    },
    "FROM_SPOOFING": {
      "score": 0
    },
    "NO_SUBJECT": {
      "score": 0
    },
    "NO_TEXT": {
      "score": 0
    },
    "REPLY_TO_SPOOFING": {
      "score": 0
    },
    "SPF": {
      "score": null
    },
    "UNDISCLOSED_RECIPIENTS": {
      "score": 1
    }
  },
  "score": 1.5,
  "spam": false,
  "time": 5.189
}

Submit raw EML file to the endpoint.

The --data-binary @email.eml parameter submits the file email.eml in your current working directory as the request body.

Options

Processing configuration can be provided using HTTP headers:

X-Authenticate: Whether to perform ARC, DKIM, SPF authentication. (boolean, defaults to false).
X-Attachments-Upload: Whether to upload attachments to configured upload storage. If enabled, the content property is null and instead contentUri is returned. (boolean, defaults to false).
X-Attachments-Size-Limit: The maximum file size limit of an attachment to be processed (integer, defaults to 5000000 - 5MB).
X-Disable-Rules: A comma-separated list of rules to disable.
X-Similarity-Groups: A comma-separated list of similarity group names (training data) which should be checked.
X-Mail-From: The sender address received from MAIL FROM (defaults to Return-Path value).
X-Smtp-Ip: The IP address of the remote SMTP relay or client.
X-Smtp-Helo: The hostname provided in the HELO/EHLO command.
X-Smtp-Mta: Hostname of the server performing the authentication (defaults to the host’s own hostname).
X-Trust-Authentication: Whether to parse and trust the last Authentication-Results header for authentication (boolean, defaults to true).

Authentication

The email authentication refers to the verification of seals such as ARC, DKIM, and SPF, which verify the authenticity of the sender and the content.

By default, authentication is performed by checking the last Authentication-Results header, which should be added by your receiving mail server.

Alternatively, you can enable full authentication by sending X-Authenticate: true, which performs necessary verifications. Performing the authentication requires DNS lookups, which can negatively impact performance.

When using X-Authenticate: true, include the X-Mail-From and X-Smtp-* headers to enable proper SPF authentication.

The results of the authentication are present in the authentication parameter of the response:

{
  "authentication": {
    "ARC": {
      "comment": "i=1 spf=pass dkim=pass dkdomain=example.com dmarc=pass fromdomain=example.com",
      "result": "pass"
    },
    "DKIM": [
      {
        "comment": "invalid public key",
        "result": "neutral",
        "signingDomain": "example.com"
      }
    ],
    "SPF": {
      "comment": "example.host: domain of hello@example.com designates 1.2.3.4 as permitted sender",
      "result": "pass"
    }
  }
}

Parsing

The response from the POST /v1/eml endpoint contains the mail property, which includes the parsed email data, including a list of attachments. This JSON response can be directly consumed by services processing inbound email without the need to implement complex parsing logic in your services.

{
  "mail": {
    "attachments": [],
    "date": "2025-09-01T10:16:09.000Z",
    "from": [{
      "address": "hello@example.com",
      "name": "Hello"
    }],
    "headers": [],
    "html": null,
    "subject": "Test email",
    "text": "Hello world...",
    "to": [{
      "address": "me@example.com",
      "name": "Me"
    }]
  }
}

For the full response schema, see the POST /v1/eml endpoint documentation.

Attachments

Sentinel automatically parses attachments included in the EML file.

By default, the contents of the attachments are returned as Base64-encoded strings. This method is not suitable for large attachments, and it is recommended to upload attachments to the upload storage instead.

To upload attachments to the upload storage, send the X-Attachments-Upload: true header. The response will contain the property contentUri which can be downloaded using the GET /v1/blobs/{key} API endpoint.

By configuring the X-Attachments-Size-Limit header, you can control the maximum size of attachments which will be processed. If the attachment is greater than the limit, the attachment metadata will still appear in the parsed response under attachments, but its content will be ignored — the API will return content: null and contentUri: null.

Downloading Attachments

When using X-Attachments-Upload: true, attachments will be uploaded to configured upload storage and the parameter contentUri will be returned with each attachment in the following format:

blob://uploads/eml/attachments/2025-09-01/2c39908c3aa32a6484fd405a7c2f782e.png?size=11998&type=image%2Fpng&filename=image.png

To download an attachment using contentUri, use the GET /v1/blobs/{key} endpoint and pass the path returned in contentUri as the key parameter:

For example, from the blob URI above, the download URL will be:

GET /v1/blobs/eml/attachments/2025-09-01/2c39908c3aa32a6484fd405a7c2f782e.png?size=11998&type=image%2Fpng&filename=image.png

Classification

The email text is classified using the built-in Classifier and the result is provided in the classification parameter of the response.

If spam is detected, the response includes spam: true along with a score. A score of 2 or higher is classified as spam.

In addition to the Classifier’s rules, there are several email-specific rules listed below.

Rules

ARC: This rule matches if the ARC authentication does not successfully pass.
CLASSIFICATION: The overall score of the text-based classification (see classification property).
DELIVERED_TO_MISMATCH: This rule matches if the Delivered-To and To headers do not match.
DKIM: This rule matches if the DKIM authentication does not successfully pass.
FROM_SPOOFING: This rule matches if the From address includes a different address in the name field.
NO_SUBJECT: This rule matches if the Subject is empty.
NO_TEXT: This rule matches if the email does not contain any text or HTML message.
REPLY_TO_SPOOFING: This rule matches if the Reply-To address does not match the sender.
SPF: This rule matches if the SPF authentication does not successfully pass.
UNDISCLOSED_RECIPIENTS: This rule matches if there is no valid To address.

Detecting Phishing

There are two rules that indicate phishing attempts with a high degree of certainty:

FROM_SPOOFING
REPLY_TO_SPOOFING

Spoofing the sender’s and/or the reply-to addresses is a common practice in phishing emails, allowing the attacker to appear like a legitimate known or high-profile identity while directing replies to their own address.

Additionally, when a verified phishing URL is detected with the Phishing Detection feature, the Classifier triggers the URL_PHISHING rule, providing a strong indication of a phishing attempt.

Learning Spam

Email classification works out of the box without training data, but detection accuracy can be enhanced by using the Similarity and Training Data feature. This allows Sentinel to detect unwanted phrases or text segments in emails.

To enable similarity matching with training data, set the X-Similarity-Groups header and specify the groups to check against (in “partial” mode of the similarity detection).

This feature makes it possible to apply a more traditional approach to spam detection, using samples of known spam or user-reported messages. To add such examples, use the Training Data API.

Server Configuration

The body size limit for the POST /v1/eml endpoint is restricted by the ENV variable EML_BODY_LIMIT which defaults to 5MB. The API will return an error if the EML file is larger than this limit. To allow submission of larger EML files, increase the limit.