Aller au contenu

Ce contenu n’est pas encore disponible dans votre langue.

Classifier

The Classifier enables you to classify text and other information, helping filter spam and identify legitimate messages. It analyzes textual and contextual data using a built-in natural language processing engine and machine learning, providing a numeric score indicating message legitimacy.

Note: The Classifier is an improved version of the Spam Filter previously offered as SaaS.

Resources

Feature Highlights

  • Comprehensive analysis of text, email addresses, device information, and IP addresses
  • Spam detection through pattern recognition and phrase analysis
  • Security protection against HTML/SQL injection patterns
  • Language detection (supports 160+ languages)
  • Geo-location identification from IP addresses or time zones
  • Data matching against training datasets
  • Full support for 19 languages (partial support for others)
  • High performance suitable for real-time classification (~10ms HTTP round-trip for 10KB texts)

Use Cases

  • Comprehensive anti-spam: Detect spam submitted through online forms or APIs by analyzing text and validating factors like email addresses and IP addresses
  • Email address validation: Identify fake or suspicious email addresses and distinguish between “free” and “work” emails
  • IP address validation: Determine if an IP address is associated with a proxy or TOR exit, and check against blocklists
  • Security firewall: Protect against common HTML and SQL injection attempts in text
  • Language detection: Automatically detect up to 160 languages from provided text
  • Geo-location: Detect user location, commonly spoken languages, currency, and other information from IP addresses or time zones
  • Geo-fencing: Block specific countries, regions, or continents from accessing your website or APIs

Implementation Guide

The classifier endpoint is designed for back-end services. Below are common use cases:

Text Classification

Provide the text payload (or fields as name-value pairs) to classify input text for spam and security patterns.

Terminal window
POST /v1/classifier
Content-Type: application/json
{
"text": "Example text to classify"
}

IP Address Classification

Provide the user’s IP address to resolve geo-location and other properties.

Terminal window
POST /v1/classifier
Content-Type: application/json
{
"ip": "10.0.0.1"
}

Email Address Classification

Provide the email domain to verify DNS records and check against known disposable or free email providers.

Terminal window
POST /v1/classifier
Content-Type: application/json
{
"email": "@gmail.com"
}

Device Classification

Provide HTTP headers from the user’s device to verify device information.

Terminal window
POST /v1/classifier
Content-Type: application/json
{
"headers": {
"Accept": "...",
"Accept-Language": "...",
"User-Agent": "..."
},
"ip": "10.0.0.1"
}

Rate Limiter

Send a custom rateLimit object to limit user requests.

Terminal window
POST /v1/classifier
Content-Type: application/json
{
"rateLimit": {
"limit": "10/1h"
}
}

For more about rate limiters, see the Rate-Limiters guide.

Response Format

The response includes the classification result with overall score and triggered rules.

{
"classification": "BAD",
"score": 3.4,
"triggeredRules": ["PROFANITY"],
...
}

Response details:

  • classification - GOOD (< 1), NEUTRAL (1-2), or BAD (> 2)
  • score - Numeric score (scores > 2 indicate spam)
  • triggeredRules - Array of matching rules, sorted by score

For details, see the API Documentation.

Alternative: Widget Integration

To enable form field classification in the ALCTHA Widget:

  1. Navigate to Security Group configuration > Advanced tab under rules
  2. Create a Set-type rule
  3. Select Classify Form Fields and set to true

Language Support

Fully supported languages:

  • Bulgarian
  • Czech
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hungarian
  • Italian
  • Norwegian
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Slovak
  • Spanish
  • Swedish

For unsupported languages, the system defaults to English-based analysis with basic functionality.