How to Detect Spam with Similarity Matching
The Similarity Detection feature in ALTCHA Sentinel lets you compare new messages against known spam-like examples using semantic similarity — going beyond exact matches to catch variations in meaning. This guide walks you through how to set it up, train it, and use it effectively to moderate content in chats, forums, and user-generated platforms.
How It Works
The similarity engine uses cosine similarity on text embeddings generated by the open-source model
all-MiniLM-L6-v2. This model is optimized for encoding sentences and short paragraphs. Input text longer than 256 word pieces is automatically truncated.
Rather than checking for exact word matches, the model evaluates semantic similarity. For example, phrases like “message me”, “text me”, and “contact us” are considered closely related in meaning.
Although primarily trained on English data, the model supports multiple languages with varying accuracy.
Setup
- Ensure you have Sentinel installed.
- Create a new API key and assign it to a Security Group with Restricted access level (the Similarity API is non-public).
Training Data
The Similarity API accepts either:
- Direct
examples(an array of strings), or
groups(names of predefined training data sets).
To create or manage training data, use the Training Data API, or add it manually through the interface.
Weights and Thresholds
Training data items can include:
threshold: Minimum similarity score to count as a match (e.g., 0.7 = 70% similarity).
weight: Multiplier applied to the similarity score (e.g., 0.75 x weight = final score).
Match Against Examples
The most basic use case is to compare a given input
text against a list of
examples.
Partial Matching
To match phrases within longer text, enable partial matching by setting
"partial": true. In this mode,
examples should be short (e.g. 1–5 words).
Using Predefined Training Data
Instead of passing
examples directly, you can reference training
groups. In the example below, the system matches the input text against the
chat_spam training group.
Creating Training Data via API
You can submit new training examples through the API—for instance, when a user clicks a “Report Spam” button. Use the
POST /v1/training-data endpoint.
Always review submitted data to ensure quality. High-quality training examples result in more accurate spam detection.
Evaluating Similarity Scores
The Similarity API returns a list of matched examples and their corresponding similarity
score, a value between
0.0 and
1.0. A higher score means the input text is more semantically similar to the example.
However, the API does not label content as “spam” automatically. It’s up to you to evaluate the response and decide what threshold is appropriate for your use case.
How to Detect Spam
You should treat a message as spam if it exceeds a threshold score that you define. A good starting point is:
0.7and above → likely spam
0.4 – 0.7→ possibly suspicious, depending on context
- below
0.4→ usually safe
These values are not strict rules. You can adjust them based on your tolerance for false positives.
For partial matches, you may want to use a lower threshold (e.g.
0.6) since matches tend to be fuzzier.
Tips
-
Use Partial Matching for Short Phrases
Enable partial matching when scanning for short phrases within longer messages. Keep your
examplesconcise—ideally 1 to 5 words—for the most accurate results.
-
Keep Texts Short
To ensure optimal performance and accuracy, limit the length of
text,
examples, and training data to 256 tokens (word pieces). For partial matching, keep examples especially brief and targeted.
-
Stop on First Match
For performance or early spam detection, consider using the
stopOnMatchparameter. This sets a threshold (0.0–1.0) at which the system will stop processing further matches once a result exceeds it.