Saltearse al contenido

Esta página aún no está disponible en tu idioma.

Logging to ClickHouse

ALTCHA Sentinel supports logging to ClickHouse, enabling high-performance data storage and real-time analytics.

What is ClickHouse?

ClickHouse is a high-performance, columnar database management system designed for real-time analytical queries. It excels at handling large volumes of log data efficiently, enabling fast aggregation and reporting without impacting your primary operational databases.

Why Enable ClickHouse?

Storing request logs in ClickHouse enables scalable, high-speed analytics and reporting while reducing the load on your primary database. This separation improves overall system performance and allows you to leverage powerful analytics tools.

For detailed tuning tips, see Performance Tuning.

Key Benefits

  • Offloads logging workload from the primary database, improving its responsiveness
  • Provides efficient querying and aggregation optimized for analytics
  • Reduces disk I/O on container volumes by using ClickHouse’s optimized storage engine
  • Enables integration with external tools like ClickHouse clients or Business Intelligence (BI) platforms for advanced analysis

Configuration

Requirements

  • ClickHouse version 22.6 or later

Database Setup

  1. Create the ClickHouse database:
CREATE DATABASE IF NOT EXISTS altcha_sentinel
ENGINE = Atomic;
  1. Create the logs table with appropriate schema and indexes:
CREATE TABLE altcha_sentinel.logs (
accountId LowCardinality(String),
apiKeyId LowCardinality(String),
time DateTime,
browser UInt8,
context Map(String, String),
countryCode LowCardinality(FixedString(2)),
device UInt8,
endpoint UInt8,
ip IPv6,
method UInt8,
network UInt8,
path LowCardinality(String),
referrer LowCardinality(String),
triggeredRules Array(UInt8),
serverLatency UInt32,
statusCode UInt16,
verified Bool,
INDEX idx_statusCode statusCode TYPE set(1000) GRANULARITY 1,
INDEX idx_countryCode countryCode TYPE set(1000) GRANULARITY 1,
INDEX idx_method method TYPE set(1000) GRANULARITY 1
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(time)
ORDER BY (accountId, toStartOfHour(time), apiKeyId)
TTL time + INTERVAL 5 YEAR DELETE
SETTINGS index_granularity = 8192, flatten_nested = 0;

Retention Policy

The TTL clause defines how long data is kept in the table. In this example, logs older than 5 years are automatically deleted by ClickHouse to control storage size and comply with data retention requirements. You can adjust the retention period by modifying the interval in the TTL expression, for example:

TTL time + INTERVAL 1 YEAR DELETE

to keep logs for 1 year only.

Enabling ClickHouse Integration

Set the CLICKHOUSE_URL environment variable to enable ClickHouse logging. For advanced configurations, review related environment variables.

Example connection URL format:

http://user:password@localhost:8123/altcha_sentinel

Batching Configuration

To optimize performance, the ClickHouse client batches log entries before sending them. Adjust these environment variables as needed:

  • CLICKHOUSE_BATCH_MAX: Maximum number of logs per batch
  • CLICKHOUSE_BATCH_INTERVAL: Maximum flush interval before sending a batch

Identifier Mappings

For performance reasons, certain fields are stored as integers. Use the following mappings to decode these identifiers:

browser

NumberBrowser
1chrome
2firefox
3edge
4safari
5brave
6vivaldi
7opera

device

NumberDevice
1desktop
2console
3mobile
4tablet
5smarttv
6wearable
7embedded
8bot

network

NumberNetwork
1fixed
2mobile
3hosting
4proxy
5tor

method

NumberHTTP Method
1GET
2POST
3PATCH
4PUT
5DELETE
6OPTIONS
7QUERY
8HEAD

endpoint

NumberEndpoint
1challenge
2verify

rule

NumberRule
1CAPITALIZATION
2CURRENCY
3DMARC
4EMOJI
5EXCLAMATION
6FREE_PROVIDER
7HASH_TAGS
8HIGH_RISK_COUNTRY
9HOSTING
10HTML
11HTML_INJECTION
12MALICIOUS
13MX
14NUMBERS_ONLY
15PROFANITY
16PROXY
17RANDOM_CHARS
18SHORT_TEXT
19SPAM_WORDS
20SPECIAL_CHARS
21SQL_INJECTION
22TOR
23UNEXPECTED_LANGUAGE
24UNKNOWN_LANGUAGE
25URL
26BOT
27ACCEPT_HEADER_MISSING
28ACCEPT_LANGUAGE_HEADER_MISSING
29USER_AGENT_HEADER_MISSING
30DISPOSABLE
31LOCATION_DISTANCE
32TIMEZONE_MISMATCH
33RATE_LIMIT
34SIMILARITY

Implementation Notes

  • context — A custom key-value map containing additional metadata set during the verification process.
  • endpoint and path — When endpoint = 0, the full request path is recorded in the path field.
  • verified — Present only when endpoint = 2 (i.e., the /v1/verify endpoint). Indicates whether the verification attempt was successful (1 equals success).