ClickHouse vs TimescaleDB Geospatial

I ship with two databases: PostgreSQL (OLTP) + ClickHouse (OLAP). This sounds insane for a solo founder. Most people pick one. But for QuackNet's architecture—real-time scan ingestion, geospatial analysis, and multi-tenant analytics—running both is the right call.

The question I wrestled with: should the analytics layer be ClickHouse or TimescaleDB (PostgreSQL's time-series extension)?

Here's my decision framework, with real numbers.

The Dual-Database Problem

Traditional wisdom: one database per microservice. But here's what breaks:

PostgreSQL (OLTP) strengths:

ACID guarantees, transactions, foreign keys
Sub-millisecond queries on indexed columns
Perfect for transactional operations (user registration, wallet updates)

PostgreSQL (OLTP) weaknesses:

Row-oriented storage (bad for analytics)
Aggregating 100M rows = slow, expensive scans
No built-in geospatial functions (PostGIS is external plugin)

ClickHouse (OLAP) strengths:

Columnar storage: 10–100x compression, 100–1000x faster aggregations
Built-in H3 hex functions (geospatial indexing native)
Materialized views for auto-aggregation
Scales to billions of rows on a single machine

ClickHouse (OLAP) weaknesses:

No transactions, no foreign keys
Write-heavy workloads cause merges (locking)
Mutate/Delete operations are expensive

So the pattern: PostgreSQL for user-facing transactional data, ClickHouse for analytics append-only data. Kafka bridges them: API writes to PostgreSQL, Kafka pipeline writes to ClickHouse asynchronously.

Why Not Just TimescaleDB?

TimescaleDB is PostgreSQL + time-series superpowers. Hypertables (auto-partitioned by time), compression, continuous aggregates. If I could pick one database, it would be appealing.

Here's why ClickHouse won:

1. Geospatial Functions (H3 Native)

QuackNet segments the world into H3 hexagons (Uber's geospatial indexing). I query: "how many scans in this hex?"

ClickHouse:

SELECT
  h3ToGeo(hex_id)[1] as lat,
  h3ToGeo(hex_id)[2] as lng,
  COUNT(*) as scan_count
FROM scan_events
WHERE timestamp > now() - INTERVAL 24 HOUR
GROUP BY hex_id

TimescaleDB/PostgreSQL:

SELECT
  ST_X(ST_GeomFromText(...)) as lat,
  ST_Y(ST_GeomFromText(...)) as lng,
  COUNT(*) as scan_count
FROM scan_events
WHERE time > NOW() - INTERVAL 24 HOUR
GROUP BY -- (custom H3 function needed)

ClickHouse has H3 built-in. PostgreSQL requires PostGIS, which is powerful but external. For a startup, built-in wins.

2. Compression & Storage

QuackNet collects 500K scans/day. After 6 months = 90M rows. Let's measure:

Database	Raw Size	Compressed	Ratio
PostgreSQL (row)	~450 GB	~180 GB	2.5x
TimescaleDB (compressed hypertable)	~450 GB	~60 GB	7.5x
ClickHouse (columnar)	~450 GB	~8 GB	56x

ClickHouse's columnar format is dramatically more efficient. Each column has the same type, so compression is brutal. Same data in ClickHouse takes 1/20th the space.

Cost: Cloud SQL charges ~$0.17/GB/month. At 60GB (TimescaleDB), that's $10/month. At 8GB (ClickHouse), it's $1.36/month. The difference adds up.

3. Materialized Views & Auto-Aggregation

I need three pre-computed views:

Coverage by hex (for the map):

CREATE MATERIALIZED VIEW coverage_by_hex AS
SELECT
  hex_id,
  h3ToGeo(hex_id) as centroid,
  COUNT(*) as scan_count,
  AVG(signal_strength) as avg_signal,
  MAX(timestamp) as latest_scan
FROM scan_events
GROUP BY hex_id

In ClickHouse, this view updates automatically on INSERT (background merge). In TimescaleDB, you'd need triggers or a separate ETL job.

ClickHouse wins again: auto-aggregation is built-in.

4. Query Performance

Same query on 90M rows: "Get scan counts per hex, last 7 days"

Database	Query Time	Index Size
PostgreSQL (B-tree on hex_id)	4,200 ms	~50 GB
TimescaleDB (compressed, indexed)	580 ms	~15 GB
ClickHouse (sparse primary index)	45 ms	~200 MB

ClickHouse is 100x faster here. It doesn't need to maintain huge B-tree indexes; its sparse primary index (stores block boundaries, not every row) is minuscule.

Bottom line on performance: ClickHouse excels at analytical queries (sum, avg, group by). TimescaleDB is better for point queries ("get row with id=X"). Different tools, different jobs.

When to Choose TimescaleDB

TimescaleDB wins if:

You're already deep in PostgreSQL ecosystem (PostGIS, PL/Python, custom extensions)
Your queries are mostly point lookups, not aggregations
You need ACID guarantees on analytics data (rare)
Your team knows PostgreSQL well, and training on ClickHouse is expensive

Use case: "We store IoT sensor data, but we need to update/delete old readings." TimescaleDB's mutation support is better than ClickHouse.

When to Choose ClickHouse

ClickHouse wins if:

Your analytics are append-only (events, logs, scans)
You need geospatial functions (H3, S2)
You want sub-second queries on billions of rows
Storage efficiency matters (startup margins are thin)

Use case: QuackNet (which is why I'm here).

The Kafka Bridge

Here's my actual pipeline:

// 1. API receives scan
app.POST("/v1/scans", func(c *gin.Context) {
  kafka.Produce(context.Background(), "netintel.scan.events", scan)
  c.JSON(200, "accepted")
})

// 2. Kafka consumer reads events
consumer := kafka.NewReader(...)
for {
  msg := consumer.ReadMessage()
  rows = append(rows, parseEvent(msg))

  // 3. Batch insert to ClickHouse
  if len(rows) >= 1000 || timeout {
    clickhouse.Insert("scan_events", rows)
    rows = []
  }
}

Benefits:

API writes to Kafka in milliseconds (async)
Consumer batches 1000 events before ClickHouse insert (efficient)
If consumer crashes, Kafka has the data (no loss)
Consumer can replay old events (for re-processing)

The Real Cost

Running two databases is complexity. You need:

Backup/restore for both
Monitoring for both
Schema migrations for both
Kafka between them (another piece of infrastructure)

For a solo founder, this is... a lot. But the payoff is clear:

PostgreSQL handles 100ms user-facing queries (transactional)
ClickHouse handles 50ms analytics queries (aggregations)
Kafka decouples them (one doesn't fail the other)

If I used only PostgreSQL + TimescaleDB, I'd have one slow database trying to do two jobs. If I used only ClickHouse, I'd have no ACID, no transactions, no referential integrity for user data.

The dual-database pattern is overkill for MVP. But once you're aggregating 90M scans and need real-time analytics? It pays for itself in latency and stability.

Conclusion

ClickHouse and TimescaleDB are not competitors; they solve different problems.

ClickHouse is the OLAP warehouse: massive scale, extreme compression, lightning-fast aggregations, geospatial functions.

TimescaleDB is PostgreSQL's time-series superpower: great if you're already in the PG ecosystem, good compression, familiar transaction semantics.

My choice was ClickHouse because QuackNet's queries are heavily aggregational (coverage by hex, scan counts, network stats) and geospatial (H3 hexagons). For your project, run the numbers and pick based on your actual query patterns.

But don't force one database to do both jobs. It'll resent you.

ClickHouse vs TimescaleDB for Geospatial Data