TL;DR. A Hypertab table with smart columns is a directed acyclic graph. Each smart column is a node. Every record flows through the graph in topological order. Columns in the same dependency layer run in parallel. Cross-record parallelism is plan-gated. A single edit cascades only to dependent cells. This post walks through how the engine is built, what trade-offs we made, and why the table shape beats a workflow graph once you cross a few thousand records.

The problem with workflow graphs at scale

If you have ever built something in n8n, Zapier, or Make, the model is familiar. You draw a graph. Each node is a step. The node fires, passes output to the next node, and the graph walks forward. For one execution that is perfect. For ten thousand it is not.

Every workflow execution pays the full setup cost. Node by node. Retry by retry. Queue by queue. Running fifty thousand records through a five step pipeline is fifty thousand separate walks, and each walk has its own error surface. You end up babysitting runs, resuming from the middle, and paying per step regardless of whether the work is identical across records.

The table shape is different. The records are the iteration. The columns are the steps. The engine plans once, runs many, and the retry surface collapses to a single cell rather than a whole run. That is the core bet behind Hypertab.

Columns that do work

Every column in Hypertab has a kind field. Seven kinds exist today, including static fields.

static holds data. Text, number, date, anything.
http makes an HTTP call per record and extracts a field.
formula computes from other columns.
integration pushes a record to an external service.
waterfall tries sources in order and uses the first match.
lookup does a cross table VLOOKUP.
extract pulls a field from an upstream JSON result at zero external cost.

The last one matters more than it looks. Once you have an HTTP column that returns a JSON blob, an extract column pulls any field out of that blob without a second API call. Most tables end up with one or two HTTP columns and many extracts. That is how the op cost stays flat even when the schema is wide.

The dependency graph

Smart columns reference other columns inside their config. An HTTP URL can template https://api.example.com/companies/{{domain}}, an HTTP body can include other field values, and a formula can add two numbers. Every time you save a smart column, the engine parses every config field, pulls out the {{column_name}} tokens, and records them as edges.

The edges live in a system table called _ht_column_dependencies. One row per edge. Source column, target column, table id, created at. A small table that changes only when columns change, which is rare. We index by target column id because the common read is “what depends on this column”, used during cascade and validation.

Two dep sources feed the graph.

Implicit. Anything between {{ }} is an edge. You do not have to declare anything. This covers roughly 95% of real columns.
Explicit. config.depends_on: ["col_name"] lets a formula reference a bare identifier without double braces. Rare, but needed for some formula shapes.

Extract columns are a special case. They always depend on their source_column_id. The DAG builder wires that automatically so you cannot forget.

Cycle detection and the `DAG_CYCLE_DETECTED` error

The moment a graph lands, we topo sort. Kahn’s algorithm. In degree map, queue the zero degree nodes, pop and decrement, repeat. If the sort visits fewer nodes than the graph has, there is a cycle. We find the cycle path by walking back from the unvisited set and return it in the error message.

DAG_CYCLE_DETECTED: Column "score" depends on "rank" which depends on "score".
Cycle: score -> rank -> score.
Suggestion: break the cycle by removing one of the {{template}} references.

We reject the column save. You never ship a broken graph. This matters because a cycle in a smart column table is not just incorrect, it is infinite work.

Unknown references and fuzzy matching

If a column references {{emial}} and no column named emial exists, we do not silently drop it. We compute Levenshtein distance against every other column in the table. If the closest match is within 2 edits, we return

DAG_UNKNOWN_REF: Column "enriched_company" references {{emial}}, which does not exist.
Did you mean "email"? (distance 1)

This is the same pattern we use on the REST API. Errors tell you what was attempted, what went wrong, and what to do. The AI agent on the other end reads the suggestion and fixes the prompt without a human round trip.

Topological layers, not a single queue

Once the graph is valid we compute layers. Layer 0 is every node with no deps. Layer 1 is every node whose deps are in layer 0. And so on. Layers are the parallelism boundary. Nodes in the same layer can run at the same time. Nodes in later layers must wait.

For one record this is cheap. You walk the layers and run the columns. For ten thousand records it is also cheap, because the graph is shared. Every record uses the same layers. You do not recompute anything per record.

The engine caches the compiled graph in a Durable Object keyed by table id. When a column mutates we invalidate the cache, rebuild, and push the new version to any active record run. The record run picks up the new graph on the next layer boundary. Mid run changes are rare but safe.

Executing a record

The function that runs a single record lives in smart-columns/row-engine.ts and is called executeRowDAG. The implementation still uses legacy row names internally, but the product-level unit is a record. Signature is roughly

async function executeRowDAG(
  table: TableRef,
  row: Row,
  ctx: ProcessorContext,
  options: { maxParallelPerRow: number } = { maxParallelPerRow: 6 },
): Promise<RowRunResult>

Inside it walks the layers. For each layer it builds an array of smart column promises. Each promise fetches the column processor (one of http, formula, integration, waterfall, lookup, extract), calls it with the row and the accumulated computed map, writes the result back into computed, and updates the cell state in the database.

for (const layer of layers) {
  const chunks = chunk(layer, options.maxParallelPerRow)
  for (const group of chunks) {
    await Promise.all(group.map((col) => runOneColumn(col, row, ctx)))
  }
}

maxParallelPerRow defaults to 6 because Cloudflare Workers cap a single request at 6 concurrent outbound connections. Some work can be delegated to the configured job runtime, but every path remains bounded by the active plan, provider limits, and deployed runtime capacity; Hypertab does not publish an unlimited-concurrency claim.

The computed map threads downstream. If column A is an HTTP call that returns a JSON body, column B is an extract that pulls company.name from that body, and column C is a formula that combines the extracted name with an original row field, the formula sees both upstream results and the original record. One context object, every value resolved.

Upstream failure propagation

If column A errors, what happens to B and C that depend on A? They are marked skipped with a reason Upstream A did not complete. They do not run. They do not bill ops. They do not add noise to your error log.

This is the part most workflow graphs get wrong. A failed step in n8n often halts the whole run. A failed step in Hypertab halts the subgraph rooted at that step, for that record only, and every other independent column on that record continues. One record with a bad field does not poison the whole batch.

Cross-record parallelism

Records are independent. The engine can run N records at once, each record walking the graph in topological order. N is plan gated.

Current Free beta: up to 5 record DAGs in flight per Table, additionally bounded by provider and runtime controls.
Pro and higher-capacity concurrency are planning information only; those plans are not currently available.

The bucket is global per external account. If two Tables share one API account, their configured work shares the same rate-control state. The token bucket and adaptive backoff reduce request pressure after service throttling, but they do not guarantee that an external API limit will never be exceeded. Throughput depends on the service and deployed runtime.

Cascade on edit

When a human (or agent) edits one cell, we do not rerun the whole record. We find the transitive dependents of the edited column, and rerun only those. The function is cascadeFromEdit in smart-columns/cascade.ts.

Concretely:

Diff old vs new row, compute the set of changed column ids.
For each changed id, look up transitive dependents via _ht_column_dependencies.
Union the dependents. Topo sort. Run the subgraph for that one record.

A column can opt out with config.recompute_on_upstream_change: false. You would do that for an expensive one shot enrichment that should only fire once even if the upstream name is edited by hand. That is a policy choice. Most columns keep the default and rerun.

When the DAG fires

Four triggers exist.

Record insert. If any smart column on the table has auto_run: true, afterRowsInserted fires the full graph for every new record. This is the default for most pipelines. You drop records in via webhook or MCP, and they enrich themselves.
Cell edit. The cascade above.
Manual column run. hypertab_run_column enqueues a run for one smart column against specified records. Useful for retries.
Manual table run. hypertab_run_dag runs the whole graph over specified records, or every record if none are specified. Useful for full refreshes.

All four go through the same record engine. No separate code paths. No “retry logic” that works differently from “first run logic”. That bought us a lot of correctness.

Why we wrote it this way

We considered three alternatives before landing here.

Per column queues. Every smart column gets its own queue and workers. Simple. Fails on cross column data flow because downstream workers cannot see upstream output without a database round trip per step.
Per-record workflow engine. Think Temporal. Rich, but overkill for a stateless batch, and every record pays workflow overhead. We measured 100ms to 300ms per record in engine overhead alone. Unacceptable at 50k records.
Record DAG with shared graph compile. What we shipped. Compile once per table, walk once per record, cache across runs. Engine overhead is roughly 3ms per record. Most of the per-record latency is the external API call, which is the part you actually want to spend time on.

Where we are going

The next step is partial record execution. Today a DAG run touches every smart column layer. Once we ship partial runs you will be able to say “run just this column and its descendants for these records”, which closes the last gap where a workflow graph beats us: interactive exploration of a single pipeline stage.

After that, branch merges. If two HTTP columns can produce the same field, a merge node picks the first non null. Today you would model that as a waterfall column, which works but is heavy. A lightweight merge will make waterfalls optional for the simple case.

We will write up both when they ship. Until then, the code is source available on GitHub for anyone who wants to read the real thing.

If you have a pipeline that is too big for Zapier and too rigid for a queue, open Hypertab with a workspace key. The graph compiles the same way whether you have ten records or a hundred thousand.