Skip to content

Shards — S3 cluster

Shards is the name of Katafract’s object-storage cluster. It speaks the S3 API, runs Garage under the hood, and replicates every object across physically distinct zones.

NodeMesh IPS3 portZoneCapacity
fury100.64.0.43901us-central1.5 TB SSD
atlas100.64.0.323901us-vin14.8 TB HDD
hades100.64.0.303901ca-bhs14.8 TB HDD
  • Replication factor: 2 (every object exists on two distinct zones).
  • Total raw: ~31 TB. Usable at rf=2: ~15 TB.
  • Layout version: 12.

External consumers reach Shards over:

https://<bucket>.s3.objstore.io

Proxied through nginx on argus to the Garage nodes (least-connections load balancing). TLS is Cloudflare edge; origin is nginx on argus.

Internal (service-to-service) access uses the mesh IPs directly on port 3901.

S3 credentials are per-consumer and stored in Infisical under prod/objstore. Obtain a key by contacting the ops team — we do not currently expose a self-service key provisioning endpoint.

Example access:

Terminal window
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
aws s3 ls --endpoint-url https://s3.objstore.io
aws s3 cp file.bin s3://my-bucket/ --endpoint-url https://s3.objstore.io

Garage is S3-compatible but supports only a subset of the full S3 API. Notably absent:

  • Multi-part upload
  • S3 Select / Object Lambda
  • Cross-region replication (Garage does its own at the cluster layer)

Notably present:

  • List / Put / Get / Delete / Head / Copy
  • Presigned URLs
  • Bucket policies + CORS
  • Server-side encryption (disabled by default; we encrypt client-side in Vaultyx anyway)

For customers who want their data on hardware they operate (Founder tier, enterprise pilots), we run the “standalone cluster” pattern: a Garage deployment with its own rpc_secret, its own admin token, its own zones, and no connection to the shared Shards cluster. See project_tartarus_standalone_cluster.md for the operational notes — this is the pattern we want to productize as “Sovereign node” eventually.

  • One zone offline — writes continue at rf=2 as long as two other zones remain healthy. Reads continue from any surviving replica.
  • Two zones offline — writes stall (not enough replicas). Reads serve from the one remaining zone.
  • Primary DB (argus) offline — Vaultyx metadata lookups fail; object GETs that don’t need metadata keep working.
  • Admin API (port 3903) compromised — attacker could rewrite cluster layout. Mitigation: admin token held only on artemis; port 3903 not exposed beyond mesh.

Prometheus scrapes each Garage node’s metrics endpoint. Grafana dashboard at https://grafana.katafract.io/d/33ee85f3.../katafract-fleet includes cluster health, per-zone used-space, and replication lag.