Setting up NucliaDB in the Cloud

This document focuses on setting up NucliaDB to work on the following cloud providers:

However, for all cloud providers, you at a minimum need a PostgreSQL database and a VM with attached persistent disk in order to run NucliaDB.

AWS

Requirements:

RDS(or Aurora)
- PostgreSQL version 12+
S3 Bucket Creation Accesss
- Need access to create new S3 Buckets
- Access key and id

NucliaDB Environment Variable Configuration:

DATA_PATH=/mnt/data: Path to mounted persistent disk to store indexes on
DRIVER=pg: Configure NucliaDB with PostgreSQL for metadata storage
DRIVER_PG_URL=postgresql://postgres:password@HOSTNAME:5432/postgres: PostgreSQL connection string
FILE_BACKEND=s3: Configure NucliaDB with S3 blob storage
S3_CLIENT_ID=<AWS_CRED_CLIENT_ID>: S3 Client Id
S3_CLIENT_SECRET=<AWS_CRED_CLIENT_SECRET>: S3 Client Secret
S3_REGION_NAME=<AWS_REGION>: S3 Region
NUA_API_KEY=<API_KEY_VALUE>: Agentic RAG Understanding API Key. This is the authentication key for using Agentic RAG´s processing engine. Check out this page to know how to obtain one.
CORS_ORIGINS=["http://localhost:8080"]: CORS configuration for your service

GCP

Requirements:

SQL(PostgreSQL)
- PostgreSQL version 12+
- We recommend either using the managed PostgreSQL from GCP or installing it via Helm with this helm chart.
GCS(Google cloud storage) Bucket Creation Accesss
- NucliaDB needs access to create new GCS Buckets: each KnowledgeBox will have its own bucket where the binaries of the pushed data will be stored.
- Service credential file: is needed to grant NucliaDB access to the GCS service. It needs to be base-64 encoded and configured in GCS_BASE64_CREDS.
Storage class
- If you install NucliaDB in Kubernetes with the helm chart, you will need to create a storage class on your GCP project and reference it on the values.yaml file of the chart. Check out GCP documentation on how to create a storage class.

NucliaDB Environment Variable Configuration:

DATA_PATH=/mnt/data: Path to mounted persistent disk to store indexes on
DRIVER=pg: Configure NucliaDB with PostgreSQL for metadata storage
DRIVER_PG_URL=postgresql://postgres:password@HOSTNAME:5432/postgres: PostgreSQL connection string. Make sure your password does not contain url-invalid characters. To check if it is valid you can run python -c 'from urllib.parse import urlparse; urlparse("<YOUR-DSN-HERE>")'
FILE_BACKEND=gcs: Configure NucliaDB with GCS blob storage
GCS_PROJECT=<PROJECT ID>: Project ID for your Google Cloud Account
GCS_LOCATION=<GOOGLE CLOUD REGION>: Google cloud region
GCS_BUCKET=nucliadb_{kbid}: GCS bucket naming format
GCS_BASE64_CREDS=<B64_ENCODED_CREDS>: Base-64 encoded GCS credentials.
NUA_API_KEY=<API_KEY_VALUE>: Nuclia Understanding API Key. This is the authentication key for using Agentic RAG´s processing engine. Check out this page to know how to obtain one.
CORS_ORIGINS=["http://localhost:8080"]: CORS configuration for your service

Check out the tutorial on how to install NucliaDB on GCP for a more in-depth walkthrough of the process.

Cluster Support

If you are manually setting up multiple nodes, you will need to configure them to be able to speak to each other:

CLUSTER_DISCOVERY_MODE=manual: Manual specify the addresses of nodes in the cluster
CLUSTER_DISCOVERY_MANUAL_ADDRESSES: JSON compatible value for list of node addresses in the cluster

If you are installing NucliaDB via Kubernetes, make sure that the values.yaml file has the following values set:

replicas: 2

env:
  cluster_discovery_mode: kubernetes
  cluster_discovery_kubernetes_namespace: nucliadb
  cluster_discovery_kubernetes_selector: "app.kubernetes.io/name=node"

so that each node is able to automatically join the cluster.

Kubernetes

The easiest way to install and manage NucliaDB is to utilize Kubernetes.

Each cloud provider has their own managed Kubernetes implementation. This document will not cover any cloud specific Kubernetes details, but it is recommended to use kubernetes for your NucliaDB On-prem install if it is available to you.

AWS​

GCP​

Cluster Support​

Kubernetes​

AWS

GCP

Cluster Support

Kubernetes