Setting up NucliaDB in the Cloud
This document focuses on setting up NucliaDB to work on the following cloud providers:
- AWS
- GCP
However, for all cloud providers, you at a minimum need a PostgreSQL database and a VM with attached persistent disk in order to run NucliaDB.
AWS
Requirements:
- RDS(or Aurora)
- PostgreSQL version 12+
- S3 Bucket Creation Accesss
- Need access to create new S3 Buckets
- Access key and id
NucliaDB Environment Variable Configuration:
DATA_PATH=/mnt/data: Path to mounted persistent disk to store indexes onDRIVER=pg: Configure NucliaDB with PostgreSQL for metadata storageDRIVER_PG_URL=postgresql://postgres:password@HOSTNAME:5432/postgres: PostgreSQL connection stringFILE_BACKEND=s3: Configure NucliaDB with S3 blob storageS3_CLIENT_ID=<AWS_CRED_CLIENT_ID>: S3 Client IdS3_CLIENT_SECRET=<AWS_CRED_CLIENT_SECRET>: S3 Client SecretS3_REGION_NAME=<AWS_REGION>: S3 RegionNUA_API_KEY=<API_KEY_VALUE>: Agentic RAG Understanding API Key. This is the authentication key for using Agentic RAG´s processing engine. Check out this page to know how to obtain one.CORS_ORIGINS=["http://localhost:8080"]: CORS configuration for your service
GCP
Requirements:
-
SQL(PostgreSQL)
- PostgreSQL version 12+
- We recommend either using the managed PostgreSQL from GCP or installing it via Helm with this helm chart.
-
GCS(Google cloud storage) Bucket Creation Accesss
- NucliaDB needs access to create new GCS Buckets: each KnowledgeBox will have its own bucket where the binaries of the pushed data will be stored.
- Service credential file: is needed to grant NucliaDB access to the GCS service. It needs to be base-64 encoded and configured in
GCS_BASE64_CREDS.
-
Storage class
- If you install NucliaDB in Kubernetes with the helm chart, you will need to create a storage class on your GCP project and reference it on the
values.yamlfile of the chart. Check out GCP documentation on how to create a storage class.
- If you install NucliaDB in Kubernetes with the helm chart, you will need to create a storage class on your GCP project and reference it on the
NucliaDB Environment Variable Configuration:
DATA_PATH=/mnt/data: Path to mounted persistent disk to store indexes onDRIVER=pg: Configure NucliaDB with PostgreSQL for metadata storageDRIVER_PG_URL=postgresql://postgres:password@HOSTNAME:5432/postgres: PostgreSQL connection string. Make sure your password does not contain url-invalid characters. To check if it is valid you can runpython -c 'from urllib.parse import urlparse; urlparse("<YOUR-DSN-HERE>")'FILE_BACKEND=gcs: Configure NucliaDB with GCS blob storageGCS_PROJECT=<PROJECT ID>: Project ID for your Google Cloud AccountGCS_LOCATION=<GOOGLE CLOUD REGION>: Google cloud regionGCS_BUCKET=nucliadb_{kbid}: GCS bucket naming formatGCS_BASE64_CREDS=<B64_ENCODED_CREDS>: Base-64 encoded GCS credentials.NUA_API_KEY=<API_KEY_VALUE>: Nuclia Understanding API Key. This is the authentication key for using Agentic RAG´s processing engine. Check out this page to know how to obtain one.CORS_ORIGINS=["http://localhost:8080"]: CORS configuration for your service
Check out the tutorial on how to install NucliaDB on GCP for a more in-depth walkthrough of the process.
Cluster Support
If you are manually setting up multiple nodes, you will need to configure them to be able to speak to each other:
CLUSTER_DISCOVERY_MODE=manual: Manual specify the addresses of nodes in the clusterCLUSTER_DISCOVERY_MANUAL_ADDRESSES: JSON compatible value for list of node addresses in the cluster
If you are installing NucliaDB via Kubernetes, make sure that the values.yaml file has the following values set:
replicas: 2
env:
cluster_discovery_mode: kubernetes
cluster_discovery_kubernetes_namespace: nucliadb
cluster_discovery_kubernetes_selector: "app.kubernetes.io/name=node"
so that each node is able to automatically join the cluster.