Custom resources

SaunaFS Operator introduces custom resources: SaunafsCluster, SaunafsMetadataVolume, SaunafsChunkVolume and SaunafsExport.

All these resources are namespaced. Single SaunaFS cluster can consist only of SaunafsCluster, SaunafsMetadataVolumes, SaunafsChunkVolumes and persistent volume claims from the same namespace.

DANGER

SaunaFS Operator doesn't verify if the persistent volume claims are empty before use nor empties them when their Saunafs...Volumes are removed. Be careful when adding new volumes, especially for metadata - you might lose your data if the volume you introduce already has metadata from another cluster.

SaunafsCluster

This resource represents a single SaunaFS Cluster - you can deploy more than one SaunaFS cluster in a single Kubernetes cluster.

Example manifest:

apiVersion: saunafs.sarkan.io/v1beta1
kind: SaunafsCluster
metadata:
  name: example-cluster
  namespace: saunafs-operator
spec:
  # Decides whether the cluster should be exposed externally (i.e. outside Kubernetes cluster).
  #
  #  If true then
  #  - metadata servers will be exposed using LoadBalancer service,
  #  - chunkservers will be exposed using host ports of Kubernetes nodes.
  exposeExternally: true

  # List of replication goals possible for this cluster. Up to 40 goals can be configured.
  replicationGoal:
  - name: "1"
    replication: "_" # One copy for each file or directory chunk across chunk servers.
  - name: "2"
    replication: "_ _" # Two copies anywhere
  - name: "3"
    replication: "_ _ _"
  - name: "4"
    replication: "_ _ _ _"
  - name: "5"
    replication: "_ _ _ _ _"
  - name: ec21
    replication: $ec(2,1) # Erasure coding with 2 data and 1 parity on all servers.
  - name: ec31
    replication: $ec(3,1)
  - name: ec32
    replication: $ec(3,2)

  # Optional PVC selectors for metadata and chunk storage. When present operator will
  # watch for PVCs with that labels and create a corresponding objects automatically.
  pvcSelectors:
    # Create SaunafsMetadataVolume object automatically for PVC with that label.
    # Required if PV selector for metadata storage is specified.
    metadataStorage: example-cluster=metadata

    # Create SaunafsChunkVolume object automatically for PVC with that label.
    # Required if PV selector for chunk storage is specified.
    chunkStorage: example-cluster=chunks
  
  # Optional PV selectors for metadata and chunk storage. When present operator will
  # watch for PVs with that labels and create a corresponding PVCs automatically. 
  pvSelectors:
    # The created PVCs will be automatically assigned PVC metadata storage label.
    metadataStorage: example-cluster=metadata

    # The created PVCs will be automatically assigned PVC chunk storage label.
    chunkStorage: example-cluster=chunks

  # Optional container image overrides.
  images:
    metadataServer: ""
    chunkServer: ""
    elector: ""

  # Optional desired minimum count of chunkservers in a single cluster.
  desiredMinimumChunkserverCount: 5

  # Optional resource requests and limits for SaunaFS components.
  # If not specified, containers will be created without CPU/memory constraints.
  #
  # The resource settings below are designed for ~1 million files workloads
  # and provide enough CPU and memory headroom to handle moderate traffic.
  # For large-scale deployments, consider increasing
  # master memory limits accordingly.
  #
  # These settings should be adjusted based on the target environment,
  # file count, and expected workload characteristics.
  #
  # Estimated memory consumption for metadata is approximately 500 bytes per stored file.
  # This translates to:
  # - 1 million files ~= 500 MiB
  # - 100 million files ~= 50 GiB
  # - 1 billion files ~= 500 GiB
  resources:
    metadataServer:
      requests:
        cpu: "250m"
        memory: "512Mi"
      limits:
        cpu: "1"
        memory: "2Gi"
    chunkserver:
      requests:
        cpu: "250m"
        memory: "512Mi"
      limits:
        cpu: "2"
        memory: "4Gi"

  # SaunaFS metadata server configurable options.
  metadataConfiguration:
    # Limit glibc malloc arenas to a specific value to reduce virtual memory usage (Linux only). (default is 0)
    limitGlibcMallocArenas: 4

    # Prioritize the local chunkserver for local chunk data when available. (default is `true`)
    preferLocalChunkserver: true

    # Keep a specified number of previous metadata files. (default is 1)
    backMetaKeepPrevious: 1

    # Enable automatic recovery of metadata after crashes. (default is `false`)
    autoRecovery: true

    # Set the initial delay in seconds for starting chunk operations. (default is 300)
    operationsDelayInit: 300

    # Set the delay in seconds after chunkserver disconnection for chunk operations. (default is 3600)
    operationsDelayDisconnect: 300

    # Limit the chunks loop to check no more chunks per second than specified. (default is 100000)
    chunksLoopMaxCps: 100000

    # Ensure the chunks loop checks all chunks within a specified time. (default is 300)
    chunksLoopMinTime: 300

    # Set a hard limit on CPU usage for the chunks loop. (percentage value, default is 60%)
    chunksLoopMaxCPU: 60

    # Define a soft maximum number of chunks to delete on one chunkserver. (default is 10)
    chunksSoftDelLimit: 10

    # Define a hard maximum number of chunks to delete on one chunkserver. (default is 25)
    chunksHardDelLimit: 25

    # Set the maximum number of chunks to replicate to one chunkserver. (default is 2)
    chunksWriteRepLimit: 2

    # Set the maximum number of chunks to replicate from one chunkserver. (default is 10)
    chunksReadRepLimit: 10

    # Set the percentage of endangered chunks to replicate with high priority. (percentage value, default is 0%)
    endangeredChunksPriority: 0

    # Define the maximum capacity of the endangered chunks queue. (default is "1Mi")
    endangeredChunksMaxCapacity: "1Mi"

    # Set the allowable disk usage difference before triggering rebalancing. (percentage value, default is 10%)
    acceptableDifference: 10

    # Allow chunk movement between servers with different labels for balancing. (default is `false`)
    chunksRebalancingBetweenLabels: false

    # Reject clients older than version 1.6.0. (default is `false`)
    rejectOldClients: true

    # Specify the period for bandwidth allocation renegotiation. (in milliseconds, default is 100ms)
    globalIoLimitsRenegotiationPeriodMs: 100

    # Allow data flow after inactivity without waiting, up to the specified milliseconds. (default is 10ms)
    globalIoLimitsAccumulateMs: 250

    # Set the frequency for sending metadata checksums to backups. (default is every 50 metadata updates)
    metadataChecksumInterval: 50

    # Define the speed for recalculating metadata checksums in the background. (default is 100 objects per function call)
    metadataChecksumRecalculationSpeed: 100

    # Disable checksum verification while applying the changelog. (default is `false`)
    disableMetadataChecksumVerification: false

    # Prevent inode access time updates on each access. (default is `true`)
    noAtime: true

    # Set the minimum time between metadata dump requests from shadow masters. (in seconds, default is 1800)
    metadataSaveRequestMinPeriod: 1800

    # Retain client session data on the master server for the specified time. Values between 60 and 604800 (one week) are accepted. (in seconds, default is 86400s, i.e., 24 hours)
    sessionSustainTime: 86400

    # Avoid selecting chunkservers with the same IP. (default is `false`)
    avoidSameIpChunkservers: true

    # Specify the redundancy level for the minimum chunk part loss before endangerment. (default is 0)
    redundancyLevel: 0

    # Define the number of snapshotted nodes to clone before batch execution. (default is 1000)
    snapshotInitialBatchSize: 1000

    # Set the maximum batch size for snapshot requests. (default is 10000)
    snapshotInitialBatchSizeLimit: 10000

    # Ensure the test files loop checks all files within a specified time. (in seconds, default is 3600s, i.e., 1 hour)
    fileTestLoopMinTime: 3600

    # Set the delay before attempting reconnection to the metadata server. (in seconds, default is 1s)
    masterReconnectionDelay: 1

    # Define the timeout for metadata server connections. (in seconds, default is 60s)
    masterTimeout: 10

    # Add a disk usage load penalty to reduce frequent heavy chunkserver selections. Values between 0% and 50% are accepted. (percentage value, default is 0%)
    loadFactorPenalty: 0

    # Prioritize data parts to chunkservers with more space, clustering parities on imbalance. (default is `true`)
    prioritizeDataParts: true

    # Set the maximum polling wait time in milliseconds for events, balancing latency and CPU usage. (in milliseconds, default is 50)
    pollTimeoutMs: 50

    # Whether to perform mlockall() to avoid swapping out sfsmaster process (default is `false`)
    lockMemory: false

    # Interval for periodically cleaning of reserved files, in milliseconds (default is 0, i.e. the reserved files deletion is disabled).
    emptyReservedFilesPeriodMs: 0

    # Set the valid log levels: "trace", "debug", "info", "warning", "error", "critical", "off". (default is "trace")
    logLevel: "trace"

  # SaunaFS chunkserver configurable options.
  chunksConfiguration:
    # Call fsync() after a chunk is modified. (default is `true`)
    performFsync: true

    # Set the number of threads that handle connections with clients. (default is 4)
    nrOfNetworkWorkers: 4

    # Set the number of threads that the connection to the master may use to process operations on chunks. (default is 10, minimum is 2)
    masterNrOfWorkers: 10

    # Determine whether to remove each chunk from the page cache when closing it. (default is `true`)
    hddAdviseNoCache: true

    # Set the valid log levels: 'trace', 'debug', 'info', 'warning', 'error', 'critical', 'off'. (default is "trace")
    logLevel: "trace"

    # Limit glibc malloc arenas to a specific value to reduce virtual memory usage (Linux only). (default is 0)
    limitGlibcMallocArenas: 4

    # Whether to perform mlockall() to avoid swapping out sfschunkserver process. (default is `false`)
    lockMemory: false

    # Set the free space threshold to mark a volume as 100% utilized. (default is "4GiB")
    hddLeaveSpaceDefault: "4GiB"

    # Enable CRC checking when reading data from disk. (default is `true`)
    hddCheckCrcWhenReading: true

    # Enable CRC checking when writing data to disk. (default is `true`)
    hddCheckCrcWhenWriting: true

    # Enable chunkserver to detect zero values in chunk data and free corresponding file blocks. (default is `false`)
    hddPunchHoles: false

    # Enable chunkserver to send periodic reports of its I/O load to the master. (default is `false`)
    enableLoadFactor: false

    # Set the number of threads that each network worker may use for disk operations. (default is 4)
    nrOfHddWorkersPerNetworkWorker: 4

    # Set the maximum number of jobs that each network worker may use for disk operations. (default is 1000)
    bgJobsCntPerNetworkWorker: 1000

    # Verify that chunk metadata and data parts exist during a disk scan. (default is `true`)
    statChunksAtDiskScan: true

    # Maximum amount of time in milliseconds that the polling operation will wait for events. Smaller values could reduce latency at the cost of CPU usage. (default is 50)
    pollTimeoutMs: 50

SaunafsMetadataVolume

This resource represents a single PVC for metadata storage. Instance of this resource belongs to a SaunafsCluster instance.

Example manifest:

apiVersion: saunafs.sarkan.io/v1beta1
kind: SaunafsMetadataVolume
metadata:
  name: example-metadata-volume
  namespace: saunafs-operator
spec:
  # Name of the SaunaFS cluster this metadata volume belongs to.
  clusterName: example-cluster

  # Name of the persistent volume claim to use.
  persistentVolumeClaimName: pvc-1

TIP

SaunaFS Metadata Volume must be in the same namespace as SaunaFS Cluster it belongs to.

SaunafsChunkVolume

This resource represents a single PVC for chunk storage. Instance of this resource belongs to a SaunafsCluster instance.

Example manifest:

apiVersion: saunafs.sarkan.io/v1beta1
kind: SaunafsChunkVolume
metadata:
  name: example-chunks-volume
  namespace: saunafs-operator
spec:
  # Name of the SaunaFS cluster this metadata volume belongs to.
  clusterName: example-cluster

  # Name of the persistent volume claim to use.
  persistentVolumeClaimName: pvc-2

TIP

SaunaFS Metadata Volume must be in the same namespace as SaunaFS Cluster it belongs to.

SaunafsExport

This resource represents a SaunaFS export. Instance of this resource belongs to a SaunafsCluster instance. It serves as access control for sfsmounts.

Example manifest:

apiVersion: saunafs.sarkan.io/v1beta1
kind: SaunafsExport
metadata:
  name: example-saunafs-export
  namespace: saunafs-operator
spec:
  # Name of the SaunaFS cluster this export belongs to.
  clusterName: saunafs-cluster

  # Path to be exported relative to your SaunaFS root.
  path: "/my-password-protected-export"

  # Comma separated list of export options, refer to SaunaFS documentation for list of possible options. defaults to 'readonly'.
  options: "rw"

  # Kubernetes secret with password that should protect the export. Secret must contain field with key 'saunafs-export-password'. Secret must be in the same namespace as SaunaFS Cluster.
  exportSecretName: saunafs-export