Introduction

I was setting up a NextCloud instance on my Raspberry Pi 4, using k3s, and found out that there are quite some step-by-step guides on how to do that, none of them fully addressed all the issues I had, so I decided to write yet another guide on how to that. Mostly for myself, but maybe it will be useful for someone else. In particular, I faced the following issues:

  • MetalLB switching to CRDs instead of config maps.
  • Raspberry Pi not working well with MetalLB when using Wi-Fi.
  • Longhorn on a single node.
  • NextCloud HTTPS issue on Android client.
  • And also solving some warnings that NextCloud explicitly warns about.

Why Raspberry?

Because I have one and it is very energy efficient.

Why k3s?

Because it is very lightweight and easy to set up. And when you have multiple projects running on the same machine, managing them via Docker or Docker compose becomes less convenient.

I mostly followed these guides, except for some specific problems:

Outline

  1. Install dependencies
  2. Install k3s and set up access from a remote (client) machine
  3. Install and set up Helm on a client machine
  4. Install MetalLB
  5. Install Nginx ingress controller
  6. Install cert-manager
  7. Set up persistent storage with Longhorn
  8. Install NextCloud

1. Install dependencies

General

At minimum, you’ll need curl and ssh, which should be both already installed and set up on your Raspberry Pi. However, it is probably a good idea to update ca-certificates, install open-iscsi and nfs-common in case you decide to use an NFS for storage, and install wireguard if you are going to use a Wireguard as a flannel backend. This should not matter on a single-node, yet does not hurt to install them:

sudo apt update && sudo apt install -y \
  curl \
  ca-certificates \
  open-iscsi \
  wireguard \
  nfs-common

Debian/Ubuntu specific

Raspberry Pi OS is Ubuntu-based, so it makes sense to follow the recommended steps for Ubuntu/Debian distributions, which boils down to updating the default firewall rules:

ufw allow 6443/tcp # apiserver
ufw allow from 10.42.0.0/16 to any # pods
ufw allow from 10.43.0.0/16 to any # services

Raspberry Pi specific

Append the cgroups as suggested here. But unlike mentioned there, they were in /boot/firmware/cmdline.txt not /boot/cmdline.txt. Also vim did not work, so I used nano.

Thus, add

cgroup_memory=1 cgroup_enable=memory

to /boot/firmware/cmdline.txt.

The resulting file can look something like this:

console=serial0,115200 dwc_otg.lpm_enable=0 console=tty1 root=LABEL=writable rootfstype=ext4 rootwait fixrtc quiet splash cgroup_memory=1 cgroup_enable=memory

There is also a Pi specific dependency for VXLAN (although it is probably better to use Wireguard):

sudo apt install linux-modules-extra-raspi

2. Install k3s and set up remote access

Install k3s

Install k3s as suggested here, but with some options:

curl -sfL https://get.k3s.io | sh -s  - \
  server \
  --cluster-init \
  --disable servicelb \
  --disable traefik \
  --flannel-backend wireguard-native \
  --write-kubeconfig-mode 644 \
  --tls-san [public IP address or hostname]

The command above installs and sets up a k3s server, but

  • Disables ServiceLB because we will use MetalLB instead.
  • Disables Traefik because we will use Nginx ingress controller instead.
  • Sets up Wireguard as a flannel backend. This is should not really matter as long you are using a single node cluster.
  • Sets /etc/rancher/k3s/k3s.yaml to 644 mode (-rw-r--r--).

Check that k3s server is up and running:

systemctl status k3s

Check that nodes and pods are running, and verify that there is no Traefik or ServiceLB:

kubectl get nodes -A -o wide
kubectl get pods -A -o wide

Set up remote access

Copy the /etc/rancher/k3s/k3s.yaml file to your client machine

scp raspberry.example.com:/etc/rancher/k3s/k3s.yaml ~/.kube/config

where raspberry.example.com is the public IP address or the hostname of your Raspberry.

Then change the clusters[0].server (there should be just one cluster at this point) from https://127.0.0.1:6443 to https://raspberry.example.com:6443. If you are trying to access it outside of your network, make sure you have set up the port forwarding to port 6443 on the Raspberry Pi.

Alternatively, you can copy k3s.yaml to any other location, but then you’ll need to set the KUBECONFIG environment variable to point to that file, or explicitly use the --kubeconfig parameter with kubectl.

Check that you can access the cluster:

kubectl get nodes -A -o wide

3. Install Helm

Install it via an official installation script:

curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

or using a package manager, e.g.,

brew install helm

or

sudo pacman -S helm

or any other suitable package manager.

4. Install MetalLB

Here, I am mostly following Greg’s tutorial, but some things have changed since it was written. It particular:

Add and update MetalLB repo:

helm repo add metallb https://metallb.github.io/metallb
helm repo update

Install MetalLB:

helm install metallb metallb/metallb --namespace kube-system

Wait until all the pods are initialized and running:

kubectl get pods -A -o wide

Then create config.yaml, which specifies the address pool and L2 advertisement. Address pool should be:

  • inside your subnet,
  • but outside of the DHCP server address pool, otherwise there might be conflicts.

For example, if your subnet is 192.168.97.0/24 and your DHCP server address pool is 192.168.97.200-192.168.97.250, you could use a pool like 192.168.97.10-192.168.97.50.

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: default
  namespace: kube-system
spec:
  addresses:
  - 192.168.97.10-192.168.97.50
status: {}
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: default
  namespace: kube-system
spec:
  ipAddressPools:
  - default
status: {}

Apply these configs:

kubectl apply -f config.yaml

Check that everything is running:

kubectl get pods -n kube-system -o wide

Fix MetalLB on Raspberry Pi when using Wi-Fi

Most tutorials are setting up MetalLB in ARP mode. It seems that most of them are also using a wired connection, because they never mention the issue with unstable connections.

In summary, the issue is that when when everything is set up, you can always access your applications from Raspberry (using local IP), often from another machine in the same network (using local IP), and sometimes from outside of the network using public IP. To make matters worse, when trying to debug it with tcpdump, the issue disappears, making an impression that there are some quantum processes involved, which behave differently when observed. In reality, of course, everything is simpler: when using tcpdump or other package sniffers, you put the network interface in promiscuous mode. And the solution to this issue is to put the network interface in promiscuous mode all the time.

Here are some links discussing the issue and offering potential solutions:

  • Reddit thread discussing the issue.
  • MetalLB issue on GitHub: layer2 mode doesn’t receive broadcast packets on VM unless promiscuous mode is enabled.
  • MetalLB issue on GitHub: L2 mode multiple replicas is not working with the bridge until promiscuous mode enabled.
  • Raspberry Pi OS issue on GitHub: Wifi interface replies on arp requests only in promiscuous mode.
  • MetalLB issue on GitHub: Layer 2 mode on RPi.

My solution was to put the network interface in promiscuous, as suggested in some of the links above:

ip link set wlan0 promisc on

However, this does not persist across boots, so I have also added a crontab entry:

@reboot ip link set wlan0 promisc on

5. Install Nginx ingress controller

Following these steps, add and update the repo:

helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

Create values.yaml overriding some default values:

defaultBackend:
  enabled: false
controller:
  allowSnippetAnnotations: true
  service:
    loadBalancerIP: 192.168.97.10

The config above:

  • Disables default backed,
  • Allows snippet annotations, to be able to specify nginx.ingress.kubernetes.io/server-snippet annotations in ingress configs,
  • Sets the IP address of the controller for easier port-forwarding on the router (this is, of course, optional, but makes things easier).

Note that loadBalancerIP should be from the available pool defined in MetalLB. You can skip it and get a random allocation, but then port forwarding on the router should be changed to account for an actual IP. E.g., if MetalLB uses a pool 192.168.97.10-192.168.97.50, then we can use 192.168.97.10 as a loadBalancerIP.

Then install using a Helm chart with overridden values:

helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace kube-system \
  -f values.yaml

Check that everything was deployed

kubectl get pods -n kube-system -o wide
kubectl get services  -n kube-system -o wide

Check that it’s working by running curl on the public IP of the following command:

kubectl get services -n kube-system -o wide -w ingress-nginx-controller

E.g., in this case this should be 192.168.97.10:

curl 192.168.97.10

It should return an Nginx with a 404 error page.

It makes sense, at this stage, to set up port forwarding on your router, pointing to ports 80 and 443 of your ingress controller (i.e, not Raspberry Pi IP, but the one we have just used).

6. Install cert-manager

Add the repo

helm repo add jetstack https://charts.jetstack.io
helm repo update

Install

helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.13.1 \
  --set installCRDs=true

Check that pods are running

kubectl get pods --namespace cert-manager

Create a cluster issuer following these steps. It makes sense to create two cluster issuers: staging and prod, because Let’s Encrypt have different rate limits for staging and prod services. config.yaml:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    email: [email protected] # change this
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-staging
    solvers:
    - http01:
        ingress:
          class: nginx
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: [email protected] # change this
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

Apply this config:

kubectl apply -f config.yaml

Check that it worked:

kubectl describe clusterissuer letsencrypt-staging -n cert-manager
kubectl describe clusterissuer letsencrypt-prod -n cert-manager

7. Set up persistent storage with Longhorn

Using Longhorn is definitely an overkill for a single node cluster, but although it creates a level of complexity, it also removes some other pain points, so we’ll set it up. Alternatively, one can use persistent volumes and persistent volume claims directly.

Follow these steps to install it using Helm. First, add the repo

helm repo add longhorn https://charts.longhorn.io
helm repo update

Then change the values in values.yaml:

defaultSettings:
  defaultReplicaCount: 1
  defaultDataPath: [path on the host machine]
  replicaSoftAntiAffinity: false
  replicaDiskSoftAntiAffinity: false
ingress:
   annotations:
     cert-manager.io/cluster-issuer: "letsencrypt-staging" # maybe, change later
   host: longhorn.nextcloud.example.com # change this
   ingressClassName: nginx
   tls: true
   tlsSecret: longhorn-tls
longhornUI:
  replicas: 1
persistence:
  defaultClassReplicaCount: 1
csi:
  attacherReplicaCount: 1
  provisionerReplicaCount: 1
  resizerReplicaCount: 1
  snapshotterReplicaCount: 1

The config above does the following things:

  • Sets all replicas to 1, because we are using a single node cluster with a single hard drive.
  • Turns off anti-affinity. Anti-affinity tries to place replicas on different nodes or drives, but since we have only one node, there is no point in this.
  • Specifies the path on the host machine (our Raspberry Pi).
  • Sets up ingress and TLS using the staging issuer: we’ll switch to prod issuers, once the whole set up is working.

Then install

helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --version 1.5.1 \
  -f values.yaml

and check that it’s working

kubectl get pods -n longhorn-system
kubectl get services -n longhorn-system -o wide

You can also check out GUI, by creating a port forwarding

kubectl port-forward service/longhorn-frontend 8081:80 -n longhorn-system

and accessing it on localhost:8081.

Sometimes Longhorn volumes cannot be mounted into pods, with error message saying that a Longhorn device “is apparently in use by the system; will not make a filesystem here!”

This seems to be caused by Multipath and the suggested solution is to blacklist /dev/sd* devices from it. Create /etc/multipath.conf if it does not exist already, and blacklist /dev/sd* devices:

blacklist {
  devnode "^sd[a-z0-9]+"
}

Then restart the service using

systemctl restart multipathd.service

8. Install NextCloud

Mostly following k3s.rocks here.

First, add the repo

helm repo add nextcloud https://nextcloud.github.io/helm/
helm repo update

And edit values.yaml (run helm show values nextcloud/nextcloud > values.yaml for the default values.):

replicaCount: 1

ingress:
  enabled: true
  className: nginx
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: 4G
    kubernetes.io/tls-acme: "true"
    cert-manager.io/cluster-issuer: letsencrypt-staging # change later
    nginx.ingress.kubernetes.io/server-snippet: |-
      server_tokens off;
      proxy_hide_header X-Powered-By;
      rewrite ^/.well-known/webfinger /index.php/.well-known/webfinger last;
      rewrite ^/.well-known/nodeinfo /index.php/.well-known/nodeinfo last;
      rewrite ^/.well-known/host-meta /public.php?service=host-meta last;
      rewrite ^/.well-known/host-meta.json /public.php?service=host-meta-json;
      location = /.well-known/carddav {
        return 301 $scheme://$host/remote.php/dav;
      }
      location = /.well-known/caldav {
        return 301 $scheme://$host/remote.php/dav;
      }
      location = /robots.txt {
        allow all;
        log_not_found off;
        access_log off;
      }
      location ~ ^/(?:build|tests|config|lib|3rdparty|templates|data)/ {
        deny all;
      }
      location ~ ^/(?:autotest|occ|issue|indie|db_|console) {
        deny all;
      }
  tls:
    - secretName: nextcloud-tls
      hosts:
        - nextcloud.example.com # change this
  labels: {}
  path: /
  pathType: Prefix

phpClientHttpsFix:
  enabled: true
  protocol: https

nextcloud:
  host: nextcloud.example.com # change this
  username: admin # change this
  password: changeme # change this
  configs:
    proxy.config.php: |-
      <?php
      $CONFIG = array (
        'trusted_proxies' => array(
          0 => '127.0.0.1',
          1 => '10.0.0.0/8',
        ),
        'forwarded_for_headers' => array('HTTP_X_FORWARDED_FOR'),
      );

persistence:
  enabled: true
  storageClass: "longhorn"
  accessMode: ReadWriteOnce
  size: 10Gi

  nextcloudData:
    enabled: true
    storageClass: "longhorn"
    accessMode: ReadWriteOnce
    size: 50Gi

The config above does the following things:

  • Sets the replica count to 1.
  • Sets up ingress with TLS using the staging issuer.
  • Fixes the HTTPS issue (I notices only when trying to set up connection on an Android device).
  • Fixes the untrusted proxy warning.
  • Creates a default user with a password.
  • Sets up persistent storage claims using Longhorn: a disk for NextCloud, and a disk for data.

Then, create a namespace and install:

kubectl create namespace nextcloud
helm install nextcloud nextcloud/nextcloud \
  --namespace nextcloud \
  --values values.yaml

Check that pods are running:

kubectl get pods -n nextcloud
kubectl get services -n nextcloud -o wide

And check that certificates were issued (this might take a few minutes):

kubectl get certificaterequest -n nextcloud -o wide
kubectl get certificate -n nextcloud -o wide

If certificates were issued succesfully and you can access NextCloud on nextcloud.example.com (assuming you set all the port forwarding on your NAT correctly), you can switch to prod issuers by changing the following, and then run:

helm upgrade nextcloud nextcloud/nextcloud \
  --namespace nextcloud \
  --values values.yaml

You can also do the same for Longhorn UI, if you are planning to access it remotely.

Enjoy

It should be up and running.