Ceph S3 Backups Part 4: Kopia Integration and Tuning

Too Long; Didn't Read

In the last part, we explored how to configure the S3 interface and set up authentication in Ceph. With credentials in hand, it’s time to connect our legacy backup servers. We are going with Kopia for this!

This is where most administrators hit their first real-world challenge. The storage is ready, the S3 endpoint is up, but our backup tools may be older than the idea of object storage itself. Bacula, Amanda, Duplicity, or even rsync-based scripts were not designed with S3 in mind. Yet, replacing them overnight isn’t realistic. That’s why bridging these tools to Ceph S3 is so important.

If you missed the earlier parts, check out:

Ceph RGW Setup Guide

Getting servers Ready for Ceph backups

Setting up credentials for Ceph

Why Old Servers Make Perfect Ceph Candidates

Before diving into the deployment, let’s talk about why your aging hardware is actually ideal for this project.

Ceph is designed to handle hardware failures gracefully. Those older servers that might fail more frequently? Ceph expects that and plans for it. Plus, you’ll learn more about distributed systems when you occasionally need to handle node failures.

Think of it this way: Instead of buying expensive new hardware to learn Ceph, you’re getting hands-on experience with real-world scenarios where hardware isn’t perfect.

Step 1: Connect Kopia to Ceph S3

Now that Ceph S3 is ready with authentication, we’ll use Kopia as our backup client. Kopia is a modern, open source backup solution that supports snapshots, encryption, deduplication, and most importantly for us, direct S3 storage backends. Unlike many legacy tools, Kopia can talk to custom S3 endpoints without extra proxies.

Other possible candidates are :Bacula requires editing its configuration files to set the S3 endpoint.

Duplicity accepts an environment variable like AWS_ENDPOINT_URL.

but for the ease, clarity and direction with the deployment, we will guide you through the Kopia setup.

Kopia is a lightweight, fast backup tool that loves deduplication, compression, and encryption. Ceph is a popular distributed object store that can scale horizontally with your storage needs. Together, they handle snapshots, backup routines, and restore operations; whether you’re protecting office documents or container volumes

Create the Repository on Ceph S3

On your backup server, install Kopia (binaries are available for Linux). Lets start with our case, we used ubuntu so our installation was with the following steps:

# Add Kopia APT repository
curl -s https://kopia.io/signing-key | sudo gpg --dearmor -o /usr/share/keyrings/kopia-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/kopia-keyring.gpg] https://packages.kopia.io/apt/ stable main" | sudo tee /etc/apt/sources.list.d/kopia.list

# Update and install
sudo apt update
sudo apt install kopia

But for the RHEL systems – things are not so much different, here is the list of commands:

# Add Kopia YUM repository
sudo rpm --import https://kopia.io/signing-key
cat <<EOF | sudo tee /etc/yum.repos.d/kopia.repo
[kopia]
name=Kopia Repository
baseurl=https://packages.kopia.io/yum/
enabled=1
gpgcheck=1
gpgkey=https://kopia.io/signing-key
EOF

# Install
sudo yum install kopia

Once installed, you can initialize a repository directly into a Ceph S3 bucket:

kopia repository create s3 \
  --bucket=legacy-backups \
  --access-key=YOUR_ACCESS_KEY \
  --secret-access-key=YOUR_SECRET_KEY \
  --endpoint=https://rgw.yourdomain.com \
  --region=us-east-1

bucket: The bucket name you created in Ceph (for example, legacy-backups).
access-key / secret-access-key: The credentials you generated earlier with radosgw-admin.
endpoint: The Ceph RGW URL.
region: Kopia requires a region name even if Ceph doesn’t enforce one. You can safely use us-east-1 or any placeholder.

So once that is done the major part of the setup is clear – we can test the connection using:

kopia repository connect s3 \
  --bucket=legacy-backups \
  --access-key=YOUR_ACCESS_KEY \
  --secret-access-key=YOUR_SECRET_KEY \
  --endpoint=https://rgw.yourdomain.com \
  --region=us-east-1

So you are set with your backup processes:

So now we have the setup ready – lets create our first snapshot or backup!!

kopia snapshot create /etc

This creates a backup of your /etc folder and upload it to the s3 backup.

You can see the list of backups here at:

kopia snapshot list

This is why we love Kopia, it works almost like aws cli or a rsync setup. But in a more advanced way.

So you may ask, how easy is it to restore – it is just this:

kopia snapshot restore <snapshot-id> /tmp/restore-test

This restores the snapshot id to /tmp/restore-test

You can later download it off or replace your existing files.

Best Practice

Create a dedicated bucket per backup server. This makes it easier to manage policies, clean up old data, and isolate access. Kopia repositories aren’t designed to be casually shared between servers, so separation avoids accidental corruption.

Automating Backups with Kopia

Manually running kopia snapshot create is fine for testing, but real-world backups need to run automatically and consistently. Kopia doesn’t run as a background daemon by default, so we rely on system schedulers like cron or systemd timers to handle automation.

Option 1: Using Cron

Cron is simple and works across most Linux distributions. For example, to snapshot /home every night at 2 AM, edit the crontab of the backup user:

0 2 * * * kopia snapshot create /home >> /var/log/kopia-backup.log 2>&1

Option 2: Using Systemd Timers

If you prefer systemd’s built-in scheduling, create a service file: /etc/systemd/system/kopia-backup.service

and add the code:

[Unit]
Description=Kopia Backup Job

[Service]
ExecStart=/usr/bin/kopia snapshot create /home
User=backup

Now that your service is setup – we can create the timer service on : /etc/systemd/system/kopia-backup.timer

[Unit]
Description=Run Kopia Backup Daily

[Timer]
OnCalendar=02:00
Persistent=true

[Install]
WantedBy=timers.target

So once this is done enable both the services:

sudo systemctl enable –now kopia-backup.timer

and check the status using: systemctl list-timers | grep kopia

That should take care of the backups continually. Now you can go on to the retentions, you can do this using:

kopia policy set /home --keep-daily=7 --keep-weekly=4 --keep-monthly=6

So this will keep 7 days worth of day of daily backups, 4 weekly backups and 6 monthly backups. You can play with this and make changes.

As with any backups software, always keep tracking the log files of Kopia.

Tuning Ceph S3 for Kopia Workloads

At this stage, your backups are running, but performance and efficiency depend on how Ceph is tuned. Legacy-style file backups can stress S3 storage differently depending on file sizes, frequency, and concurrency. Kopia handles much of the optimization internally, but you should still configure Ceph S3 to match your workload.

Optimize Multipart Uploads

Large files (over 5 GB) must use multipart uploads. Kopia does this automatically, but make sure Ceph RGW has multipart uploads enabled and tested. If multipart upload limits are too strict, backups may stall or fail mid-transfer.

Adjust the rgw_max_chunk_size in Ceph’s RGW configuration if you consistently handle very large files.

Handle Small Files Efficiently

If your datasets contain thousands of tiny files (logs, configs, reports), Kopia deduplicates them, but Ceph can still be stressed with object overhead.

Options to help:

Group files before backup (archive into tarballs for especially noisy directories).
Enable compression in Kopia so smaller objects are packed efficiently.

Enable versioning and safe cleanup in the bucket

# enable versioning on the bucket
aws --endpoint-url https://rgw.yourdomain.com \
    s3api put-bucket-versioning \
    --bucket legacy-backups \
    --versioning-configuration Status=Enabled

Plan Your Bucket Layout

Buckets are logical boundaries in Ceph. The safest approach is one bucket per repository. This avoids mixing unrelated data and makes cleanup easier if you retire a server.

One bucket per server
Shared buckets is possible, but always prefix paths (server1/, server2/) to avoid accidental overwrite.

Leverage Lifecycle Policies

Ceph supports lifecycle policies similar to AWS S3. This allows you to automatically move or delete old objects. Pair this with Kopia’s retention policies for a two-layer cleanup system:

Kopia prunes old snapshots.
Ceph lifecycle transitions pruned objects to cold storage pools or deletes them entirely.

This reduces storage costs and keeps buckets lean.

You can try to create lifecycle.json with simple expiration and multipart cleanup. Adjust day counts to your policy.

{
  "Rules": [
    {
      "ID": "expire-old-snapshots",
      "Status": "Enabled",
      "Filter": { "Prefix": "" },
      "Expiration": { "Days": 90 }
    },
    {
      "ID": "abort-incomplete-multipart",
      "Status": "Enabled",
      "AbortIncompleteMultipartUpload": { "DaysAfterInitiation": 7 }
    }
  ]
}

Now you can apply this to your Kopia setup:

aws --endpoint-url https://rgw.yourdomain.com \
    s3api put-bucket-lifecycle-configuration \
    --bucket legacy-backups \
    --lifecycle-configuration file://lifecycle.json

With Kopia now tied into Ceph S3, you’ve taken your first full cycle of modern backups, from setup, to repository creation, to scheduling, and finally to tuning for performance. This bridges the gap between old-style backup servers and modern object storage without needing to abandon legacy infrastructure.

Ceph S3 isn’t just a storage pool; it’s the backbone for a flexible backup ecosystem. Kopia makes it practical, efficient, and secure.

In the next part, we’ll explore advanced optimizations and scaling strategies: how to handle very large deployments, integrate multiple backup servers, and keep performance steady under heavy load.

FAQ

Q: Can I run multiple Kopia repositories in the same bucket?
Not recommended. Kopia repositories expect exclusive control. Use one bucket per repository for safety.

Q: Do I need to worry about multipart uploads with Kopia?
No, Kopia handles this transparently. Just ensure Ceph’s RGW is not restricted in upload chunk size.

Q: What happens if I lose the Kopia repository password?
Backups become unreadable. Always store the password in a secure vault or password manager.

Q: Why do my backups feel slow even though Ceph is healthy?
Often it’s due to many small files. Bundle them before backup, or review Kopia compression and parallelism settings.

Q: Is it safe to back up directly as root?
Not ideal. Use a dedicated non-root backup user and grant specific read permissions. Root backups can lead to accidental overwrites and unsafe access patterns.

Please follow and like us: