Enable Collabora/WOPI/office suite #1

Open
scott wants to merge 1 commit from feature/office-suite into prod
Owner

This adds support for WOPI and Collabora office suite integration, based on the example configuration here.

Most of the bugs are worked out, but we're still getting an error:

hard to read raw output, or...
ocis-app-provider-1  | {"level":"error","pid":1,"error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.18.0.12:9142: i/o timeout\"","time":"2023-09-19T16:25:18.957971967Z","caller":"github.com/cs3org/reva/v2@v2.16.1-0.20230911153145-a2e2320f3448/internal/grpc/services/appprovider/appprovider.go:164","message":"error registering app provider: error calling add app provider"}
...reformatted to be easier to read:
{
  "level": "error",
  "pid": 1,
  "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.18.0.12:9142: i/o timeout\"",
  "time": "2023-09-19T16:25:18.957971967Z",
  "caller": "github.com/cs3org/reva/v2@v2.16.1-0.20230911153145-a2e2320f3448/internal/grpc/services/appprovider/appprovider.go:164",
  "message": "error registering app provider: error calling add app provider"
}

If we inspect the app-provider network configuration...

$ docker inspect ocis-app-provider-1 | jq -r '.[] | .NetworkSettings.Networks | .[] | .IPAddress'
172.26.0.5

We can see the problem: the subnet that app-provider is trying to reach the ocis container on is not the network that the app-provider container is on. Sure enough, if we inspect the ocis container:

$ docker inspect ocis-ocis-1 | jq -r '.[] | .NetworkSettings.Networks | .[] | .IPAddress'
172.26.0.4
172.27.0.3
172.18.0.12

We can see that, sure, the ocis container is on the app-provider-net network, but it's also on the web network, which is the subnet the app-provider container is trying to reach it on. This suggests that either the mDNS/service registry system1 is only reporting the IP address of the web network, or the client is only trying the first IP that it gets in response to the mDNS query and discarding any other networks. I don't really know that much about how mDNS works, but I did try to do a bit of spelunking in the code... the relevant code is here which seems to be just a bit of glue to tie in Go Micro which seems to be a microservice framework. The portion of that relevant to the mDNS registry is here.2

Unfortunately, at first pass I didn't see well enough how things worked to track down why it isn't working. I may come back to that, based on our discussion here, and try to spend some more time diving more deeply into that code, if that's what we want to do.

Another option, I considered out of the list of available registries is etcd, but that doesn't seem designed to fit this use-case, and distribution is...odd. It's a project which is out of the Cloud Native Computing Foundation, which is Red Hat, which is Going Through Some Shit right now... To make matters worse, the CNCF official docker image distribution is from their own container registry (which is fine, I use quay.io, they're no better or worse than docker hub), but is only documented for a very old version and doesn't seem to have semver aliases on the tags or a latest tag (so we'd have to watch closely for point releases). They seem to expect it to be deployed as a microservice as a part of a Cloud Native K8s web application, rather than the sort of dockerised environment we're working with here. There's also a distrubution by VMWare on docker hub, but...

...I don't know, at this point I'm just not super sure what to do. On top of all that, we don't really have any guarantee that we won't see the same issue under a different backend for the service provider. The example configuration this was all based on expects that everything be on a single network...in our case that would have to be the web network, which is the network shared across the physical system by all services which are directly proxied by Traefik out to the public internet. I prefer to keep dependent services (e.g. postgres, redis, or in this case, Tika, the app-provider itself, and possibly etcd) on a separate subnet, preferably as finely as possible, as a security measure, which goes double at least for things like redis or Tika which don't have authentication (at least not as configured) and rely on network segmentation as basically the only thing preventing their data from being exposed to the internet.


  1. There's some relevant documentation to what the service registry is and how it's configured here ↩︎

  2. I'm mostly just leaving these code links here so I can reference them later. ↩︎

This adds support for WOPI and Collabora office suite integration, based on the example configuration [here](https://github.com/owncloud/ocis/blob/master/deployments/examples/ocis_wopi/docker-compose.yml). Most of the bugs are worked out, but we're still getting an error: <details> <summary>hard to read raw output, or...</summary> ``` ocis-app-provider-1 | {"level":"error","pid":1,"error":"rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.18.0.12:9142: i/o timeout\"","time":"2023-09-19T16:25:18.957971967Z","caller":"github.com/cs3org/reva/v2@v2.16.1-0.20230911153145-a2e2320f3448/internal/grpc/services/appprovider/appprovider.go:164","message":"error registering app provider: error calling add app provider"} ``` </details> ...reformatted to be easier to read: ```json { "level": "error", "pid": 1, "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.18.0.12:9142: i/o timeout\"", "time": "2023-09-19T16:25:18.957971967Z", "caller": "github.com/cs3org/reva/v2@v2.16.1-0.20230911153145-a2e2320f3448/internal/grpc/services/appprovider/appprovider.go:164", "message": "error registering app provider: error calling add app provider" } ``` If we inspect the `app-provider` network configuration... ```console $ docker inspect ocis-app-provider-1 | jq -r '.[] | .NetworkSettings.Networks | .[] | .IPAddress' 172.26.0.5 ``` We can see the problem: the subnet that `app-provider` is trying to reach the `ocis` container on is not the network that the `app-provider` container is on. Sure enough, if we inspect the `ocis` container: ```console $ docker inspect ocis-ocis-1 | jq -r '.[] | .NetworkSettings.Networks | .[] | .IPAddress' 172.26.0.4 172.27.0.3 172.18.0.12 ``` We can see that, sure, the `ocis` container is on the `app-provider-net` network, but it's also on the `web` network, which is the subnet the `app-provider` container is trying to reach it on. This suggests that either the mDNS/service registry system[^service-reg-doc] is only reporting the IP address of the `web` network, or the client is only trying the first IP that it gets in response to the mDNS query and discarding any other networks. I don't really know that much about how mDNS works, but I did try to do a bit of spelunking in the code... the relevant code is [here](https://github.com/owncloud/ocis/tree/master/ocis-pkg/registry) which seems to be just a bit of glue to tie in [Go Micro](https://github.com/go-micro/go-micro) which seems to be a microservice framework. The portion of that relevant to the mDNS registry is [here](https://github.com/go-micro/go-micro/blob/master/registry/mdns_registry.go).[^1] [^1]: I'm mostly just leaving these code links here so I can reference them later. [^service-reg-doc]: There's some relevant documentation to what the service registry is and how it's configured [here](https://github.com/owncloud/ocis/blob/b0ac9840dff00a2527b2e8df86bebcd12632104c/ocis/README.md) Unfortunately, at first pass I didn't see well enough how things worked to track down why it isn't working. I may come back to that, based on our discussion here, and try to spend some more time diving more deeply into that code, if that's what we want to do. Another option, I considered [out of the list of available registries](https://github.com/owncloud/ocis/blob/b0ac9840dff00a2527b2e8df86bebcd12632104c/ocis/README.md) is `etcd`, but that doesn't seem designed to fit this use-case, and distribution is...odd. It's a project which is out of the Cloud Native Computing Foundation, which is Red Hat, which is Going Through Some Shit right now... To make matters worse, the CNCF official docker image distribution is [from their own container registry](https://quay.io/repository/coreos/etcd?tab=tags) (which is fine, I use quay.io, they're no better or worse than docker hub), but is [only documented for a very old version](https://etcd.io/docs/v2.3/docker_guide/) and doesn't seem to have semver aliases on the tags or a `latest` tag (so we'd have to watch closely for point releases). They seem to expect it to be deployed as a microservice as a part of a Cloud Native K8s web application, rather than the sort of dockerised environment we're working with here. There's also a distrubution by VMWare on [docker hub](https://hub.docker.com/r/bitnami/etcd), but... ...I don't know, at this point I'm just not super sure what to do. On top of all that, we don't really have any guarantee that we won't see the same issue under a different backend for the service provider. The example configuration this was all based on expects that everything be on a single network...in our case that would have to be the `web` network, which is the network shared across the physical system by all services which are directly proxied by Traefik out to the public internet. I prefer to keep dependent services (e.g. postgres, redis, or in this case, Tika, the app-provider itself, and possibly etcd) on a separate subnet, preferably as finely as possible, as a security measure, which goes double at least for things like redis or Tika which don't have authentication (at least not as configured) and *rely* on network segmentation as basically the only thing preventing their data from being exposed to the internet.
scott added 1 commit 2023-09-19 13:07:52 -04:00
Author
Owner

Oh, this is frustrating. In both ocis and app-provider containers, if I run dig ocis and `dig app-provider it gives me IPs on the same subnet!

$ docker compose run -u 0 --entrypoint sh ocis
[+] Building 0.0s (0/0)                                                                                           
[+] Creating 1/0
 ✔ Container ocis-search-engine-1  Running                                                                   0.0s 
[+] Building 0.0s (0/0)                                                                                           
# apk add --quiet bind-tools
# dig +short ocis
172.26.0.4
# dig +short app-provider
172.26.0.5
Oh, this is frustrating. In both `ocis` and `app-provider` containers, if I run `dig ocis` and `dig app-provider it gives me IPs on the same subnet! ```console $ docker compose run -u 0 --entrypoint sh ocis [+] Building 0.0s (0/0) [+] Creating 1/0 ✔ Container ocis-search-engine-1 Running 0.0s [+] Building 0.0s (0/0) # apk add --quiet bind-tools # dig +short ocis 172.26.0.4 # dig +short app-provider 172.26.0.5 ```
Author
Owner

I created a forum post about this yesterday.

I created a [forum post](https://central.owncloud.org/t/grpc-connection-happens-on-the-wrong-subnet-when-using-network-segmentation/45348) about this yesterday.
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feature/office-suite:feature/office-suite
git checkout feature/office-suite

Merge

Merge the changes and update on Forgejo.
git checkout prod
git merge --no-ff feature/office-suite
git checkout prod
git merge --ff-only feature/office-suite
git checkout feature/office-suite
git rebase prod
git checkout prod
git merge --no-ff feature/office-suite
git checkout prod
git merge --squash feature/office-suite
git checkout prod
git merge --ff-only feature/office-suite
git checkout prod
git merge feature/office-suite
git push origin prod
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TWS/ocis-deployment#1
No description provided.