Why Cloud Development

    • Developer on-boarding acceleration (migrate developer tools to the cloud & store development assets in the cloud)
    • Standardized workspace: immutable dev / runtime environment
    • Collaborative dev solution & trouble shooting (share & attach remote workspaces)
    • Security
    • Scalable: replicate and distribute the entire development workspace on-premises
    • Integrated approach to DevOps
      • Create containerized Dev / Test / Staging environments (hosted on a shared-resource cloud)
      • Running the tests in an exact copy of production
      • Integrated CI / CD pipelines

    Our Solutions

    • Remote workspaces with native toolchains
      • Using a thin client to connect with cloud-based containers / VMs (X Windows)
      • Made some compromise between the “Cloud-Native” way and traditional “VM way”
    • WebIDE + Cloud-based workspaces (multi-year effort)

    Infrastructure

    • k8s cluster:
      • xx+ machines (most of them are retired physical machines which don’t have SLAs)
      • providing xx RAM & xx CPU cores
      • every host node is running CentOS
    • Workspace:
      • Multiple containers run within a single workspace encapsulation
      • Workspace configuration:
        • adaptable templates (called stacks) to create new workspace
        • resource management (quota / limit)
      • k8s-friendly application stack definition (Docker image, kubernetes.yaml, Helm Chart)
      • The workspace engine will be capable of interpreting an application stack definition and generating the workspace
    • CloudIDE container:
      • IDE Container (IDE services): fat single-container apps, with an init system
      • Dev Container (apps): CentOS based container with tini as the top-level process
      • Containers talk to each other over the network and form a complete cloud-dev system
    • Overlay Network & routing & service
    • Stack: In-house Docker registry. Dockerfiles are kept in the VCS

    Challenges:

    • Availability & Stability (SLA 99%, RTO < 30min)
    • System utilization is low (7% CPU overall)
    • Start-up speed is slow (~30s)
    • Provision / Scheduler
      • Resource allocation is handled by the in-house container platform, Sigma (k8s)
      • Orchestration system: Che-inspired scheduler
    • Distributed Storage
      • Local PV, backup / sync to block storage
      • GlusterFS for persistent stateful services RWX
    • Developer Experience on IDE
      • Code / Debug / Language Service
      • Build log
      • Real-time collaboration: last-write-wins policy / multi-cursor editing
      • Desktop IDE sync: fuse-based mount and sync, sshfs

    GitPod Arch

    Challenges

    1. Allow arbitrary code execution on GitPod
    2. GitPod service need to be scale from 1 user to 5k concurrent users

    Arch: meta clusters + ws clusters

    “Meta” cluster

              [dashboard]
                /
    ---> proxy / -----> [server] (xN) ---> database
                         /
                    messagebus (rabbitMQ)
                     /
                [ws-manager-bridge]
    
    • Dashboard: GitPod.io written in TypeScript (React + Tailwind)
      • Key components such as “Start Workspace” page.
      • Determine the contextUrl (gitpod.io/#/<contextUrl>) and ask server to assembly the configs to start WS.
    • Server: serve the dashboard, a webapp written in TypeScript running on node.
      • Server provides JSON-RPC of WebSocket API, which dashboard uses to talk to Server.
      • Server talks to database
      • Server is stateless, and can scale out horizontally and run many instances
      • Server is bundled with concerns such as Login flow & Auth, talk to code hosting services (GitHub/GitLab), and also prepares configurations required to start the workspaces.
    • ws-manager-bridge: when status update (from Workspace clusters), notify the server (using MQ)

    “Workspace” cluster

    Server select a cluster, and talk to ws-manager to start the WS based on such things as availability of the clusters, health, region, etc.

    • ws-manager: core service.
      • Talk directly to k8s to create k8s Pod.
      • Talk to ws-daemon on that node in order to init the WS content (git clone the source code, etc.)
    • ws-scheduler: k8s scheduler to find a node on which to schedule the Pod.
    • ws-proxy: go application acts as a reverse proxy towards workspaces but also static content.
      • During the startup (after the response of ws-manager), you will be redirected to your WS url and thus make a request to ws-proxy directly
      • Normally the proxy would simply try to route the request to the WS Pod, but there’s a very good chance that the WS Pod isn’t running yet. So some requests are served statically by blobserver
    • blobserver: serve static content (which can serve content directly out of OCI or Docker images)

    Node

    • each node as a DaemonSet we operate 2 of our own services:
      • registry-facade: For containerd to pull a image. The facade takes the configured image (from gitpod.yml) and adds a bunch of layers on top dynamically. It manipulates the OCI image configuration and manifest to for example include the IDE
      • ws-daemon (similar to kubelet, a process we use to initialize content within a WS to backup content from a WS and also to assist in setting up the user Namespace that each WS runs in)
    • each WS itself is a k8s Pod:
      • workspace-kit: Entrypoint of the container. Setup User Namespace, PID Namespace, mount Namespace to provide more isolation of the workload that runs within the WS towards the Node and towards other WS. Also provide Docker (DinD) or Root within a WS.
      • supervisor: Supervise processes that run within the WS. (PID-1: workspace-kit created a PID namespace where supervisor is root). Supervisor also starts the IDE also used for integration services such as to detect weather a port is being serviced from within the WS. It also watches the IDE and keep it running.
      • IDE

    Status update route:

    • All the operations will trigger the status updates predominantly through k8s. ws-manager listens for those status updates, translates them into GitPod’s structure and talks to ws-manager-bridge. ws-manager-bridge then persists the states into database but also forward them to messageBus (and forward to Server).
    • The dashboard in turn listens to those updates throught the JSON-RPC socket – this is how you see status updates on the dashboard but also on the WS startup screen
    • Only in “meta” that we keep any state that lives beyond a single instance of a workspace.