Why Cloud Development:

    • Developer on-boarding acceleration (migrate developer tools to the cloud & store development assets in the cloud)
    • Standardized workspace: immutable dev / runtime environment
    • Collaborative dev solution & trouble shooting (share & attach remote workspaces)
    • Security
    • Scalable: replicate and distribute the entire development workspace on-premises
    • Integrated approach to DevOps
      • Create containerized Dev / Test / Staging environments (hosted on a shared-resource cloud)
      • Running the tests in an exact copy of production
      • Integrated CI / CD pipelines

    Solution:

    • Remote workspaces with native toolchains
      • Using a thin client to connect with cloud-based containers / VMs (X Windows)
      • Made some compromise between the “Cloud-Native” way and traditional “VM way”
    • WebIDE + Cloud-based workspaces (multi-year effort)

    Infrastructure:

    • k8s cluster:
      • xx+ machines (most of them are retired physical machines which don’t have SLAs)
      • providing xx RAM & xx CPU cores
      • every host node is running CentOS
    • Workspace:
      • Multiple containers run within a single workspace encapsulation
      • Workspace configuration:
        • adaptable templates (called stacks) to create new workspace
        • resource management (quota / limit)
      • k8s-friendly application stack definition (Docker image, kubernetes.yaml, Helm Chart)
      • The workspace engine will be capable of interpreting an application stack definition and generating the workspace
    • CloudIDE container:
      • IDE Container (IDE services): fat single-container apps, with an init system
      • Dev Container (apps): CentOS based container with tini as the top-level process
      • Containers talk to each other over the network and form a complete cloud-dev system
    • Overlay Network & routing & service
    • Stack: In-house Docker registry. Dockerfiles are kept in the VCS

    Challenges:

    • Availability & Stability
      • SLA 99%, RTO < 30min
    • System utilization is low (7% CPU overall)
    • Start-up speed is slow (~30s)
    • Provision / Scheduler
      • Resource allocation is handled by the in-house container platform, Sigma (k8s)
      • Orchestration system: Che-inspired scheduler
    • Distributed Storage
      • Local PV, backup / sync to block storage
      • GlusterFS for persistent stateful services RWX
    • Developer Experience on IDE
      • Code / Debug / Language Service
      • Build log
      • Real-time collaboration: last-write-wins policy / multi-cursor editing
      • Desktop IDE sync: fuse-based mount and sync, sshfs