How JetBrains Gateway works - beyond the black box

2022-11-13

Tech

JetBrains Gateway is getting more and more popular, even though still in beta release, it has first-party providers like Gitpod, JetBrains Space, Google Cloud Workstations, and GitHub Codespaces. There are also numerous plugins built for customized in-house remote workspace orchestration.

There is an official diagram introducing the architecture and definitions relating to JetBrains’ remote development. The split of the monolithic IDE architecture raises a couple of interesting questions:

How are different components launched and connected with each other in such a distributed way?
Is the network traffic routed to some relay? How should I configure the network policy if I’m hosting my remote servers in some enterprise environment?

The official document explained in detail both the user guide and some security concerns. Please check it out first: https://www.jetbrains.com/help/idea/remote-development-overview.html. In this post, I will try to answer the questions from an end-user perspective by watching the process tree and monitoring the traffic between different components. Please note that many conclusions are just based on logs and network packet capturing, and there could be mistakes.

Here is a quick summary of the terminologies of JetBrains remote development:

Client (“Local machine”) is the developer’s PC where JetBrains (Thin) Client runs.
Server (“Remote host”) is a Linux server where the IDE backend runs.

The experiment is conducted with a one-hop client/server setup, where the Client (my laptop) has direct access to the Server (a Virtual Machine hosted on Tencent Cloud). The server only allows inbound SSH requests on port 22. It is the simplest but basic form of a remote development setup. If you are using Gitpod or JetBrains Space, the Server could be a Kubernetes Pod or Docker container, and the actual traffic routing could be complicated.

From the experiment, we can draw a graph of key components of JetBrains Gateway remote development:

Fig: Components of the JetBrains Gateway remote development stack and the network traffic flow

How connection is established

There are two types of workflow:

client → server: JetBrains Gateway acts as supervisor and launches both remote IDE backend and local IDE thin client. It then set up the communication tunnel between the two sides.
server → client: the IDE backend server is first started as a standalone process, and waits for clients to bind. This fits well with the scheduling and provisioning of internal infrastructure for enterprise scenarios.

Most of the technical details we want to investigate are actually the same in both ways. Let’s first start with the client-to-server workflow.

client-to-server workflow

After finishing the ssh server setup in JetBrains Gateway, let’s check the ports it opened:

# ====== client side ======
$ jps -v | grep -i gateway
56786 ...
$ lsof -Pan -i -p 56786
... TCP 127.0.0.1:6942 (LISTEN)
... TCP 127.0.0.1:63342 (LISTEN)
... TCP 192.168.1.5:60025->119.100.3.19:22 (ESTABLISHED)

We can see an SSH connection is established with the remote end. We can verify that by checking sshd processes on the remote host, there is a root@notty session:

# ====== server side ======
$ ps auxf | grep -i ssh
... /usr/sbin/sshd -D
...  \_ sshd: root@notty
...  \_ sshd: root@pts/0
...          \_ grep --color=auto -i ssh

In the client-to-server workflow, JetBrains Gateway needs to specify the IDE backend version and download the IDE if necessary. In a wild guess, these pre-checks and version negotiations are done via SSH commands. How can we know what commands are sent?

We can try using Wireshark on the client side to help us analyze the network traffic.

Capture packets from en0 interface. Use tcp.port == 22 filter to find the SSH packets.

Follow the TCP stream to locate the SSH handshake packets. From the SSH protocol exchange message we can find out that JetBrains Gateway is using SSHJ 0.33.

 02 00 00 00 45 00 00 6b 00 00 40 00 40 06 f0 3d   ....E..k..@.@..=
 c0 a8 ff 0a 09 86 81 16 ef e0 8c a0 be d8 d5 6d   ...............m
 85 29 d0 12 80 18 18 eb b1 e5 00 00 01 01 08 0a   .)..............
 16 8a 2a ff aa 50 09 12 53 53 48 2d 32 2e 30 2d   ..*..P..SSH-2.0-
 49 6e 74 65 6c 6c 69 4a 5f 5f 47 61 74 65 77 61   IntelliJ__Gatewa
 79 5f 47 57 2d 32 32 32 2e 34 34 35 39 2e 31 31   y_GW-222.4459.11
 5f 5f 53 53 48 4a 5f 30 2e 33 33 2e 30 0d 0a      __SSHJ_0.33.0..

However Wireshark does not support SSH payload decryption, so the packets after the key exchange won’t give us any further information.

Luckily JetBrains Gateway (as well as the JetBrains Client and IDE backend) prints out detailed logs, which gives us some hints on what’s going on.

$ cd <path-to-jetbrains-client-logs>
$ egrep 'sshj|SshCommandExecutor' idea.log
2022-11-11 23:06:49,729 [   7384]   INFO - #c.j.g.s.d.i.SshCommandExecutor - Executing on remote host: 'uname -sm'. Parameters: redirectErrorStream=false, usePty=true
2022-11-11 23:06:49,930 [   7585]   INFO - #c.i.s.i.s.sshj - Client identity string: SSH-2.0-IntelliJ__Gateway_GW-222.4459.11__SSHJ_0.33.0
2022-11-11 23:06:50,131 [   7786]   INFO - #c.i.s.i.s.sshj - Server identity string: SSH-2.0-OpenSSH_7.4
2022-11-11 23:06:50,603 [   8258]   INFO - #c.i.s.i.s.sshj - Authentication log: SSH connection to root@119.100.3.19:22
2022-11-11 23:06:50,973 [   8628]   INFO - #c.j.g.s.d.i.SshCommandExecutor - Executing on remote host: 'echo $HOME'. Parameters: redirectErrorStream=false, usePty=true
2022-11-11 23:06:51,291 [   8946]   INFO - #c.j.g.s.d.i.SshCommandExecutor - Executing on remote host: 'test -f /root/.cache/JetBrains/RemoteDev/remote-dev-worker/remote-dev-worker_8be48ed1f7d9d149a663842f0df2735b7433265d1654774596e54ce70af850f2'. Parameters: redirectErrorStream=false, usePty=true
2022-11-11 23:06:51,609 [   9264]   INFO - #c.j.g.s.d.i.SshCommandExecutor - Executing on remote host: 'test -x /root/.cache/JetBrains/RemoteDev/remote-dev-worker/remote-dev-worker_8be48ed1f7d9d149a663842f0df2735b7433265d1654774596e54ce70af850f2'. Parameters: redirectErrorStream=false, usePty=true
2022-11-11 23:06:51,951 [   9606]   INFO - #c.j.g.s.d.i.SshCommandExecutor - Executing on remote host: 'echo $SHELL'. Parameters: redirectErrorStream=true, usePty=true
2022-11-11 23:06:52,265 [   9920]   INFO - #c.j.g.s.d.i.SshCommandExecutor - Executing on remote host: '/bin/bash -lc echo\ REMOTE_EXEC_OUTPUT_MARKER_\ \&\&\ /root/.cache/JetBrains/RemoteDev/remote-dev-worker/remote-dev-worker_8be48ed1f7d9d149a663842f0df2735b7433265d1654774596e54ce70af850f2\ active-projects'. Parameters: redirectErrorStream=false, usePty=true
...

The client-to-server communication is not entirely in form of SSH command. Some complicated instructions are written in Go and cross-compiled for different architectures, and packaged as binaries which are pre-bundled locally under the JetBrains Gateway installation directory. JetBrains Gateway uploads the binary to the remote host, and invokes the agent with sub-commands like /bin/bash -lc echo REMOTE_EXEC_OUTPUT_MARKER_ && <remote_dev_worker> <subcommands> <args>

We can try running the worker locally (choosing the right arch of your machine), and draw a rough conclusion that its main duties relate to operations on the remote file system and monitoring IDE backend status.

$ ./remote-dev-worker-darwin-arm64
Available commands:
  * exists: checks if file/directory exists
  * pid-alive: Checks whether the requested process is alive
  * readlink: return resolved path
  * homedir: return homedir
  * env: Returns env variables
  * available-space: requires --path. Returns available space in bytes
  * create-dir: requires --path. Creates directory
  * create-file: requires --path. Creates file
  * remove-path: requires --path
  * product-code: requires --ide-path. Return product code.
  * host-status: return host-status
  * backend-pid: backend pid
  * backend-host-alive: requires --project-path. Return true/false
  * backend-status-alive: requires --project-path. Return true/false
  * kill-backend: kill backend
  * installed-ides: Returns installed IDEs
  * expand-archive: requires --archive-path, --destination.
  * get-jstack: return jstack
  * recent-projects: get recent projects
  * active-projects: get active projects
  * list-files: Lists files; requires --root, --max-depth. Optional flags: --dirs-only, --files-only, --exclude-pattern
  * installed-ides-with-projects: Returns installed IDEs and opened project
  * remove-ide: Close projects and remove installed IDE
  * simple-json-output: Provides simple json output with data provided by --output-string
  * stdin-gather-json-output: Provides multiline json output with data read line-by-line from stdin
  * multi-json-output: Provides multi-result json output with data provided by --input-string, split by character provided by --separator

Following the logs, we can have a conclusion about the boot sequence of different parts of the whole JetBrains remote development stack.

Server Side:

Check remote IDE installations and recent projects.
Start remote IDE backend with the specified project root.

Go agent reports IDE backend status (PID, IDE version, JBR version) and product code, and most importantly, joinLink / httpLink / gatewayLink for opened projects, back to the JetBrains Gateway.

 {
   "appPid": 31352,
   "appVersion": "IU-222.4345.14",
   "runtimeVersion": "17.0.4.1b469.62",
   "unattendedMode": true,
   "backendUnresponsive": false,
   "modalDialogIsOpened": false,
   "idePath": "/root/.cache/JetBrains/RemoteDev/dist/bcbc13aa201fb_ideaIU-2022.2.3",
   "projects": [
     {
       "projectName": "test",
       "projectPath": "/root/workspace/test",
       "joinLink": "tcp://127.0.0.1:5990#jt=4ca271f2-6c7b-4a7e-8b40-7d82a15b7fe4&p=IU&fp=A92C884087BB0C2549B6E208D61C532A16C2E67B8EA0F0C815461B54BEC3BF03&cb=222.4345.14&jb=17.0.4.1b469.62",
       "httpLink": "https://code-with-me.jetbrains.com/remoteDev#idePath=%2Froot%2F.cache%2FJetBrains%2FRemoteDev%2Fdist%2Fbcbc13aa201fb_ideaIU-2022.2.3&projectPath=%2Froot%2Fworkspace%2Ftest&host=ubuntu&port=22&user=root&type=ssh&deploy=false",
       "gatewayLink": "jetbrains-gateway://connect#idePath=%2Froot%2F.cache%2FJetBrains%2FRemoteDev%2Fdist%2Fbcbc13aa201fb_ideaIU-2022.2.3&projectPath=%2Froot%2Fworkspace%2Ftest&host=ubuntu&port=22&user=root&type=ssh&deploy=false",
       "controllerConnected": false,
       "secondsSinceLastControllerActivity": 65546,
       "backgroundTasksRunning": false,
       "users": [
         "root"
       ]
     }
   ]
 }

At this time, the IDE backend is alive and waiting for the IDE client to “attend”. Let’s check again what ports the IDE backend is listening to:

 # ====== server side ======
 $ lsof -Pan -i -p 31352
 ... TCP 127.0.0.1:6942 (LISTEN)
 ... TCP 127.0.0.1:63342 (LISTEN)
 ... TCP 127.0.0.1:5990 (LISTEN)
 ... TCP 127.0.0.1:5990->127.0.0.1:55784 (ESTABLISHED)
 ... # Ignore other TCP connections with child process such as MavenRemoteServer, as well as socket wrapper for IDE backend ports
 # Note that we are just listing TCP ports (lsof -i), however SSH tunnel works with UNIX sockets as well.

Here we see the port of joinLink (5990) is already connected, let’s check the process listening on the other end, and we can see a TCP port forwarding to the outgoing SSH port 22.

 $ lsof -Pan -i:55784
 ld-linux- 15838 ...  TCP 127.0.0.1:5990->127.0.0.1:55784 (ESTABLISHED)
 sshd      24548 ...  TCP 127.0.0.1:55784->127.0.0.1:5990 (ESTABLISHED)
 # check sshd (pid=24548) opened ports
 $ lsof -Pan -i -p 24548
 ... TCP 119.100.3.19:22-><my-pub-ip>:29495 (ESTABLISHED)
 ... TCP 127.0.0.1:55784->127.0.0.1:5990 (ESTABLISHED)

Client Side

Download and extract JetBrains Client if not exists.
Start JetBrains Client process open [-n, -W, -a, <path-to-JetBrains Client.app>, --args, thinClient, gwws://<gatewayLink>]
Start local port forwarding. (We will skip the verification steps with lsof on the client side.)

server-to-client workflow

If we start the IDE backend manually in the remote host, we can attend from JetBrains Gateway using httpLink(CWM) or gatewayLink.

One interesting thing to note is that we can skip JetBrains Gateway entirely in such a workflow:

(Server side) Start the IDE backend manually using the remote-dev-server.sh wrapper script, and copy the joinLink URL from its output.
(Client side) Establish local SSH port forwarding manually via ssh -L localhost:5990:localhost:5990 user@remote
(Client side) Start JetBrains Client using the command from the previous log (open '<path-to-JBClient.app>' --args thinClient 'tcp://<joinLink>' ).

Gateway is eating up ~500M memory in my laptop, so we could save some local resources in this way.

But it does come with a cost, according to the JetBrains document, Gateway is also in charge of handling reconnection and other stuff. We could see there’s an established WebSocket connection on the loopback interface between JetBrains Client and JetBrains Gateway, where JetBrains Client periodically sends its liveness state to an endpoint provided by the JetBrains Gateway built-in Web Server.

Security

Security is a big concern these days, and developer tool is a huge and valuable target. Especially when all your source code is stored remotely and transferred back to the local IDE across the network, and the distributed IDE components increase the attack surface and security risks. JetBrains document covers a lot about security (both the Code-With-Me model & JetBrains Gateway remote development): https://www.jetbrains.com.cn/help/idea/security-model.html

I’m not an expert in the security domain. I make some attempts using Wireshark’s TLS dissector trying to decrypt the messages between JetBrains Client and Gateway, a close attempt was to enable the JSSE debug flag (adding VMOptions -Djavax.net.debug=all to the JetBrains Client), and inside the JetBrains Client’s log there is decrypted HEX dump, however the printable ASCII characters give nothing useful, and I would only guess the binary data format is somewhat based on JetBrains/RD protocol.

2022-11-13 17:45:55,497 [  79089]   INFO - STDERR - javax.net.ssl|DEBUG|F2|Flush:tcp://127.0.0.1:56116|2022-11-13 17:45:55.497 CST|SSLEngineOutputRecord.java:280|WRITE: TLSv1.3 application_data, length = 18
2022-11-13 17:45:55,498 [  79090]   INFO - STDERR - javax.net.ssl|DEBUG|F2|Flush:tcp://127.0.0.1:56116|2022-11-13 17:45:55.498 CST|SSLCipher.java:2066|Plaintext before ENCRYPTION (
2022-11-13 17:45:55,498 [  79090]   INFO - STDERR -   0000: 02 02 24 F5 3F 11 32 48   EA 7F 43 45 01 00 00 00  ..$.?.2H..CE....
2022-11-13 17:45:55,498 [  79090]   INFO - STDERR -   0010: FF FF 17 00 00 00 00 00   00 00 00 00 00 00 00 00  ................
2022-11-13 17:45:55,498 [  79090]   INFO - STDERR -   0020: 00 00 00                                           ...
2022-11-13 17:45:55,498 [  79090]   INFO - STDERR - )
2022-11-13 17:45:55,498 [  79090]   INFO - STDERR - javax.net.ssl|DEBUG|F2|Flush:tcp://127.0.0.1:56116|2022-11-13 17:45:55.498 CST|SSLEngineOutputRecord.java:296|Raw write (
2022-11-13 17:45:55,499 [  79091]   INFO - STDERR -   0000: 17 03 03 00 33 E8 85 BA   38 5E 32 68 C0 44 72 F4  ....3...8^2h.Dr.
2022-11-13 17:45:55,499 [  79091]   INFO - STDERR -   0010: DC F6 58 6F 68 57 35 6B   8A 6C C0 66 42 65 9C F5  ..XohW5k.l.fBe..
2022-11-13 17:45:55,499 [  79091]   INFO - STDERR -   0020: 8F AE BF 9A B5 5B 4D 2C   AB 36 00 30 92 7F F1 B4  .....[M,.6.0....
2022-11-13 17:45:55,499 [  79091]   INFO - STDERR -   0030: BC D5 BB 06 19 FF 9F 7A                            .......z
2022-11-13 17:45:55,499 [  79091]   INFO - STDERR - )
2022-11-13 17:45:55,581 [  79173]   INFO - STDERR - javax.net.ssl|DEBUG|03|ProtocolFromChan:tcp://127.0.0.1:56116|2022-11-13 17:45:55.581 CST|SSLEngineInputRecord.java:176|Raw read (
2022-11-13 17:45:55,581 [  79173]   INFO - STDERR -   0000: 17 03 03 00 31 AB 64 7E   99 E1 CA CD 30 D1 F4 38  ....1.d.....0..8
2022-11-13 17:45:55,581 [  79173]   INFO - STDERR -   0010: F6 5B 7C 9B 3A B4 B4 F0   87 5E 97 91 A2 3B 57 06  .[..:....^...;W.
2022-11-13 17:45:55,581 [  79173]   INFO - STDERR -   0020: 61 A2 4E 5C 94 5F CE 36   00 51 29 EC 99 64 67 72  a.N\._.6.Q)..dgr
2022-11-13 17:45:55,581 [  79173]   INFO - STDERR -   0030: CF 76 A6 32 14 93                                  .v.2..
2022-11-13 17:45:55,582 [  79174]   INFO - STDERR - )

Summary

We can confirm that the JetBrains Client and IDE backend are directly connected without relay, and it’s encrypted end-to-end, via secure (TLS) socket + SSH tunnel. Though JetBrains Gateway is not open-sourced, we could have some guess about some of its functionalities by looking through its extension points and API for plugin development.

And there are also many undocumented but solid works by JetBrains relating to the production-ready features of remote development, for example, I found there is an indexing diagnostics report for each project, which is useful for performance analysis.

Thank you so much for reading. This post is also a part of a remote development series:

1. Gitpod workspace with JetBrains Gateway
2. Gitpod Self-hosted installation on Tencent Cloud
3. How JetBrains Gateway works - beyond the black box