Juniper Switch SSH 'Expecting SSH2_MSG_KEX_ECDH_REPLY' Error

I was trying to SSH into a Juniper switch running JunOS 12.3, the connection would fail with Connection closed by <switch IP>. Using ssh -vvv showed an error: Debug: debug1: expecting SSH2_MSG_KEX_ECDH_REPLY. This is a nice rabbit hole to go down as we’re going to use SSH’s verbose mode, nmap to debug the solution.

Into the archives we must go

Looking into the logs of the switch after trying to SSH, produces: sshd[1521]: fatal: ssh_dispatch_run_fatal: Connection to <client ip address>: unexpected internal error [preauth]

Ok, so we have both the client and server side of the puzzle pieces. Resolving this doesn’t immediately stand out to me, so we’re going to hit the well trodden path of googling it and mixing in past experiences with nmap.

It appears that it could be some sort of key exchange error at face value. Since this Juniper is running an older version of SSH, we could assume that the switch and the client can’t agree on a key or cipher to use. Most likely the clients SSH has rotated old encryption algorithms out. But generally this would give an error Unable to negotiate with <IP>: no matching key exchange method found. Their offer: but instead just Connection closed. So this is a bit deeper.

We can use SSH in verbose mode to see exactly what we’re offered on both ends and where the hang up is:

SSH verbose mode

Client to Server(local client KEXINIT proposal):
ssh -vvv username@192.168.1.1

debug2: KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: ciphers ctos: aes256-gcm@openssh.com,chacha20-poly1305@openssh.com,aes256-ctr,aes128-gcm@openssh.com,aes128-ctr
<snipped>

Server to Client response:

debug2: KEX algorithms: curve25519-sha256@libssh.org
debug2: peer server KEXINIT proposal
ebug2: ciphers ctos: chacha20-poly1305@openssh.com,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,rijndael-cbc@lysator.liu.se
<snipped>

Then the agreed upon encryption:

debug1: kex: algorithm: curve25519-sha256@libssh.org
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: aes256-gcm@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: aes256-gcm@openssh.com MAC: <implicit> compression: none
debug1: kex: curve25519-sha256@libssh.org need=32 dh_need=32
debug1: kex: curve25519-sha256@libssh.org need=32 dh_need=32
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
Connection closed by Switch port 22

This all looks above board for a normal SSH connection so far. But notice the last message… So the remote end of the connection closed when it was time for the switch to reply.

For completion sake, lets look at what a working output shows at this point so we can compare:

debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug3: receive packet: type 31
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
Connection closed by Switch port 22

We need to dig deeper on this…

Let’s look on the network with a packet capture. Interestingly the start is all in plain text(how a shared secret exchange happens).

Wireshark debug

Packet capture from the start of initiating an SSH connection:

Source           Destination        Protocol Length Sequence Number Info
Client           Switch             TCP      74     0               50604 → 22 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=4271094762 TSecr=0 WS=128
Switch           Client             TCP      78     0               22 → 50604 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=2 TSval=2740708411 TSecr=4271094762 SACK_PERM=1
Client           Switch             TCP      66     1               50604 → 22 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4271094766 TSecr=2740708411
Client           Switch             SSHv2    87     1               Client: Protocol (SSH-2.0-OpenSSH_8.8)
Switch           Client             TCP      66     1               22 → 50604 [ACK] Seq=1 Ack=22 Win=66586 Len=0 TSval=2740708516 TSecr=4271094767
Switch           Client             SSHv2    87     1               Server: Protocol (SSH-2.0-OpenSSH_6.9)
Client           Switch             TCP      66     22              50604 → 22 [ACK] Seq=22 Ack=22 Win=64256 Len=0 TSval=4271095015 TSecr=2740708659
Client           Switch             SSHv2    1426   22              Client: Key Exchange Init
Switch           Client             TCP      1514   22              22 → 50604 [ACK] Seq=22 Ack=1382 Win=65248 Len=1448 TSval=2740708692 TSecr=4271095015 [TCP segment of a reassembled PDU]
Client           Switch             TCP      66     1382            50604 → 22 [ACK] Seq=1382 Ack=1470 Win=64128 Len=0 TSval=4271095046 TSecr=2740708692
Switch           Client             SSHv2    266    1470            Server: Key Exchange Init
Client           Switch             TCP      66     1382            50604 → 22 [ACK] Seq=1382 Ack=1670 Win=64128 Len=0 TSval=4271095047 TSecr=2740708692
Client           Switch             SSHv2    114    1382            Client: Elliptic Curve Diffie-Hellman Key Exchange Init (Type 30)
>Switch           Client             TCP      66     1670            22 → 50604 [FIN, ACK] Seq=1670 Ack=1430 Win=66608 Len=0 TSval=2740708705 TSecr=4271095048
Client           Switch             TCP      66     1430            50604 → 22 [FIN, ACK] Seq=1430 Ack=1671 Win=64128 Len=0 TSval=4271095062 TSecr=2740708705
Switch           Client             TCP      66     1671            22 → 50604 [ACK] Seq=1671 Ack=1431 Win=66606 Len=0 TSval=2740708710 TSecr=4271095062

Looking at a packet capture, marked with the > is when the switch sends a [FIN, ACK]. Which co-insides at the point we’re seeing in the SSH debug.

Below is another capture, but using a working session. So there should be a Server: Elliptic Curve Diffie-Hellman Key Exchange Reply, New Keys reply that we’re not receiving.

Source           Destination         Protocol Length Sequence Number Info
Client           Switch              TCP      74     0               50606 → 22 [SYN] Seq=0 Win=64240 Len=0 MSS=1460 SACK_PERM=1 TSval=4272246787 TSecr=0 WS=128
Switch           Client              TCP      78     0               22 → 50606 [SYN, ACK] Seq=0 Ack=1 Win=65535 Len=0 MSS=1460 WS=2 TSval=2741860439 TSecr=4272246787 SACK_PERM=1
Client           Switch              TCP      66     1               50606 → 22 [ACK] Seq=1 Ack=1 Win=64256 Len=0 TSval=4272246789 TSecr=2741860439
Client           Switch              SSHv2    87     1               Client: Protocol (SSH-2.0-OpenSSH_8.8)
Switch           Client              TCP      66     1               22 → 50606 [ACK] Seq=1 Ack=22 Win=66586 Len=0 TSval=2741860543 TSecr=4272246790
Switch           Client              SSHv2    87     1               Server: Protocol (SSH-2.0-OpenSSH_6.9)
Client           Switch              TCP      66     22              50606 → 22 [ACK] Seq=22 Ack=22 Win=64256 Len=0 TSval=4272247042 TSecr=2741860691
Client           Switch              SSHv2    1258   22              Client: Key Exchange Init
Switch           Client              TCP      1514   22              22 → 50606 [ACK] Seq=22 Ack=1214 Win=65416 Len=1448 TSval=2741860723 TSecr=4272247042 [TCP segment of a reassembled PDU]
Switch           Client              SSHv2    266    1470            Server: Key Exchange Init
Client           Switch              TCP      66     1214            50606 → 22 [ACK] Seq=1214 Ack=1470 Win=64128 Len=0 TSval=4272247074 TSecr=2741860723
Client           Switch              TCP      66     1214            50606 → 22 [ACK] Seq=1214 Ack=1670 Win=64000 Len=0 TSval=4272247074 TSecr=2741860724
Client           Switch              SSHv2    114    1214            Client: Elliptic Curve Diffie-Hellman Key Exchange Init (Type 30)
Switch           Client              TCP      66     1670            22 → 50606 [ACK] Seq=1670 Ack=1262 Win=66608 Len=0 TSval=2741860830 TSecr=4272247075
Switch           Client              SSHv2    274    1670            Server: Elliptic Curve Diffie-Hellman Key Exchange Reply, New Keys (Type 31)
Client           Switch              TCP      66     1262            50606 → 22 [ACK] Seq=1262 Ack=1878 Win=64128 Len=0 TSval=4272247252 TSecr=2741860902
Client           Switch              SSHv2    82     1262            Client: New Keys
Switch           Client              TCP      66     1878            22 → 50606 [ACK] Seq=1878 Ack=1278 Win=66608 Len=0 TSval=2741861008 TSecr=4272247255
Client           Switch              SSHv2    134    1278            Client: Encrypted packet (len=68)

Time to start digging into how an SSH handshake works and looking at RFC docs, section 3. Also a good post about the 1. handshake.

So briefly an SSH handshake has the following steps(See link above for more details: 1):

  1. SSH version exchange(simple: SSH-2.0-OpenSSH_8.8)
  2. Key Exchange (Client and Server exchange all info like algorithms - referred to as a KEKINIT - see sshdebug: SSH2_MSG_KEXINIT sent. Also viewable in a packet capture!)
  3. Elliptic Curve Diffie-Hellman Init (Client sends public key)(Compare the debug output and packet capture - see sshdebug: send packet: type 30)
  4. Elliptic Curve Diffie-Hellman Reply (Server generates keys using clients key + a lot more)(Expecting the reply from the switch - see sshdebug: receive packet: type 31)
  5. Client/Server New Keys (Generate new distinct keys for encryption and integrity)

Off the bat, we can now see that the switch doesn’t reply at step 4 - instead just closes the connection with FIN. We’re expecting the switch to reply with the information to setup the connection. Step 4 is very important as there’s a lot of moving parts required to create keys and secrets. If that fails, it’s game over.

Try different algorithms?

So, next we know that other clients can still connect to the switch(I used them in the working excerpts above) such as Putty. But not a Linux(Fedora 36) based SSH client using OpenSSH. This leads me to looking at what different algorithms are being used between them.

Repeating the debug process, from a Windows 10 box, using Putty - but this time using a packet capture first because putty debug logging is bad. I noticed the cipher in Wireshark shows: SSH Version 2 (encryption:aes256-ctr mac:hmac-sha2-256 compression:none) for the SSH info after the switch replied(Step 4). Checking the capture from the Linux box shows it’s empty because the switch never replied with an agreed cipher(Step 4.Elliptic Curve Diffie-Hellman Reply) - Wireshark therefore didn’t include it. So it looks like aes256-ctr works and aes256-gcm@openssh.com doesn’t.

Trying a different cipher from the Fedora box results in a working connection: ssh -c chacha20-poly1305@openssh.com user@IP

Using ssh -Q cipher to see what algorithms your client supports is also handy here if you don’t want to debug the SSH output.

Conclusion

Ultimately, now to connect via SSH from a Linux box I use above modified ssh parameters as the workaround. You can also manually set it to use that cipher in the ssh_config for this particular host.

Why it’s failing at step 4, isn’t clear and might be something due to Juniper’s side. While digging around, I came across a mailing list post about this exact same issue. They came to the same conclusion that using the aes128/256-gcm@openssh.com cipher is causing the drama. Interesting to note, that apparently JunOS 12.3R12-S12 doesn’t include the aes128/256-gcm ciphers while JunOS 12.3R12-S13.1 and above, does. (This switch is running S15)

If it is on the Juniper side that also makes it harder to debug but worth a look further down the line after getting more familiar with JunOS. It does have a CLI and works based on FreeBSD. Quickly looking, there’s what appears to be the sshd_conf in /var/etc that might accept the use of LogLevel option. But it’s in production and just changing the ciphers on the client side isn’t a bad workaround.

This was still a good rabbit hole to go down in the end. I learnt how to debug SSH, how it works, what the client and server expect in steps with a debug and packet capture to go along with it.

Bonus content

We can also use handy NMAP to show all the available keys, ciphers and macs offered from the target device.

nmap -sV --script ssh2-enum-algos -p 22 <IP>

Nmap scan report for <IP>
Host is up (0.0030s latency).
PORT   STATE SERVICE VERSION
22/tcp open  ssh     OpenSSH 6.9 (protocol 2.0)
| ssh2-enum-algos: 
|   kex_algorithms: (8)
|       curve25519-sha256@libssh.org
|       ecdh-sha2-nistp256
|       ecdh-sha2-nistp384
|       ecdh-sha2-nistp521
|       diffie-hellman-group-exchange-sha256
|       diffie-hellman-group14-sha1
|   server_host_key_algorithms: (4)
|       ssh-rsa
|       ecdsa-sha2-nistp256
|       ssh-ed25519
|   encryption_algorithms: (16)
|       chacha20-poly1305@openssh.com
|       aes192-ctr
|       aes256-ctr
|       aes128-gcm@openssh.com
|       aes256-gcm@openssh.com
|   mac_algorithms: (19)
|       hmac-sha2-256-etm@openssh.com
|       hmac-sha2-512-etm@openssh.com
|       umac-64@openssh.com
|       umac-128@openssh.com
|       hmac-sha2-256
|       hmac-sha2-512
|       hmac-md5-etm@openssh.com
|   compression_algorithms: (2)
|       none
|_      zlib@openssh.com

Excellent how SSH works Unable to SSH or Telnet to EX switch - Juniper Knowledge base A handy comparison chart site I found for compatibility of ciphers
Mozilla OpenSSH recommendations

A note for future devices, there’s a lot of older legacy algorithms disabled. Enabling only trusted:.

For JunOS EX2200c, the commands are:
edit system services ssh
set ciphers [ "chacha20-poly1305@openssh.com" aes256-ctr aes192-ctr aes128-ctr ]

set macs [ hmac-sha2-512 "hmac-sha2-512-etm@openssh.com" hmac-sha2-256 "hmac-sha2-256-etm@openssh.com" ]

set key-exchange [ curve25519-sha256 ecdh-sha2-nistp521 ecdh-sha2-nistp384 ecdh-sha2-nistp256 ]

See also