If the author is reading this, I think a single repository would be more appropriate than multiple repos [1]. It would be nice to set things up so we can simply git pull, docker run, and execute the benchmarks for each language sequentially.
Something that stood out to me is the author’s conclusion that "Node.js wins." However, both the Node.js and C++ versions use the same library, uWebSockets! I suspect the actual takeaway is this:
"uWebSockets wins, and the uWebSockets authors know their library well enough that even their JavaScript wrapper outperforms my own implementation in plain C++ using the same library!" :-p
Makes me wonder if there’s something different that could be done in Go to achieve better performance. Alternatively, this may highlight which language/library makes it easier to do the right thing out of the box (for example, it seems easier to use uWebsockets in nodejs than in C++). TechEmpower controversies also come to mind, where "winning" implementations often don’t reflect how developers typically write code in a given language, framework, or library.
Their explanation for why Go performs badly didn't make any sense to me. I'm not sure if they don't understand how goroutines work, if I don't understand how goroutines work or if I just don't understand their explanation.
Also, in the end, they didn't use the JSON payload. It would have been interesting if they had just written a static string. I'm curious how much of this is really measuring JSON [de]serialization performance.
Finally, it's worth pointing out that WebSocket is a standard. It's possible that some of these implementations follow the standard better than others. For example, WebSocket requires that a text message be valid UTF8. Personally, I think that's a dumb requirement (and in my own websocket server implementation for Zig, I don't enforce this - if the application wants to, it can). But it's completely possible that some implementations enforce this and others don't, and that (along with every other check) could make a difference.
> Their explanation for why Go performs badly didn't make any sense to me.
To me, the whole paper is full of misunderstanding, at least the analysis. There's just speculation based on caricatures of the language, like "node is async", "c++ is low level" etc. The fact that their C++ impl using uWebSocket was significantly slower than then Node, which used uWebSocket bindings, should have led them to question the test setup (they probably used threads which defeats the purpose of uWebSocket.
Anyway.. The "connection time" is just HTTP handshake. It could be included as a side note. What's important in WS deployments are:
- Unique message throughput (the only thing measured afaik).
- Broadcast/"multicast" throughput, i.e. say you have 1k subscribers you wanna send the same message.
- Idle memory usage (for say chat apps that have low traffic - how many peers can a node maintain)
To me, the champion is uWebSocket. That's the entire reason why "Node" wins - those language bindings were written by the same genius who wrote that lib. Note that uWebSocket doesn't have TLS support, so whatever reverse proxy you put in front is gonna dominate usage because all of them have higher overheads, even nginx.
Interesting to note is that uWebSocket perf (especially memory footprint) can't be achieved even in Go, because of the goroutine overhead (there's no way in Go to read/write from multiple sockets from a single goroutine, so you have to spend 2 gorountines for realtime r/w). It could probably be achieved with Tokio though.
> I'm curious how much of this is really measuring JSON [de]serialization performance.
Well, they did use the standard library for that, so quite a bit, I suppose. That thing is slow. I've got no idea how fast those functions are in other languages, but you're right that it would ruin the idea behind the benchmark.
I was under the impression that the underlying net/http library uses a new goroutine for every connection, so each websocket gets its own goroutine. Or is there somewhere else you were expecting goroutines in addition to the one per connection?
http.ListenAndServe is implemented under the hood with a new goroutine per incoming connection. You don't have to explicitly use goroutines here, it's the default behaviour.
Yes _however_ the nodejs benchmark at least is handling each message asynchronously, whereas the go implementation is only handling connections asynchronously.
Edit to add: looks like the same goes for the c++ and rust implementations. So I think what we might be seeing in this benchmark (particularly the node vs c++ since it is the same library) is that asynchronously handling each message is beneficial, and the go standard libraries json parser is slow.
Edit 2: Actually I think the c++ version is async for each message! Dont know how to explain that then.
Well, tcp streams are purely sequential. It’s the ideal use case for a single process, since messages can’t be received out of order. There’s no computational advantage to “handling each message asynchronously” unless the message handling code itself does IO or something. And that’s not the responsibility of the websocket library.
Yeah I thought this looked familiar.. I went through this article about a year and a half ago when exploring WebSockets in Python for work. With some tuning and using a different libraries + libuv we were easily able to get similar performance to NodeJS.
I had a blog post somewhere to show the testing and results, but can't seem to find it at the moment though.
It's also interesting that https://github.com/websockets/ws does not appear in this study, given that in the node ecosystem it is ~3x more likely to be used (not a perfect measurement but ws has 28k github stars vs uWebSockets 8k stars)
Thanks for the free access links. I did read through a bit.
The title is misleading because exactly one implementation was chosen for each of the tested languages. They conclude “do not us e Python” because the Python websockets library performs pretty poorly.
Each language is scored based on the library chosen. I have to believe there are more options for some of these languages.
As someone who is implementing an Elixir LiveView app right now, I was particularly curious to see how Elixir performed given LiveViews reliance on websockets, but as Elixir didn’t make the cut.
Is this a peer reviewed paper? It does not seem to be. At a first glance, the researchgate URI and the way the title was formulated made me think it would be the case.
I was able to make a uWebsockets adapter for NestJS pretty easily. It's a bit sensitive of a library to integrate though, a single write when the connection is gone and you get a segfault, which means a lot of checking before writing if you've yielded since you last checked. This was a few years ago, perhaps they fixed that.
Was this published as-is to some sort of prominent CS journal? I honestly can't tell from the link. If that's the case, I'm very disappointed and would have a few choice words about the state of "academia".
The author couldn't tell why he didn't manage to make run the C or python program but figured it is probably the blame of the language for some obscure reasons.
He also mentioned that he should have implemented multithreading in C++ to be comparable with Node, but meh that's probably also not of his concern, let compare them as is ^^`
Also he doesn't mention the actual language of the library used, but that would have voided the interest of the article, so I quite may understand that omission :P
But at the end, nothing can be learned from this and it is hard to believe it is what "research" can produce
Yeah it’s a rubbish paper. It’s just a comparison of some websocket implementations at some particular point in time. It tells you how fast some of the fastest WS implementations are in absolute terms, but there are no broad conclusions you can make other than the fact that there’s more room for optimisation in a few libraries. Whoopty doo. News at 11.
If the author is reading this, I think a single repository would be more appropriate than multiple repos [1]. It would be nice to set things up so we can simply git pull, docker run, and execute the benchmarks for each language sequentially.
Something that stood out to me is the author’s conclusion that "Node.js wins." However, both the Node.js and C++ versions use the same library, uWebSockets! I suspect the actual takeaway is this:
"uWebSockets wins, and the uWebSockets authors know their library well enough that even their JavaScript wrapper outperforms my own implementation in plain C++ using the same library!" :-p
Makes me wonder if there’s something different that could be done in Go to achieve better performance. Alternatively, this may highlight which language/library makes it easier to do the right thing out of the box (for example, it seems easier to use uWebsockets in nodejs than in C++). TechEmpower controversies also come to mind, where "winning" implementations often don’t reflect how developers typically write code in a given language, framework, or library.
--
1: https://github.com/matttomasetti?tab=repositories&q=websocke...
Their explanation for why Go performs badly didn't make any sense to me. I'm not sure if they don't understand how goroutines work, if I don't understand how goroutines work or if I just don't understand their explanation.
Also, in the end, they didn't use the JSON payload. It would have been interesting if they had just written a static string. I'm curious how much of this is really measuring JSON [de]serialization performance.
Finally, it's worth pointing out that WebSocket is a standard. It's possible that some of these implementations follow the standard better than others. For example, WebSocket requires that a text message be valid UTF8. Personally, I think that's a dumb requirement (and in my own websocket server implementation for Zig, I don't enforce this - if the application wants to, it can). But it's completely possible that some implementations enforce this and others don't, and that (along with every other check) could make a difference.
> Their explanation for why Go performs badly didn't make any sense to me.
To me, the whole paper is full of misunderstanding, at least the analysis. There's just speculation based on caricatures of the language, like "node is async", "c++ is low level" etc. The fact that their C++ impl using uWebSocket was significantly slower than then Node, which used uWebSocket bindings, should have led them to question the test setup (they probably used threads which defeats the purpose of uWebSocket.
Anyway.. The "connection time" is just HTTP handshake. It could be included as a side note. What's important in WS deployments are:
- Unique message throughput (the only thing measured afaik).
- Broadcast/"multicast" throughput, i.e. say you have 1k subscribers you wanna send the same message.
- Idle memory usage (for say chat apps that have low traffic - how many peers can a node maintain)
To me, the champion is uWebSocket. That's the entire reason why "Node" wins - those language bindings were written by the same genius who wrote that lib. Note that uWebSocket doesn't have TLS support, so whatever reverse proxy you put in front is gonna dominate usage because all of them have higher overheads, even nginx.
Interesting to note is that uWebSocket perf (especially memory footprint) can't be achieved even in Go, because of the goroutine overhead (there's no way in Go to read/write from multiple sockets from a single goroutine, so you have to spend 2 gorountines for realtime r/w). It could probably be achieved with Tokio though.
> I'm curious how much of this is really measuring JSON [de]serialization performance.
Well, they did use the standard library for that, so quite a bit, I suppose. That thing is slow. I've got no idea how fast those functions are in other languages, but you're right that it would ruin the idea behind the benchmark.
They didn’t use goroutines, which is explains the poor perf. https://github.com/matttomasetti/Go-Gorilla_Websocket-Benchm...
Also, this paper is from Feb 2021.
I was under the impression that the underlying net/http library uses a new goroutine for every connection, so each websocket gets its own goroutine. Or is there somewhere else you were expecting goroutines in addition to the one per connection?
Which is perfectly fine. However, you will be able to process only a single message per connection at once.
What you would do in go is:
- either a new goroutine per message
- or installing a worker pool with a predefined goroutine size accepting messages for processing
Another option is to have a read-, and a write-pump goroutine associated with each gorilla ws client. I found this useful for gateways wss <--> *.
http.ListenAndServe is implemented under the hood with a new goroutine per incoming connection. You don't have to explicitly use goroutines here, it's the default behaviour.
Yes _however_ the nodejs benchmark at least is handling each message asynchronously, whereas the go implementation is only handling connections asynchronously.
The client fires off all the requests before waiting for a response: https://github.com/matttomasetti/NodeJS_Websocket-Benchmark-... so the comparison isn't quite apples to apples.
Edit to add: looks like the same goes for the c++ and rust implementations. So I think what we might be seeing in this benchmark (particularly the node vs c++ since it is the same library) is that asynchronously handling each message is beneficial, and the go standard libraries json parser is slow.
Edit 2: Actually I think the c++ version is async for each message! Dont know how to explain that then.
Well, tcp streams are purely sequential. It’s the ideal use case for a single process, since messages can’t be received out of order. There’s no computational advantage to “handling each message asynchronously” unless the message handling code itself does IO or something. And that’s not the responsibility of the websocket library.
Yeah I thought this looked familiar.. I went through this article about a year and a half ago when exploring WebSockets in Python for work. With some tuning and using a different libraries + libuv we were easily able to get similar performance to NodeJS.
I had a blog post somewhere to show the testing and results, but can't seem to find it at the moment though.
Interesting that https://github.com/uNetworking/uWebSockets.js (which is C++ with node bindings) outperforms the raw C++ uWebSockets implementation.
It's also interesting that https://github.com/websockets/ws does not appear in this study, given that in the node ecosystem it is ~3x more likely to be used (not a perfect measurement but ws has 28k github stars vs uWebSockets 8k stars)
Thanks for the free access links. I did read through a bit.
The title is misleading because exactly one implementation was chosen for each of the tested languages. They conclude “do not us e Python” because the Python websockets library performs pretty poorly.
Each language is scored based on the library chosen. I have to believe there are more options for some of these languages.
As someone who is implementing an Elixir LiveView app right now, I was particularly curious to see how Elixir performed given LiveViews reliance on websockets, but as Elixir didn’t make the cut.
Was also surprised they omitted Elixir/Erlang from the list of languages. Crazy considering how many messaging apps use OTP on the backend.
Is this a peer reviewed paper? It does not seem to be. At a first glance, the researchgate URI and the way the title was formulated made me think it would be the case.
(2021) Was surprised it used a depreciated Rust crate until I noticed how out of date it is
Not including Swift in such a research seems to be a big oversight to me.
Too bad that uWebsockets was used for Node because a lot of higher level libraries are built on top of https://www.npmjs.com/package/ws
I was able to make a uWebsockets adapter for NestJS pretty easily. It's a bit sensitive of a library to integrate though, a single write when the connection is gone and you get a segfault, which means a lot of checking before writing if you've yielded since you last checked. This was a few years ago, perhaps they fixed that.
I'd like to know why Elixir and Erlang were excluded.
Seems the author went silent after this, maybe he decided to run a café or something instead.
Was this published as-is to some sort of prominent CS journal? I honestly can't tell from the link. If that's the case, I'm very disappointed and would have a few choice words about the state of "academia".
Yes, that would be concerning indeed...
The author couldn't tell why he didn't manage to make run the C or python program but figured it is probably the blame of the language for some obscure reasons.
He also mentioned that he should have implemented multithreading in C++ to be comparable with Node, but meh that's probably also not of his concern, let compare them as is ^^`
Also he doesn't mention the actual language of the library used, but that would have voided the interest of the article, so I quite may understand that omission :P
But at the end, nothing can be learned from this and it is hard to believe it is what "research" can produce
Yeah it’s a rubbish paper. It’s just a comparison of some websocket implementations at some particular point in time. It tells you how fast some of the fastest WS implementations are in absolute terms, but there are no broad conclusions you can make other than the fact that there’s more room for optimisation in a few libraries. Whoopty doo. News at 11.
The SSRN link doesn’t have a login-wall: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3778525
Thanks! Here's the direct link to the ungated PDF: https://download.ssrn.com/21/02/03/ssrn_id3778525_code456891...
TLDR; NodeJS is the clear winner, and Python far and away the worst of the bunch.
The DX for websockets in Go(gorilla) is horrible. But i do not believe these numbers one bit.