Inspect ANSI control codes and escape sequences

96 points by webpro 4 days ago

SpaceL10n 11 hours ago

The things they don't prepare you for in school...

I was working at my first job and we had a ColdFusion app that was displaying some data from the database. I get a ticket one day saying our search page would crash when searching for a very specific document. The other 1 million+ documents all loaded fine to our knowledge, so why this one?

I was pretty junior back then and feeling mighty defeated as to why I couldn't figure it out. I debugged every single line and condition, trying to find some reason. After ruling out the code as a culprit, I took the data we were loading and placed it into Notepad++. Don't remember why exactly. I was wracking my brain trying to come up with explanation and lazily moving the text cursor left and right through the text, mostly out of boredom and despair.

That's when I noticed that I had pressed the right arrow key in my keyboard and the text cursor position hadn't changed! I pressed it again and nothing. Again, nothin. It took eight key presses to move the text cursor from one letter in a word to the adjacent letter. I was utterly bamboozled. Why was the text cursor getting stuck in the middle of this word?!

Shortly thereafter, I discovered "Show all hidden characters" setting in the menu. I toggled it and sure enough there were little black boxes with weird three letter strings in them. NUL, ESC, and others - right where my cursor was getting hung up.

That was the day I learned about ANSI control characters and the importance of data sanitization.

wpm 6 hours ago

Similarly, I once copied a shell script out of Slack and saw a bunch of red errors from my IDE when I pasted it in. The errors were on every line that had a new line on it. The error was "â: Command not found", despite there being no such character on the line.
Pasted it into a hex editor, tracked down the bytes, and while I can't currently remember specifically what the encoding problem was, it was something to do with going between UTF-8 > ISO-8859 > UTF-8 again.
I've since aliased `pbpaste | xxd` (macOS, linux has similar CLI tools for working with the clipboard depending on your distro/DE), because weird shit like this comes up more often than I'd care to admit. Last rabbit hole was discovering that in macOS 15, Apple changed one of the the "space" characters in the default Screenshot file names, but only if your Mac is set to use 12-hour time from a normal ASCII 0x20 space, to a Unicode 'U+202F NNBSP' "non-breaking space" between the time and AM/PM, which was causing S3 uploads to fail.
txdv 8 hours ago

There are also zero-space width characters, yeah utf is a rabbit hole
- webpro 7 hours ago
  
  Emojis and other unicode characters may or may not be rendered as a single-width character. I've been splitting hairs and strings.
  The tool currently counts any unicode character as a single-width one.
ddd34drf3 5 hours ago

CudaText is better than Notepad++ in this regard. It shows ASCII control chars always. The option for "unprinted chars" only hides "arrows" over spaces/tabs.

webpro 4 days ago

Working with and debugging ANSI control codes and escape sequences can be a challenge.

This free web-based tool helps to inspect the input, visualize colors and styling, and list control codes. By using a proper tokenizer and parser (not just regex hacks), it supports all sorts of control codes. The parser is open source and available too (find links in "about").

Type or paste text in the black text area, or try out the examples. Use the lookup table to filter & find specific codes.

Feedback welcome, I’d love to know what’s confusing, missing, or especially useful.

michaelmior 12 hours ago

Very cool! Seems like this should be a Show HN post.

tronster 11 hours ago

This is a fantastic web util; bookmarked for the future.

I wish I had this when I was making, [Dragon's Oven](https://tronster.itch.io/dragon). It was a lot of nights and weekends of tinkering with ANSI codes in Typescript. I learned a lot that surprised me, such as: most modern OS's still don't support 16m colors out of the box and that the default Linux shell doesn't support beyond 16 colors. Also no really good modern ANSI editors out there. I tried bringing back "TheDraw" in DosBOX for some art, but ended up using a mismatch of more modern utilities, false starting one of my own, and working on an image to ASCii/ANSI converter.

Maybe it's growing up in the BBS days, but something about ANSI is really charming.

prometheus76 10 hours ago

TheDraw was a cornerstone of my teenage years. I would log into different BBSs just to see their ANSI welcome screens, then I would try and re-create them to learn the art. It was a unique form of animation and I was hoping you had figured out how to get TheDraw working.
I also later used ANSI to make my own cool command line prompts in DOS and later, Linux.
- ForOldHack 5 hours ago
  
  Recreate them? we would steal the stream, save it, run it through a hex editor, while watching it draw in a separate window. It got to be just a work of wonder what people came up with, and then my friend got an Amiga, and those splash screens... omg...

112233 10 hours ago

"\u001b[0m — reset" ... what? Why SGR is not called by name, while, e.g. CUU is? strange... According to which terminal or standard it interperts sequences?

Is this tool really helpful? It does look nice! But it does not help with the corneriest cases that would benefit from such tool the most.

webpro 8 hours ago

Got to start somewhere! Didn't see many examples to get inspired by either. Here's the full table: https://ansi.tools/lookup. This is my initial take on it. Please bring in the corneriest cases! It's open source so bug reports, RFCs and pull requests are most welcome.
- 112233 6 hours ago
  
  This thing is made out of corner cases: https://www.invisible-island.net/vttest/
  I am sure capturing it's output will provide endless source of amusement and despair.
  There are sequences from real terminal (e.g. stuff documented at vt100.net), sequences from ECMA 48 and friends (most of it likely never implemented), and de-facto behaviour of different software. Infamous examples being original windows terminal, rxvt (ugh), linux co nsole, emacs terminal.
  Most vexing behaviour is background fill on newline, incorrect characters in terminal reports, broken scroll region, inability to write in bottom-right position etc.
  This project looks fun! But it leads to endless narrow abandoned places. Hopefully you will enjoy the experience!
- wonger_ 6 hours ago
  
  Ghostty the terminal emulator has a cell inspector feature along these same lines

mnurzia 12 hours ago

Neat tool, I could see this being handy for debugging TUI tools.

I noticed that it works with _escaped_ ESC characters ("\x1b", "\u001b", "\033") but it didn't recognize raw ESC characters that I had in my clipboard. It might be useful to support those (maybe highlight them similarly to how VS Code highlights whitespace characters). The characters show up as numbered unicode error glyphs (I'm on Firefox, if that helps)

webpro 8 hours ago

Thanks, this is great feedback. I'll see what I can do, stay tuned.
- webpro 4 hours ago
  
  Raw input should be cool now (there's "raw" in the examples as well)

taviso 9 hours ago

I've used the tool sequin in the past to debug issues: https://github.com/charmbracelet/sequin

It worked great for me, seems much easier to debug logs directly in the terminal.

webpro 8 hours ago

Thanks for sharing, haven't seen that one yet. Will see if I can borrow ideas from it.

ryan-c 12 hours ago

This is really cool - I've been experimenting with terminal escape sequences recently, and they go deep. Thanks for sharing! Get in touch (email in profile) if you'd like to collaborate.

webpro 8 hours ago

Thanks! It's all open source (including the tokenizer/parser), so feel free come collaborate on GitHub.

gwbas1c 9 hours ago

I was a teenager when BBS's were popular. I still sometimes think I would enjoy writing an ANSI parser.

webpro 5 hours ago

What would prevent you from starting? Could be fun :)
- Xss3 an hour ago
  
  Probably their free time budget
- gwbas1c 3 hours ago
  
  Time: There's other nerd projects I'd like to do.

JdeBP 12 hours ago

The revealing shibboleth is when people call it "ANSI". (-: "ANSI" is what people call it when they are working from paltry and incomplete samizdat doco of how this stuff works, from Microsoft's old ANSI.SYS appendix to its MS-DOS user manual, to innumerable modern WWW sites all repeating received wisdom.

The thing to remember is that the "E" in "ECMA" does not stand for "ANSI".

* https://ecma-international.org/publications-and-standards/st...

* https://www.itu.int/rec/T-REC-T.416-199303-I

If you read ECMA-35, you'll find that there's actually a whole system to escape sequences and control sequences. As I pointed out last month, it's often the case that people who haven't read ECMA-35 don't realize that parameter characters can be more than digits, don't handle intermediate characters, and don't grasp how DEC's question mark and SCO's equals sign fit into the overall picture. People who haven't read ECMA-48 and traced its history don't realize that there's subtlety to missing parameters in control sequences. And people who haven't read ITU/IEC T.416 do what many of us did years ago and get 24-bit colour wrong. Twice over.

* https://github.com/tattoy-org/tattoy/issues/105#issuecomment...

Other common errors include missing out on all of the other 7-bit aliases for C1 characters. Or not realising that the ECMA-35/ECMA-48 syntax allows for any control sequence to have sub-parameters, not just SGR. Or using regular expressions and pattern matching instead of a state machine. Only a state machine truly handles the fact that in the real world terminals allowed, and enacted, various C0 and C1 control characters in the middle of control sequences, as well as had ways of cancelling or restarting control sequences mid-sequence.

* https://github.com/jdebp/nosh/blob/trunk/source/ECMA48Decode...

But it gets even worse for a real world control sequence decoder.

In the real world, not only do terminals interpret the same control sequences, and their parameters, differently depending from whether the terminal is sending or receiving them; but several terminal emulators like the one in Interix, rxvt, the one built in to Linux, and even XTerm, send control sequences that not only break ECMA-35 but also conflict with received control sequences. So if one wants to be comprehensive and be cabable of decoding real data, one needs a switch to tell the program whether to decode the character stream as if it is being received by the terminal or as if it is being sent by the terminal.

* https://jdebp.uk/Softwares/nosh/guide/commands/console-decod...

Microsoft Terminal tries to do things properly, which many modern terminal emulators and tools do not, and handles this with two distinct entire state machines, one for input and one for output.

* https://github.com/microsoft/terminal/tree/main/src/terminal...

I handled it with a few goto statements and a handful of flags. (-:

* https://github.com/jdebp/nosh/blob/trunk/source/console-deco...

j4_james 3 hours ago

> "ANSI" is what people call it when they are working from paltry and incomplete samizdat doco of how this stuff works
People just use "ANSI" as a shorthand for ANSI X3.64-1979. And that was the standard that DEC used for their VT100+ range of terminals, which in turn became the de facto standard from which most modern terminal emulators are derived. If you read the DEC documentation, you'll find many references to "ANSI standard", "ANSI controls", "ANSI colors", etc. I don't think this is because they were ignorant of the subject matter, considering that they were members of the committee that produced that standard.
And ECMA-48 is essentially just the European equivalent of ANSI X3.64, and was developed in parallel. But obviously an American company like DEC or Microsoft would more likely be working from the American version of the standard rather than the European one.
blueflow 12 hours ago

I think this rant is out-of-place here, type "\x1b[:<=>$t" and check for yourself. It parses correctly. You do learn about the allowed character ranges for CSI sequences from ECMA-48 only, not from the Microsoft docs, so i guess the author did their homework.
- JdeBP 11 hours ago
  
  That tells me that you are writing from ignorance, as for starters that's a truly pathetic test that even misses one of the characters that I explicitly mentioned above, let alone thoroughly tests the full range that the specs define. I had an actual poke around the parser code, in contrast to your superficial experimentation. (-: One can, with knowledge, actually find the point where the only three unusual characters that you in fact tested are special cased.
  - blueflow 10 hours ago
    
    They are not special cased:
    https://github.com/webpro/ANSI.tools/blob/main/packages/parser/src/parsers/csi.ts#L12
    The comment correctly identifies the 0x30-0x3f range as parameter bytes and the following as intermediate bytes. Both the range and the names for the bytes are matching ECMA-48 Chapter 5.4.
    But you seem to think that everyone except yourself is incompetent, are you trying to make up for something?
    
    webpro 8 hours ago
    
    Thanks. Agreed. The way I see it, ignore the noise and there might be something in there.
    
    JdeBP 9 hours ago
    
    Of course they are. There's a file with all of the special cased constants in, named constants.ts.
    Your superficial test tested all three of the special cases in the PRIVATE_OPENERS array, which is what the parser.ts code actually checks. DEC's question mark, which is special cased yet further off on its own, is in reality another "private opener", too, and it isn't limited to DEC (e.g. XQTMODKEYS), and neither does DEC not use the other non-digit parameter characters (e.g. DECDA3).
    (There's a hypothesis that DEC's own state machine didn't care where these marker characters were, as it was a simple state machine that had to fit in ROM and probably just set a bitflag. A mistake that we're probably all still making is assuming that they only take effect when in the very first position.)
    STRING_OPENERS is another widespread special casing that people do, treating ESC plus a few characters as special rather than handling all of the 7-bit aliases for the C1 characters as the general case.
    You seem to think that people who share what the mistakes are and where they themselves have made these very mistakes over the years, to help other people not make them and so that the world continues to remember this hard-learned stuff, is somehow worthy of ad hominems, straw men, insults, and vilification right off the bat. That's a very poor show and you should be ashamed.
    
    clucas 8 hours ago
    
    > people who share what the mistakes are and where they themselves have made these very mistakes over the years, to help other people not make them and so that the world continues to remember this hard-learned stuff
    But then we have this in your post:
    > That tells me that you are writing from ignorance, as for starters that's a truly pathetic test
    and
    > I had an actual poke around the parser code, in contrast to your superficial experimentation.
    Perhaps you really did intend for these lines to be helpful and informative? If so, I encourage you to have a moment of empathy for your interlocutor and ask yourself if talking this way is actually the best way to communicate and pass on this hard-earned knowledge.
    > ad hominems, straw men, insults, and vilification
    I didn't see this from the other poster. I did see it from you. As a disinterested third party, I'm just telling you, you come off way worse in this exchange. Good luck out there buddy.
    
    webpro 8 hours ago
    
    That's some interesting feedback, thanks for sharing. I'll see what I can extract and apply from it. Please bear with me, this is only my initial take on the whole concept (and as you point out, it isn't that trivial). Didn't have much examples to be inspired by, but we're on our way anyway.
ForOldHack 5 hours ago

Ill never forget the comments in termcap: "Brain dead", "Very Brain dead" and "Brain? What brain!" I think most of that was terminals that CTEOL Clear to end of line was just garbage.
We just knew that at some point in time, all the Hazeltine terminals were going to end up in the garbage, which is what they deserved, and no one would rescue them.
https://www.shallowsky.com/linux/noaltscreen.html
The parent post is SOLID GOLD.

FlyingAvatar 8 hours ago

I would have loved this in 1993. Not that I don't now, but I would have had a real use for it then.

webpro 8 hours ago

At least I tried to make it look like a 1993 website

imran9m 6 hours ago

Nice. This is helpful for making jenkis ci output colorful!!

codesnik 11 hours ago

I wonder how many languages have nice looking "\e" for "\u001b". ruby, perl, bash, anything else?

teddyh 8 hours ago

No support for blinking text.

webpro 8 hours ago

The parser has, but not the HTML renderer indeed. Using a third-party lib for that currently, but noticed the limitations too. Might replace it with my own!
- teddyh 8 hours ago
  
  Great! Next step: torturetest.vt
  - webpro 8 hours ago
    
    There is https://github.com/webpro/ANSI.tools/blob/main/packages/pars... and others
  - ForOldHack 5 hours ago
    
    Does it crash the tester?
    https://invisible-island.net/ncurses/tctest.htm
    
    webpro 4 hours ago
    
    There is some recovery in the lexer, but would love to learn what would make it crash! The url you provide gives a 404.
    
    teddyh 3 hours ago
    
    s/htm/html/
    I.e. <https://invisible-island.net/ncurses/tctest.html>
ForOldHack 5 hours ago

Blinking text?
You do not know me, but believe me, I have a special skills that I have developed for many years to deal with people like you. And if I find you I will CREOL you.
- teddyh 3 hours ago
  
  Bring it on. I’ll feed you your own form, you insignificant NUL character.