Welcome! Please see the About page for a little more info on how this works.

+5 votes
in Errors by

Background

I saw reddit discussion about error messages and one user's comment about signal-to-noise ratio resonated with me and got me thinking about how stack traces are presented to the user at the REPL.

Context

I had a look at various developments in this area, including:
- Stuart Holloway's opinion that stack traces are always the right length
- clj-stacktrace library
- pretty library
- clojure archive page with design iterations around stack traces, including this snippet

My thoughts on this is that existing libraries are either try to hide too much (pretty) or mostly concerned with presentation/formatting and not the actual contents (clj-stacktrace). I generally agree with Stuart that stack traces are the right length because every stack element is truth, an unambiguous description of the stack of methods that were executing when exception happened.

Scope

This discussion is about stack traces presented to programmer at the REPL, think Throwable->map and clojure.core/print-throwable, not re-wrapping all thrown exceptions.

Problem

The purpose of stack trace is to present useful information about executed methods that give context to exception. I think that one aspect that makes this information useful is ability to see/navigate to code that led to an exception. In case of functions defined in Clojure code, there are no methods/enclosing classes to see/navigate to since bytecode for those is emitted dynamically from function definitions. In addition to that, stack traces of what semantically is a single function call usually contain multiple elements where some of those elements point to a line where the function is defined, which is also not useful.

Illustration

Given a file repro.clj (I included line numbers):

1 (defn foo []
2   (/ 1 0))
3 
4 (defn bar []
5   (foo))
6 
7 (bar)

Loading this file in the REPL and showing the error will show the following stack trace:

clj
Clojure 1.10.1
user=> (load-file "repro.clj")
Syntax error (ArithmeticException) compiling at (C:\Users\Vlaaad\Projects\vlaaad.github.io\repro.clj:7:1).
Divide by zero
user=> *e
#error {
 :cause "Divide by zero"
 :via
 [{:type clojure.lang.Compiler$CompilerException
   :message "Syntax error compiling at (C:\\Users\\Vlaaad\\Projects\\vlaaad.github.io\\repro.clj:7:1)."
   :data #:clojure.error{:phase :compile-syntax-check, :line 7, :column 1, :source "C:\\Users\\Vlaaad\\Projects\\vlaaad.github.io\\repro.clj"}
   :at [clojure.lang.Compiler load "Compiler.java" 7648]}
  {:type java.lang.ArithmeticException
   :message "Divide by zero"
   :at [clojure.lang.Numbers divide "Numbers.java" 188]}]
 :trace
 [[clojure.lang.Numbers divide "Numbers.java" 188]
  [clojure.lang.Numbers divide "Numbers.java" 3901]
  [user$foo invokeStatic "repro.clj" 2]
  [user$foo invoke "repro.clj" 1]
  [user$bar invokeStatic "repro.clj" 5]
  [user$bar invoke "repro.clj" 4]
  [user$eval140 invokeStatic "repro.clj" 7]
  [user$eval140 invoke "repro.clj" 7]
  [clojure.lang.Compiler eval "Compiler.java" 7177]
  [clojure.lang.Compiler load "Compiler.java" 7636]
  [clojure.lang.Compiler loadFile "Compiler.java" 7574]
  [clojure.lang.RT$3 invoke "RT.java" 327]
  [user$eval1 invokeStatic "NO_SOURCE_FILE" 1]
  [user$eval1 invoke "NO_SOURCE_FILE" 1]
  [clojure.lang.Compiler eval "Compiler.java" 7177]
  [clojure.lang.Compiler eval "Compiler.java" 7132]
  [clojure.core$eval invokeStatic "core.clj" 3214]
  [clojure.core$eval invoke "core.clj" 3210]
  [clojure.main$repl$read_eval_print__9086$fn__9089 invoke "main.clj" 437]
  [clojure.main$repl$read_eval_print__9086 invoke "main.clj" 437]
  [clojure.main$repl$fn__9095 invoke "main.clj" 458]
  [clojure.main$repl invokeStatic "main.clj" 458]
  [clojure.main$repl_opt invokeStatic "main.clj" 522]
  [clojure.main$main invokeStatic "main.clj" 667]
  [clojure.main$main doInvoke "main.clj" 616]
  [clojure.lang.RestFn invoke "RestFn.java" 397]
  [clojure.lang.AFn applyToHelper "AFn.java" 152]
  [clojure.lang.RestFn applyTo "RestFn.java" 132]
  [clojure.lang.Var applyTo "Var.java" 705]
  [clojure.main main "main.java" 40]]}

Let's focus on the stack trace elements that belong to repro.clj:

[clojure.lang.Numbers divide "Numbers.java" 188]
[clojure.lang.Numbers divide "Numbers.java" 3901]
[user$foo invokeStatic "repro.clj" 2]
[user$foo invoke "repro.clj" 1]
[user$bar invokeStatic "repro.clj" 5]
[user$bar invoke "repro.clj" 4]
[user$eval140 invokeStatic "repro.clj" 7]
[user$eval140 invoke "repro.clj" 7]

Every Clojure function call in code has 2 corresponding stack elements: invoke and invokeStatic. Method invoke for classes user$foo and user$bar point to lines 1 and 4. If you look at the repro.clj, you'll notice these are the lines where function foo and bar are defined — this information is not useful, and generally stack traces for java code don't point to lines where methods are defined, only to lines inside the method bodies that invoke methods higher on the stack. While it is possible to use interop to call .invoke directly as this is defined in IFn, invokeStatic is a method emitted by the compiler and arguably is an implementation detail.

Possible solutions?

Do nothing

Stuart might be right — if I am debugging bytecode-related issue, I would certainly be interested in these details. Also, some future version of Clojure might emit function bytecode differently, leading to a fewer amount of more readable stack traces.

Merge stack elements

This approach merges stack elements of a same function that keeps only public invoke methods visible and reports lines that actually did the call to next method in the stack. This is a pretty minimal filtering that will make stacktrace for the repro exception look like that:

[clojure.lang.Numbers divide "Numbers.java" 188]
[clojure.lang.Numbers divide "Numbers.java" 3901]
[user$foo invoke "repro.clj" 2]
[user$bar invoke "repro.clj" 5]
[user$eval140 invoke "repro.clj" 7]
[clojure.lang.Compiler eval "Compiler.java" 7177]
[clojure.lang.Compiler load "Compiler.java" 7636]
[clojure.lang.Compiler loadFile "Compiler.java" 7574]
[clojure.lang.RT$3 invoke "RT.java" 327]
[user$eval1 invoke "NO_SOURCE_FILE" 1]
[clojure.lang.Compiler eval "Compiler.java" 7177]
[clojure.lang.Compiler eval "Compiler.java" 7132]
[clojure.core$eval invoke "core.clj" 3214]
[clojure.main$repl$read_eval_print__9086$fn__9089 invoke "main.clj" 437]
[clojure.main$repl$fn__9095 invoke "main.clj" 458]
[clojure.main$repl invoke "main.clj" 458]
[clojure.main$repl_opt invoke "main.clj" 522]
[clojure.main$main invoke "main.clj" 667]
[clojure.lang.RestFn invoke "RestFn.java" 397]
[clojure.lang.AFn applyToHelper "AFn.java" 152]
[clojure.lang.RestFn applyTo "RestFn.java" 132]
[clojure.lang.Var applyTo "Var.java" 705]
[clojure.main main "main.java" 40]

I think this already does a decent job at reducing signal-to-noise ratio, and if you debug bytecode-related issues, you can still get the full stack trace using .getStackTrace on an exception object.

Make Clojure stack elements resemble code

A part of Clojure philosophy is to prefer functions over methods, perhaps stack elements pointing to Clojure functions should look like functions and not methods? Most of the time there is no .invoke interop in code. I would prefer to act upon user/foo, not user$foo invoke, because this is something I can actually see in my code. One might argue that even stack elements like clojure.lang.AFn applyToHelper and clojure.lang.RestFn invoke are also unnecessary since they are implementation details that make any Clojure functions invokable with variadic arguments (there is some filtering of those details already!). There are many unknowns in this direction that I didn't try to answer because I'm not sure if it's worth pursuing, but here is an illustration of how stack traces might look in that case:

[clojure.lang.Numbers.divide "Numbers.java" 188]
[clojure.lang.Numbers.divide "Numbers.java" 3901]
[user/foo "repro.clj" 2]
[user/bar "repro.clj" 5]
[user/eval140 "repro.clj" 7]
[clojure.lang.Compiler.eval "Compiler.java" 7177]
[clojure.lang.Compiler.load "Compiler.java" 7636]
[clojure.lang.Compiler.loadFile "Compiler.java" 7574]
[clojure.lang.RT$3.invoke "RT.java" 327]
[user/eval1 "NO_SOURCE_FILE" 1]
[clojure.lang.Compiler.eval "Compiler.java" 7177]
[clojure.lang.Compiler.eval "Compiler.java" 7132]
[clojure.core/eval "core.clj" 3214]
[clojure.main/repl/read-eval-print--9086/fn--9089 "main.clj" 437]
[clojure.main/repl/fn--9095 "main.clj" 458]
[clojure.main/repl "main.clj" 458]
[clojure.main/repl-opt "main.clj" 522]
[clojure.main/main "main.clj" 667]
[clojure.main.main "main.java" 40]

Some open questions:
- representing Clojure function calls as single symbol instead of a pair class+method asks to represent java method calls as single symbols, how should those look like? clojure.lang.Compiler/eval vs clojure.lang.Compiler.eval vs something else?
- how to show method calls for deftypes/reifies? those are both Clojure and methods...
- are Var/AFn/RestFn stack elements truly irrelevant?

Something else

There might be other options...

What do you think?

2 Answers

+1 vote
by
selected by
 
Best answer

Per Kevin's comment, I do really want to see the actual stack trace so I don't think that should go away. But it's also useful to think about tools that can be applied to stack traces to give you better insight. That said, a lot of people have poured time into this and the results are not (imo) better than the original. I say that not to throw water on it, but just to note that it seems easy but the details are complicated.

I should mention that you happen to have chosen a scenario that is tickling a known bug in load https://clojure.atlassian.net/browse/CLJ-2529 where the phase is being incorrectly reported (here reached from a top-level form eval). While this is obviously possible, you would more typically have gotten this by having no top-level call of bar and putting this code in a namespace:

(ns repro)

(defn foo []
  (/ 1 0))

(defn bar []
  (foo))

and then put repro in your sources and called:

% clj
Clojure 1.10.3
user=> (require 'repro)
nil
user=> (repro/bar)
Execution error (ArithmeticException) at repro/foo (repro.clj:4).
Divide by zero

I think you should also note that Clojure provides a function to print stack traces (pst) that does some level of clean up for you (in some cases it will even elide top-level frames involved in the error reporting as well - that was tweaked in 1.10.0 to improve a number of cases):

user=> (pst *e)
ArithmeticException Divide by zero
	clojure.lang.Numbers.divide (Numbers.java:188)
	clojure.lang.Numbers.divide (Numbers.java:3901)
	repro/foo (repro.clj:4)
	repro/foo (repro.clj:3)
	repro/bar (repro.clj:7)
	repro/bar (repro.clj:6)
	user/eval150 (NO_SOURCE_FILE:1)
	user/eval150 (NO_SOURCE_FILE:1)
	clojure.lang.Compiler.eval (Compiler.java:7181)
	clojure.lang.Compiler.eval (Compiler.java:7136)
	clojure.core/eval (core.clj:3202)
	clojure.core/eval (core.clj:3198)

Since this is a "printing" method, it might be a place where we could do more to help, like merging invoke frames. Detecting that these are the same frame might be challenging, I don't know. The tools for emitting line/col debug symbols in bytecode are weird and I don't think that stuff has been touched in forever, there may even be options we could use that didn't exist originally. (Look for visitSource and visitLineNumber in the Compiler and the source debug extension stuff like https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Compiler.java#L4326-L4340 - see JSR 45 for that).

I believe JRuby has spent a lot of time on this same problem (showing you a ruby stack trace that came from a jruby jvm stack trace), that might be an interesting thing to learn more about (I have not studied it).

+5 votes
by

My experience with tools that munge stacktraces is they seem fine until one day you encounter a weird bug that doesn't make sense, spend a few days trying to debug it, then discover the munging is hiding some crucial information, then swear and never trust those things again.

Even the paired invoke and invokeStatic frames tell you something. The existence of invokeStatic tells you something.

  1. If you see both invoke and invokeStatic for the same function it means the function doesn't close over anything, and the calling code was not compiled with direct linking on.
  2. If you just see invokeStatic it means the function doesn't close over anything and the calling code was compiled with direct linking on.
  3. if you just see invoke it tells you the function closes over something, and doesn't tell you if direct linking was on.

If you haven't seen https://github.com/stuartsierra/stacktrace.raw it is invaluable. It is some code that tries to undo the effects of all kinds of exception "prettiers" that exist. The readme for it has a nice "Why does this exist?" section (https://github.com/stuartsierra/stacktrace.raw#why-this-exists).

Alex has done a lot to try and improve clojure errors and error reporting, because it has been a perennial complaint, but to my knowledge has done nothing to interfere with the contents of stacktraces. It seems unlikely that clojure would start now.

by
Good points about all the info you can extract from the presence/absence of `invokeStatic`, TIL.

I read the "Why it exists?" section of stacktrace.raw and it seems it's mostly concerned with bugs in the existing exception prettifiers and not with the problem I tried to identify.
by
These are indeed good points and in general, we consider it important to have access to the real stack trace. However, there is no reason not to also have other tools that interpret traces.
...