I like egal fine if we decide to tell the type system about immutability, but that seems like a major design decision. I'm not sure this one feature is sufficient reason.
I don't understand the paper very well (thanks for the pointer). I'm not sure why he cares about identity so much if most of the time we're treating lists as immutable. And if a list is mutable equality is not forever, all it returns is whether these two lists are equal right now.
The only new drawback I learned about was that equal can't handle cycles. I hadn't thought of that. It's not hard to fix, but the obvious fix costs performance every single time.
I'm going to go reread it, but so far I think it suffers from being written 10 moore's law generations ago, and complects performance considerations with API design.
"Most languages have only eq and no equal: the == operator in Ruby/Python compares by object reference."
$ python
>>> [0, 1, 2] == [0, 1, 2]
True
I'm pretty sure those two literals turn into objects with different 'identities'.
Most of the time I don't care what 'identity' an object has. If I'm doing weird stuff with assignment and mutable structures I'm happy for the language to push back and make me go back to the straight and narrow asap.
"The only new drawback I learned about was that equal can't handle cycles. I hadn't thought of that. It's not hard to fix, but the obvious fix costs performance every single time."
Just as a point of interest, Racket handles this with its built-in types:
I wasn't able to post this earlier, due to the Arc Forum being all wonky, so I'll post it now.
---
"I'm pretty sure those two literals turn into objects with different 'identities'."
Weird, I distinctly remember Python returning false for that... I guess I was wrong about == in Python/Ruby.
Oh, I see, I was thinking about the "is" operator, which does return False, but is rarely used. So you were right: Python uses equal by default for arrays/dictionaries. However, that only applies to the built-in types:
>>> class Foo:
... pass
>>> Foo() == Foo()
False
User-created types can add an __eq__ method to implement "equal" testing, but it's not the default.
JavaScript, however, uses object reference for both "==" and "===":
[1, 2, 3] == [1, 2, 3]
[1, 2, 3] === [1, 2, 3]
Which, as far as I'm concerned, is the correct semantic, because arrays are mutable.
"I think I don't understand why object reference is worth a special operator (besides performance reasons)."
The point of "egal" is that you don't have a special operator: you use the same operator for both mutable and immutable objects. It's wart (and Arc/Common Lisp/Scheme) that has a special operator for reference purposes. Just because you call it "addr" rather than "eq" doesn't make it any less of a special operator. And I doubt you put it into wart for performance reasons.
The "egal" article actually defines equality as "operational equivalence":
Two objects are "operationally equivalent" if and only if there is no way
that they can be distinguished, using ... primitives other than [equality
primitives]. It is guaranteed that objects maintain their operational
identity despite being named by variables or fetched from or stored into
data structures. [Rees86,6.2]
In other words, it is possible to distinguish two mutable objects by mutating one of them and checking to see if the same change occurs in the other object. But this is not possible with immutable objects.
So my stance is simple: immutable objects should be compared by recursing on their components, like how "iso" works in Arc. Mutable objects should compare by reference equality, because the whole point of mutation is that you can change the object without affecting other objects which have the same subcomponents. That has to do with the conceptual model of how the objects are used and has nothing to do with performance.
The reason the article talks about performance at all is because immutability is much preferred in parellel computing, and most Lisps treat cons cells as immutable, but they're not actually immutable, so the parallel system has to have extra infrastructure to support mutable conses, even though they're almost never mutated. In any case, the point the article is making is valid even if you ignore performance:
There are a several problems with EQUAL. First, it may diverge in the
presence of directed cycles (loops) in one of its arguments, although some
(e.g., [Pacini78]) have suggested more sophisticated predicates capable of
detecting such cycles. Secondly, it is not referentially transparent; two
calls to EQUAL on the "same" (i.e., EQ) arguments can produce different
results at different times. Neither of these possibilities is to be desired
in a programming language primitive equality test because we would like such
a test to always return and we would like object identity to be preserved.
Yet EQUAL is an extremely valuable operation, because the vast majority of
Lisp lists are side-effect free--i.e., "pure". Without side-effects, loops
cannot be constructed and sublists cannot be modified, and within these
restrictions EQUAL becomes a well-defined and useful operation.[8]
This tension between the simplicity of EQ and the usefulness of EQUAL has
led to a great deal of confusion. This confusion has now lasted for 30
years, with additional confusion having been recently added by Common Lisp.
Since neither EQ nor EQUAL has the desired semantics for the multiplicity of
Common Lisp datatypes, Common Lisp added six more--EQL, EQUALP, =, CHAR=,
STRING=, and STRING-EQUAL, yet these also lack the correct semantics.[9]
Scheme continues this confusion by offering the three predicates (eq?, eqv?
and equal?) which are roughly equivalent to Common Lisp's EQ, EQL and
EQUALP.
In other words, if you have mutable cons cells that are almost always treated as immutable, you need both eq and equal. But if you create two datatypes: mutable cons and immutable cons, then you only need "egal". With immutable conses, you don't want to use "eq" on them because they're functional: letting you grab the reference of an immutable cons would be an abstraction leak.
But with mutable conses, you do need "eq", because two cons cells that are equal may not be eq. So, if your language distinguishes between mutable and immutable conses, you only need "egal". This not only prevents abstraction leaks, but it also preserves operational equivalence, which means that if "egal" says that two objects are equal, then it's impossible to distinguish them in any way. This is not true with eq/equal.
As a practical example of why you would want that... look at Racket. It has three kinds of hash tables: hash-eq, hash-equal, and hash-eqv. And it has all kinds of issues if you use a mutable object as a key in an "equal" hash:
Caveat concerning mutable keys: If a key in an equal?-based hash table is
mutated (e.g., a key string is modified with string-set!), then the hash
table’s behavior for insertion and lookup operations becomes unpredictable.
So you, the programmer, have to decide every time you create a hash table which kind of equality it should use. And if you choose wrong, you end up with issues. And it also means you can't mix mutable and immutable keys in a hash table. This is ridiculous. With "egal" this isn't a problem.
Notice that none of this has to do with performance. It has to do with the conceptual model of how objects behave.
Thanks for all that detail! Yeah, using lists and hashes as hash keys is kinda crazy. And mutating such hash keys is utterly crazy.
I think the crux is this: "..it is possible to distinguish two mutable objects by mutating one of them and checking to see if the same change occurs in the other object."
This still feels like a crazy abstraction. If two mutable objects look the same I want equality to say they are the same by default, not go into contortions to cover its ass because I might choose to mutate them and I might then care about reference equality.
However I have a lot more respect after your comment for how egal can simplify lots of thorny concurrency issues.
This is a matter where Pauan and I agree on the ideal semantics completely. I seem to remember us advocating egal a few other times (though I don't think I've ever called it "egal").
That said, I don't care if this functionality is provided as a two-argument function or as (addr ...).
---
"Yeah, using lists and hashes as hash keys is kinda crazy. And mutating such hash keys is utterly crazy."
I built Cairntaker on the idealistic notion that every single first-class value is a mutable or immutable weak table. That includes the table keys. Cairntaker does make compromises for performance and FFI, but none that betray this ideal.
Even when I'm programming normally in Arc or JavaScript, I use tables with immutable list keys all the time. It doesn't seem unusual to me.
"This is a matter where Pauan and I agree on the ideal semantics completely."
Quite the rare thing indeed.
---
"I seem to remember us advocating egal a few other times (though I don't think I've ever called it "egal")."
I personally haven't. I only recently started to actually care about egal because of Nulan using immutable objects.
I do remember us having discussions before about making "iso" the default equality in Arc, which is what wart does, but I don't recall us talking about cons mutability.
However, I can't find any more than that. Maybe I just took it for granted at that point.
It is the policy I applied in Cairntaker. Cairntaker interns all immutable tables so they can be compared for deep equality even after some of their entries have been garbage-collected. Equality support is essential in Cairntaker since any value can be used as a table key.
"Even when I'm programming normally in Arc or JavaScript, I use tables with immutable list keys all the time. It doesn't seem unusual to me."
You silly. Neither JavaScript nor Arc have immutable lists. Well, I guess you could use Object.freeze on an array in JS, but... even if you did that, it would just serialize the array to a string.
Lol, yep. The way I "implement" immutable lists in practical programming is by not modifying them. I rationalize that their static type would enforce this if only there were a type system. :-p
More specifically...
In Arc I can't really rely on using lists as keys, since (IIRC) Jarc converts table keys to strings using the same behavior as 'write. However, in practice I have used lists of alphanumeric symbols, since those have a predictable 'write representation on Jarc, a predictable Racket 'equal? behavior on pg-Arc, etc.
Similarly, in JavaScript I tend to use Arrays of strings, storing them as their ECMAScript 5 JSON representation.
Yeah I take that back about immutable list keys :)
This is all most interesting. I'm going to keep an eye out for nice programs that exploit structural equality on mutable structures. Y'all should do the same!
"This still feels like a crazy abstraction. If two mutable objects look the same I want equality to say they are the same by default, not go into contortions to cover its ass because I might choose to mutate them and I might then care about reference equality."
Then make the objects immutable. Then equal makes sense. Then the compiler/interpreter knows what your intent is. Then you avoid all these issues. Then you don't even need a distinction between eq/equal: you can just use egal.
This seems to me to be the obvious choice, but it seems we disagree.
Yeah, I didn't fully understand egal when I said I agreed with it if the language distinguishes immutable objects. I want my equality to be more lenient in comparing mutable objects. Even with mutable objects it seems a lot more common that I just care about structural equality rather than identity.