Arc Forum | "In Racket, you don't exactly have access to the low bits of a cons cell for tag...

Arc Forum

2 points by rocketnia 5320 days ago | link | parent

"In Racket, you don't exactly have access to the low bits of a cons cell for tagging purposes. Instead we have to make a compound data structure--tag things with a vector--and we need a "rep" function to extract the object itself. But this is an implementation detail, and it can and should be hidden."

Isn't it the other way around? Why are "low bits" not an implementation detail?

---

"If you pass "car" a tagged cons, or you pass "+" a tagged number, it just strips off the tag, finds that this object is indeed something it knows how to add, and just adds it."

If I define a new type, it makes little sense for it to inherit the qualities of the particular old type I'm using to implement it. That idea does nothing but expose implementation details.

---

"The arbitrarily nested tagging is technically possible here, but I think people haven't run with it very much because the interface sucks."

The interface indeed sucks a bit, 'cause [annotate 'foo _] doesn't wrap something that's already a 'foo. That means if you want to make a new type which wraps a single arbitrary value, then you need to do something like [annotate 'foo list._] so that you don't get mysterious bugs wrapping things of type 'foo. I end up having to do this all the time.

A long time ago (http://arclanguage.org/item?id=12076) I didn't like 'annotate and 'rep because I found them wordy, but I got over it. :-p Ssyntax goes a long way; once I started accessing compound types using things like (let (a b c) rep.foo ...), rep.foo.1, and rep.foo!fieldname, the "rep." part didn't seem so bad. I said something similar recently at http://arclanguage.org/item?id=14373, but I don't expect you to read that thread. ^^;

---

Speaking of that thread, though, this part might be relevant:

One could go all the way down, making a (deftype wrap-foo unwrap-foo isa-foo) form that eliminates the need for 'annotate, 'rep, 'isa, and 'type altogether. In this example, 'isa-foo would be a function that indicates support for 'unwrap-foo, and 'wrap-foo would be a simple data constructor that makes an encapsulated value with nothing but innate support for 'unwrap-foo. (http://arclanguage.org/item?id=14265)

I consider that to be a great way to get rid of implementation details, since someone can extend 'unwrap-foo and 'isa-foo for a new type if they want to. However, I'd also like to leave in 'annotate and friends, just so people who'd really like to dig into the details can do so.

---

"Hmm, one possible flaw occurs to me. If something is (tag a (tag b x)), a user-defined function won't be able to extract the "b" using 'rep. However, I don't think that's much of a problem: if the thing that tagged the (tag b x) as an "a" wanted to allow its callers to do inheritance or whatever on the "b" tag, it could instead have returned (tag (list a b) x)."

As I've mentioned recently (http://arclanguage.org/item?id=14176), I only expect 'type to determine the concrete representation of something, since it naturally arises (in my experience) that a value has one and only one concrete representation.

With the way I code, saying (tag a (tag b x)) means I'm making an "a"-typed value that has a "b"-typed value that has "x". I don't consider them all to be the same value, so the flaw you're talking about is relevant to me, and your example workaround solves a different problem.

1 point by waterhouse 5320 days ago | link

> Isn't it the other way around? Why are "low bits" not an implementation detail?

The implementation detail is that "you can't tag an object without wrapping it in a compound data structure that old functions will have to be told to reach inside to get the original object".

"Hiding implementation details" isn't something to be done for its own sake. You want to hide implementation details when they're uglier than the semantics you want to convey; e.g. it's impossible to use a MUL instruction to check if the result of a multiplication is a bignum and allocate one and return a tagged pointer to it if necessary, but keeping track of all that is a pain; so instead we have a generic * function that hides all this scrambling around, and we can think of all integers as a single type of object. Exposing that stuff to a user who isn't extremely concerned with performance just sucks. It's bad because the interface sucks, and we'd identify it as "exposing implementation details" because that was the reason it happened.

On the other hand, does anyone want to hide the fact that lists are implemented as conses and that you can take their car and cdr? Not me. That stuff is useful.

> If I define a new type, it makes little sense for it to inherit the qualities of the particular old type I'm using to implement it. That idea does nothing but expose implementation details.

When you first define a new type, no preexisting functions will know how to handle the new type. You have two choices for what should happen in the absence of such instructions. Either they can give errors and fail (unless you use "rep", in which case they do their job as usual), the way they do now, or they can do their job as usual, the way it would be under my proposal. I think the latter choice is clearly--I would say strictly, unless you find the error messages useful--superior.

Now, if you define a new type, the first thing you'll probably do is define some functions that handle the new type, and they will likely need to use old functions that manipulate the objects that your new type is made of. (Perhaps the functions you're defining will be intelligent, wrapped versions of the old ones.) As mentioned, either they have to use "rep", or they don't.

And if you want to define a second new type that's supposed to be a subtype of the first one, then you can still build that on top of the provided "tag" and "type". (I think this answers your last point.) For example:

  ;Single inheritance. The path of inheritances is an improper list.
  ;Builtins will have atomic types; you could give something an
  ;atomic type to pretend it's a builtin.
  (def inherit-tag (typ x)
    (tag (cons typ (type x)) x))
  (def inherit-type (x)
    (if (acons (type x))
        (car (type x))
        (type x)))
  (def inherit-rep (x) ;more like "cast to supertype"
    (if (acons (type x))
        (tag (cdr (type x)) x)
        x))
  (def is-a (typ x)
    (if (atom (type x))
        (is typ (type x))
        (or (is typ (car (type x)))
            (is-a typ (inherit-rep x)))))

If you wanted "has-a" type semantics or something, you could figure out a similar way to define those. (Also, it occurs to me that you could use "tag" to identify your tagged objects, as in (tag 'inheritance-tagged (list (list 'type1 'type2 ...) 'implementation-type x)) or something.) I think you can build things on the type system just as much as before, and meanwhile the initial implementation is cleaner.

[Hmm. Come to think of it, given that the user doesn't have access to the internal "rep", and the built-in functions don't use it, it's meaningless (and useless) for (tag 'a (tag 'b x)) to return nested vectors; I'll make it return the same thing as (tag 'a x). So in the language primitives, everything has a single type--which would likely be just a symbol, but conceivably an Arc standard library might have things that do inheritance with lists of symbols. Anyway, this doesn't change anything I've said above.]

-----

2 points by rocketnia 5319 days ago | link

"The implementation detail is that "you can't tag an object without wrapping it in a compound data structure that old functions will have to be told to reach inside to get the original object"."

When I use 'annotate, what I want is to do this: Take a value that's sufficient for the implementation of a new type, and produce a different value such that I can extract the old one using an operation (in this case 'rep) that I can't confuse for part of the type's stable API.

Any "tag" semantics that overwrites the original type is useless to me for that.

I do think "wrap" is a better name for what I want than "tag" or "annotate," and that's the word I reach for when naming a related utility or recreating 'annotate in a new language. I don't mind if you say that tagging is something different. ^_^

---

"You want to hide implementation details when they're uglier than the semantics you want to convey"

From a minimalistic point of view, any accidental feature is ugly complexity, right? But I grant that having easy access to accidental features is useful for a programmer who's writing experimental code.

---

"On the other hand, does anyone want to hide the fact that lists are implemented as conses and that you can take their car and cdr? Not me. That stuff is useful."

I consider Arc lists to be conses by design, not just by implementation. As you say, conses are a positive feature.

On the other hand, the core list utilities would be easier for custom types to support if they relied on a more generic sequence interface, rather than expecting a type of 'cons. This isn't hiding the implementation so much as deciding to rely on it as little as possible, but it's somewhat causally related: If I don't rely on the implementation, I don't care whether or not it's hidden. If the implementation kinda hides itself (the way the body of a 'fn is hidden in official Arc), then I tend to leave it that way, and it can come in handy as a sanity check.

---

"And if you want to define a second new type that's supposed to be a subtype of the first one, then you can still build that on top of the provided "tag" and "type". (I think this answers your last point.)"

No... I said this:

With the way I code, saying (tag a (tag b x)) means I'm making an "a"-typed value that has a "b"-typed value that has "x". I don't consider them all to be the same value[...]

There's no subtype relationship going on here. For comparison's sake, if I say (list (list 4)), I'm making a cons-typed value that has a cons-typed value that has 4. At no point should my conses inherit from 4, and at no point should my "a" inherit from "x". That's just not what I'm trying to do.

---

In any case, just because I like the status quo (or my own spin on it) doesn't mean I shouldn't give your approach a chance too.

So, as long as (tag 'foo (tag 'bar "hello")) makes something nobody can tell is a 'bar, it's consistent for them not to know it's a 'string either. But then what will core utilities which can handle strings and tables (like 'each) do when they get that value? Do they just pick 'string or 'table and assume that type by default? If they pick 'string, won't that make it seem strangely inconsistent when someone comes along and expects (each (k v) (tag 'baz (table)) ...) to work?

-----