Arc Forumnew | comments | leaders | submitlogin
Why I think Arc should use packages
11 points by cchooper 5938 days ago | 16 comments
Packages solve a lot of problems in Lisp. They not only allow you to modularize code, but also avoid problems like this: http://arclanguage.com/item?id=7701 and many more besides.

I didn't like the idea of packages originally. As pg said in ANSI Common Lisp, people often get confused about packages, which suggests they aren't the right abstraction. It isn't easy to explain about maps from names to symbols, implicit interning, when two symbols are equal and so on. It's a very complex system, and everyone gets confused the first time (I know I did :)

But they solve a lot of problems, especially tricky problems with macros, so I'm reconsidering them. I've come to realise that it's not packages that are the problem, but CL symbols. If you simplify symbols, then packages also become symbols. It's also possible to simplify packages a little bit, which also helps. To demonstrate, I'll explain how I'd like Arc packages to work.

How Arc Packages Should Work

============================

A package should act like an implicit prefix for symbols. So if the current package is 'foo and you type in 'bar, Arc treats this as the symbol 'foo::bar. This is a full name of the symbol, while 'bar is the abbreviated name. A symbol can have multiple full names by importing it into different packages:

  (import 'foo::bar 'quux)
  
  (is 'foo::bar 'quux::bar)
  => t
Every symbol has at least one full name (i.e. all symbols are interned somewhere). There is no #: syntax for accessing uninterned symbols, because they don't exist.

Packages should be created when they're first used, just like symbols. For example

  'quux::foo
creates a package called 'quux containing the symbol 'foo. There should also be an Arc equivalent of CL's 'in-package which can also create a package automatically, and can be used for modules.

There should be no such thing as internal vs. external symbols. All symbols are accessed using :: (so that : can continue to be used for composition). There should also be no such thing as one package 'using' another. You just import the symbols you want, and hopefully the module creator will provide a nice list of symbols that you can pass to 'import to get all the symbols you need.

There should be no such thing as 'make-symbol, as this would allow you to create uninterned symbols. To create a symbol from a string, use 'read, which already has the correct behaviour for creating interned symbols.

Finally, instead of calling them packages, it would be better to call them prefixes, so that they don't sound like modules.

As you can see, it's not difficult to understand or use packages this way. You don't have to worry about them being maps from names to symbols. You don't have to worry about what 'interning' is. You just treat them like prefixes.

Gensyms

=======

Gemsyms in Arc are always interned, so there is a possibility of them not being unique. Putting them in a different package would reduce the chance of this happening. Alternately, the 'gs package could be a special magic package where symbols always act like they are uninterned.

  (is 'foo::bar 'foo::bar)
  => t

  (is 'gs::bar 'gs::bar)
  => nil
This could be explained by saying that gensyms aren't real symbols; they are special magic tokens that you can use instead of symbols, and are always unique. Note that you don't need to use #: to refer to these symbols, because the 'gs prefix tells you that they're uninterned.

Obviously, implementing all this is very difficult, especially on Scheme, which is why I haven't even tried. But I think the benefits would be huge. What do you guys think?



4 points by almkglor 5938 days ago | link

I've been bashing my head in this somewhat, in SNAP. The problem is always with the reader function: we cannot use a single reader function, since two different process could be using the reader and expecting it to be in two different packages. Obviously we would have to create separate monadic readers for each process.

The other problem then becomes: how do we implement, say, (import ...) or (in-package ...) ? Should the reader silently filter them out? If not, how does 'eval notify the reader that the package is changed? If someone defines foo::in-package and bar::in-package, which one does the reader return, and how will 'eval know if the package should be changed?

Yet another problem would be intrasymbol syntax: should the reader leave foo::bar!nitz, or should it process it into (foo::bar 'foo::nitz) or even (foo::bar 'quux::nitz) if (import quux::nitz foo::nitz) is used?

Yet another problem is the "standard" package, i.e. CL-USER in CL. Obviously all symbols from this package are imported into all packages.... or should they? The problem is a problem which PG ignored in ArcN: backwards compatibility. If I write code today which uses the symbol "convoke" as, say, a table in my function and tomorrow PG decides to create a macro "convoke" which does something else, kaboom! my code dies.

In my opinion we should separate packages into several interfaces.

Basically, suppose I release a package, AlmkglorSuperSupremePanPizza, and I define a "version 1" of this interface:

  (in-package AlmkglorSuperSupremePanPizza)
  (interface v1
    eat drink be-merry)
Then someone else can use:

  (import AlmkglorSuperSupremePanPizza::v1)
This imports the 'eat, 'drink, and 'be-merry symbols from AlmkglorSuperSupremePanPizza. Importantly, once I publish the interface, I cannot change it. If I realize that my interface is lacking, I will have to define a new interface:

  (in-package AlmkglorSuperSupremePanPizza)
  (interface v1
    eat drink be-merry)
  (interface v2
    ;include v1 interface elements
    v1 for-tomorrow-we-die)

-----

4 points by almkglor 5938 days ago | link

continued:

Anyway, I've been thinking rather deeply about packages and module systems and the like. One major reason for using a symbol-based package/module system is types: for example, if packageA defines a 'gaz type, and packageB defines a 'gaz type also, obviously the types are incompatible and they shouldn't be the same.

Also, for a potential solution from within the Scheme implementation, consider instead moving the problem from the reader to the evaller.

Instead of having symbol packages be handled by the reader, we might instead define a new builtin type, 'eval-cxt. An 'eval-cxt object is simply an evaluator, but understands packages and the (in-package ...) and (import ...) etc. syntaxes.

The reader simply reads in symbols blindly, without caring about the exact package they should go to. This simplifies the reader and allows us to continue using the reader and writer for saving plain Arc data. Instead, any particular context for evaluation is put into the 'eval-cxt object. 'eval-cxt will understand 'import etc. forms, and will perform the translation of all symbols into their qualified, package-based counterparts, i.e.:

  (= evaller (eval-cxt))
  (= tmp (read))
  user input> '(hello world)
  => (quote (hello world))

  (evaller tmp)
  => (arc-user::hello arc-user::world)
The Arc REPL would then be something like:

  (def tl ()
    (let my-eval (eval-cxt)
      ((afn ()
         (pr "arc> ")
         (write:my-eval:read)
         (self)))))
This simplifies 'read, EXCEPT: intrasymbol syntax must, must be absolutely expanded by the reader. Why? Because otherwise macros whose expansions use intrasymbol syntax won't work properly:

  (in-package sample)
  (= private-table
     (table 'foo 42
            'bar 99))
  (mac foo ()
    `(private-table!foo))
The problem is the 'foo symbol above: the macroexpansion must express both private-table as sample::private-table and 'foo as sample::foo.

Incidentally, the 'eval function can still be expressed rather simply by this manner:

  (def eval (e)
    ((eval-cxt) e))
Alternatively we can give an optional package name:

  (w/uniq no-param
    (def eval (e (o p no-param))
      (let evaller (eval-cxt)
        (unless (is p no-param)
          (evaller `(in-package ,p)))
        (evaller e))))
This keeps maximum backward compatibility with ArcN: a simple reader and a simple eval function

-----

3 points by cchooper 5937 days ago | link

Hmm... interesting stuff.

I think I'll keep your interfaces idea. It's much more elegant than a list of symbols (although it isn't necessarily much different underneath).

I don't really understand the problem with 'in-package. Is this a SNAP-specific problem or does it affect Arc generally? Won't things work the same way as in CL?

Some other thoughts I had about this which may be useful: modules are usually kept in a file, so a 'load-in-package function which takes a package argument might be useful

Also, I thought it would be good to have a read-macro to switch packages. I'll reuse #: as it's not needed in my system:

  #:foo (some expressions)
This would read the expressions in package 'foo before executing them. That might solve your problem as the package is passed to the reader explicitly.

I like the idea of symbols being read in without a package, but then getting a package at eval time. One way to implement this may be to store all the symbols in a special package when they are read, then eval can move these to a new package when it evaluates them. This makes packages very dynamic.

One other thought I've had: package names should be strings. Otherwise, 'foo::bar actually becomes 'foo::foo::bar, which is really 'foo::foo::foo...::bar etc. That;s just a bit crazy, so I think strings should be used to name packages instead. Alternatively, package names should also be interned in a special package that's treated differently. Seeing as packages are just mappings from strings to symbols, that doesn't really make much difference.

-----

3 points by almkglor 5937 days ago | link

> It's much more elegant than a list of symbols (although it isn't necessarily much different underneath).

Which is the point, of course ^^

The other point is disciplining package makers to make package interfaces constant even as newer versions of the package are made. This helps preserve backward compatibility. In fact, if the ac.scm and arc.arc functions are kept in their own package, we can even allow effective backward compatibility of much of Arc by separating them by version, i.e.

  (using arc v3)
  (using arc v4)
  (using arc v5)
> I don't really understand the problem with 'in-package.

  ; tell the reader that package 'foo has a symbol 'in-package
  (= foo::in-package t)
  ; enter package foo
  (in-package foo)
  ; now: does the reader parse this as (in-package ...) or (foo::in-package ...)
  (in-package bar)
> Is this a SNAP-specific problem or does it affect Arc generally?

It's somewhat SNAP-specific, since we cannot have a stateful, shared reader, but I suspect that any Arc implementation that supports concurrency of any form will have similar problems with having readers keep state across invocations. The alternative would be having a monadic reader.

> Won't things work the same way as in CL?

Not sure: I never grokked anything except the basics of CL packages.

> #:foo (some expressions)

How about in a module file? It might get inconvenient to have to keep typing #:foo for each expression I want to invoke in the foo package, which means we really should think deeply about how in-package should be properly implemented.

> One other thought I've had: package names should be strings. Otherwise, 'foo::bar actually becomes 'foo::foo::bar, which is really 'foo::foo::foo...::bar etc.

If we don't allow packages to have sub-packages, then a name that is at all qualified will quite simply directly belong to that package, i.e. foo::bar is always foo::bar, as long as :: exists in the symbol.

Of course, hierarchical packages are nice too ^^

-----

1 point by cchooper 5935 days ago | link

> #:foo (some expressions)

I'm assuming you can use it like this:

  #:foo 
  ((def (x) (+ 1 x))
   (def (y) (expt y))
   (= bar 123))
or like this:

  #:foo ((load "filename"))
so you only have to type it once (although you would at the top level!) With this syntax, #:foo x could expand to something like (read-with-package "foo" x), so you wouldn't need a stateful read. Well, unless you called 'read within the file. So I guess you do. :)

> If we don't allow packages to have sub-packages, then a name that is at all qualified will quite simply directly belong to that package

True, but it will be a bit confusing:

  (import 'foo::bar 'foo::quux)
  (import 'foo::bar 'baz::quux)
Are 'foo:quux and 'baz::quux the same package? If so, it's a bit strange that you can refer to the same thing by different symbols. That's why I think strings are better. Not sure what I think about nested packages. I'll have to ponder on that.

-----

1 point by almkglor 5935 days ago | link

load is currently defined as:

  (def load (file (o hook))
    " Reads the expressions in `file' and evaluates them.  Read expressions
      may be preprocessed by `hook'.
      See also [[require]]. "
    (push current-load-file* load-file-stack*)
    (= current-load-file* file)
    (or= hook idfn)
    (after
      (w/infile f file
        (whilet e (read f)
          (eval (hook e))))
      (do (= current-load-file* (pop load-file-stack*)) nil)))
What magic needs to be inserted here to make 'load use the correct 'read, keeping in mind that even plain Arc supports threads and those threads share global variables?

It still looks like a stateful 'read to me, and I don't want a stateful 'read at all, because a file might want to directly use 'read:

  $ cat getconfig.arc
  (= configuration (read "my.cfg"))
This is one good reason to try to keep 'read stupid: one of Arc's idioms is to simply dump data as s-expressions and read them in later as list structures. If 'read is too smart, this idiom might have some subtle gotchas.

For that matter I'd prefer to keep the package definitions in the file itself, rather than have to remember to put the file in a package:

  $ cat mine.arc
  (in-package mine)

  (def mine ()
    (prn "this is my mine!!"))
> (import 'foo::bar 'foo::quux)

Okay, I have to ask: what does 'import mean?

-----

1 point by cchooper 5934 days ago | link

I would define 'load-in-package as

  - (def load (file (o hook))
  + (def load-in-package (package file (o hook))
  - (whilet e (read f)
  + (whilet e (read-in-package package f)
That's the best I can do. I think that if packages are involved then read is inherently stateful, so even threads are a problem. I have no idea how CL implementations deal with threads and package*, because the spec makes no account for it. :(

> (import 'foo::bar 'foo::quux)

Oops, that should be

  (import 'foo::quux 'baz::quux)

-----

1 point by almkglor 5934 days ago | link

Here's my proposal:

We move all state information into a new object type called a "context". It can be constructed without parameters via the 'cxt function:

  (cxt)
  => <implementation-specific>
  (type (cxt))
  => arc::cxt
The REPL becomes a RCEPL, a read-contexter-eval-print loop. For convenience, we also provide an 'eval-cxt object:

  (eval-cxt)
  => <implementation-specific>
  (type (eval-cxt))
  => arc::eval-cxt
'eval-cxt objects are callable, and their call is equivalent to:

  (let ob (eval-cxt)
    (ob x))
  ==>
  (let ob (cxt)
    (eval:ob x))
The implementation is free to define 'cxt and/or 'eval-cxt objects in terms of Arc axioms or by adding them as implementation-specific axioms.

The context object accepts a plain read expression (with unpackaged symbols) and emits an s-expression where all symbols are packaged symbols.

It is the context object which keeps track of the current package, so you might have some accessor functions to manipulate the context object (e.g. destructure it into the current package, etc.).

The read function is stateless and simply emits unpackaged symbols, and emits packaged symbols if and only if the given plaintext specifically includes a package specification.

A package object is a stateful, synchronized (as in safely accessible across different threads, and whose basic operations are assuredly atomic) object. A context is a stateful object intended for thread- and function- local usage.

context objects

===============

A context object is callable (and has an entry in the axiom::call* table) and has the following form:

  (let ob (cxt)
    (ob expression))
The return value of the context is either of the following:

1. If the expression is one of the following forms (the first symbol in each form is unpackaged, 'symbol here is a variable symbol):

  (in-package symbol)
  (interface symbol . symbols)
  (using symbol)
  (import symbol symbol)
...then the return value is axiom::t, and either the context's state is changed, or the state of a package (specifically the current package of the context) is changed.

2. For all other forms, it returns an equivalent expression, but containing only packaged symbols. The state of the context is not changed.

The forms in number 1 above have the following changes in the context or current package of the context:

  (in-package symbol)
Changes the current package of the context to the package represented by the unpackaged symbol. The implementation is free to throw an error if the given symbol is packaged.

  (interface symbol . symbols)
Defines an interface. All symbols are first applied to the current package to translate them into packaged symbols, if they are unpackaged (this translation by itself may change the package's state, and also a packaged symbol will simply be passed as-is by the package object; see section "package objects" below). It then modifies the package of the first symbol to have an interface whose symbols are the given symbols.

If the interface already exists, it is checked if the lists are the same to the existing list. If it is not the same, the implementation is free to throw an error.

  (using symbol)
The given symbol must be a packaged symbol. It must name an interface of its package; if the interface does not exist on the package, the implementation must throw an error. For each symbol in the interface, this changes the current package's state, creating or modifying the mapping from the unpackaged symbol of the same name to the symbol in the interface.

For conflicting package interfaces: let us suppose that the context is in package 'User, and there exists two package interfaces, A::v1 and B::v1. A::v1 is composed of (A::foo A::bar) while B::v1 is composed of (B::bar B::quux). If the context receives (using A::v1), the User package contains the mapping {foo => A::foo, bar => A::bar}. Then if the context receives (using B::v1), the User package afterwards contains the mapping {foo => A::foo, bar => B::bar, quux => B::quux}.

  (import symbol symbol)
Forces the current package to have a specific mapping. The first symbol must be a packaged symbol and the second symbol must be unpackaged. The implementation must throw an error if this invariant is violated.

Continuing the example above, after (import A::bar A-bar), this changes the package to {foo => A::foo, bar => B::bar, A-bar => A::bar, quux => B::quux}

package objects

===============

A package object is callable and has the following form:

  (ob expression)
expression must evaluate to a symbol, and if a non-symbol is applied to a package object, the implementation is free to throw an error. The application otherwise evaluates to either:

1. The same symbol, if the given symbol is a packaged symbol; this does not change the state of the package

2. A packaged symbol, if the given symbol is an unpackaged symbol. If the package does not contain a mapping for the unpackaged symbol, the state of the package is changed so that a mapping for the unpackaged symbol to a packaged symbol exists.

The package object also supports an 'sref operation:

  (sref ob v k)
k is an unpackaged symbol while v is a packaged symbol; the implementation is free to throw an error if this invariant is violated.

Packages are handled by interface.

Further, we also predefine two packages, axiom and arc.

The axiom package contains the following symbols:

  axiom::t
  axiom::nil
  axiom::fn
  axiom::if
  axiom::quote
  axiom::quasiquote
  axiom::unquote
  axiom::unquote-splicing
  axiom::set
  axiom::call*
The axiom package is implicitly imported into all packages. It presents no interface

The arc package contains all "standard" Arc functions and macros. The arc package is not implicitly imported into all packages.

The arc package contains the interface arc::v3 . This interface is the set of symbols currently defined on Anarki. Future extensions to the arc standard library must be first placed in the interface arc::v3-exp until they are placed into a future arc::v4 interface, and so on.

load

====

The load implementation is thus:

  (def load (file (o hook))
    " Reads the expressions in `file' and evaluates them.  Read expressions
      may be preprocessed by `hook'.
      See also [[require]]. "
    (push current-load-file* load-file-stack*)
    (= current-load-file* file)
    (or= hook idfn)
    (after
      (w/infile f file
        (let evaller (eval-cxt)
          (evaller '(in-package User))
          (whilet e (read f)
            (evaller (hook e)))))
      (do (= current-load-file* (pop load-file-stack*)) nil)))

-----

1 point by cchooper 5935 days ago | link

As for CL packages, I've decided I don't really like the way they work. If a file is compiled, then (in-package foo) is only guaranteed to work if it appears at the top level. So...

  (if (eq x 10) (in-package foo) (in-package bar))
works in an interpreted file, but the behaviour is undefined if the file is compiled. CLISP handles both cases fine.

Also in CL, the value of package* doesn't always correspond to the actual current package. For example

  (setf x "I'm in the default pacakge!")
  (setf foo::x "I'm in the FOO package!")
  (setf *package* foo)
  (print *package*)
  (print x)
does this when interpreted

  #<PACKAGE FOO>
  "I'm in the FOO package!"
but this when compiled

  #<PACKAGE FOO>
  "I'm in the default pacakge!"
Either the package should be determined at eval-time (as was your suggestion) or the user should be forced to use read macros like #: and #.(in-package ...) to switch packages at read time. The CL solution is an ad-hoc compromise between the two.

-----

1 point by almkglor 5935 days ago | link

Forcing the user to keep using read macros doesn't feel quite right. Personally I'm more for using 'eval-cxt objects, which would do the assignment from plain symbols to qualified symbols, and keep track of the current package.

Of course, using 'eval-cxt raises questions about static whole-program compilation, I think. Hmm. I haven't thought deeply about that yet.

-----

2 points by rntz 5933 days ago | link

While I like the idea of simplifying packages, I'd like to point out that packages are no universal panacea. Consider the following:

    ;; In package 'a
    (mac afn (parms . body)
      `(rfn self ,parms ,@body))

    ;; In package 'b
    (import 'a::afn 'afn)

    ;; A stupid example function
    (afn () self)

    ;; The translation process
    (b::afn () b::self)
    
    (rfn a::self () b::self)
    
    (let a::self nil
      (set a::self
           (fn () b::self)))
'afn is broken if used with packages. This can be fixed by importing 'self from the 'a package, but it's an ugly fix.

-----

2 points by almkglor 5933 days ago | link

Then 'self will have to be part of the arc::v3 interface.

There's a reason why there's an interface abstraction in my proposal.

-----

2 points by tung 5931 days ago | link

I don't think there's anybody here that doesn't think that Arc should have packages.

I personally don't mind how packages are done so long as they're simple and transparent.

-----

3 points by drcode 5931 days ago | link

I don't think arc should have packages.

My preferred solution, FYI, would be some kind of crazy thread-specific namespacing. Then, by having large numbers of light-weight threads, name collisions wouldn't be an issue anymore. (Yes, you could say this is similar to a module system, if you want to be pedantic :-)

-----

2 points by almkglor 5930 days ago | link

> My preferred solution, FYI, would be some kind of crazy thread-specific namespacing.

The devil is in the details.

-----

1 point by cchooper 5935 days ago | link

BTW: "packages also become symbols" should read "packages also become simple".

-----