I regret to inform you that your submission 

108: Resource Limits for Haskell

was not selected for inclusion in the programme of this year's 
ICFP.

We received 133 submissions, which made the competition very 
tough this year. Selection of the best papers was difficult.
Many papers with high marks were discussed extensively,
but had to be rejected eventually.

We hope that you will find the reviewers' remarks useful and 
can benefit from them to produce a better paper.

We also hope to see you in Boston at ICFP 2013 and perhaps 
you can contribute to a workshop of the conference.

Kind regards,

Tarmo Uustalu
ICFP 2013 PC chairman


----------------------- REVIEW 1 ---------------------
PAPER: 108
TITLE: Resource Limits for Haskell
AUTHORS: Edward Yang and David Mazieres

OVERALL EVALUATION: 0 (borderline paper)
REVIEWER'S CONFIDENCE: 3 (medium)

----------- REVIEW -----------
Technical Summary
~~~~~~~~~~~~~~~~~

The paper presents a way to extend the cost center mechanism of GHC,
used for profiling, with dynamic cost centers that the user
gives. Cost centers can then be used in a "profiled" run to establish
resource limits for pure and impure computations. A challenging
problem that this work addresses is how to make sure that all memory
that has been allocated by a (pure or impure) computation can indeed
be freed after e.g. the resource consumption for this computation
tracked in a relevant cost center has exceeded some bounds. To this
end the authors propose a sandboxing approach based on copying and
truncating for pure computations and an information-flow approach for
impure computations that tracks cost centers that flow to references
(and threads with cost centers they intend to access, as in LIO). To
track memory consumption the authors have implemented incremental heap
census which allows them to get accurate resident memory information
without traversing the whole heap. The paper presents experimental
evidence that shows resource limits working, but the incremental
census required for memory tracking does not appear to make things
much worse than the profiled version of the program.


Opinion:
~~~~~~~~

This is an extremely interesting topic, very relevant for high
assurance software.  The main ideas and primitives look reasonable,
although some people might complain that they are more of design
guidelines and do not really enforce safety (though most real-world
software is like this, so I do accept the pragmatic
considerations). On the negative side the paper presentation is dense
and very GHC specific. I am not sure that a person not familiar with
cost centers would be able to fully follow. Moreover there are very
few examples (none I believe, actually!) of using the design
primitives to implement or demonstrate the usage -- especially for the
impure computations. Finally I know and understand that a
formalization of how exactly the incremental census works or how
exactly labels are tracked in the modified LIO interface is probably a
heavy task but I would appreciate a more precise description of what
is going on. This basically feels like half paper to this reviewer
(actually like 2 half papers, one about the library and the modified
LIO, and the other about the census algorithm), so I would go with 
borderline.


Comments:
~~~~~~~~

+ Abstract: "forcible resource reclamation". Terminology vague at this point

+ Intro, pg1, col2, para2: "are your cost semantics" <- is your cost
  semantics

+ Intro, contribution list: maybe the bullet order should be reversed
  to follow the exposition in the paper?

+ In general the bullet points in the intro assume already famliarity
  with GHCs cost centers, so I am wondering if the first paragraph in
  2.1 could be moved to the introduction, to give the readers a little
  bit of context.

+ Moreover, the scc discussion and dynamic scoping / static scoping in
  2.1 -- is it really relevant? Why don't you cut to the chase and
  say: "Here is what static cost centers are, here is why they are not
  good for our purposes (with an example please! there is no example
  currently), and here is our proposal"). The motivation for dynamic
  cost centers comes a bit too late (3.1) in a parenthesis ("in the
  case of loading an untrusted module") and no example!  Also at that
  point I believe that you should go in quite a bit of depth to
  explain cost centers and their semantics in detail (it is ok if it
  is previous work, no problem!), and not defer to the Appendix. There
  is plenty of space in the paper and there are many subsequent points
  where the user is somewhat lost without referring to the semantics.

+ The Goal 2.3 is a bit dense too. Also, at some point in the paper
  you should go back to the Goal and say: And now we achieve this. And
  explain how.

+ End of 3.1: "A keen reader ... symmetric with ...". The word
  "symmetric" is funny. In fact I do not understand this sentence.

+ Section 3.3:
  I am a bit puzzled about the use of "evaluate" in
  sandboxThunk. Evaluate evaluates to WHNF or deeply? If it evaluates
  to WHNF then aren't we going to be left with something like a
  constructor
  
        K (copy (k ..))
  
  at the top-level? And then we will still have pointers leaking to
  things that should be deallocatable?
  
  Can you please give an example of copy? It might also be helpful to
  elaborate a bit on why and how exactly copying "moves" data from one
  cost center to another.

  Can you please give an example of truncation? Is the programmer
  supposed to truncate up to WHNF? I am quite puzzled.

+ Pg 4, last line of first para: "The fold can be composed ... the
  sandbox" <- vague I do not understand this

+ Pg 4, first column, first bullet point: I do not get the discussion
  about why sandboxThunk does not take a CC whereas sandboxFun does
  take a cc. I mean, I could very well have a version of sandboxFun
  without a CC argument, and do the withCC *outside* couldn't I? Or is
  it because you are interested in passing it around, whereas you do
  not intend to pass around a sandboxed thunk with its own CC?

+ Pg 4, 2nd column, LIO discussion point (4): What happens with the
  thread label? Nothing?  Examples are desperately missing in this
  section ...

+ Semantics 4: the last sentence in the first paragraph is a bit
  disappointing -- cannot you isolate some non implementation-specific
  description that *other* PL implementors/researchers can actually
  use if they want to extend your work? And if the *modified* LIO
  semantics are "out of the scope of the paper" then is it really the
  right thing to put the LIO part in the paper and the list of
  contributions? Is the paper supposed to be the documentation of the
  library? 

+ Page 5, 2nd column, first code snippet: replace (f n) with fn.  Also
  when explaining the inlining problems you should give to the reader
  more context. First paragraph of 2nd col of Pg4: "To make matters
  worse ... into a built-in". Please say that: "If it were a built-in
  then we could tell GHC to not float over its definition, however
  even in that case, if we create another function which is not
  built-in then we lose this property". I think this is your point.

  Moreover, the text that explains withCC1 is not very clear,(for
  instance "with the usual extensions for arbitrary arity" migh seem
  strange to non GHC-users) Perhaps give withCC2 as well and give the
  example from the main function above :
                ... 
                withCC2 cc f n (\f n -> f n)

+ It is sad that withCC is not robust to program transformations, but which of 
  those do indeed preserve cost semantics? It might be good to have a collection
  of GHCs optimizations and characterize their safety or have "weaker versions" of
  them that do preserve cost semantics. 

+ The Appendix is not self-contained either, what is the union
  operator on \theta, what does SUB("SUB",cc) mean?  However, as I
  mentioned before, it should be streamlined in the rest of the paper
  and explained in more detail.


----------------------- REVIEW 2 ---------------------
PAPER: 108
TITLE: Resource Limits for Haskell
AUTHORS: Edward Yang and David Mazieres

OVERALL EVALUATION: 0 (borderline paper)
REVIEWER'S CONFIDENCE: 3 (medium)

----------- REVIEW -----------
This paper proposes an API which allows Haskell programmers
to run code with limited resources inside their programs.
For this purpose, the authors partially reuse the profiling
support in GHC driven by the idea that profiling is a basis
to reflect the resources used by some part of the program.
Thus, they allow the programmer to define cost centers
in a dynamic manner and define operations to react on
situations when resource bounds are reached.
They demonstrate by a number of benchmarks that their approach
can be implemented with a reasonable overhead.

Although this paper is an interesting contribution
to support more reliable Haskell programs,
it seems to be in a preliminary stage since some
aspects are worked out in detail whereas others
are incompletely described. For instance,
the definition and application of cost centers
is precisely described. However, this is only done
for a tiny core language which is not related
to a cost semantics. An important aspect of a resource
limit system is the ability to react on situations
when the limits are reached. Although the authors
proposes primitives to handle such situations,
they don't describe them precisely since "its semantics
are heavily implementation dependent". Thus, if this
is really the case, how should be a programmer use these
features in a reliable manner? I would expect a formal
connection between the cost semantics, the introduction
of dynamic cost centers, and the listenCC operations
in order to understand their meaning. In a similar way,
the reclamation of resources is presented via an API
but its semantics is considered as "out of scope for the paper".
Thus, one is left with the feeling that some important
aspects are not covered by this approach.

Minor comments:

P. 1: that is desirable --> that it is desirable

P. 1, bottom: You use the abbreviation "IFC" without explaining it
(in contrast to "GHC" which is explained although well known in this
community).

P. 2: ...[17] Previously,...  -->  ...[17]. Previously,...

P. 3, Sect. 3.1: "IO monad to enforce ordering":
This is a bit unclear since there is only one monadic argument.

P. 4: Copyable is type-class  -->  Copyable is a type class

P. 4: We use feature --> We use a feature

P. 4: "we require the programmer to specify the references...":
How is this done?

P. 4: we present modified cost semantics -> we present a modified cost semantics

P. 4: its semantics are heavily --> its semantics is heavily

P. 5: Fig. 1: The language seems to simplified since it contains
only cost centers, abstractions, but no application.
So, how is it connected to real programs which you want to profile?
In this case, you need also typing rules to distinguish cost centers
and other objects.

P. 5, Fig. 2: How is this connected to the cost semantics in Fig. 10?
You only talk about the "new cost semantics rules" but didn't
relate them to the "old" rules. Describe the informal meaning
of derivable formulas.

P. 6: The trickies part -> The trickiest part

P. 6, Sect. 6: Explain "Valgrind Massif".

P. 7: "compiled Haskell code continues to outperform interpreted languages":
If this is a serious argument, the GHC people wouldn't put so much
effort in their optimizers.


----------------------- REVIEW 3 ---------------------
PAPER: 108
TITLE: Resource Limits for Haskell
AUTHORS: Edward Yang and David Mazieres

OVERALL EVALUATION: 0 (borderline paper)
REVIEWER'S CONFIDENCE: 4 (high)

----------- REVIEW -----------
This is a fantastic topic.  Safe Haskell is one of the most promising
possibilities for full-performance execution of untrusted code, and
resource limits have been the major missing piece.

The connection between profiling and resource limits had not
previously occured to this reader and is in fact quite elegant.

The evaluation of web servers in Figures 7 & 8 brings to mind one
unfortunate aspect of the current implementation.  The server that
hosts untrusted code (say by compiling and loading dynamic shared
objects / plugins) must itself be compiled in profiling mode.  This
seems rather unfortunate, but as long as the server can pass pointers
to the untrusted code, I suppose it is unavoidable.  (Though one
wonders about a kind of "foreign function interface" for calling
between unprofiled and profiled code...)

Also, Figure 6 should report the time with resource limits turned on
as well as profiling.  As it stands the first sentence of the
"Performance" subsection makes a comment about costs being "dominated"
by profiling without giving any quantification.  If the entire nofib
benchmark suite runs with resource limit, then those numbers should be
reported, at least in aggregate.  Further, it would be good to see
some microbenchmarks of allocation and GC to get a better sense for
the overheads of profiling.

Otherwise, the evaluation is adequate.

The basic result is that while Safe Haskell has no runtime overhead,
resource limits require a ~2X runtime overhead.  The competiting
approach to the one proposed in this paper would be to have a server
process launch all untrusted Safe Haskell code in its own process, and
use OS-level facilities to enforce resource bounds, killing the
process if it violates them.  
  One nice thing to see in future work would be a comparison of the overheads of
process creation vs. the ~2X runtime overhead of profiling, for realistic server
applications that run untrusted code.


Details:
--------

The code examples in section 2.1 BOTH use the names "f" and "g".  I
know these are simply arbitrary placeholder names (e.g. foo, bar,
baz), but since they are unrelated examples it would be good if they
used different arbitrary placeholder names.

I wasn't previously familiar with the usage of the word "finger" in
the heap management context.  (I had to look it up.)  It's *mostly*
clear from context, but a definition or change of terminology would be
an improvement.

The "Deallocate" bullet, pg 4 col 2: I was confused about the exact
structure of the Map that is maintained.  Bullet (1.) describes it as
mapping labels to deallocation actions.  But if it is to be
maintained, such that it deallocates all references that in turn can
access a given label, then the Map entry must be updated at a write
barrier, correct?  But this is not mentioned in the write bullet (4.).

Or is the implementation instead actually *searching* through the Map to find
everyone who "points to" the reference being deallocated?  Is it an O(1)
operation or O(N) in the size of the Map to do a deallocation?

Section 6 first para:
  The comment about the margin of error being "even smaller" when
there are other threads was confusing.  Other threads have their own
nursery areas, correct?  I assume the meaning was that on average, the
resource limited thread in question won't be the only thread tripping
garbage collection? 

Section 6 very curtly points out the benchmarks in the Figure but
doesn't say anything about what they do or where they came from!

To this reader it still wasn't clear at the end of section 5.2 why
heap census does not happen during garbage collection.  It seems like
it could be turned on or off even if t was part of GC.  Further I
wasn't clear when it *does* happen.  Immediately after GC?

When the cost semantics in Figure 2 are first referenced in the prose the basic
notation should be defined.  I wasn't familiar with the 1995 cost semantics
paper, so I had to search around and eventually find the information back in the
appendix of this submission.


----------------------- REVIEW 4 ---------------------
PAPER: 108
TITLE: Resource Limits for Haskell
AUTHORS: Edward Yang and David Mazieres

OVERALL EVALUATION: -2 (reject)
REVIEWER'S CONFIDENCE: 5 (expert)

----------- REVIEW -----------
Owning up since it will be obvious: I am Simon Marlow.

High level points:

  - It's an interesting idea, and the basic premise is sound: use a
    cost semantics to implement resource limits.  However, I think the
    execution needs work.

  - The paper is confused about semantics.  Unfortunately GHC doesn't
    use evaluation scoping as the paper claims (2.1), we did experiment
    with evaluation scoping for a while but changed tack when some
    problems were uncovered in practice.

    What's strange is that the authors seem to have discovered this; in
    section 5.1 they contradict the earlier claims, saying that GHC
    implements a complex combining function on cost-centre stacks.
    They say that they changed this in their implementation, but for
    performance reasons rather than semantic ones (which is a
    surprise).

    The big problem is that they didn't update the simplifier to
    respect the change in semantics.  Not doing this would lead to GHC
    performing transformations that violate the different semantics.
    For example, GHC's optimiser translates (scc cc (\x . e)) into
    (\x. scc cc e), which is not valid under evaluation scoping.  I
    wonder if this leads to the confusion in section 4 about let
    floating (see below).


Detailed comments

para 2: what's the point you're making here?  I thought it was that the
functional abstraction is leaky, but the paragraph went on to say that
we spend a lot of time optimising functional code to counter the
criticism that functional code is slow.  These two points seem unrelated
to me.

"On the other hand" - what was the first hand?

para 3: conventional wisdom has a point, because profiling can impose
non-trivial overheads on execution time and space usage, and reduce
parallelism.

para 4: you haven't motivated the use of resource limits yet.

Your section references e.g. "(Section 3)" should be before the full
stops (several instances).

"The choice GHC has currently adopted for scc is dynamic scoping."
The Sansom semantics had flat cost centres only, but GHC has since been
extended to model stacks.  So I don't think you can say that GHC's
semantics is equivalent to either Sansom's lexical or dynamic scoping,
although it is closer to lexical scoping because (\x . scc "f" E) is
equivalent to (scc "f" \x. E).  For a while we thought that evaluation
scoping was the right solution, but I since discovered that it has some
bad properties, and the current situation is the scheme that you see
implemented in enterFunCCS() in rts/Profiling.c.  I've given a couple of
talks about this (most recently in the Haskell Implementors Workshop
2012), but there's nothing published.  Some slides explaining the
problems are here:

http://community.haskell.org/~simonmar/Stack-traces.pdf


3.2: If a resource limit is exceeded and the listener is triggered, is
the original thread suspended?  Might that cause a deadlock?  What if it
gets suspended while holding a lock, or making a foreign call?

3.3: I'm somewhat surprised that you can have copy :: a -> a.  I'd
expect compiler optimisations to eliminate the copying.  How do you
avoid that?

"can enforces" -> "can enforce"

"call centres" Do you mean "cost centres"?

The call to withCC in sandboxFun doesn't type-check.  I'm a bit confused here.

3.3: I think this section is missing motivation.  It would really help
the reader's intuition to have a concrete example to put these ideas
into context.  I'm not sure how to evaluate whether what you have
proposed here is a good idea, whether there are other ways to do it, or
whether you need this level of complexity at all.  Under what
circumstances might you need multiple threads with different cost
centres to share mutable references?

Figure 2: As mentioned earlier, you're basing this unfortunately on a
semantics that doesn't correspond to what GHC implements.  I'm slightly
surprised that you didn't notice this when experimenting with your
system.  In particular, you can't ignore cost centre stacks - adding
stacks is a core part of the semantics, not something you can treat
orthogonally. Unfortunately I don't know what the right cost semantics
for stacks is.  Perhaps what you should do is modify GHC to implement
your semantics - bear in mind that there are two places where this
matters: in the RTS and in the simplifier when manipulating expressions
with sccs.

Your let-floating example has a mistake: the last line should be "withCC cc fn".

I don't understand why you think there's a problem with let-floating.
(f n) is a function, so it can be floated anywhere: the cost centre used
when it is applied will be the one from the call site, which is inside
withCC. (this is the case both for evaluation scoping and for GHC's more
complex semantics, which is designed such that both let-floating and
inlining are semantics-preserving).

If your withcc expression form has the following type:

   withcc :: CostCentre -> (a -> b) -> (a -> b)

can't you just implement it like this?

   withcc cc f = \x -> scc cc (f x)

then you don't need to give a semantics to withcc at all.

This section on let-floating belongs in your implementation section.

Since you have a cost semantics, can you say anything formal about the
behaviour of resource limits?

5.1.  So I see that you implemented evaluation scoping.  However, you
probably didn't update the transformations in the simplifier to respect
this change in the semantics, right?

Furthermore, wouldn't you have to change scc so that it doesn't push on
the stack, but just sets the current cost centre?

"tracing and eliminate overhead"

Post author response update: External reviewer writes:

The transformation

   scc f (\x . e) => \x . scc f e

is *not* valid under evaluation scoping, and GHC will perform this
transformation (see CoreUtils.mkTick for details).  As far as I know
they didn't disable this transformation.

They might have set things up so that it just about works, by using
withCC1 (with NOLINLINE?), and not having any other sccs in the program.
But it's a fragile setup, and given that their semantics includes scc
I'd expect their implementation to work too.