Remove the `Either Int` from value2term #87

anka-213 · 2020-11-29T13:53:07Z

This prevents HUGE space leak and makes compiling a PGF a LOT faster

For example, an application grammar moved from taking over 50GB
of ram and taking 5 minutes (most of which is spent on garbage colelction)
to taking 1.2 seconds and using 42mb of memory

The price we pay is that the "variable #n is out of scope" error is now
lazy and will happen when we try to evaluate the term instead of
happening when the function returns and allowing the caller to chose how
to handle the error.
I don't think this should matter in practice, since it's very rare;
at least Inari has never encountered it.

johnjcamilleri · 2020-11-29T16:52:06Z

This sounds very exciting!

I'm not too sure which error you are referring to, could you provide an example which triggers it? I'm curious to compare the behaviours, even if this case is rare or non-existent in the wild.

Does this have any effect on PGF size? I presume the resulting PGF produced with this modification is different, in the sense that it will have a different md5sum. Has any comparison of old/new PGFs been done with say gftest?

inariksit · 2020-11-29T17:52:04Z

@johnjcamilleri In the old version, where value2term returned an Either Int Term, the potential error ("variable #n is out of scope") was caught in 5 places with almost identical behaviour: 1, 2, 3, 4, 5. I've never seen that error in the wild, but maybe @aarneranta or @krangelov can explain when it's supposed to be triggered, so we can test.

It looks to me that the PGFs are identical. I ran gftest on an application grammar as a part of my normal development process (store old PGF compiled with old GF -> implement new feature -> compile with new GF -> gftest the two versions) and got the expected results, that is, only intended changes. I also linearised manually some ad hoc test sentences on the PGF compiled with the new GF, and was happy with them.
Here's a more rigorous test with FoodsEng:

➜ gf -make FoodsEng.gf
Writing Foods.pgf...
➜ mv Foods.pgf /tmp   

➜ gf-old-slow -make FoodsEng.gf
Writing Foods.pgf...

➜ gftest -g Foods.pgf -o /tmp/Foods.pgf 
<TL;DR: identical result>

➜ md5sum Foods.pgf
0e5e30e31e0976cae23884fad2c3b784  Foods.pgf
➜ md5sum /tmp/Foods.pgf
0e5e30e31e0976cae23884fad2c3b784  /tmp/Foods.pgf

johnjcamilleri · 2020-11-29T19:32:25Z

Ah, so this is a GF compilation error we're talking about. If the PGFs have the same md5 they are surely identical, no need to use gftest for that 🙃

anka-213 · 2020-12-02T19:05:15Z

This change should not change the behaviour in any way other than with respect to performance (and possibly error messages), since all I've done is remove the Either Int wrapping from the function value2term, replaced the Left case with a call to error and replaced all the monadic functions in it with their non-monadic counterparts.

The only ways it could change anything is if some code relies on the error of the Left case being thrown swiftly instead of eventually (or possibly never, if the Term is not fully evaluated). My guess is that the Left case would correspond to a compiler bug and not a user error, but I may be mistaken. If it was user-facing, the UX of "variable #637 is out of scope" would be pretty bad.

johnjcamilleri · 2020-12-02T20:44:10Z

Sounds reasonable, Andreas. I would make the same assumption about it indicating a compiler bug.

Since the performance gains are so huge, we should really push this through for inclusion in the next release. But before merging, it would be nice to get the blessing of someone who knows the internals well. Looking at the git history, seems it was @krangelov who introduced the Either Int to value2term here.

johnjcamilleri · 2020-12-10T10:06:25Z

I finally got around to testing the version of GF, however I'm disappointed to say that I haven't noticed any improvement at all 😞
I tried two things:

building the RGL to GFOs
compiling a large application grammar (to PGF) which normally takes 90s and 5.56 GB

In both cases I saw identical time and space usage with the old and new GF versions. I'm pretty sure I was not using the same GF executable for both, as I changed the version number in gf.cabal for this version and confirmed with gf --version before each test.

So, any idea why in certain cases this gives no improvement at all? Or maybe I'm doing something wrong?
Can someone else run a test and produce a noticeable improvement?

inariksit · 2020-12-10T10:40:27Z

@johnjcamilleri Building the RGL to GFOs uses --no-pmcfg flag, so that's already fast. The changes are in effect when compiling to PGF.

The only grammar I've seen with such a dramatic effect is the one I mentioned in the email to you and Aarne on 29th November at 16:23 (DG address). That particular application grammar only compiles to 300 or so concrete categories, so it's very mysterious how come it even took that long before the fix. I have just synced it to DG's repo, so you can easily find the grammar there.

As for other grammars, I see mostly differences in memory usage. For example, compiling the French RG into a PGF takes still over 2 minutes, but it never uses more than 700MB memory on my computer. Contrast with the old version, where I start compiling and it's almost immediately at 3GB. I'll try some others to see if I find a clear test that is still shorter than 2 minutes. :-P

inariksit · 2020-12-10T11:14:52Z

Results for other grammars

Here's a result for the Somali resource grammar:

gf-old-slow -make -v=0 --force-recomp LangSom.gf  72.69s user 1.41s system 99% cpu 1:14.48 total (~1.3GB)
gf          -make -v=0 --force-recomp LangSom.gf  67.21s user 0.83s system 99% cpu 1:08.39 total (~400MB)

Not a huge difference in time, I agree. I had my activity monitor open, and gf-old-slow used 1.3GB memory, gf used around 400 MB.

For Basque RG, the numbers are practically the same, 1.41 GB old vs. 1.37 GB new (tested just once), and both around 24 seconds.

Zulu RG has another pattern, where both versions take around 2GB memory for the bulk of the compilation, and a peak at 6-7 GB, but the new version is quite significantly faster:

gf-old-slow -make -v=0 --force-recomp LangZul.gf  161.49s user 44.77s system 81% cpu 4:13.15 total
gf          -make -v=0 --force-recomp LangZul.gf  156.44s user 34.64s system 87% cpu 3:38.75 total

Wild speculation about the nature of the bug and fix

So if the grammar is actually big, the fix isn't doing miracles. (For example, I still don't manage to produce a PGF of Romanian.) But sometimes GF is unreasonably slow for smaller grammars, in a way that genuinely looks like a bug. Looking at graphs, what seems to improve was the time spent on garbage collection. Here's the before, on that anomalous application grammar:

And here's after:

So this fix seems to work for an anomalous case. While not as dramatic as we hoped for, it's still an improvement---maybe more accurately described as a bugfix? It's funny that the first grammar on which we tried the fix produced a result of 50GB to 42MB. That certainly coloured our expectations :-P

I've been using this new GF for 11 days now, and so far haven't noticed that anything is going worse than it used to. Of course, it would be nice to have guarantees that it didn't cause any new anomalies, like some Foods grammar suddenly taking 10 minutes.

mengwong

case x of y -> return y
seems like a rather wordy way of saying return x
:)

This prevents HUGE space leak and makes compiling a PGF a LOT faster For example, an application grammar moved from taking over 50GB of ram and taking 5 minutes (most of which is spent on garbage colelction) to taking 1.2 seconds and using 42mb of memory The price we pay is that the "variable #n is out of scope" error is now lazy and will happen when we try to evaluate the term instead of happening when the function returns and allowing the caller to chose how to handle the error. I don't think this should matter in practice, since it's very rare; at least Inari has never encountered it.

anka-213 force-pushed the make-it-fast branch 2 times, most recently from a17f107 to 2d580d2 Compare November 29, 2020 14:08

johnjcamilleri added a commit that referenced this pull request Dec 7, 2020

Add note to changelog that #87 is still pending

2b6b315

mengwong reviewed Dec 10, 2020

View reviewed changes

johnjcamilleri mentioned this pull request Jun 30, 2021

GF release 3.11 #85

Closed

anka-213 force-pushed the make-it-fast branch from a431c9b to fadec03 Compare July 12, 2021 07:47

anka-213 added 3 commits July 12, 2021 15:50

Remove last traces of the Either in value2term

b388157

Github actions: Fix build for stack

c2ffa67

anka-213 force-pushed the make-it-fast branch from fadec03 to c2ffa67 Compare July 12, 2021 07:54

Clean up redundant case expressions

7faf8c9

inariksit merged commit 667bfd3 into GrammaticalFramework:master Jul 20, 2021

anka-213 mentioned this pull request May 31, 2022

Improve error messages for str pattern matching #142

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove the `Either Int` from value2term #87

Remove the `Either Int` from value2term #87

Uh oh!

anka-213 commented Nov 29, 2020 •

edited

Loading

Uh oh!

johnjcamilleri commented Nov 29, 2020

Uh oh!

inariksit commented Nov 29, 2020

Uh oh!

johnjcamilleri commented Nov 29, 2020

Uh oh!

anka-213 commented Dec 2, 2020

Uh oh!

johnjcamilleri commented Dec 2, 2020

Uh oh!

johnjcamilleri commented Dec 10, 2020

Uh oh!

inariksit commented Dec 10, 2020

Uh oh!

inariksit commented Dec 10, 2020

Uh oh!

mengwong left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Remove the Either Int from value2term #87

Remove the Either Int from value2term #87

Uh oh!

Conversation

anka-213 commented Nov 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnjcamilleri commented Nov 29, 2020

Uh oh!

inariksit commented Nov 29, 2020

Uh oh!

johnjcamilleri commented Nov 29, 2020

Uh oh!

anka-213 commented Dec 2, 2020

Uh oh!

johnjcamilleri commented Dec 2, 2020

Uh oh!

johnjcamilleri commented Dec 10, 2020

Uh oh!

inariksit commented Dec 10, 2020

Uh oh!

inariksit commented Dec 10, 2020

Results for other grammars

Wild speculation about the nature of the bug and fix

Uh oh!

mengwong left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Remove the `Either Int` from value2term #87

Remove the `Either Int` from value2term #87

anka-213 commented Nov 29, 2020 •

edited

Loading