Julia

REXX Primes

Just a quick sanity check on performance.

/** primes.rexx */
/* Created 2019-09-26 */
/* Copyright © 2019 by Mark Damon Hughes. All Rights Reserved. */

PARSE ARG primeCount
IF \ DATATYPE(primeCount, "W") THEN DO
    SAY "Usage: primes.rexx N"
    EXIT 1
END

CALL clearPrimes
CALL sievePrimes
CALL printPrimes
EXIT 0

clearPrimes: PROCEDURE EXPOSE primes. primeCount
    primes. = 1
    primes.0 = primeCount
    primes.1 = 0
RETURN

sievePrimes: PROCEDURE EXPOSE primes. primeCount
    DO i = 2 TO primeCount
        DO j = (i * i) TO primeCount BY i
            primes.j = 0
        END
    END
RETURN

printPrimes: PROCEDURE EXPOSE primes. primeCount
    DO i = 1 TO primeCount
        IF primes.i THEN CALL CHAROUT , i || " "
    END
RETURN

# REXX: 0.8% C
% time rexx primes.rexx 1000000 >~/tmp/primes-rexx.txt
rexx primes.rexx 1000000 > ~/tmp/primes-rexx.txt  6.43s user 0.36s system 99% cpu 6.831 total

# Regina: 1.53% C
% time regina primes.rexx 1000000 >~/tmp/primes-regina.txt
regina primes.rexx 1000000 > ~/tmp/primes-regina.txt  3.25s user 0.26s system 99% cpu 3.521 total

# Python: 4.8% C
% time ./primes.py 1000000 >~/tmp/primes-python.txt
./primes.py 1000000 > ~/tmp/primes-python.txt  0.75s user 0.02s system 68% cpu 1.123 total

# Julia: 1.4% C
% time ./primes.jl 1000000 >~/tmp/primes-julia.txt
./primes.jl 1000000 > ~/tmp/primes-julia.txt  0.45s user 0.35s system 21% cpu 3.797 total

Most of REXX's bad time can be attributed to using stem variables in a tight loop, effectively string-keyed hashtables, so I'm sure an ooRexx Array implementation would be significantly faster. But stems are what you'd use in "real code", so caveat coder. In a long real-world program I don't think it's as big an issue, but it's definitely not received the kind of optimization love that newer languages have. Aesthetically, the REXX source is a little wordier and more explicit about globals access, but not hard to write or read.

I went ahead and grabbed Regina, and it doubles the speed of ooRexx. I'm still wary of it, but that's a big win.

Python's not terrible at anything; it's within a stone's throw of a compiled language at this point. Competent mediocrity has really made Python the new dynamic Java of our time. But you still can't do multithreading in it.

Julia's slow because the startup time is just atrocious; if I timed it internally after startup it'd be as fast as a compiled native program, but as a scripting language Julia's bad news.

(I do have real work to do, but I'll keep playing with REXX more over the next few days)

Julia More Packaging & Code

I don't just drink coffee or booze, watch movies and Internet drama, and look cool. I code sometimes, too! Who knew?!

Carrying on with my experiment in Julia, packaging has another step needed to make references. For instance, Ansi uses Geometry, so:

Geometry/Project.toml:


authors = ["Mark Damon Hughes "]
name = "Geometry"
uuid = "e3172796-a620-11e8-2cbf-612649bb77f8"
version = "0.1.0"

[deps]

Ansi/Project.toml:


authors = ["Mark Damon Hughes "]
name = "Ansi"
uuid = "72992c94-a620-11e8-3d05-55611ea0dbd0"
version = "0.1.0"

[deps]

Geometry = "e3172796-a620-11e8-2cbf-612649bb77f8"

Ansi/Manifest.toml:


[[Geometry]]
repo-rev = "master"
repo-url = "/Users/mdh/Code/CodeJulia/Geometry"
uuid = "e3172796-a620-11e8-2cbf-612649bb77f8"
version = "0.1.0"

All the boldface code is what I wrote/copy-pasted, the rest is generated by juliaMakePackage.zsh. I may go ahead and make a tool to link projects, because it's so error-prone. In fact, I cheated, and made a single Manifest.toml which I copy to all projects so far, and can replace whenever something updates.

Anyway, this gets me to a nice state where I can write using Ansi in my project and it'll just find it. IIUC, if I move all the libraries to a public repository, I can just change the repo-url and the packages are downloaded into ~/.julia cache somewhere.

I still haven't followed up on making a binary application; the more I look into that, the jankier it seems, more like something to defer until there's an official solution. Putting a real UI on it is also something to work on, but that's much more doable.

Coding

I've written a lot more code, over 1000 LOC, not just screwing around with packages. Mostly this is enjoyable, it's a nice systems programming language. The ugly parts haven't yet driven me insane, they're just things to work around or ignore. Far less frustrating than almost any other new language; Rusty Nail In Your Head and Go Fuck Yourself Its Google aren't my favorites.

Strong typing really is a pain in the ass. Declare a variable or struct field foo, and it takes anything. Type it with foo::AbstractString, and you soon learn nothing is not a string; foo::Union{AbstractString,Nothing} is necessary to be nullable. Ick.

Enumerations

Enumerated types @enum are disappointing. They're a little smarter than C enums, but not as useful as Java enums. They just represent a value; but you have to cast them to Int every time you use them for their value, so too painful to use them as array indices. Or as characters, a thing I like a lot for debugging. And they're not easy to reflect on:

julia> @enum Terrains begin
               Ter_Floor = Int('.')
               Ter_Wall = Int('#')
       end
julia> Ter_Wall
Ter_Wall::Terrains = 35
julia> Int(Ter_Wall)
35
julia> Char(Int(Ter_Wall))
'#': ASCII/Unicode U+0023 (category Po: Punctuation, other)
julia> String(Char(Int(Ter_Wall)))
ERROR: MethodError: no method matching String(::Char)
julia> string(Char(Int(Ter_Wall)))
"#"
julia> # FFS
julia> string(Ter_Floor)
"Ter_Floor"
julia> # Surprisingly easy!
julia> instances(Terrains)
(Ter_Floor::Terrains = 46, Ter_Wall::Terrains = 35)
julia> # Shit, this is a named tuple, not a dictionary!
julia> useful_instances = Dict()
Dict{Any,Any} with 0 entries
julia> for v in values(instances(Terrains))
           useful_instances[ string(v) ] = v
           useful_instances[ string(Char(Int(v))) ] = v
       end
julia> useful_instances
Dict{Any,Any} with 4 entries:
  "Ter_Floor" => Ter_Floor
  "Ter_Wall"  => Ter_Wall
  "#"         => Ter_Wall
  "."         => Ter_Floor
julia> # JFHC

That was an annoying adventure to get a simple reverse lookup.

Julia Local Packaging

So I wanted to move all my common Julia code to a support dir. My filesystem has for 30+ years contained:

$HOME/
    Code/
        CodeC/
            foo/
                src/
                    bar.c
        CodeJava/
            foo/
                src/
                    com/
                        mdh/
                            bar/
                                Quux.java
        et fucking cetera

Build scripts for most languages expect something very like this, and it's easy to import one package's source into another, so I could put common code in a "Marklib" project, and get work done.

Making this happen in Julia was a lot more difficult. With a little help from Slack I made sense of the terrible package documentation for Julia and the incomprehensible errors, and wrote a script juliaMakePackage.zsh:

#!/bin/zsh
if [[ $# -ne 1 ]]; then
    echo "Usage: juliaMakePackage.zsh NAME"
    exit 1
fi
name=$1
devdir=$HOME/Code/CodeJulia
cd $devdir
julia -E "using Pkg; Pkg.activate(\".\"); Pkg.generate(\"${name}\")"
cd $name
git init
git add .
git commit -m "Initial commit"
cd ..
julia -E "using Pkg; Pkg.develop(PackageSpec(url=\"${devdir}/${name}\"))"

And then added the main dir and all packages I make to ~/.julia/config/startup.jl:

# startup.jl

push!(LOAD_PATH, pwd())
push!(LOAD_PATH, "$(homedir())/Code/CodeJulia")
push!(LOAD_PATH, "$(homedir())/Code/CodeJulia/Marklib")

println("READY $(pwd())")

Now finally I can:

% julia
READY /Users/mdh
julia> using Marklib
julia> Marklib.greet()
Hello World!
julia>

And from there start putting in my libs. Each one needs a package and a startup entry; I may have to automate that by walking my code dir. Waste of several hours figuring that out.

Julia String Concatenation

Last time, I was uncertain about string concatenation, so I did a test:

#!/usr/bin/env julia

const kTestText = "abcdefghijklmnopqrstuvwxyz0123456789\n"
const kLoops = 10000

function stringString()
    s = ""
    for i in 1:kLoops
        s = "$s$kTestText"
    end
    return s
end

function bufferString()
    sb = IOBuffer()
    for i in 1:kLoops
        print(sb, kTestText)
    end
    return String(take!(sb))
end

function vectorString()
    sb = Vector()
    for i in 1:kLoops
        push!(sb, kTestText)
    end
    return join(sb, "")
end

function typedVectorString()
    sb = Vector{AbstractString}()
    for i in 1:kLoops
        push!(sb, kTestText)
    end
    return join(sb, "")
end

println("*** stringString")
@timev stringString()
sleep(2)

println("\n*** bufferString")
@timev bufferString()
sleep(2)

println("\n*** vectorString")
@timev vectorString()
sleep(2)

println("\n*** typedVectorString")
@timev typedVectorString()

*** stringString
  1.100197 seconds (21.24 k allocations: 1.725 GiB, 15.94% gc time)
elapsed time (ns): 1100197167
gc time (ns):      175349222
bytes allocated:   1851904041
pool allocs:       11292
non-pool GC allocs:9950
GC pauses:         79

*** bufferString
  0.006864 seconds (11.80 k allocations: 1.134 MiB)
elapsed time (ns): 6864042
bytes allocated:   1189493
pool allocs:       11794
non-pool GC allocs:3
realloc() calls:   8

*** vectorString
  0.017380 seconds (26.68 k allocations: 2.191 MiB)
elapsed time (ns): 17380237
bytes allocated:   2297091
pool allocs:       26659
non-pool GC allocs:9
realloc() calls:   8

*** typedVectorString
  0.031384 seconds (44.75 k allocations: 2.999 MiB, 10.00% gc time)
elapsed time (ns): 31384383
gc time (ns):      3137654
bytes allocated:   3144221
pool allocs:       44730
non-pool GC allocs:8
realloc() calls:   8
GC pauses:         1

Well, there's me told off. I expected #1 typedVector, #2 vector, #3 buffer, then stringString way at the bottom. Instead the first 3 are reversed.

IOBuffer, as ugly as it is, is the clear winner. Vector did OK, but twice as much CPU & RAM loses. Amusing that typedVector is twice as slow and memory-heavy as the untyped (explained ). On larger loops, buffer gets slower, but vector remains a memory pig, and in GC that's unacceptable. Of course stringString is terrible, and it's almost exactly the same for string(s, kTestText).

Time to rewrite some text processing.

More Julia

Decided to take another day on Julia, write something more serious and see how that goes.

There's an uber-juno "IDE" plugin for Atom, which at least turns on syntax highlighting and puts an interactive console in the editor. Yay. It's not capable of linting yet, though it says it is.

So I'm rewriting a simple 1970s-style dungeon crawl game (Chorus:"As if you could make any other kind of game, Mark!" Mark:"I assure you I could, I just choose not to.") as a test of data structures and application programming in Julia. It's a little challenging, but not impossible.

Using Modules

The current directory is not included in the default LOAD_PATH, so you can't import local modules right off. One solution is to put it in your startup:

% mkdir -p ~/.julia/config
% echo '@everywhere push!(LOAD_PATH, pwd())' >>~/.julia/config/startup.jl

All names are in the same namespace. This means your module and a struct or method in it can't have the same names… My current solution has been to pluralize the module, so GridMaps contains a struct GridMap.

The export rules are a little annoying. If you use the @enum macro to make a ton of constants, they aren't exported when you export the enum type; you can either manually export each constant name, or just use ModuleName.ConstantName in other modules. Bleh.

Debugging

Bug #1: The terminator problem is pretty bad in Julia:

% cat Foo.jl
module Foo
for i=1:10
    println(i)
#missing end
end #module

% julia Foo.jl
ERROR: LoadError: syntax: incomplete: "module" at /Users/mdh/Code/CodeJulia/Foo.jl:1 requires end
Stacktrace:
 [1] include at ./boot.jl:317 [inlined]
 [2] include_relative(::Module, ::String) at ./loading.jl:1038
 [3] include(::Module, ::String) at ./sysimg.jl:29
 [4] exec_options(::Base.JLOptions) at ./client.jl:229
 [5] _start() at ./client.jl:421
in expression starting at /Users/mdh/Code/CodeJulia/Foo.jl:1

Good luck finding that missing end if you have a 1000-line module. C used to be just as bad about semicolons and braces, but modern compilers are pretty good at guessing where you fucked up. Python's whitespace-as-control is brilliant, because you can't ever do that. A passable solution would be each control keyword having its own unique end keyword, but it's too late for that. In the actual bug, I had to comment out half the code, run the module, uncomment and comment the other half, repeat until I isolated it.

Bug #2: Type annotations need to be very generic or left off entirely. As code-as-documentation, I declare a function as parseLine(line::String), and it gives me:

MethodError(Main.Foo.parseLine, ("a",), 0x00000000000061bb)

Well, thanks. Turns out I need to use AbstractString, because a prior function returns an AbstractString and not String. Or I can just leave the typing off, as was my first instinct.

String Concatenation

This is super ugly. Currently I've fallen back on:

sb = Vector()
push!(sb, "part of ")
push!(sb, "a string")
return join(sb, "")

Strings aren't mutable, and there's no StringBuffer/NSMutableString equivalent. The other option is to use an IOBuffer. I haven't done timings yet to see which is faster/uses less memory, I just find pushing to a vector simpler.

Switch

There's no switch statement. I could put functions in a dictionary and dispatch on that, which is not ideal for looping over a bunch of simple values, and requires passing around control flags instead of a simple break or return. Caveman solution is a chain of if/elseif, but I don't like it. Possibly a macro could be written?

Binary

Turns out you can make binaries: Julia apps on the App Store: Building and distributing an application written in Julia

It's an ugly process, that Nathan's working around, but it's a start. This should 100% be in the core libraries, and should have a cross-compiler.

Getting this working is tomorrow's problem, I think.

Julia 1.0

Interesting language, originally a math/statistics package but now as general-purpose as any lang. More or less Pythonic, though it has some type-annotation stuff, and heavily-optimized Julia looks like a mess of annotations with your code buried somewhere inside.

The Mac version comes as a dmg with an app (which I'd prefer for easy install/uninstall), or brew (which I prefer not to use). The app just launches a single command in a new Terminal window; add that path/bin to the PATH in your .profile, e.g.:

export JULIA_HOME="$HOME/Applications/Julia-1.0.app/Contents/Resources/julia"
export MANPATH="$MANPATH:$JULIA_HOME/share/man"
export PATH="$PATH:$JULIA_HOME/bin"

And now in Terminal:

% echo 'println("Hello, world!")' >hello.jl
% julia hello.jl
Hello, world!

The only way I can see to make it compile to a binary is embedding, and I'm not clear on how you package that with a full Julia distribution yet. That's unfortunate. I like REPL workflows as much as anyone, but binaries are what "normal" people run.

Getting Started

<voice tone="excessively chipper"> Let's read the manual! </voice>

Syntax is nicer than usual: function/end, if/elseif/else/end, for/end, while/end, begin/end, let/end, which beats the hell out of Python's def, if/elif/else; to say nothing of abominations like Swift's "func". No do/while loop, which is annoying especially for file processing, but I suspect that can be fixed with macros.

There's a lot of ways to write functions, which is nice but allows some ugly choices. Anonymous functions are x->x^2 or function(x) x^2 end; named functions can just be assigned f(x)=x^2 or written in full:

function f(x)
    return x^2
end

Whitespace is not significant, and indentation is not enforced, which is a major bummer for style-enforcing-structure, but I'm sure sloppy jerks will love that.

You can use tuples for multiple returns, or as ad-hoc structures:

> point(x, y) = (x=x, y=y)
point (generic function with 1 method)
> p = point(13, 2)
(x = 13, y = 2)
> a, b = p
(x = 13, y = 2)
> a
13

It's pass-by-reference, not copying, so be careful with mutable data.

My only real kvetch so far is that arrays are 1-indexed and column-major, like FORTRAN, not 0-indexed and row-major, like C. For a numeric package, that makes sense, but for other programming tasks it's frustrating and error-prone, see EWD 831.

This is a functional language, and there are no classes/inheritance/methods, however "methods" are functions which are overloaded based on types, and can be used like class methods:

> quack(x::Int64) = "int $x"
> quack(x::Float64) = "float $x"
> quack(1)
"int 1"
> quack(6.66)
"float 6.66"

As well, you can use closures to make pseudo-classes, the same way you do in Scheme:

let state = 0
    global counter() = (state += 1)
    global counterReset() = (state = 0)
end

struct (immutable) and mutable struct make "Composite types", and can make objects the usual way:

struct Point
    x
    y
end
pointDist(p::Point) = sqrt(p.x^2 + p.y^2)
Base.show(io::IO, p::Point) = print(io, "{$(p.x),$(p.y)}")

> p = Point(6, 6)
{6,6}
> pointDist(p)
8.48528137423857

The default constructor can be overridden at the end of the field list, it's defined as Point(x,y) = new(x,y). The "toString" equivalent there is ugly as hell, but there's a ton of options for overloading it by type of output.

There's a lot of fucking around with generics and strong typing (for weak minds), but ignore all that crap.

Interfaces are a somewhat messy use of several methods to create pseudo-types; define the basic interface methods for your type, and most things calling those interface methods will work. So, a couple iter() functions and you have an iterable, and so on. This would work much better if Julia had an actual OOP class system and real interfaces, but Python half-asses interfaces the same way and aside from being 1000x slower than you'd like, it gets by.

Quickly skimming modules, seems pretty standard import mechanism, but I don't see any way to make something private. OK, I'm bored of reading docs. Let's do something. Something semi-practical here, my standard RPN calculator, one command per line.

Docs/libraries are kind of a mess, Vector is discussed in Base.Arrays, push!/pop! methods are discussed in Collections (an interface). parse is under Numbers, not Strings, as one might expect.

eof() does the extremely unfortunate thing of blocking for input, so it's utterly useless in a main interactive loop.

… About 30 minutes later, I have a working, final version. Well, that was pretty easy, and it's a clean implementation, other than the interactive loop.

Next time I open this, I'll put it in a module, and tokenize the line instead of requiring just one token per line, and have some command-line argument to suppress help and prompts.

I should also investigate IJulia which is a Jupyter notebook, which seems like the "expected" way to make it interactive and handle graphics or media.

RPNCalc.jl

#!/usr/bin/env julia
# RPNCalc.jl
# Copyright ©2018 by Mark Damon Hughes. All Rights Reserved.

stack = Vector()

function checkStack(n)
    if length(stack) < n
        error("Stack underflow: Needs $n values")
    end
end

function parseLine(s)
    s = strip(s)
    if s == "+"
        checkStack(2)
        b = pop!(stack); a = pop!(stack)
        push!(stack, a + b)
    elseif s == "-"
        checkStack(2)
        b = pop!(stack); a = pop!(stack)
        push!(stack, a - b)
    elseif s == "*"
        checkStack(2)
        b = pop!(stack); a = pop!(stack)
        push!(stack, a * b)
    elseif s == "/"
        checkStack(2)
        b = pop!(stack); a = pop!(stack)
        push!(stack, a / b)
    elseif s == "="
        checkStack(1)
        println(stack[end])
    else
        push!(stack, parse(Float64, s) )
    end
end

function main()
    println("RPN Calc: Type numbers or operators (+, -, *, /) one at a time, = to show top of the stack, ^D to end.")
    while true
        print("> "); flush(stdout)
        s = readline(stdin, keep=true)
        if s == ""
            println("Goodbye")
            break
        end
        try
            parseLine(s)
        catch e
            println(e.msg)
        end
    end
end

main()