Julia String Concatenation

Last time, I was uncertain about string concatenation, so I did a test:

#!/usr/bin/env julia

const kTestText = "abcdefghijklmnopqrstuvwxyz0123456789\n"
const kLoops = 10000

function stringString()
    s = ""
    for i in 1:kLoops
        s = "$s$kTestText"
    end
    return s
end

function bufferString()
    sb = IOBuffer()
    for i in 1:kLoops
        print(sb, kTestText)
    end
    return String(take!(sb))
end

function vectorString()
    sb = Vector()
    for i in 1:kLoops
        push!(sb, kTestText)
    end
    return join(sb, "")
end

function typedVectorString()
    sb = Vector{AbstractString}()
    for i in 1:kLoops
        push!(sb, kTestText)
    end
    return join(sb, "")
end

println("*** stringString")
@timev stringString()
sleep(2)

println("\n*** bufferString")
@timev bufferString()
sleep(2)

println("\n*** vectorString")
@timev vectorString()
sleep(2)

println("\n*** typedVectorString")
@timev typedVectorString()

*** stringString
  1.100197 seconds (21.24 k allocations: 1.725 GiB, 15.94% gc time)
elapsed time (ns): 1100197167
gc time (ns):      175349222
bytes allocated:   1851904041
pool allocs:       11292
non-pool GC allocs:9950
GC pauses:         79

*** bufferString
  0.006864 seconds (11.80 k allocations: 1.134 MiB)
elapsed time (ns): 6864042
bytes allocated:   1189493
pool allocs:       11794
non-pool GC allocs:3
realloc() calls:   8

*** vectorString
  0.017380 seconds (26.68 k allocations: 2.191 MiB)
elapsed time (ns): 17380237
bytes allocated:   2297091
pool allocs:       26659
non-pool GC allocs:9
realloc() calls:   8

*** typedVectorString
  0.031384 seconds (44.75 k allocations: 2.999 MiB, 10.00% gc time)
elapsed time (ns): 31384383
gc time (ns):      3137654
bytes allocated:   3144221
pool allocs:       44730
non-pool GC allocs:8
realloc() calls:   8
GC pauses:         1

Well, there's me told off. I expected #1 typedVector, #2 vector, #3 buffer, then stringString way at the bottom. Instead the first 3 are reversed.

IOBuffer, as ugly as it is, is the clear winner. Vector did OK, but twice as much CPU & RAM loses. Amusing that typedVector is twice as slow and memory-heavy as the untyped (explained ). On larger loops, buffer gets slower, but vector remains a memory pig, and in GC that's unacceptable. Of course stringString is terrible, and it's almost exactly the same for string(s, kTestText).

Time to rewrite some text processing.

More Julia

Decided to take another day on Julia, write something more serious and see how that goes.

There's an uber-juno "IDE" plugin for Atom, which at least turns on syntax highlighting and puts an interactive console in the editor. Yay. It's not capable of linting yet, though it says it is.

So I'm rewriting a simple 1970s-style dungeon crawl game (Chorus:"As if you could make any other kind of game, Mark!" Mark:"I assure you I could, I just choose not to.") as a test of data structures and application programming in Julia. It's a little challenging, but not impossible.

Using Modules

The current directory is not included in the default LOAD_PATH, so you can't import local modules right off. One solution is to put it in your startup:

% mkdir -p ~/.julia/config
% echo '@everywhere push!(LOAD_PATH, pwd())' >>~/.julia/config/startup.jl

All names are in the same namespace. This means your module and a struct or method in it can't have the same names… My current solution has been to pluralize the module, so GridMaps contains a struct GridMap.

The export rules are a little annoying. If you use the @enum macro to make a ton of constants, they aren't exported when you export the enum type; you can either manually export each constant name, or just use ModuleName.ConstantName in other modules. Bleh.

Debugging

Bug #1: The terminator problem is pretty bad in Julia:

% cat Foo.jl
module Foo
for i=1:10
    println(i)
#missing end
end #module

% julia Foo.jl
ERROR: LoadError: syntax: incomplete: "module" at /Users/mdh/Code/CodeJulia/Foo.jl:1 requires end
Stacktrace:
 [1] include at ./boot.jl:317 [inlined]
 [2] include_relative(::Module, ::String) at ./loading.jl:1038
 [3] include(::Module, ::String) at ./sysimg.jl:29
 [4] exec_options(::Base.JLOptions) at ./client.jl:229
 [5] _start() at ./client.jl:421
in expression starting at /Users/mdh/Code/CodeJulia/Foo.jl:1

Good luck finding that missing end if you have a 1000-line module. C used to be just as bad about semicolons and braces, but modern compilers are pretty good at guessing where you fucked up. Python's whitespace-as-control is brilliant, because you can't ever do that. A passable solution would be each control keyword having its own unique end keyword, but it's too late for that. In the actual bug, I had to comment out half the code, run the module, uncomment and comment the other half, repeat until I isolated it.

Bug #2: Type annotations need to be very generic or left off entirely. As code-as-documentation, I declare a function as parseLine(line::String), and it gives me:

MethodError(Main.Foo.parseLine, ("a",), 0x00000000000061bb)

Well, thanks. Turns out I need to use AbstractString, because a prior function returns an AbstractString and not String. Or I can just leave the typing off, as was my first instinct.

String Concatenation

This is super ugly. Currently I've fallen back on:

sb = Vector()
push!(sb, "part of ")
push!(sb, "a string")
return join(sb, "")

Strings aren't mutable, and there's no StringBuffer/NSMutableString equivalent. The other option is to use an IOBuffer. I haven't done timings yet to see which is faster/uses less memory, I just find pushing to a vector simpler.

Switch

There's no switch statement. I could put functions in a dictionary and dispatch on that, which is not ideal for looping over a bunch of simple values, and requires passing around control flags instead of a simple break or return. Caveman solution is a chain of if/elseif, but I don't like it. Possibly a macro could be written?

Binary

Turns out you can make binaries: Julia apps on the App Store: Building and distributing an application written in Julia

It's an ugly process, that Nathan's working around, but it's a start. This should 100% be in the core libraries, and should have a cross-compiler.

Getting this working is tomorrow's problem, I think.

Julia

Interesting language, originally a math/statistics package but now as general-purpose as any lang. More or less Pythonic, though it has some type-annotation stuff, and heavily-optimized Julia looks like a mess of annotations with your code buried somewhere inside.

The Mac version comes as a dmg with an app (which I'd prefer for easy install/uninstall), or brew (which I prefer not to use). The app just launches a single command in a new Terminal window; add that path/bin to the PATH in your .profile, e.g.:

export JULIA_HOME="$HOME/Applications/Julia-1.0.app/Contents/Resources/julia"
export MANPATH="$MANPATH:$JULIA_HOME/share/man"
export PATH="$PATH:$JULIA_HOME/bin"

And now in Terminal:

% echo 'println("Hello, world!")' >hello.jl
% julia hello.jl
Hello, world!

The only way I can see to make it compile to a binary is embedding, and I'm not clear on how you package that with a full Julia distribution yet. That's unfortunate. I like REPL workflows as much as anyone, but binaries are what "normal" people run.

Getting Started

<voice tone="excessively chipper"> Let's read the manual! </voice>

Syntax is nicer than usual: function/end, if/elseif/else/end, for/end, while/end, begin/end, let/end, which beats the hell out of Python's def, if/elif/else; to say nothing of abominations like Swift's "func". No do/while loop, which is annoying especially for file processing, but I suspect that can be fixed with macros.

There's a lot of ways to write functions, which is nice but allows some ugly choices. Anonymous functions are x->x^2 or function(x) x^2 end; named functions can just be assigned f(x)=x^2 or written in full:

function f(x)
    return x^2
end

Whitespace is not significant, and indentation is not enforced, which is a major bummer for style-enforcing-structure, but I'm sure sloppy jerks will love that.

You can use tuples for multiple returns, or as ad-hoc structures:

> point(x, y) = (x=x, y=y)
point (generic function with 1 method)
> p = point(13, 2)
(x = 13, y = 2)
> a, b = p
(x = 13, y = 2)
> a
13

It's pass-by-reference, not copying, so be careful with mutable data.

My only real kvetch so far is that arrays are 1-indexed and column-major, like FORTRAN, not 0-indexed and row-major, like C. For a numeric package, that makes sense, but for other programming tasks it's frustrating and error-prone, see EWD 831.

This is a functional language, and there are no classes/inheritance/methods, however "methods" are functions which are overloaded based on types, and can be used like class methods:

> quack(x::Int64) = "int $x"
> quack(x::Float64) = "float $x"
> quack(1)
"int 1"
> quack(6.66)
"float 6.66"

As well, you can use closures to make pseudo-classes, the same way you do in Scheme:

let state = 0
    global counter() = (state += 1)
    global counterReset() = (state = 0)
end

struct (immutable) and mutable struct make "Composite types", and can make objects the usual way:

struct Point
    x
    y
end
pointDist(p::Point) = sqrt(p.x^2 + p.y^2)
Base.show(io::IO, p::Point) = print(io, "{$(p.x),$(p.y)}")

> p = Point(6, 6)
{6,6}
> pointDist(p)
8.48528137423857

The default constructor can be overridden at the end of the field list, it's defined as Point(x,y) = new(x,y). The "toString" equivalent there is ugly as hell, but there's a ton of options for overloading it by type of output.

There's a lot of fucking around with generics and strong typing (for weak minds), but ignore all that crap.

Interfaces are a somewhat messy use of several methods to create pseudo-types; define the basic interface methods for your type, and most things calling those interface methods will work. So, a couple iter() functions and you have an iterable, and so on. This would work much better if Julia had an actual OOP class system and real interfaces, but Python half-asses interfaces the same way and aside from being 1000x slower than you'd like, it gets by.

Quickly skimming modules, seems pretty standard import mechanism, but I don't see any way to make something private. OK, I'm bored of reading docs. Let's do something. Something semi-practical here, my standard RPN calculator, one command per line.

Docs/libraries are kind of a mess, Vector is discussed in Base.Arrays, push!/pop! methods are discussed in Collections (an interface). parse is under Numbers, not Strings, as one might expect.

eof() does the extremely unfortunate thing of blocking for input, so it's utterly useless in a main interactive loop.

… About 30 minutes later, I have a working, final version. Well, that was pretty easy, and it's a clean implementation, other than the interactive loop.

Next time I open this, I'll put it in a module, and tokenize the line instead of requiring just one token per line, and have some command-line argument to suppress help and prompts.

I should also investigate IJulia which is a Jupyter notebook, which seems like the "expected" way to make it interactive and handle graphics or media.

RPNCalc.jl

#!/usr/bin/env julia
# RPNCalc.jl
# Copyright ©2018 by Mark Damon Hughes. All Rights Reserved.

stack = Vector()

function checkStack(n)
    if length(stack) < n
        error("Stack underflow: Needs $n values")
    end
end

function parseLine(s)
    s = strip(s)
    if s == "+"
        checkStack(2)
        b = pop!(stack); a = pop!(stack)
        push!(stack, a + b)
    elseif s == "-"
        checkStack(2)
        b = pop!(stack); a = pop!(stack)
        push!(stack, a - b)
    elseif s == "*"
        checkStack(2)
        b = pop!(stack); a = pop!(stack)
        push!(stack, a * b)
    elseif s == "/"
        checkStack(2)
        b = pop!(stack); a = pop!(stack)
        push!(stack, a / b)
    elseif s == "="
        checkStack(1)
        println(stack[end])
    else
        push!(stack, parse(Float64, s) )
    end
end

function main()
    println("RPN Calc: Type numbers or operators (+, -, *, /) one at a time, = to show top of the stack, ^D to end.")
    while true
        print("> "); flush(stdout)
        s = readline(stdin, keep=true)
        if s == ""
            println("Goodbye")
            break
        end
        try
            parseLine(s)
        catch e
            println(e.msg)
        end
    end
end

main()