Writing a Personal Knowledge Management System in Haskell: Introduction

Haskell, the language of gods

Sometimes ideas kinda flutter in surrounding space and enter your brain from several different angles. Of course it comes from Baader–Meinhof, the first time you simply pay enough attention to have the idea planted in your mind, then when you hear about it next time the chances you’ll go look it up increase, and then personalised recommendations make it certain the idea will always be around until you fall for it.

So, i had this idea to look into Haskell. The language Matrix is programmed in, you know. The programming language of Amber, all others being the reflections of it in our imperfect worlds.

I remember reading a lot about it at some point in the past. I learned about pure functions and monads back then, but the syntax scared me, and i did not really progress much.

This time, however, it made much more sense. Even the lambda calculus and the category theory are relatively easy to understand without maths background once they are distilled in the form of a Youtube video.

It is a high level language at its finest, making all other low level languages (that is, most other languages) pale in its presence. Even Rust feels like syntactic bloat when instead of say

response = answer.and_then(|s| s.parse :: <i32>().ok()).map(|n| if n==42 { "welcome" } else { "go back" })

you can write

response = bool "go back" "welcome" . (== 42) <$> answer >>= readMaybe

This is exactly the kind of syntax that scared me off in the past, but after understanding the monads it feels very natural, as if you are simply getting the result from a value that goes processed down some kind of a pipe from right to left. Also, less parentheses is always good. Reminds me of Ruby, my programming mother tongue.

And you don’t really need the mathematical understanding of monads (although after i learned what monads (and functors, and monoids) are in Haskell i could easier understand the corresponding mathematical concepts too). It’s enough that you have been working with monads for some time already because Rust’s Option and Result types are effectively monads.

A long dreamed of hobby project

Of course i needed a hobby project to work on so that i could have my Haskell learning journey.

And my thoughts quickly turned back to a certain idea that used to feel out of reach.

I have been using Logseq for quite some time as my personal knowledge database, especially while i’ve been researching some vast topic that needed to be structured as you follow along. For example, learning about Māori culture certainly requires me to be making notes. And i have always experienced sadness that i have to open the app and use its clumsy editor. And editing text files in vim was hampered by the horrible hyphen syntax Logseq uses to denote blocks. And of course, due to the lack of access to reference tracking and all things that make PKM databases so powerful too.

I could use Obsidian instead, yes, and if we peek into the future of this story, i actually implemented a system closer to Obsidian than to Logseq. But i always disdained it for not being open source, and having to use the app instead of vim is also bad.

Everyone would say ‘just use the vim mode, there are plugins for each app’ but i have realised long ago it’s not the keybindings that are most valuable for me in vim. It is the !! command. I don’t need any fancy functions in my editor as long as i can write a filter and i can use it on any text i edit. And of course it’s good to have access to your shortcuts and other things like digraphs, etc. that are never supported by any vim modes or keybindings in other apps.

There are of course existing vim plugins but they are either bloated or are written in a language i don’t quite like. For instance, how can i use something written in TypeScript outside of a browser, honestly? It’s an offence in the gods’ eyes.

So i wanted to have Logseq, but completely in vim, without the need to ever leave it. But i’ve always thought it to be something too complex for mere me, and never had the courage to even try. Tiene el miedo muchos ojos.

But now, when learning is further facilitated by a multitude of LLMs, i no longer feared it.

And it seemed like a perfect candidate for Haskell too! After all, Logseq is written in another functional programming language, Clojure.

The Prototype

However, designing the architecture and trying to implement it in the language i’ve only started learning turned out to be too difficult still, therefore i began with a prototype.

The most natural choice of the language would be Ruby, but it would be too easy, so i opted for shell script. Zsh to be exact as i’m using it on my mac, and it makes a lot of things simpler than POSIX shell.

In the hindsight, it was a great idea to go this way! I could not do anything complex in shell script, you just don’t want to spend too long drowned by that horrible syntax, lest use it for any heavy computations. Thus, i had to simplify the design as much as i could. That’s how it came out more obsidianish than logsequish.

As a result, i used frawk for metadata parsing, ripgrep for searching through files, and sqlite3 was of course the natural choice to manage the references index that comprises the core of the system.

As a result, the prototype is fully usable albeit basic, and ironically, i even have my doubts that the Haskell version will be faster.

I even had a glimpse of thought maybe i could have two code bases, one for those who want to try the system without having to install the Haskell compiler, but then the bugs started to become more and more insane, and the shell script approach thus reached its limits.

The Architecture

After i unwinded the apparent complexity of a giant app like Logseq, i realised that the core is the backlinking mechanics.

I needed to parse each document’s metadata, specifically the title and any aliases it might have (i use those heavily in Logseq as they are indispensable when it comes to topics that have multiple language versions or can be written in multiple scripts) and create an index of those, then parse the contents of the documents and see whether they contain any mentions of those titles and aliases.

One decision that greatly simplified the architecture was to get rid of the distincion between linked and unlinked references. Logseq conceals unlinked references by default requiring you to click a triangle to show them. That used to make me obsessed with premature linking of anything i could think of, which in the end was only a distraction. What if instead i did not have to link anything at all? (unless i clearly want to be able to jump from a certain place in one document to another) Instead i can have totally decluttered notes, and the search that no longer requires complex parsing! Any mentions of a title create a reference! Of course, it presents a danger of false linking if an alias is short enough to occur as a part of some irrelevant word, but i guessed that would not often be the case for me, and when it would, i can use tricks like aliases with spaces to minimise those false references.

I also decided to use shell commands instead of running it as a server. Otherwise, LSP would be an obvious choice, and in fact, the easiest way to escape implementing your own system is to combine Obsidian with Marksman markdown LSP in Neovim. However, when making notes you don’t really need any online functions like immediately showing you something as you type, nothing like that. That would only create distraction. You only ever need your document references updated and reloaded every time you save the file, the links to be resolved when you are trying to follow a link, and the only autocompletion you need is when you make an actual link in your document.

The core and the most complex part is of course the update command that does file parsing and creates/updates the index. Any other commands are trivial, and at most they include a single query to the database. For the prototype i separated them into single shell scripts that are invoked by a common runner (in order to disambiguate, so that your find command does not clash with the system find command, and i like simple command names) Of course, naming might seem as unimportant because the commands are intended to be run by a plugin. Still, i want to leave an option to use the system even without an editor that supports plugins, like ed. With the caveat that you’ll have to invoke reindexing manually of course. And clean interface makes it more pleasant to write plugins for other editors too, if need be.

For the Haskell implementation i’d rather have a single binary that runs all commands. Thus, the first thing to implement is parsing the command line arguments. However, this introduction is already rather long, so i’d better leave it until the next article of the series.

Thanks for reading, and may all be auspicious!

Writing a Personal Knowledge Management System in Haskell: Introduction

Haskell, the language of gods

A long dreamed of hobby project

The Prototype

The Architecture

other posts