đ˛ On finding relevant notes when you need them
Another use-case where LLMs are genuinely useful and not overhyped nonsense
After awhile, note taking systems tend to reach a point where you forget what youâve got. Sometimes, though â we want that stuff back. There are a couple of ways to accomplish that, and LLMs are one of the newest.
But first, a reminder: forgetting is natural, normal, and healthy. Thereâs a fancy term for it in child development: synaptic pruning1. But before you can prune, there must be something there. Something that was useful at least once, generally. The question is⌠when it might finally be useful, will you be able to find it again?
Knowledge bases grow and then shrink
A newborn babyâs brain grows a ton â early brain development involves explosive growth, similar to the burst of enthusiasm many folks felt at the beginning of the pandemic when personal knowledge management as a movement began to really take off. The contents of my Obsidian vault grew exponentially as I filled it with everything I needed to offload from my blurry mom-brain so I could still do knowledge work â mostly wrestling with the nerdy academic research I was digging into because I wanted to create a plausible and interesting fantasy world for the book I was writing while my son took naps.
Then he got older, I went back to work, and I took fewer and fewer notes â but the collection was still useful. Iâm extremely glad I curated its contents; I rely on the information there daily, even though I donât take as many notes as I used to by any stretch of the imagination. There are a handful of topics and files that I refer back to almost constantly, others I refer to only regularly â like my potty training notes, reflections on previous Thanksgiving meals, or idealistic plans for what Iâd like my retirement to look like. And there are, of course, notes I made once and never really touched again, because my life changed â the odds of me teaching US history in an environment where I need to worry about structuring a classroom lesson with careful differentiation according to student reading levels are exceptionally low, for example.
When a child hits 2 or 3, the number of synapses in their brain peaks â and the brain starts to remove synapses it doesnât need anymore. It âforgetsâ skills and knowledge that havenât been used enough to justify the cost of keeping.
There are methods to combat forgetting, of course â formal spaced repetition practices are the most popular, although I prefer to think of it as âstructured serendipityâ since Iâm not usually trying to memorize. According to my Readwise database, my personal peak streak for flipping through flashcards of things I wanted to see again later is 121 days. Iâve been considering setting up a flashcard system for memorizing the Pokemon type chart, because my son is a lot more pleasant on hikes if thereâs the promise of spinning a Pokestop every half-mile or so⌠but he doesnât like to lose battles and heâs a little too young to grasp the complexities himself yet.
Like my son, my notes collection is about 4 years old at this point, and I can attest that Iâve done more âpruningâ than adding in the last year or so â I moved the finance stuff into a shared drive with my husband, my work stuff now lives in a shared Notion database with my work colleagues, Iâve deleted a bunch of PDFs that are no longer relevant to my interests, Iâve archived old books I no longer have an interest in finishing because I no longer aspire to the rarefied heights of âtraditionally published authorâ or even the grind of âsuccessful self-published author.â
Notes are valuable resources
But as with the Pokemon type charts, I created most of my notes for a reason. Pulling from my own notes accesses vetted sources instead of the wild world of the internet â I know I liked it enough to read it, if nothing else. But thatâs certainly not the only value: the latent connections between my thoughts and my notes are easier to find if I only need to refresh my memory, not re-learn the thing with different sources that might not align to the overgrown pathways in my brain.
A curated notes collection saves time, and itâs efficient, particularly in an era where internet search is badly broken and even when LLMs donât hallucinate like crazy, they often lack context.
Taking notes is easy, especially if (as with my types chart) you donât bother to re-write everything into your own words. Using them isnât even that hard, really â even if you only vaguely remember what youâre looking for, there are a bunch of ways to find stuff again. Most tools have some kind of search function, whether it uses semantic search, boolean operators, database queries, regular expressions, natural language search⌠or isnât called search at all but relies on up-front organization; stuff like naming conventions, backlinks, structured data classification systems, tagging schema, folder systems⌠itâs generally straightforward to lay hands on an old note. As far back as twenty years ago, browsers kept searchable histories and helpfully changed the color of links that had already been clicked. I know people who still rely almost entirely on their browser search history as a ânotes database.â If a method ainât broke, why fix it?
The hard part is finding useful things when youâre not looking for it, and thatâs where the technology has really improved a lot in the last year or so. Little AIs that live in the corner of your screen, ready to helpfully pop up with a reminder that you already started writing an almost identical article two years ago. Services that get to know you and algorithmically serve you content they think you want, when you want it â memetically offering you coupons for diapers before you even know youâre pregnant. Plugins2 that index your notes, perform magic I barely understand â like vector embedding â and tell you what youâve got thatâs similar to what youâre currently looking at.
The frustrating part of this new technology is that itâs not set up for the careful systems I created three years ago; vector embeddings are made using the contents of your notes, so if you embedded content that is only rendered by a particular program â to avoid redundancy, perhaps â youâre kind of3 out of luck. Worse is if you use consistent templates â if youâre writing chapters of a novel, for example, vector embeddings will get you similarly structured chapters instead of notes relevant to the content of what youâre writing, unless youâre very careful about only indexing certain subsets of your notes. This is maybe not what you want to do if you, for example, want to see other chapters touching on similar themes and also nonfiction notes about that theme, without weighting every chapter that involves particular characters. Vector embeddings are helpful, but they arenât as âsmartâ as all that, alas.
A well-trained LLM levels up search
Historically I havenât used Notion for anything non-collaborative. Itâs slow to load. I donât really like databases, and I donât like the way it âpushesâ me to add icons and banners to everything. Most importantly, I find the way its block-based structure interferes with a simple cmd+a âselect allâ immensely frustrating. While Iâm venting, concatenation and tracking word counts is all but impossible unless I do it manually? Ugh. Plus, the keybinds suck.
The AI, though, is incredible. Being able to feed my unrefined database of raw highlights and annotations to the workspace, do no setup beyond clicking a handful of consent buttons, then ask a question like âshow me all of my notes about infrastructure that might pair well with this point I am making about how Fall of Angels is a fantasy novel that teaches a lot about infrastructureâ (stay tuned for that forthcoming article sometime next spring đ¤Ş) and get⌠useful results, neatly organized into a brief memo, with footnotes sourced to my own personal notes, complete with a carefully curated list of (in this case 14) additional related notes is⌠incredible.
Seriously, I am sufficiently blown away that Iâm going to just share a screenshot of what that slowly growing note looks like, even though itâs messy and disjointed and raw and ugly and probably gives too much of the game away.
The NotionAI is terrible at many things. If you try to use it to create a database property for automatically counting words in a document (which should be table stakes, imo) it hallucinates wildly and is wrong by orders of magnitude. If you ask it whether a particular chapter of a story has a coherent beginning, middle, and end, it cannot tell. It has most of the same painful conversational pitfalls as ChatGPT, which makes sense because as far as I know, Notionâs AI is essentially a wrapper of GPT-3.5.
But wow is it convenient, and whatever the Notion team has done to get it to answer questions like âfind me notes aboutâŚâ really, really works. Itâs always well-sourced, which is what matters⌠and it manages to find hidden gems I had completely forgotten about â although I absolutely cannot trust it to accurately report on the status of those things, and it definitely gets some nuance wrong. For instance, I asked for books and of the three bullets, only one is a book instead of an article. Plus, I already wrote that âInfrastructure in Ancient Civilizationsâ article it says I could develop (although I had completely forgotten about it) â compare the actual content of my note to NotionAIâs report on it.
Now, sure, if I search the raw term âinfrastructureâ with old-fashioned search, I get twice as many results. But they arenât as useful as even a quasi-accurate memo. Compare:
Of my top results, only one is in any way useful in any way to the project at hand (that Chinese transport network thing, which I definitely forgot about entirely until now), and it showed up in the curated list of 14 the AI gave me. The snippet of text thatâs displayed isnât clearly connected to anything Iâm thinking about, and it takes a lot more mental effort to sift through and remember why it might matter.
I ought to figure out how to do this in Obsidian too, but honestly⌠that is up there with training my own LLM model in terms of âprobably a good idea, donât really have time.â Iâm just glad that LLMs have leveled up search, because as Googleâs dominance in the last era of the internet demonstrates â search is key.
Midjourney makes beautiful images, and Elevenlabs seems to have nailed text to speech⌠but for me search, and summarization, are the killer use-cases for AI â which is why Elicit (which automates time-consuming research tasks like summarizing papers, extracting data, and synthesizing findings) has historically been one of the apps I point to when people ask me what AI is even good for, anyway.
Yeah, thereâs lots of hype â but I truly believe that figuring out how to leverage large language models for search is as key to success in the current era as learning boolean operators was 20 years ago.
Hereâs the basic primer on synaptic pruning I used as a reference while writing this.
The one I use for Obsidian is called Smart Connections, but as nice and helpful as the developer is, sometimes I feel too stupid to get the most out of it.
Although Obsidian does have plugins like âeasy bakeâ, which will take embedded content and copy or move the text â and itâs possible to create code scripts that do the same thing in more custom ways. Even Iâve done it with javascript or python and some kind help, and Iâm no coder â tools like Copilot and ChatGPTâs strawberry model make this even easier now. Thanks to ChatGPT, I managed to install homebrew in the terminal without asking for help, after bouncing right off the documentation I found using GoogleâŚ




Enjoyed and agreed with raves and rants. We're in that awkward space between new tech rollout and UI's that make the tech understandable, not to mention useable.
Every couple of months I install Smart Connections and end up cursing. I'm not stupid, we shouldn't be made to feel that way, and I refuse to use products with lofty pitches that do not deliver like this one - Spend less time linking, tagging and organizing because Smart Connections finds relevant notes so you don't have to!
I am currently indexing all of my documents locally using GPT4All and Meta's model Llama 3 8B Instruct. If it works well, then I'll try out the Obsidian plugin Copilot for Obsidian which can call the above model in GPT4All, keeping everything local. (Bad product name: It has nothing to do with Microsoft's Copilot). Fingers crossed. If you're interested in some eye candy, here's the dev's latest video: https://youtu.be/1jSaGwuPiJs