Anki scripting crimes, part 4: Using media storage to implement offline support

Introduction

We already learned a lot about scripting on and with Anki cards. This time we are going to learn about loading additional data from our scripts on-demand in a way that works offline on all Anki platforms like AnkiDroid and Anki Mobile.

Of all techniques you are going to learn in this series the ones presented in this post are possibly the most hacky and deranged, but bear with me, it's all going to make sense in the end.

As with the other posts in this series, this one too is going to be a hands-on tutorial. You can continue with the code from the last post or from its solution. This post has a solution as well.

The problem with additional files

In the first post, we added animated Kanji by putting them in a well-defined place with an ID, then reading the Kanji in JavaScript, and using it their text value to create an animated character with HanziWriter in the same div.

Under the hood, HanziWriter will load a JSON from the internet that contains SVG data and additional information it requires to show the animation. This is not really a problem when you are practicing Kanji at home on your computer, but what if you are sitting on an Airplane, or are running on a slow or expensive internet connection? Then your animated Kanji will either take longer to load, or not work at all.

Similar problems arise not just with character SVGs, but with any data you need on your template that is not directly connected to a card. Say, your cards have a unicorn and a duck theme that you can switch on the card, each with their own background image. Where do you put unicorn.png and duck.png?

Asked more generally: where do you put it additional files, if Anki only has a textbox for HTML and CSS, but not for files?

`hanzi-writer-data`

Lucky for us, the author of HanziWriter, chanind anticipated the problem of unreliable or missing internet connection when using HanziWriter in a mobile app. To get around this, they publish the data separately in the hanzi-writer-data module, so that we can package it with an apps (or in our case, an Anki template) in a suitable way, and then we can use customization options in HanziWriter itself to load the data from another source.

Let's install the data locally by adding it to our package.json:

npm install --save-dev hanzi-writer-data

If you now check in node_modules/hanzi-writer-data, you'll see a big pile of JSON files like 一.json, 丁.json and so on. This is the data HanziWriter needs to render a particular character.

Let's use this JSON data as a case study for how we would embed data in general.

Embed everything

We actually already know of one solution to the problem. When we wanted to add JavaScript, we chose to rely on a tool called parcel to transform the script files we had into inline scripts, so that they became part of the HTML.

With parcel, you can't just import other scripts, but pretty much anything, including CSS or JSON files (like the ones used by hanzi-writer), and they will also just get embedded into the HTML. You can even do wildcard imports, and import all of the JSONs into your HTML in one line. Images and other binary data can also be inlined using data URLs in parcel.

While embedding everything like this is a perfectly fine solution for moderate amounts of data (say, <100KB), this becomes impractical and inefficient when you are dealing with a lot of data. All of this data is going to be parsed an needs to fit into memory, if you need it or not. hanzi-writer-data for example is 47MB in size total, but each vocabulary card individually contains only a small fraction of the characters that HanziWriter can render.

I tried it anyway for HanziWriter once, after a few other attempts to make it work with external files failed for me. Embedding worked surprisingly well on Desktop, but performance on AnkiDroid took a big hit, making it much more sluggish. Friends with older smartphones either got even worse UX than me, or the cards would just hang and not work at all. It was worse than no offline support at all. A good takeaway would be: don't embed excessive amounts of data directly in the template HTML.

Adding data to Anki media storage

It's not that Anki doesn't support files, it supports them quite well actually, it's just that there is no dedicated feature to use files in templates.

Cards, or rather notes, are a different matter. When you edit them in Anki, you can add images and other media, and Anki will store them in media storage, and sync them between your devices, so that they always work offline. When you export an APKG within Anki, then the export will contain all of the media that the cards use, and when someone else imports it, it will work for them offline just as well as it does for you.

A funny quirk about Anki's media storage is that it allows you to store pretty much everything as media, as long as it's used on a card in a way that Anki recognizes as using the file. If you have <img src="personal_finances.xls"> in any field of a note type (can be unused), and a file named personal_finances.xls actually exists in your media storage, then Anki will happily sync the Excel sheet with your personal finances between devices. Exporting an APKG with the note that contains the "image" will include the Excel sheet as part of the file. Notably, adding "images" like this to your template without the detour through a note (putting <img src="personal_finances.xls"> in your HTML) will not be picked up by Anki. It may work if you manually put the file into storage, but it will not be included in any exports this way, which is a non-starter for our approach of generating APKGs from scripts. In order to work with extra files on templates in an offline-compatible manner that works well with exporting APKGs, notes and templates have to work together. There must be a note using the files as an "image", then the template can reliably download the file.

Like most features of Anki, you are not limited to the UI when working with the media storage, but can do the same thing from python, where an API exists to add files to it. The same limitation applies that exports only include media that is used on a note field. Consequently, everything you have added needs an accompanying "image" on a note in the export. Any note will work and it's usually enough to add all of the files as images just to the first note and to omit them on all others.

Making up images just for the sake of exporting a non-image is clearly a hack, but I couldn't find any other way to make it work.

Accessing data from Anki media storage in a script

Are we good to go then, just add them to media storage, use all of the JSONs as an "image" on a card, and then load via fetch or XMLHttpRequest in JS?

Not quite yet. While fetch("一.json") does work on Anki Desktop and Anki Web, it won't on AnkiDroid. The problem is a bit weird. On AnkiDroid, files in the media storage are treated as having a different origin than the template HTML. The Android web view implements the same-origin policy accordingly, and will let us use the media storage for many things, like the src of an img, iframe or script tag, but no matter what we do, our script will not be allowed to read any data from the other origin. This makes sense from a security standpoint in other contexts, because with this policy a phishing site won't be able to read anything from your banking website by sending a fetch request to it and steal all your data.

When scripting with AnkiDroid, though, it just stands in our way. Just adding the JSON would already get you quite far, you could even see the JSON in an iframe. You could even use the JSON as the src of a script and AnkiDroid will "execute" it, but JSON is not executable, right?

JSON-P: A cursed technique from the deep past comes to the rescue

No, but JSON-P is! Unless you have been in the web game for a long time, you might have never needed it, but now as a dedicated Anki hacker, you do. Prepare yourself for a blast from the past.

While largely obsolete in web development these days, where other techniques are available to selectively allow other origins, we can't use these newer and better techniques in AnkiDroid because we have no control over the headers and other settings used. So we turn to JSON-P. This technique cleverly exploits a few properties of JSON and of the same-origin policy.

JSON is not quite JavaScript, but very closely related. The contents of most JSON files could appear anywhere in a JS file where a value would be expected, such as the right-hand side of a variable assignment, or an argument that you pass to a function when calling it. The JSON-P approach to make the JSON file {a:1} executable, would be to rewrite it as handleResult({a:1}), that is, to write it as a function call, and to get the parsing of the JSON for free because it can be evaluated as JavaScript in this context. JSON-P means JSON with padding, and the padding is a function call that we wrap around the data.

Single-origin policy allows us to load scripts, so if we load the JSON-P as a JavaScript, we still won't be able to read the source code, but we won't have to because the callback function handleResult gets called with {a:1}, which is the data we actually care about. There it is, a way to load JSON data from another origin, just by bending over backwards a bit and squinting very hard at specifications.

Preparing hanzi-writer-data for JSON-P

Most JSON-P APIs have a parameter that let you choose the name of the callback function, which is very useful to keep files apart. In our case, we don't really have a server, so to tell files apart we are going to use an additional first parameter that is the filename, before passing the actual data as the second parameter.

This would be easy to do from python, but let's save some time by re-using something from NPM that does this transformation for us:

npm install --save-dev to-static-jsonp
node_modules/.bin/to-static-jsonp --fn gotIt node_modules/hanzi-writer-data/*.json --outDir media

If you look in the new directory media, you'll see a bunch of JS files that look like this:

gotIt("一.js",{"strokes":["M 518 382 Q 572 385 623 389 Q 758 399 900 383 Q 928 379 935 390 Q 944 405 930 419 Q 896 452 845 475 Q 829 482 798 473 Q 723 460 480 434 Q 180 409 137 408 Q 130 408 124 408 Q 108 408 106 395 Q 105 380 127 363 Q 146 348 183 334 Q 195 330 216 338 Q 232 344 306 354 Q 400 373 518 382 Z"],"medians":[[[121,393],[193,372],[417,402],[827,434],[920,401]]]})

That's the SVG data that hanzi-writer needs, with a way for us to tell the files apart via the first parameter.

Changing our template to use the JSON-P files

Before we add the files to Anki media storage, let's first write the code that loads the JSON-P. We'll call it templates/kanji/jsonp.js and use this as a basic version without error handling:

const callbacks = {}

window.gotIt = (filename, data) => callbacks[filename](data)

export async function loadStaticJsonP(filename) {
  return new Promise((resolve, reject) => {
    const inFlight = callbacks[filename]
    if (inFlight === undefined) {
        const script = document.createElement("script")
        callbacks[filename] = data => {
            script.remove()
            delete callbacks[filename]
            resolve(data)
        }
        script.src = filename
        document.body.appendChild(script)
    } else {
        // already loading the same file, call the existing callback first
        callbacks[filename] = data => {
            inFlight(data)
            resolve(data)
        }
    }
  })
}

To conclude our changes, let's use our new module to make HanziWriter use our files instead of the internet. Let's modify animation.js to include a charDataLoader that uses our function:

import HanziWriter from "hanzi-writer"
import { loadStaticJsonP } from "./jsonp"

const kanji = document.getElementById("kanji").innerText

const hanziWriterContainer = document.getElementById("kanji-animation")

HanziWriter.create(hanziWriterContainer, kanji, {
  charDataLoader(char) { return loadStaticJsonP(`${char}.js`) }
}).animateCharacter()

That's it for the template, it now uses JSON-P to load data. Let's build it:

node_modules/.bin/parcel build templates/**/*.html

Aside: the new loading technique won't work in parcel's live preview mode as-is, but we'll see later that it will work in Anki. You can add parcel support yourself as an exercise. One option to make it work in parcel is to use a .proxyrc.js with serve-static. If you are interested, check out .proxyrc.js and data.ts in the Kartenaale templates. For this blog post, we'll just make do with testing directly in Anki later.

Adding the JSON-P data to the media storage in python code

Now we have the JSON-P data in a directory, we have the JS code to use it, all that's left now is to add it to Anki in a way that can be exported.

Let's modify create_from_scratch_with_csv.py to make it work.

First let's update the part at the start where we delete existing collections to also delete the accompanying media database:

try:
    # delete if exists already
    os.remove("temporary.anki2")
    os.remove("temporary.media.db2")
except OSError:
  pass
# then start fresh
col = Collection("temporary.anki2")

Then, we need "images" in a field to include the JSON-P in our APKG export, so our first task in python will be to add a new fourth field that we are just going to use for the image hack. Let's add this to our flds array to hold the new data:

{
    'name': "Fake images",
    'ord': 3,
    'sticky': False,
    'rtl': False,
    'font': 'Arial',
    'size': 20,
    'description': '',
    'plainText': False,
    'collapsed': False,
    'excludeFromSearch': False,
    'id': 3,
    'tag': None,
    'preventDeletion': False
}

Then, in our CSV loop, we can on each iteration add the media file to Anki and also reference it in a value for "Fake images". This way the APKG will only include the JSON-P files we actually need. Add to the loop:

# if HanziWriter supports the character, ask Anki to include the JSON-P file for HanziWriter for this Kanji in the APKG export
hanzi_writer_data = f'./media/{line[0]}.js'
if os.path.isfile(hanzi_writer_data): # HanziWriter does not support all Kanji so we check first if it is there
  note["Fake images"] = f'<img src="{line[0]}.js">'
  col.media.add_file(f'./media/{line[0]}.js')
# save the note
col.add_note(note, deck_id)

And then, run in your shell:

pipenv install
pipenv run python create_from_scratch_with_csv.py

You'll notice that it takes longer to finish than last time because it has to do a lot more work. The APKG is now about 6MB in size to make room for all that Kanji data. Let's import the big APKG in Anki to check if it works. Hooray, it does, and you might also notice that it loads quicker:

HanziWriter backed by JSON-P with offline support

Not only is it a lot quicker, you can now take your cards into a tunnel or into space. It will also work on Anki Mobile, even in airplane mode:

Airplane mode is no longer a problem on mobile

As always, feel free to have a look at the solution on University of Vienna Gitlab.

Conclusion

If you have followed the series to this point, you now know of all the essential techniques used on Kartenaale cards and will be able to apply what you have learned to build all sorts of crazy stuff in Anki.

I hope you enjoyed the look behind the scenes in this series, and if you learn with Kartenaale cards, I hope your learning experience to this point has and will be a pleasant one. If you just came here to learn about scripting in Anki, I hope reading this series put you in a position where you now feel comfortable to build the thing you probably already had in mind when you started reading.

Although this post marks the end of this particular series, there would be lots more to tell you about scripting with Anki: how to prepare and clean data of various formats, how to implement text-to-speech in a way that works on all Anki platforms without plugins and with most software voices, how our continuous integration on Gitlab works to deliver Anki packages and much much more. If something in that list interests you, or if you would like a post on building something specific for Anki, please do let me know! I love E-Mails to philipp.pospischil@tapirbug.at and look forward to hearing from you.

tapirbug Blog