Natural language search for blog posts using TensorflowJS

I’ve been learning TensorflowJS and Machine Learning, as an experiment, I thought I would implement a search across my current blog posts using sentence similarity on natural language, running in the browser.

George Griffiths
9 min readApr 22, 2021


In this post i’ll go into how you can get started using pre-trained Tensorflow models to do Machine learning in the browser, examine some of the potential gotchas, such as not blocking the main thread with custom logic and consider the impact of the size of models on UX.

The demo that I developed as part of this article is a “search engine” using my blog posts as a data set, which I converted into an API, the idea being: can I find blog posts based on a search query by a user, by comparing the similarity of the query with a blog posts title and description?

Search is a solved problem and there are better ways of achieving the same thing, but I created this to learn and to have a bit of fun!

If you want to check out a live demo for what I built in this post, I’ve hosted it on my website.

Screen capture of demo of Natural language post search

Sentence similarity with TensorflowJS

I’m going to explain how this all works with a smaller example rather than the full demo that I linked earlier, but the source code for the example is available on Github, it’s the same code, just with things like UI simplified.

First up, let’s load in the library we are going to use. We’re just going to load them from a CDN, when you’re just experimenting, you don’t want to be messing around with build processes.

Create a HTML file called index.html, with the following content:

We’re loading in two libraries here, the first is TensorflowJS and the second is a the Universal Sentence Encoder model, which uses TensforflowJS, you can read about over here.

If you want to code along, host your files on a local dev server. I personally recommend the Live Server VS Code extension.

Next, create index.js add the following code:

In Chrome, and other browsers soon, you won’t need to wrap the code in an IIFE because you could use top level await instead.

This code is loading the model, and then passing our userQuery of "Sharing to social media" and our array of blogPosts into the model. Doing this converts the sentences into vectors (arrays) with 512 entries in the vector for each sentence, this is how the model sees the sentence. Universal sentence encoder has been trained on a large vocabulary and is encoding the provided data based on the data it saw during training.

To help make this a bit clearer, blogPostsTensor and userInputTensor will be an instance of tensor2d. These are 2D arrays (on the GPU) with 512 entries in each of the arrays, which represents a provided phase.

Next, in order to find potentially good results based our input sentence we need to check how similar our input vector is to the vectors of the blog post titles, we can achieve this by calculating Cosine Similarity between the vectors, which will give us a value between -1 and 1. 1 being most similar and -1 being not very similar at all.

I’m not going to explain the mathematics of cosine similarity, but i’ve provided an implementation of it. If you want to know how it works, there are lots of great explanations on YouTube, such as this one..

Define these at the top of your index.js file.

I tried to implement this maths purely in TensorflowJS, so that I could take advantage of the GPU, but after much trial and error, I could not find a solution. If anyone knows how to do this I’d love to hear about it. Doing this calculation myself is performing a large tradeoff of having these calculations happen on the main thread, which can cause bad UX, i’ll explain this in more detail towards the end of the post, including ways around this.

Now lets use the functions in our code,

On the last line of the above example we’re updating the text of an element with id “initial-example-results”, to make this work, let’s add the following to your html file, inside the <body> tag.

Here’s a link to the code we’ve built so far:

Turning posts into an API

My blog is written using the static site generator tool Eleventy. If you haven’t heard of Eleventy and you’re into building fast websites, seriously check it out, it’s awesome. I’m not going to explain how Eleventy works, but I wrote a post about how I got started with Eleventy.

To create an API out of my blog posts I generate a JSON file in the form of a JSON Feed, which can be hosted on my server.

Here’s my template for my json feed, this template is based on the 11ty base blog. The templating syntax being used is Nunjucks, and comes supported out of the box with Eleventy.

If you are curious and want to check out the source code of my blog it’s over here on Github.

This template is iterating through my blog posts and populating a JSON array with post data, as well as some other site metadata, ultimately the result is a JSON file which i can request on my server:

Now I have an API which I can use in my search, success!

We can now update our code sample to pull data from this api instead of hard-coding it. Add this function to the top of “index.js”.

Replace the following code:


Also replace


Here’s a link to the code we’ve built so far:

ML in the browser, why?

Hopefully the examples so far have made sense, I thought i’d take a moment to talk about some of benefits and tradeoffs of doing Machine learning in the browser with TensorflowJS.

One of the first things you might think of when you think Machine learning in JavaScript is it’s slow, well that’s where one of the great things about TensorflowJS comes in, it performs all of its expensive calculations on the GPU, under the hood it’s utilising WebGL shader programs to achieve this.

Running Machine learning in the browser opens up the possibilities of offering Machine learning in applications without needing to build complex server architectures, or learning another language. It also means that it’s possible to provide on-device Machine learning to users, without their data ever hitting a server.

One of the other great things about the JavaScript ecosystem is its ability to not just run in the browser, but on the server too, with NodeJS. TensorflowJS is also available in Node JS, where it can be bound directly to the Tensorflow API, the same API that the python implementations of the library consume. I’ve considered the possibility of modifying my experiment in this blog post so that when I generate my static site at build time with Eleventy, I could run the model against my data and pre-generate the data for my blog posts, that might be cool.

The final great thing is that it is possible to convert/re-use models created by the other Tensorflow ecosystems (Python etc) so that they run in the browser.

Now for one of the big trade offs, Machine learning models can be large, there is a lot of work going to make these models smaller and smaller, but the model used in this demo for example is approximately 28 MB. To be fair, for a general purpose natural language model, this is quite impressively small. Many of these models, are split into chunks so that the model can be downloaded in parallel, which improves things a bit. This tradeoff might be acceptable if it unlocks the ability to provide a good enough UX, without the need to hit a server, which once the model is downloaded can be lightning fast. The model can only be as fast the end-user machine it’s running on, which, especially on mobile, can vary dramatically.

In applications you might be able to do some different things to make this tradeoff worth it, for example:

  • Enabling good caching headers
  • Using service workers to background fetch and cache the model, and enable the feature
  • Allowing users to opt in/out
  • Offer the feature as a progressive enhancement that enables once downloaded
Chrome network panel showing model chunks downloading.

With the above tradeoffs in mind it might, or might not, make sense to do ML in the browser. Where you need to try and run your models immediately as the site/app loads, or end user device constraints are a problem, maybe server side is the better choice.

When using JavaScript it’s always important to not block the main thread, I mentioned above that Tensorflow utilises the GPU for its calculations, but as soon as you stop using its API you’re back in the JS main thread, and if you perform expensive calculations there ,you are at risk of providing a bad UX to your users.

The sample in this post is guilty of this, when performing the cosineSimilarity calculations, let's fix it.

Unblocking the main thread

In the browser you can create additional threads called “Workers”, these are isolated threads, that do not have access to any DOM APIs, or variables in the main thread. The only way to communicate between the main thread is via postMessage, which can be cumbersome.

There is an absolutely fantastic library Comlink that makes working with Worker threads basically invisible, it allows you to work with functions as if they were on the main thread, I believe it achieves this using Proxy objects, hiding the need to work with postMessage directly 🎉.

Let’s convert our example to use Comlink and move our maths off the main thread.

We’re going to import the Tensorflow libraries in our worker instead so your HTML should look like this.

Let’s also add in some user input, to make the demo a bit more spicy.

Next up, delete all of the code in “index.js”. Now in “index.js” lets add the code to work with our new “worker.js” file and update the UI.

We’re going to add all of the same code, except this time, expose a function called “search” which returns our predictions. There are few other changes too, such as using importScripts to import the libraries into the Worker.

Now, let’s use our new SearchService in "index.js".

If you load this demo code up in the browser you should get similar result to before, but with the heavy-work offloaded to a Worker thread.

Here’s a live demo project for reference:

Hopefully you can see from the example how you can offload work into a worker using Comlink, you can also build for production using popular tools such as Rollup, but I won’t cover that here.

One of the neat things about using Worker threads is because they don’t have access to the DOM you are forced to separate your application logic from your UI, making your code more modular and reusable in the future.

Future thoughts

In case you missed the links earlier:

If I was to continue this idea through i’d probably explore some of the following:

  • Making the code more production ready using module imports and a build tool chain.
  • Investigate ways to use TensorflowJS at build time of my blog to pre-calculate embeddings for posts.
  • See if there is in-fact ways to do cosine similarity directly in TensorflowJS, again, i’d love to know if anybody knows how!

I hope to continue my Machine learning journey, I have some other blog related ideas that I might try to explore in the future:

  • Recommending similar blog posts
  • Text summary generation of blog posts.

I’m fairly early on in my AI learning journey, but one of the initial resources that helped me out and inspired me was watching content from Jason Lengstorf from his Learn with Jason series, which I highly recommend. One of the truly awesome things about this series is closed captioning is provided, making this content more accessible to everybody 🎉.

At the time of writing there are 3 sessions relating to Machine Learning and TensorflowJS, here is one of them:

I hope this was a good read, if you feel like reading more of my work, please follow me on Twitter @griffadev, or get me a coffee if you feel like it ☕.