`hex2txt`

quickly convert hex.pm package docs to an /llms.txt file

…this helps AI coding assistants write better Elixir code by teaching them about the packages you're using

Usage Details

Latest Version

Generate an /llms.txt file for the most recent version of a package:


              https://hex2txt.fly.dev/<package>/llms.txt

Examples:

https://hex2txt.fly.dev/geo/llms.txt - Geo
https://hex2txt.fly.dev/flop_phoenix/llms.txt - Flop Phoenix
https://hex2txt.fly.dev/phoenix_live_view/llms.txt - LiveView

Specific Version

Generate an /llms.txt file for a specific version of a package:


              https://hex2txt.fly.dev/<package>/<version>/llms.txt

Examples:

https://hex2txt.fly.dev/elixir/1.18.0-rc.0/llms.txt - Elixir v1.18.0-rc.0
https://hex2txt.fly.dev/ecto_ulid_next/1.0.2/llms.txt - Ecto.ULID Next v1.0.2
https://hex2txt.fly.dev/phoenix/1.7.17/llms.txt - Phoenix v1.7.17

Replace <package> with the desired package name and <version> with the specific version number.

Small context windows getting in the way?

Try hex2context, a self-contained Livebook notebook that uses Retrieval-Augmented Generation (RAG) to only include the most relevant snippets of documentation.

Excluding less-relevant documentation makes more efficient use of limited context window sizes (and reduces inference costs, too).

FAQ

Isn't this just generating a Markdown file?

Yes. Although the implementation is currently straightforward, the end goal is to produce files that are optimized specifically for inference-time consumption by an LLM (or LLM-adjacent tooling). The /llms.txt component is a signal that these files are intended for use by machines, not humans.

Why provide "inference-time" documentation?

Here's an example of two real sessions using Aider:

Session 1: Aider hallucinates and generates broken code (it is not familiar with the EasyHTML package)
Session 2: Aider generates perfect code, because the EasyHTML docs are loaded into context (via the /web https://hex2txt.fly.dev/easyhtml/llms.txt command)

Session 1 ⁉️

aider --sonnet

Main model: claude-3-5-sonnet-20241022 with diff edit format, infinite output
Weak model: claude-3-5-haiku-20241022

> /ask Using EasyHTML, how to find all OpenGraph tags in an HTML document? Use Elixir.

Using EasyHTML in Elixir, you can find all OpenGraph tags like this:

``` def find_og_tags(html) do {:ok, document} = EasyHTML.parse(html) document |> EasyHTML.find("meta[property^='og:']") |> Enum.map(fn tag -> %{ property: EasyHTML.attr(tag, "property"), content: EasyHTML.attr(tag, "content") } end) end ```

You can use it like this:

``` html = """ <html> <head> <meta property="og:title" content="My Page Title"> <meta property="og:description" content="Page Description"> <meta property="og:image" content="image.jpg"> </head> </html> """ og_tags = find_og_tags(html) # Returns a list of maps with property and content: # [ # %{property: "og:title", content: "My Page Title"}, # %{property: "og:description", content: "Page Description"}, # %{property: "og:image", content: "image.jpg"} # ] ```

The CSS selector meta[property^='og:'] will match all meta tags where the property attribute starts with "og:". The function returns a list of maps containing the property and content values for each OpenGraph tag found.

Tokens: 309 sent, 286 received. Cost: $0.0052 message, $0.0052 session.

Session 2 ✅

aider --sonnet

Main model: claude-3-5-sonnet-20241022 with diff edit format, infinite output
Weak model: claude-3-5-haiku-20241022

> /web https://hex2txt.fly.dev/easyhtml/llms.txt
Scraping https://hex2txt.fly.dev/easyhtml/llms.txt...
... added to chat.

> /ask Using EasyHTML, how to find all OpenGraph tags in an HTML document?

Based on the documentation shown, you can use EasyHTML's Access implementation (the [] operator) to find meta tags with OpenGraph properties. Here's how you would likely do it:

``` html = EasyHTML.parse!(document) html["meta[property^='og:']"] ```

This would find all <meta> tags where the property attribute starts with "og:" which is the standard prefix for OpenGraph tags.
For example, if you had HTML like:

``` html = """ <html> <head> <meta property="og:title" content="My Page Title"> <meta property="og:description" content="Page description"> </head> <body>...</body> </html> """ ```

The above code would find those meta tags.

Note that EasyHTML is a wrapper around Floki and uses its CSS selector syntax, so this should work as expected. You can then enumerate over the results since EasyHTML implements Enumerable.

However, if you need to get the specific content of these tags, you might need to use additional Floki functions as this isn't explicitly shown in the documentation provided.

Tokens: 548 sent, 234 received. Cost: $0.0052 message, $0.0052 session.

Why doesn't this work for `<package-name>`?

This works by scraping output files produced by ExDoc, making numerous assumptions (for example, relying on ExDoc to generate JavaScript files with embedded JSON assigned to specific JS variable names). This is obviously fragile. For this reason, documentation published with older versions of ExDoc might not work.

Shouldn't this be part of ExDoc?

Yes, I think so (with, perhaps, additional web-facing tooling facilitated by other Hex-related projects). Iterating on this prototype and collecting community feedback is the best way to determine how useful this feature is and to inform requirements.

What are the limitations?

This is currently prototype-quality code, without proper error handling (among other deficiencies).

But the biggest practical issue is the size of generated documentation files, specifically for packages with a large API footprint (such as Phoenix, Ecto, Elixir, etc.). These docs can consume several hundreds of thousands of tokens and easily exhaust all available LLM context space.

We need to find ways to reduce the file size (e.g. by only including information for a subset of modules, or by dropping examples, etc.) Additionally, there may be clever ways to use embeddings to dynamically include only the relevant components of documentation for the task-at-hand (see, for example, the hex2context Livebook, which offers one potential solution to this problem).

Please submit a PR on GitHub if you'd like to help contribute. Some work (and lots of experimentation) will be required to discover how to most effectively assemble LLM-specific documentation.

Follow @mjrusso on Twitter

Source code on GitHub

hex2txt