Technical Details of My LLM-Generated Book

I released a book today called The National Labor Relations Book, which you can buy over at NLRBResearch.com for $10 (click here). The book provides an introduction to the NLRA and NLRB by providing summaries of the 100 most-cited cases in this area of law using the 100 most recent cases citing to each of those cases. Do the math and that’s 10,000 cases in total being digested and turned into a book.

I have never seen someone write a reference book using this method and, if you were to try to do so manually, it would be a monumental effort requiring a massive amount of human labor. I did not do this labor, but instead mostly used Google Gemini and Claude Sonnet to generate the book.

I have been talking about producing this book for several months on X and a number of people have asked for more of the technical details of how I did it. That is what this post is about.

Constructing the Database

A couple of years ago I created the NLRB Law Database at NLRBResearch.com. This database tracks fourteen different types of documents that essentially define the practical meaning of the National Labor Relations Act. The database is updated once a day using bespoke web scrapers that I wrote in Python. At the time, I used ChatGPT to help me a little with the coding, but not that much, as this was in the early LLM days before the code-writing ability of LLMs got really good.

This database is the centerpiece of the book-generation process. It contains all NLRB decisions issued from 1935 to present and every Supreme Court case that has ever used the phrase “National Labor Relations” along with tens of thousands of other sorts of documents that cite to those decisions. All of this material is stored in plain text with metadata like the name, date, and citation for the case.

Finding the Most-Cited Cases

Once the database was constructed, it was fairly trivial to find the most-cited cases. Using code, I made a list of every NLRB and Supreme Court decision in the database along with each decision’s citation. NLRB cases have citations that look like “90 NLRB 289” and Supreme Court cases have citations that look like “395 U.S. 575.” I conducted a search in the database for each of the cases on the list, e.g., this search for “90 NLRB 289.” Every case that contains that text shows up in the search results as a match and so you can add up the number of matches to determine how many times it has been cited.

Once you do this for every single NLRB and Supreme Court decision, you can simply sort the cases from most citations to least citations to find the 100 most-cited cases. Once you have the 100 most-cited cases, you can go back into the database, search for the citation of each of those cases again, and then pull down the 100 most-recent cases that cite to each target case in a JSON file.

So the result of this process is you end up with 100 JSON files, one for each of the 100 most-cited cases. Those JSON files contain the names, dates, citations, and full text of the 100 most-recent cases citing to the target case.

Producing the Summaries

It would be nice if you could just upload each of these JSON files to an LLM and ask it to produce a summary. But you cannot do this because the files are too big. For instance, the JSON file containing the 100 most-recent cases citing to “90 NLRB 289” is over 3.3 million tokens in length. The most tokens you can give a high-quality frontier LLM at a time is 1 million.

So what I did instead was write a Python script that walks through a JSON file one case at a time and sends each case’s information to Google Gemini Flash while prompting it to provide a 100-word summary of that case that focuses on how that case applied the target case. The prompt also instructed Gemini to send its response back as JSON so that I could compile all of the summaries into a new JSON file for the next step of the process.

So, for example, the most-cited case in NLRB history is F.W. Woolworth Co., 90 NLRB 289 (1950), which is a case about how to calculate backpay remedies. I grabbed a JSON containing the 100 most-recent cases citing to F.W. Woolworth in my NLRB Law Database. Then I went through that JSON file one case at a time and had Gemini provide a 100-word summary that described what legal rule F.W. Woolworth established and how F.W. Woolworth was applied in that case. This 100-word summary was sent back in JSON format with other metadata, allowing me to construct a new JSON file that contained 100-word summaries for each of the cases in the original JSON file.

Once I had this new JSON file — the one with the 100-word summaries — I sent that file to Claude Sonnet and had it produce a summary of the target case using the 100 Gemini Flash summaries. This text is what made it into the book.

To be clear, all of what I am describing above is being done with Python code that was written with Claude Code, mostly with the Claude Sonnet model though occasionally with the Claude Opus model. I didn’t do any of this manually.

Compiling the Book

Once you have all of these summaries, you still have to compile them into a book. People who buy The National Labor Relations Book get sent an email with a PDF, EPUB, and HTML version of the book. To produce those three things, I had to compile all of the summaries, with my edits and other contributions, into some kind of intermediary document that could then produce these files.

The way I did that was to have the final summaries from Claude Sonnet sent to me as markdown files. Markdown is a simple markup language that can create headings, subheadings, hyperlinked text, and so on. These markdown files were all placed in a directory with numerical prefixes to make sure they all lined up in the order desired for the book (so the first part of the book is 000-Introduction.md, the next part is 001-Method.md, and so on). Once the markdown files are all lined up in the directory like this, it is easy to move cases around in the book by changing their numerical prefixes and also easy to edit the files. Because markdown is just plain text, it was also possible to prompt Claude Code to move things around or to edit the text.

To compile the markdown into the PDF, EPUB, and HTML, I used pandoc. I actually created a Makefile that contained the pandoc command for generating each of the three files. And so whenever I wanted to compile the book to see what it looked like, I could just type “make pdf” or “make epub” or “make html” or, to do all of them, “make all.”

The book also contains a couple of graphical illustrations and a nicely-designed table in it. These were written in LaTeX, which was placed in the markdown files.

As with the prior section, basically all of the coding aspects of what I am describing here were done by Claude Code using the Sonnet model. It wrote the Makefile. It wrote the LaTeX markup. It installed pandoc, LaTeX, and all the other packages needed to do all of this.

Selling the Book

When it came time to sell the book, I wanted to see if I could do so without using an e-commerce platform that takes a cut of your sales. There are a number of such platforms, including Amazon, for self-published books. But, what’s the point of a coding agent if not to allow me to set all that up myself without having to pay a third party? I keep hearing that LLMs are going to kill SaaS. So let’s kill it.

I SSH’d into my NLRBResearch.com server, which also has Claude Code installed. I explained to Claude what I was wanting to do — take card payments and automatically email the PDF, EPUB, and HTML to buyers — and asked it to come up with a plan for doing it in the simplest way possible.

Claude concluded that the best way to do it would be to set up a small Flask app with Stripe (payment processing) and Resend (emailing) integrations. I already have a Stripe account for NLRB Research, so Claude walked me through how to create a new product on Stripe and how to get the relevant Stripe secrets/keys. It also walked me through how to set up a Resend account and get a Resend API key. It instructed me not to share those keys with Claude, which was reassuring, and then proceeded to write the Flask app, edit my nginx (server) configuration, and create a systemd entry that launches everything. The systemd entry had three empty variables where I was told to paste my Resend/Stripe keys, which I did.

And it worked. I am selling my book without paying a platform fee.

Conclusion

As I noted in my prior post about AI, I find all of this stuff incredibly useful. In some of the more practically minded LLM discourse (not the philosophical discourse), I keep seeing this painful phrase “solopreneur” and I guess that is what I am and have been for the last 10 years or so, with all of my projects. I’ve never hired an employee and only occasionally hire outside contractors, like writers, editors, and designers for People’s Policy Project.

For someone in my situation, LLMs enable me (1) to do different stuff than I have ever done before because it has abilities I do not, (2) to do more stuff than I have previously been able to do because it works faster, and (3) to in-house things I would have previously had to outsource to a particular person (like an editor) or to a service (like an e-commerce platform).

But one thing I think we can see with this book is that the LLMs are not just labor-replacing and productivity-expanding, but can, in some circumstances at least, enable the production of totally new things. Obviously there are other labor law introductions and reference books. But this particular method of deriving the law from 10,000 carefully selected cases is not something that would have been feasible without LLMs. In the absence of LLMs, the only option for producing a book like this is to use expert judgment. This can also be useful (indeed I used some of that in writing this book), but it is not the same product.

I do think this book will be useful for certain kinds of people who I hope will buy it. But I think the process of making it was also very useful as far as skill-building goes. LLMs look poised to become a standard aspect of most white-collar professions in the future. So better to get ahead of such things than get left behind.