Building a Bookshelf with Goodreads Sync

Growing up, reading was one of my favorite pastimes. In recent years I haven’t been reading as much as I would like, but in 2025 I decided to read every day before bed instead of scrolling social media. I ended up finishing 16 books. It’s a big part of who I am, and I wanted a dedicated space on this site to showcase what I’m currently reading.

The Goodreads API Problem

I assumed this would be a straightforward API integration. Goodreads has been around since 2007, and millions of people track their reading there. Surely there’s a well-documented API.

There isn’t. Goodreads deprecated their public API in December 2020. They stopped issuing new developer keys, let existing OAuth integrations break, and eventually took down the documentation entirely.

This leaves us with a few options:

Web scraping: Fragile, breaks when markup changes, and violates terms of service
Build a reading tracker: Full control, but requires a database and manual data entry
RSS feeds: Still works, publicly accessible, no authentication required

RSS it is.

Thinking About the Problem

Let’s think about what data is needed and how often it changes.

For a bookshelf display, the requirements are minimal:

Book title
Author name
Cover image
Link back to Goodreads (for anyone who wants more details)
Date added (for sorting)

The data structure looks something like:

type Book = {
  id: string;
  title: string;
  author: string;
  link: string;
  imageUrl?: string;
  dateAdded: Date;
};

Now, how often does this data change? On average, I finish about one book a month. Weekly syncs are more than enough.

This determines the architecture. If data changes infrequently and doesn’t need to be live, fetch it at build time rather than on every page load. No database, serverless functions, or API keys. Just fetch the data when deploying and bake it into static HTML.

The RSS Feed Approach

Goodreads exposes RSS feeds for public shelves. The URL pattern:

https://www.goodreads.com/review/list_rss/{USER_ID}?shelf=currently-reading

Replace {USER_ID} with your Goodreads user ID (visible in your profile URL) and shelf with whichever shelf you want; currently-reading, read, to-read, etc.

The feed returns XML with entries like:

<item>
  <title>Project Hail Mary</title>
  <author_name>Andy Weir</author_name>
  <book_image_url>https://i.gr-assets.com/images/...</book_image_url>
  <link>https://www.goodreads.com/review/show/...</link>
  <user_date_added>Sat, 14 Dec 2024 10:30:00 -0800</user_date_added>
</item>

Parsing this requires an XML parser. In JavaScript, fast-xml-parser handles this well. The transformation from XML to the Book type is straightforward mapping.

The plan: fetch from the RSS feed at build time, validate the data with Zod, and trigger weekly rebuilds to keep it fresh.

Implementation with Astro Content Collections

Since this site is built with Astro, content collections are the way to go. They typically load from local markdown or JSON files, but they also support async loaders, which are functions that fetch data from anywhere and return it in the expected format. (I think this is pretty cool)

The books collection in src/content.config.ts:

import { defineCollection, z } from "astro:content";
import { XMLParser } from "fast-xml-parser";

const books = defineCollection({
  loader: async () => {
    const response = await fetch(
      "https://www.goodreads.com/review/list_rss/158209629?shelf=currently-reading"
    );
    const xml = await response.text();
    const parser = new XMLParser();
    const result = parser.parse(xml);

    const items = result.rss?.channel?.item || [];
    const itemsArray = Array.isArray(items) ? items : [items];

    return itemsArray.map((item) => ({
      id: item.book_id.toString(),
      title: item.title.replace(/<[^>]*>/g, ""), // strip HTML entities or tags
      author: item.author_name,
      link: item.link,
      imageUrl: item.book_image_url,
      dateAdded: new Date(item.user_date_added),
    }));
  },
  schema: z.object({
    title: z.string(),
    author: z.string(),
    link: z.string(),
    imageUrl: z.string().optional(),
    dateAdded: z.coerce.date(),
  }),
});

The loader runs during build. Zod validates the shape of each book object. If Goodreads changes their feed format and breaks the expected structure, the build fails with a clear error rather than silently rendering broken data.

Querying the collection in a page works like any other collection:

---
import { getCollection } from "astro:content";

const books = await getCollection("books");
const sortedBooks = books.sort(
  (a, b) => b.data.dateAdded.getTime() - a.data.dateAdded.getTime()
);
---

Keeping It Updated

Static sites don’t update themselves. The bookshelf reflects whatever was on Goodreads at build time. To keep it current, the site needs to rebuild periodically.

A GitHub Actions workflow handles this:

name: Weekly Goodreads Sync

on:
  schedule:
    - cron: "0 3 * * 0" # Every Sunday at 3 AM UTC

jobs:
  trigger-deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Trigger Vercel Deploy
        env:
          DEPLOY_HOOK: ${{ secrets.VERCEL_DEPLOY_HOOK }}
        run: curl -X POST "$DEPLOY_HOOK"

Vercel provides deploy hooks. These are URLs that trigger a rebuild when you POST to them. The workflow stores this URL as a GitHub secret and calls it on schedule.

Trade-offs

This approach works well for my use case, but has limitations:

No real-time updates: Changes appear after the next rebuild. Fine for books, less suitable for frequently changing data.
Goodreads dependency: If they change or remove RSS feeds, the build breaks. No API means no stability guarantees.
Public shelves only: RSS feeds require the shelf to be publicly visible.
Limited data: RSS doesn’t include everything the API once offered. No reading progress, no private notes, limited metadata.

For a personal bookshelf that updates weekly, these trade-offs are acceptable. The alternative, maintaining a separate database or building a full reading tracker, adds complexity I don’t need.

Applying This Elsewhere

This works for more than just books. Any data that changes infrequently can be fetched at build time:

Identify slow-moving data: Reading lists, playlists, pinned repos, recent activity
Find a public feed: RSS, Atom, JSON endpoints
Fetch at build time: Transform it into your preferred format
Rebuild on a schedule: Cron job, GitHub Actions, whatever works

Any static site generator can do this. The implementation details change, but the idea is the same; treat external data as content and refresh it periodically.