OneNote Web Extractor: When synthetic clicks fail

I needed to extract content from old OneNote notebooks I had shared as view-only links. Playwright should have made this straightforward: navigate to the notebook, click sections, click pages, extract text. Except the clicks didn't work.

The sections highlighted when clicked. The URL changed. Network requests fired. But the content iframe stayed frozen on the section overview. No matter how long I waited, the page content never loaded.

The debugging process

I started with the obvious checks. Wait longer between actions. Try different selectors. Click parent elements instead of children. Take screenshots at each step to confirm the browser state matched expectations. Nothing changed the outcome.

Then I checked the DOM carefully. After clicking a page, the pageNode element had the selected state applied. The browser knew I clicked it. OneNote's JavaScript saw the click happen. It just didn't respond.

That narrowed the problem: OneNote was receiving the click event but choosing not to act on it.

The isTrusted check

Modern web apps can detect synthetic events. When you run element.click() in JavaScript, the browser sets event.isTrusted to false. Real user interactions set it to true. That's the signal.

I was using page.evaluate() to execute clicks inside the browser context:

// This generates isTrusted: false
await page.evaluate(() => {
  document.querySelector('.pageNode').click();
});

OneNote Web checks isTrusted and ignores synthetic clicks. The event fires, the state updates, but the content load never triggers.

The fix

Playwright has a different way to click: native browser automation that generates real events with coordinates, timing, and trust markers.

// This generates isTrusted: true
await page.locator('.pageNode').click();

The difference:

evaluate().click() runs JavaScript in the browser
locator().click() uses the browser's automation protocol to simulate an actual mouse click

The second approach generates events that look identical to real user interaction. OneNote Web responds to them.

Once I switched to native clicks, the content loaded immediately. Pages that had been stuck on the section overview now showed their actual text content.

What it does now

OneNote Web Extractor takes a JSON config listing sections and pages, navigates through the notebook using native Playwright clicks, waits for content to load in OneNote's iframe, and saves the text to markdown files.

Config format:

{
  "notebookUrl": "https://1drv.ms/o/c/your-notebook-id/...",
  "outputDir": "./output",
  "sections": [
    {
      "name": "My Section",
      "pages": ["Page 1", "Page 2"]
    }
  ]
}

Run:

node extract.js config.json

Output:

output/
└── my-section/
    ├── page-1.md
    └── page-2.md

The extraction is slow—about 2-3 seconds per page—because OneNote's iframe loading is asynchronous and there's no reliable way to detect when it's finished except waiting. But it works consistently.

What I learned

Always check event.isTrusted when automation fails on modern web apps. If a click highlights an element but doesn't trigger the expected behavior, the app is probably checking whether the event came from a real user.

Playwright's native interactions (locator.click(), locator.type(), etc.) generate trusted events. JavaScript's DOM methods (element.click(), element.dispatchEvent()) do not.

When building automation:

Start with native Playwright methods
Only fall back to evaluate() if you need to run custom logic inside the page context
If clicks mysteriously stop working, suspect an isTrusted check

The debugging process took hours. The fix was a single line. That's how it goes sometimes.

Install

git clone https://github.com/daniel-butler/Onenote-web-extractor.git
cd Onenote-web-extractor
npm install
npx playwright install chromium

Source and docs: https://github.com/daniel-butler/Onenote-web-extractor

This was built to extract documentation from old work notebooks for a career timeline blog post. The tool ended up being more interesting than I expected—not because the extraction itself is novel, but because the debugging process revealed how modern web apps distinguish real users from automation.

When synthetic clicks fail, native clicks succeed. That's the whole trick.