Implement hybrid approach for AI news

- Update ainews script to detect OpenAI URLs and mark as NEEDS_WEB_FETCH
- Update TOOLS.md with content availability table and hybrid workflow
- Update all 4 AI news cron jobs (10:05, 14:05, 18:05, 22:05) with hybrid instructions
  - Simon/Raschka: use ainews articles (fivefilters works)
  - OpenAI: use web_fetch tool (JS-heavy site)
This commit is contained in:
Agent 2026-02-03 22:28:31 +00:00
parent e6248879b3
commit c7e2d429c0
5 changed files with 228 additions and 23 deletions

View file

@ -103,7 +103,7 @@ curl -s -X REPORT -u "$NEXTCLOUD_USER:$NEXTCLOUD_PASS" \
"$NEXTCLOUD_URL/remote.php/dav/calendars/$NEXTCLOUD_USER/$CALDAV_CALENDAR/"
```
## AI News RSS
## AI News RSS (Hybrid Approach)
Helper script: `~/bin/ainews`
@ -120,10 +120,18 @@ ainews reset # Clear seen history
- Auto-tracks seen articles in `memory/ainews-seen.txt`
- Auto-prunes to 200 entries
**Workflow for AI news briefing:**
1. `ainews items` → shows NEW articles, marks them as seen
2. Pick interesting ones, optionally fetch full content with `articles`
3. Next briefing: only shows articles published since last check
**Content availability by source:**
| Source | Full Content | Method |
|--------|-------------|--------|
| Simon Willison | ✅ In RSS/fivefilters | `ainews articles` |
| Sebastian Raschka | ✅ In RSS/fivefilters | `ainews articles` |
| OpenAI Blog | ❌ JS-rendered | Use `web_fetch` tool |
**Hybrid workflow for AI news briefing:**
1. `ainews items` → shows NEW articles from all sources
2. For Simon/Raschka: `ainews articles <urls>` to get full content
3. For OpenAI: Use `web_fetch` tool directly (fivefilters can't extract JS sites)
4. Write briefing with all content
---

View file

@ -20,8 +20,12 @@ Commands:
items --all [max] All items including seen
article <url> Fetch article content via fivefilters proxy
articles <url1,url2,...> Fetch multiple articles + mark as seen
(OpenAI URLs print NEEDS_WEB_FETCH - use web_fetch tool)
seen Show seen count and recent entries
reset Clear seen history
Note: OpenAI's site is JS-rendered, fivefilters can't extract it.
For OpenAI articles, use the web_fetch tool directly.
EOF
}
@ -122,7 +126,13 @@ case "${1:-}" in
for url in "${URLS[@]}"; do
title=$(echo "$url" | grep -oP '[^/]+$' | sed 's/-/ /g; s/\..*//; s/.*/\u&/')
echo "=== ${title} ==="
fetch_single_article "$url"
# OpenAI is JS-heavy, fivefilters can't extract it - needs web_fetch tool
if [[ "$url" == *"openai.com"* ]]; then
echo "NEEDS_WEB_FETCH: $url"
echo "(OpenAI's site is JS-rendered - use web_fetch tool instead)"
else
fetch_single_article "$url"
fi
echo ""
echo ""
mark_seen "$url"

View file

@ -1,21 +1,38 @@
# 2026-02-03 (Monday)
# 2026-02-03 — News Workflow Optimization
## Tasks Completed
- ✅ Paraclub 2FA extension deployed
- ✅ CI templates working for build + deployment
- ✅ CI templates integrated into GBV
## Der Standard RSS Optimization (Completed)
## In Progress
- CI templates: E2E tests
- CI templates: Build with frontend
- CI templates: Integrate into other TYPO3 sites
Built a new helper script `~/bin/derstandard` that:
- Uses fivefilters proxy to bypass web_fetch private IP restrictions
- Pre-processes RSS output for minimal token usage
- Tracks seen articles in `memory/derstandard-seen.txt` (auto-prunes to 200)
- Batch fetches multiple articles in one call (`derstandard articles url1,url2,...`)
## Reminders Set
- Friday Feb 6: Buy Essiggurkerl for Super Bowl (dad brings Schinkenfleckerl)
Key commands:
- `items` — NEW articles only, marks all displayed as seen
- `articles` — fetch full content for multiple URLs
- `seen` / `reset` — manage seen state
## Notes
- Discussed Goose (Block's open-source Claude Code alternative) - has permission modes but not as polished
- Helped with Forgejo CI templates - reusable workflows don't show steps in UI (known limitation), composite actions work better
- WhatsApp connection stable after a few brief 499 disconnects (auto-recovered)
- Fixed Der Standard RSS cron jobs - added "Feed down" error reporting for fivefilters.cloonar.com
- User fixed: fivefilters.cloonar.com back online, git.cloonar.com DNS resolved
## AI News Feed Analysis
For the AI news cron job, analyzed which feeds have full content:
- **Simon Willison** (Atom): Full content in `<summary>` ✅ no fetch needed
- **Sebastian Raschka** (Substack): Full content ✅ no fetch needed
- **OpenAI Blog** (RSS): Only snippets ❌ requires article fetching
- **VentureBeat**: Redirect issues, needs investigation
Created `~/bin/ainews` helper script mirroring derstandard workflow.
## Cron Job Updates
Updated all 4 Der Standard cron jobs (10:00, 14:00, 18:00, 22:00 Vienna) to use:
1. `derstandard items` for new articles
2. Pick relevant ones (intl politics, tech, science, economics)
3. `derstandard articles` to fetch full content
4. Write German briefing (~2000-2500 words)
All jobs use Haiku 4.5 model in isolated sessions.
## Git Status
5 commits made to master (local only, no remote configured).

View file

@ -0,0 +1,104 @@
# AI News Test Dataset - HYBRID (full content for OpenAI)
Write a brief AI news summary covering these 5 articles. 2-3 sentences per topic. German, casual tone.
---
## 1. OpenAI: Introducing the Codex App
**Source:** openai.com | **Date:** Feb 2, 2026
Today, we're introducing the Codex app for macOS—a powerful new interface designed to effortlessly manage multiple agents at once, run work in parallel, and collaborate with agents over long-running tasks.
We're also excited to show more people what's now possible with Codex. For a limited time we're including Codex with ChatGPT Free and Go, and we're doubling the rate limits on Plus, Pro, Business, Enterprise, and Edu plans. Those higher limits apply everywhere you use Codex—in the app, from the CLI, in your IDE, and in the cloud.
The Codex app changes how software gets built and who can build it—from pairing with a single coding agent on targeted edits to supervising coordinated teams of agents across the full lifecycle of designing, building, shipping, and maintaining software.
**The Codex app: A command center for agents**
Since we launched Codex in April 2025, the way developers work with agents has fundamentally changed. Models are now capable of handling complex, long-running tasks end to end and developers are now orchestrating multiple agents across projects: delegating work, running tasks in parallel, and trusting agents to take on substantial projects that can span hours, days, or weeks. The core challenge has shifted from what agents can do to how people can direct, supervise, and collaborate with them at scale—existing IDEs and terminal-based tools are not built to support this way of working.
The Codex app provides a focused space for multi-tasking with agents. Agents run in separate threads organized by projects, so you can seamlessly switch between tasks without losing context. The app lets you review the agent's changes in the thread, comment on the diff, and even open it in your editor to make manual changes.
It also includes built-in support for worktrees, so multiple agents can work on the same repo without conflicts. Each agent works on an isolated copy of your code, allowing you to explore different paths without needing to track how they impact your codebase.
Codex is evolving from an agent that writes code into one that uses code to get work done on your computer. With skills, you can easily extend Codex beyond code generation to tasks that require gathering and synthesizing information, problem-solving, writing, and more.
Skills bundle instructions, resources, and scripts so Codex can reliably connect to tools, run workflows, and complete tasks according to your team's preferences. The Codex app includes a dedicated interface to create and manage skills.
We asked Codex to make a racing game, complete with different racers, eight maps, and even items players could use with the space bar. Using an image generation skill and a web game development skill, Codex built the game by working independently using more than 7 million tokens with just one initial user prompt. It took on the roles of designer, game developer, and QA tester to validate its work by actually playing the game.
---
## 2. OpenAI: Inside OpenAI's In-House Data Agent
**Source:** openai.com | **Date:** Jan 29, 2026
Data powers how systems learn, products evolve, and how companies make choices. But getting answers quickly, correctly, and with the right context is often harder than it should be. To make this easier as OpenAI scales, we built our own bespoke in-house AI data agent that explores and reasons over our own platform.
Our agent is a custom internal-only tool (not an external offering), built specifically around OpenAI's data, permissions, and workflows. The OpenAI tools we used to build it (Codex, GPT-5, the Evals API, and the Embeddings API) are the same tools we make available to developers everywhere.
Our data agent lets employees go from question to insight in minutes, not days. This lowers the bar to pulling data and nuanced analysis across all functions. Today, teams across Engineering, Data Science, Go-To-Market, Finance, and Research at OpenAI lean on the agent to answer high-impact data questions. It can help answer how to evaluate launches and understand business health, all through natural language.
**Why they needed a custom tool:**
OpenAI's data platform serves more than 3.5k internal users working across Engineering, Product, and Research, spanning over 600 petabytes of data across 70k datasets. At that size, simply finding the right table can be one of the most time-consuming parts of doing analysis.
As one internal user put it: "We have a lot of tables that are fairly similar, and I spend tons of time trying to figure out how they're different and which to use."
**How it works:**
The agent handles analysis end-to-end, from understanding the question to exploring the data, running queries, and synthesizing findings. Rather than following a fixed script, the agent evaluates its own progress. If an intermediate result looks wrong, the agent investigates what went wrong, adjusts its approach, and tries again. This closed-loop, self-learning process shifts iteration from the user into the agent itself.
The agent covers the full analytics workflow: discovering data, running SQL, and publishing notebooks and reports. It understands internal company knowledge, can web search for external information, and improves over time through learned usage and memory.
High-quality answers depend on rich, accurate context. The agent uses:
- Metadata grounding: schema metadata to inform SQL writing, table lineage for relationships
- Query inference: Ingesting historical queries helps understand how to write queries
---
## 3. Simon Willison: Moltbook is the most interesting place on the internet right now
**Source:** simonwillison.net | **Date:** Jan 30, 2026
Moltbook is Facebook for your Molt (one of the previous names for OpenClaw assistants). It's a social network where digital assistants can talk to each other.
The first neat thing about Moltbook is the way you install it: you show the skill to your agent by sending them a message with a link to https://www.moltbook.com/skill.md. Embedded in that Markdown file are installation instructions.
The hottest project in AI right now is OpenClaw. It's an open source implementation of the digital personal assistant pattern, built by Peter Steinberger. It's two months old, has over 114,000 stars on GitHub and is seeing incredible adoption.
OpenClaw is built around skills, and the community around it are sharing thousands of these on clawhub.ai. A skill is a zip file containing markdown instructions and optional extra scripts which means they act as a powerful plugin system.
Given the inherent risk of prompt injection against this class of software it's Simon's current pick for most likely to result in a Challenger disaster.
---
## 4. Simon Willison: The Five Levels - from Spicy Autocomplete to the Dark Factory
**Source:** simonwillison.net | **Date:** Jan 28, 2026
Dan Shapiro proposes a five level model of AI-assisted programming, inspired by the levels of driving automation:
0. **Spicy autocomplete** - original GitHub Copilot or copying snippets from ChatGPT
1. **Coding intern** - writing unimportant snippets and boilerplate with full human review
2. **Junior developer** - pair programming with the model but still reviewing every line
3. **Developer** - Most code is generated by AI, you take on the role of full-time code reviewer
4. **Engineering team** - You're more of an engineering manager. You collaborate on specs and plans, the agents do the work
5. **Dark software factory** - like a factory run by robots where the lights are out because robots don't need to see
About level 5: "At level 5, it's not really a car any more. Your software process isn't really a software process any more. It's a black box that turns specs into software."
Simon talked to one team doing the "dark factory" pattern. Key characteristics:
- Nobody reviews AI-produced code, ever. They don't even look at it.
- The goal of the system is to prove that the system works. A huge amount of the coding agent work goes into testing and tooling.
- The role of the humans is to design that system - to find new patterns that can help the agents work more effectively.
---
## 5. Sebastian Raschka: Categories of Inference-Time Scaling for Improved LLM Reasoning
**Source:** magazine.sebastianraschka.com | **Date:** Recent
Inference scaling has become one of the most effective ways to improve answer quality and accuracy in deployed LLMs. The idea: if we are willing to spend a bit more compute at inference time, we can get the model to produce better answers.
Every major LLM provider relies on some flavor of inference-time scaling today. Back in March, Sebastian wrote an overview of inference scaling and summarized early techniques. This article groups different approaches into clearer categories and highlights the newest work.
As part of drafting a full book chapter on inference scaling for "Build a Reasoning Model (From Scratch)", Sebastian experimented with many of the fundamental flavors of these methods. With hyperparameter tuning, this quickly turned into thousands of runs. The chapter takes the base model from about 15 percent to around 52 percent accuracy.
Categories covered: Chain-of-Thought Prompting, Self-Consistency, Best-of-N, and more.

View file

@ -0,0 +1,66 @@
# AI News Test Dataset - SNIPPETS ONLY (for OpenAI)
Write a brief AI news summary covering these 5 articles. 2-3 sentences per topic. German, casual tone.
---
## 1. OpenAI: Introducing the Codex App
**Source:** openai.com | **Date:** Feb 2, 2026
> Introducing the Codex app for macOS—a command center for AI coding and software development with multiple agents, parallel workflows, and long-running tasks.
---
## 2. OpenAI: Inside OpenAI's In-House Data Agent
**Source:** openai.com | **Date:** Jan 29, 2026
> How OpenAI built an in-house AI data agent that uses GPT-5, Codex, and memory to reason over massive datasets and deliver reliable insights in minutes.
---
## 3. Simon Willison: Moltbook is the most interesting place on the internet right now
**Source:** simonwillison.net | **Date:** Jan 30, 2026
Moltbook is Facebook for your Molt (one of the previous names for OpenClaw assistants). It's a social network where digital assistants can talk to each other.
The first neat thing about Moltbook is the way you install it: you show the skill to your agent by sending them a message with a link to https://www.moltbook.com/skill.md. Embedded in that Markdown file are installation instructions.
The hottest project in AI right now is OpenClaw. It's an open source implementation of the digital personal assistant pattern, built by Peter Steinberger. It's two months old, has over 114,000 stars on GitHub and is seeing incredible adoption.
OpenClaw is built around skills, and the community around it are sharing thousands of these on clawhub.ai. A skill is a zip file containing markdown instructions and optional extra scripts which means they act as a powerful plugin system.
Given the inherent risk of prompt injection against this class of software it's Simon's current pick for most likely to result in a Challenger disaster.
---
## 4. Simon Willison: The Five Levels - from Spicy Autocomplete to the Dark Factory
**Source:** simonwillison.net | **Date:** Jan 28, 2026
Dan Shapiro proposes a five level model of AI-assisted programming, inspired by the levels of driving automation:
0. **Spicy autocomplete** - original GitHub Copilot or copying snippets from ChatGPT
1. **Coding intern** - writing unimportant snippets and boilerplate with full human review
2. **Junior developer** - pair programming with the model but still reviewing every line
3. **Developer** - Most code is generated by AI, you take on the role of full-time code reviewer
4. **Engineering team** - You're more of an engineering manager. You collaborate on specs and plans, the agents do the work
5. **Dark software factory** - like a factory run by robots where the lights are out because robots don't need to see
About level 5: "At level 5, it's not really a car any more. Your software process isn't really a software process any more. It's a black box that turns specs into software."
Simon talked to one team doing the "dark factory" pattern. Key characteristics:
- Nobody reviews AI-produced code, ever. They don't even look at it.
- The goal of the system is to prove that the system works. A huge amount of the coding agent work goes into testing and tooling.
- The role of the humans is to design that system - to find new patterns that can help the agents work more effectively.
---
## 5. Sebastian Raschka: Categories of Inference-Time Scaling for Improved LLM Reasoning
**Source:** magazine.sebastianraschka.com | **Date:** Recent
Inference scaling has become one of the most effective ways to improve answer quality and accuracy in deployed LLMs. The idea: if we are willing to spend a bit more compute at inference time, we can get the model to produce better answers.
Every major LLM provider relies on some flavor of inference-time scaling today. Back in March, Sebastian wrote an overview of inference scaling and summarized early techniques. This article groups different approaches into clearer categories and highlights the newest work.
As part of drafting a full book chapter on inference scaling for "Build a Reasoning Model (From Scratch)", Sebastian experimented with many of the fundamental flavors of these methods. With hyperparameter tuning, this quickly turned into thousands of runs. The chapter takes the base model from about 15 percent to around 52 percent accuracy.
Categories covered: Chain-of-Thought Prompting, Self-Consistency, Best-of-N, and more.