Innehållssamlare
Mod Studio blir AI-labb för PR-koncernen More
CMS-bolaget första att välja bort amerikanska moln, så gick resonemanget
Modulai tar de riktiga AI-byråcasen, vinner Novo Nordisk, Telge, Lindex och EQT
US Escalation in the Caribbean and Latin America – Live Updates
The post US Escalation in the Caribbean and Latin America – Live Updates appeared first on CEPR.
Father-of-eight killed in San Diego mosque shooting hailed as hero
The crucial split-second call that could deny Spurs safety
Supermarkets urged to limit food prices by government
Title despair & Guardiola going - 24 hours of pain for Man City
Twenty-two years in the making - how Arsenal celebrated title win
Twenty-two years in the making - how Arsenal celebrated title win
Southampton expelled from Championship play-offs for spying on Middlesbrough
Southampton expelled from play-offs for spying
Channel 4 contacted by police after Married at First Sight UK rape claims
Arsenal win Premier League as Man City held at Bournemouth
'Taiwan Travelogue' wins the 2026 International Booker Prize
The novel is the first work translated from Mandarin Chinese to win the award, which celebrates its 10th anniversary this year.
(Image credit: Adrian Dennis)
Pixelite: Bots, scrapers, and proxies: defending Drupal sites in an automated internet
Over half of all web traffic in 2024 was automated. That is the headline number from the Imperva 2025 Bad Bot Report, and it is the first time bots have outnumbered humans in more than a decade. Drupal sites sit squarely in that traffic mix, and the old defensive playbook — block an IP, ban a user agent, drop a robots.txt entry, lean on Fail2ban — does not hold up anymore.
This is the companion post to my DrupalSouth Wellington 2026 talk, Bots, scrapers, and proxies: defending Drupal sites in an automated internet. The talk walked through the defences I actually use at amazee.io and recommend on client sites. The post covers the same ground, with a bit more room to show config and link out to the projects.
What actually changedThe technical context underneath bot defence has shifted in three ways that matter:
- Residential proxy networks. Scrapers no longer come from a handful of cloud subnets you can block. They route through real consumer IP addresses, often unwittingly donated by free-VPN users or piggy-backed off shady SDKs in mobile apps.
- Headless browsers everywhere. Playwright and Puppeteer have made it trivial to render JavaScript-heavy pages at scale. A page that needed a real human five years ago can be scraped today by anyone with a laptop.
- AI-driven scraping. Volume is up sharply because every new LLM needs training data, and there is now a steady drip of new crawlers showing up. Meta's externalagent is one recent example. There will be more.
Mimicry is now the baseline, not the edge case. A modern scraper will rotate IPs, randomise user agents, replay realistic TLS fingerprints, and pace itself slowly enough to look like a real user. You cannot rely on signal that lives in one HTTP header.
The scale of itIf this still sounds like a niche problem, the numbers say otherwise.
- 51% - share of web traffic that was automated in 2024, per the Imperva 2025 Bad Bot Report.
- +96% - year-on-year growth in some popular bot services across Pantheon's hosting fleet in their July 2025 data.
- 1B+ - unique monthly visitors Pantheon sees across its platform, which is the size of dataset those numbers are coming from.
On the amazee.io platform globally, 13% of incoming requests can be flagged as non-human based on the user agent alone. That is the lazy bots. The actual share of automated traffic is higher once you account for the ones that try to blend in. In absolute terms it adds up to hundreds of millions of requests every month.
The goal is not to block all botsBefore going through the defences, one thing I am careful to say up front, both on stage and here: the goal is not to block all bots. That is unwinnable, and the closer you get to it the more real users you break.
Search crawlers, RSS readers, uptime monitors, link-preview generators in Slack and iMessage, accessibility tooling - all bots, all wanted. The goal is to reduce abuse where it hurts most, on the endpoints that cost you real money or real performance, while leaving everything else alone.
Drupal-native defencesThe defences closest to your application are the smartest. They can see the path, the user, the form, the cache state. They are also the most expensive per blocked request, because every block at this layer has already cost you a full PHP bootstrap.
PerimeterThe Perimeter module drops requests matching known-bad patterns: /wp-admin, /.env, xmlrpc.php, all the WordPress scanner noise that hits every Drupal site daily. It is the cheapest win on the list. It will not stop a serious scraper, but it will keep your logs clean and your error rate honest.
CrowdSec and AbuseIPDBCrowdSec is a local agent plus a community blocklist. Every site running CrowdSec contributes detected attacks back to a shared signal, and pulls down the latest list of bad actors. It is the closest thing the open-source world has to a distributed reputation system.
AbuseIPDB is a reputation lookup service. You query an IP, you get a confidence score. It is most useful on the forms and login flows where you can afford the latency of an external API call. Both are available as Drupal modules.
Facet Bot BlockerIf you run Search API with facets, this is the single cheapest huge win available to you. Faceted search URLs are catnip for scrapers: every combination of filters is a new URL, every URL is uncached, every uncached request hits the database. A bot that crawls a faceted listing can take a site down without trying.
The Facet Bot Blocker module acts as a rate limit on requests that include at least one facet in the URL. Configure it to use Redis or memcache for the counter so you are not making the problem worse by hitting the database to record the request. On one of our hosting customers, this one module cut Search API load by more than half.
Form-side defencesLogins, registrations, password resets and contact forms all need their own treatment, separate from page-level defence:
- Honeypot - invisible field plus a time-based check. Cheap, fast, surprisingly effective against the dumb half of form spam.
- Antibot - requires JavaScript to submit, blocks the bots that do not run JS.
- CAPTCHA, reCAPTCHA, or Cloudflare Turnstile - full challenge. Use the lightest option that works, and ideally only after Honeypot and Antibot have already rejected the easy cases.
- Hidden CAPTCHA - bridges the gap when you want a CAPTCHA-style check without the accessibility cost of a visible challenge.
Every block at the Drupal layer has already cost you a PHP bootstrap. That is fine when the absolute volume is small. It is not fine when you are eating hundreds of millions of bot requests and bootstrapping PHP for each one. This is why you cannot stop at the application layer.
Web server and infrastructureOne layer out, the web server can drop requests before PHP ever runs. The trade-off flips: you save the bootstrap cost, but you lose access to application context.
Rate limiting and geo blockingnginx ships with limit_req_zone, Apache has mod_ratelimit. Both are blunt but effective on volume. A starting point for nginx looks roughly like this:
limit_req_zone $binary_remote_addr zone=search:10m rate=10r/m; location /search { limit_req zone=search burst=5 nodelay; proxy_pass http://drupal; }Ten search requests per minute, per IP, with a burst of five. Tune to taste. The $binary_remote_addr key is cheap on memory; a 10MB zone holds around 160,000 IPs.
Geo blocking is the other infrastructure-level lever. It is pragmatic and occasionally controversial. If your audience is the New Zealand public sector, blocking inbound from regions you do not serve is a defensible call. If your audience is global, it is not. Know your traffic before reaching for it.
ModSecurity and the OWASP CRSModSecurity with the OWASP Core Rule Set is a proper WAF you can self-host. Once tuned, it is real protection. The tuning is the catch — out of the box it will flag Drupal admin actions, file uploads, anything that looks like SQL in a body or a query string. Expect to spend real time pruning rules and adding exceptions for legitimate site behaviour before you stop generating false positives.
Cache disciplineA request that hits the cache costs you nothing. Whatever else you do, get your cache headers right. Vary on the bits that need to vary, cache aggressively on the bits that do not, and lean on the page cache or the reverse proxy in front of Drupal. The cheapest bot is the one that asks for a page you have already served.
(shamless plug) you can also use my site Caching Score to review your current caching setup, to see if there is anything better you can be doing.
Caching ScoreAssess how strong the caching capabilities of any given site is.Caching Scoresean.hamlinamazee.ioWhat this layer is bad atRate limits, ModSecurity rules and geo blocks are great at volume and bad at quality. They cannot tell a scraper trickling one request per minute apart from a real user. For that you need either the edge or the application.
Edge and paid bot managementThe edge is where the big vendors live, and it is where you push the cheapest blocks. A scraper rejected by Cloudflare at the network edge never gets to your origin at all.
CloudflareThe free tier already includes Bot Fight Mode, basic challenges, and Turnstile. For most small-to-medium Drupal sites, this is a good baseline at zero extra cost. The paid Bot Management product adds custom rule logic, JA3 and JA4 TLS fingerprinting, and machine-learning-based bot scoring you can wire into firewall rules. The jump from free to paid is significant in price; the jump in capability is also significant.
Fastly, Akamai, and the restFastly offers the Next-Gen WAF (originally Signal Sciences) with a Bot Management add-on. Akamai sits at the enterprise tier with the most sophisticated fingerprinting available, and a price tag to match. Beyond those, there is AWS WAF with Bot Control, DataDome, HUMAN, and Imperva — all credible, all paid, all priced for sites where bot abuse is costing real money.
The trade-offs nobody puts on the sales deckBot Management at the edge solves real problems. It also comes with real costs that the vendor demos skip past:
- Cost. Bot Management is almost always an add-on to the core WAF subscription, and the pricing escalates fast with traffic.
- Vendor lock-in. Your rules, your dashboards, your observability all live in the vendor's UI. Migrating off is painful.
- Accessibility and SEO. Aggressive challenges break real users, and search bots that fail a challenge will hurt your rankings. Test both before turning anything up.
- Rules live outside your application codebase. They drift, they are not versioned alongside the code that depends on them, and a rule change can break a feature without any commit to point to.
- False positives are invisible to you. By default, blocked requests do not reach your logs. You will not know which real users were turned away unless you specifically ask for that signal.
The newest piece in the picture, and the one that has me genuinely interested.
What Anubis isAnubis is an open-source reverse proxy (MIT licensed) that sits in front of your site and issues a proof-of-work challenge to clients before letting them through. It was built specifically for the AI scraper era — for the case where the scraper is mimicking a real browser well enough that classifying it on signal alone has stopped working.
Why proof-of-work, not CAPTCHAThe interesting move with Anubis is who pays the cost. A real user pays a few hundred milliseconds of CPU once when they first arrive, and never sees it again for the lifetime of the cookie. A scraper hitting you a million times pays the cost a million times.
That asymmetry is the whole point. CAPTCHAs put the cost on humans (the people who lose patience trying to identify traffic lights). Anubis puts it on whoever is doing the hammering. That is closer to the right shape of the trade.
Where to put itYou do not want Anubis in front of your whole site. You want it in front of the endpoints that are expensive and uncacheable. From the talk, my shortlist:
- Search endpoints
- Facet and filter URLs
- Pagination tails - ?page=2348 is not a real user
- Login, register, password reset
- Spicy forms (contact, anything that triggers an email)
- Authenticated user flows
- Anything expensive and uncacheable
Static pages stay fast. The cache stays warm. The PoW cost only applies on the routes where it earns its keep.
But what about Googlebot?This is the first question every site owner asks, and the answer is good. Anubis ships with allowlists for known good crawlers, matching IP ranges against the published lists from Google, Bing, and the rest. The allowlist is maintained upstream, which means you need to keep Anubis deployed on a reasonable cadence to pull in the latest changes. New legitimate crawlers do show up.
Demo siteYou can see Anubis in action with a demo Drupal 11 site I put together, the login form has Anubis in front of it, the homepage does not.
Log in | Drush Site-InstallDrush Site-InstallPutting the layers togetherNone of these defences is a silver bullet on its own. Each layer is cheap at one thing and bad at another, and the trick is matching the layer to the threat.
Layered defence diagram showing requests flowing from clients through Edge/CDN, Anubis, web server, and Drupal, with cost-to-block increasing as you move closer to the application.Block the cheap traffic at the edge. Block the lazy bots with rate limits and ModSecurity at the web server. Put Anubis in front of the endpoints that are expensive and uncacheable. Let Drupal-native modules handle the application-aware decisions where you actually need to see the user, the form, or the facet state.
Five things to take away- No single layer is enough. Stack them. The edge handles raw volume, the web server handles patterns, and the application is the only thing that can see real user and form context.
- Match the protection to the threat. A login form needs different defence to a faceted search results page.
- Measure before you defend. Look at your actual traffic. Find your most-hit uncacheable endpoints. Defend those first.
- Watch accessibility and SEO. Every challenge you add is a tax on a real user or a real crawler. The cost of false positives is invisible unless you go looking.
- Plan for adversarial improvement. Whatever you deploy today, the scrapers get a turn next. Pick defences you can iterate on.
You do not need to win the bot war. You just need to make your site a worse target than the next one.
The slides from the talk are on the DrupalSouth schedule page. The recording will be posted here once the DrupalSouth team have edited and uploaded it — check back in a few weeks.
MidCamp - Midwest Drupal Camp: MidCamp 2027 Dates Are Official: Save the Date for April 27-29, 2027
Mark your calendars. MidCamp is returning April 27-29, 2027!
We are excited to officially announce the dates for the next MidCamp, the Midwest's community-driven event for designers, developers, strategists, content creators, marketers, project managers, and open source enthusiasts.
After another incredible year of learning, collaboration, and community, we are already looking ahead to what comes next. And yes, as announced during closing remarks, MidCamp will be returning to DePaul next year just in time for Norah Schrum's birthday, which feels like the perfect excuse to gather this community again. MidCamp 2027 will once again bring together people from across Chicago, the Midwest, and beyond for several days of connection, practical learning, hallway conversations, contribution, and the kind of idea-sharing that keeps open source communities thriving.
Whether you are a longtime MidCamp regular or considering your first trip, MidCamp is built to be welcoming, approachable, and full of opportunities to learn from one another.
What to expect as planning gets underway:
- Engaging sessions from community speakers
- Hands-on training and learning opportunities
- Contribution and collaboration time
- Social events to reconnect with friends and meet new faces
- Community-focused experiences that reflect the spirit of MidCamp
Our organizing team is just getting started, and there will be many ways to get involved in the months ahead, from volunteering and sponsoring to submitting sessions and helping shape the event.
As was said during closing remarks: bringing value to others is the best gift, and this community proves that year after year.
For now, the most important thing to do is simple: save the date, bring your friends, and plan to be part of it.
Missing MidCamp already? You can relive this year's sessions by watching the recordings on our MidCamp 2026 YouTube playlist while we get planning underway for next year.
More details will be shared on the MidCamp 2027 event page as planning progresses.
We cannot wait to do it all again with this amazing community.
The Drop Times: Apex AI 2.0 Expands Drupal AI Integration With Multi-Provider Orchestration
Dries Buytaert: Acquia builds Drupal funding into its partner program
Today Acquia announced something I'm really proud of. We're calling it the Acquia Fair Trade Initiative.
When an Acquia partner closes a deal, 2% of that deal flows directly to the Drupal Association, credited in the partner's name, to fund Drupal's infrastructure and long-term growth.
Imagine an Acquia partner closes a $100,000 Drupal deal with Acquia. $2,000 goes to the Drupal Association, attributed to that partner. The 2% comes from Acquia, not from partner margins, so the partner keeps their full revenue and incentives.
The donation is publicly attributed in the Acquia Partner Portal and counts toward the partner's standing in the Drupal Association's Certified Partner Program. It is recognized as financial support for the Drupal Association, separate from non-financial contributions like code, case studies, or community participation.
Most of all, I like that this program is structural. It is not a one-time gift or sponsorship campaign. It is built into the economics of Acquia's partner program, so Drupal's funding grows automatically as Acquia and its partners grow.
Too often, funding for Open Source projects depends on periodic fundraising or individual goodwill. That can work, but it rarely scales in a predictable way.
Open Source sustainability works best when incentives align. With the Fair Trade Initiative, the Drupal Association receives more predictable funding, partners receive recognition through the Drupal Association's Certified Partner Program, and Acquia invests in the long-term health of the Drupal ecosystem its business depends on. And yes, this also creates more incentive for partners to work with Acquia on Drupal projects. Drupal wins, Acquia's partners win, and Acquia wins too. That is what incentive alignment looks like.
I set a reminder for myself to report back in a year, maybe sooner. I'm curious to see what this model can become.

