June 2, 20269 min readMathew Boswell

What an AI Pilot Actually Looks Like at a 20-Person Company

Forget the slide decks. Here's what a real 4-to-6 week AI pilot looks like week by week — who's in the room, what gets measured, and what you actually walk away with.

Most founders I talk to have the same question, and they usually ask it a little sheepishly, like they're worried it's a dumb thing to ask.

"Okay… so what does this actually look like? Like, day to day. What are we actually doing for six weeks?"

It's not a dumb question. It's the right question. And almost nobody answers it honestly, because vague is easier to sell than specific.

So let me just walk you through it. This is what a real AI pilot looks like at a small company — somewhere around 15 to 25 people, one or two locations, a founder who still knows everyone's name.

No fluff. No "digital transformation." Just what actually happens.

First, What a Pilot Is (and Isn't)

A pilot is not a rollout. It's not "we're putting AI into the whole company." It's the opposite of that.

A pilot is: pick one workflow, build something small and real, run it with one team for a few weeks, see if the numbers actually move. That's it.

If it works, you keep it and expand. If it doesn't, you killed it cheap and you learned something. Either way, you're not stuck with a six-figure software contract and a Slack channel full of people who hate the new tool.

Good pilots have three things in common:

One specific workflow. Not "customer service." Something like "replying to shipping status questions in email."
One team that's actually going to use it. Usually 2 to 5 people.
A number you're trying to move. Hours saved per week, response time, error rate — something you can write on a sticky note.

If a vendor pitches you a "pilot" that doesn't have those three things, it's not a pilot. It's a sales process wearing a costume.

Week 0: Before the Clock Starts

Before week one, we have one conversation. Usually about an hour. Sometimes two.

In that conversation, we're answering boring but important questions:

Who's the point person on your side? (One human. Not a committee.)
Which team is doing the work today?
Where does the data live — Gmail, a CRM, a shared drive, someone's head?
What does "this worked" look like to you in plain English?

That last one matters more than people think. "It worked" has to mean something specific. "We cut response time on shipping questions from 6 hours to under 1 hour." "Our intake coordinator got back 5 hours a week." That kind of specific.

If we can't write that sentence together, we don't start. I've walked away from engagements at this stage. It's not personal — it's just that without a target, you can't tell if you hit it.

Week 1: Sitting With the Work

Week one isn't building anything. It's watching.

I sit with the team that does the work. Sometimes literally — over Zoom, screen-share on, the coordinator walking me through their inbox. Sometimes in person if you're close enough.

I'm trying to figure out:

What does this task actually involve, step by step?
Where do they get stuck? What questions do they have to ask someone else?
What information do they need that's hard to find?
What parts feel mindless, and what parts require real judgment?

This is the part most consultants skip, and it's the part that decides whether the whole pilot works.

Because here's what always happens: the workflow on paper is not the workflow in real life. The SOP says one thing. The person actually doing the job has six little shortcuts and three unwritten rules and one weird exception they handle every Tuesday.

If you build for the paper version, your AI is going to be wrong in ways nobody can explain. If you build for the real version, it actually helps.

By the end of week one, I should be able to describe the team's job back to them and have them go "yeah, that's basically it." If I can't, I keep watching.

Week 2: Building the Smallest Possible Thing

Week two we build. But small. On purpose.

The goal isn't a polished product. The goal is something the team can poke at by the end of the week. A rough draft. A working sketch.

Usually that means:

Wiring up the AI to whatever knowledge it needs — your FAQ, your past tickets, your product docs, your pricing sheet
Giving it a clear job description (what it should do, what it should never do, when to hand off to a human)
Putting it somewhere the team can actually use it without a 40-page manual — usually inside a tool they already open every day

It will be ugly. The first version is always ugly. That's fine. Ugly and real beats pretty and theoretical every time.

By Friday of week two, somebody on the team should be able to push a button and get an answer out of it. Even if the answer is bad. We need something to react to.

Weeks 3 and 4: Actually Using It

This is the part that matters.

The team uses the tool on real work. Every day. Not in a demo. Not in a sandbox. Real shipping questions, real intake forms, real whatever-we-picked.

And we keep two things going at once:

The team flags what's wrong. Wrong tone, wrong info, missing context, weird hallucinations, the time it confidently told a customer we were open on Christmas.
We fix it. Tighten the instructions, add the missing document, change how it hands off to a human, fix the tone.

This loop runs all week, both weeks. Use it, break it, fix it, use it again.

Two things I tell the team going in:

Be brutal. If it sounds off, say so. We can't fix what nobody complains about.
You're not being graded. The tool is being graded. If it's bad at something, that's our problem, not yours.

Around the middle of week three, something usually clicks. The team stops treating it like a science experiment and starts treating it like a coworker who's still in training. That's the moment you know the pilot has a real shot.

Week 5: Measuring the Thing

Now we go back to that sticky-note number from week zero.

Did it move? By how much? In which direction?

Sometimes the answer is great. "Response time on shipping questions dropped from 6 hours to 40 minutes. The coordinator got back about 7 hours a week. Customers are happier — here are the three replies we got this month saying so."

Sometimes the answer is mixed. "It's saving us time on the easy questions, but the hard ones are taking longer because people don't trust the draft and rewrite the whole thing." That's still a useful answer — it tells you where to focus.

And sometimes the answer is "this didn't really work." That's rare if week one was done properly, but it happens. When it does, we say so out loud, write up what we learned, and you decide what's next.

The point is: there's a number, and we look at it together. No hand-waving.

Week 6: Decide What Happens Next

Last week is the decision week. You've got the data. You've got the team's honest opinion. Now you choose one of three things:

Keep it and expand. It works. Roll it out to more of the team, or apply the same approach to the next workflow on the list.
Keep it small. It works for this one team, but it's not worth scaling yet. Lock it in, let it run, revisit in six months.
Kill it. It didn't earn its keep. Shut it down. You're out a few weeks and the cost of the engagement — not a six-figure platform contract and a multi-year regret.

All three of those are real outcomes. I've recommended each of them, more than once.

You also walk away with something most pilots don't give you: a short, honest write-up. What we tried, what worked, what didn't, what it cost, what it saved, and what we'd do differently next time. Written like a human would write it, not like a McKinsey deck.

That document is worth keeping. Even if you kill the project, you now know more about your own operation than you did six weeks ago. That's not nothing.

Who's Actually in the Room

People always ask how much of their team's time this takes. Honest answer:

You (the founder or owner). Maybe 2 to 3 hours total across six weeks. Mostly week zero and week six. I'm not going to put you in every working session — that's a waste of your time and mine.
The point person. Usually an ops manager or team lead. 2 to 4 hours a week. They're my main collaborator.
The team using it. Their normal job, plus about 30 minutes a week giving feedback during weeks 3 and 4. That's it.

If somebody is pitching you a pilot where your team has to sit in workshops for half a day every week for two months, that's not a pilot. That's a project disguised as a pilot, and you're the one paying for it in lost hours.

What It Costs You (Beyond Money)

People focus on the dollars. Fair. But the real cost of a pilot is attention.

For six weeks, somebody on your team has to care about this. Not full-time. But they have to answer messages within a day, show up to a couple of working sessions, and be willing to say "this is wrong" when it's wrong.

If your team is already drowning, a pilot is going to feel like one more thing on the pile, and it will fail. Not because the tech doesn't work, but because nobody had the bandwidth to shape it.

This is something I'll ask you straight up before we start: "Is this actually a good six weeks for your team?" Sometimes the right answer is "not right now, ask me in two months." That's a totally legitimate answer.

Why Six Weeks, Specifically

Six weeks isn't magic. It's just long enough to do real work and short enough that nobody loses interest.

Shorter than that, you don't get past the honeymoon phase — every new tool feels exciting in week one. The interesting stuff happens in week three when the novelty's worn off and you find out whether anyone actually uses it.

Longer than that, and the pilot stops being a pilot. It turns into "the project," budgets get attached, people start defending it whether it's working or not, and you've lost the whole point of doing a pilot in the first place.

Six weeks keeps everyone honest.

The Bottom Line

A real AI pilot at a small company is not glamorous. It's a lot of watching people work, a lot of small adjustments, and one number you're trying to move.

It's also one of the cheapest ways to find out whether AI is actually useful in your business — not in some hypothetical industry-wide sense, but specifically for the way your company actually runs.

You don't need a strategy. You need a workflow, a team, and six weeks.

Key Takeaways:

A pilot is one workflow, one team, one number — not a company-wide rollout
Week one is watching how the work actually happens, not building anything
The first version should be ugly and usable by Friday of week two
Weeks 3 and 4 are use-it, break-it, fix-it on real work
Week 6 has three honest outcomes: expand, keep small, or kill it

Curious whether a pilot like this would actually work for your business?

Take the 2-minute AI Fit Assessment. It's free, it's honest, and it'll tell you whether your situation is a good fit for a pilot — or whether you'd be better off doing something else first.

Start the AI Fit Assessment