How to Choose an AI Consulting Firm: 12 Questions to Ask in 2026
The single most important filter when hiring an AI consulting firm is whether they can deploy inside your own infrastructure and prove where your data lives during inference. Here are the 12 questions to ask, grouped by what they reveal, with the answer a good firm gives, the red flag to watch for, and how we'd answer each one ourselves.
Choosing the firm is the highest-leverage decision you will make about AI, and most of the failures happen here rather than in the technology. A widely cited 2025 MIT study found that roughly 95 percent of enterprise generative-AI pilots delivered no measurable return, and Gartner has projected that at least 30 percent of generative-AI projects are abandoned after the proof-of-concept stage. Very little of that is the models falling short. It is the wrong partner, the wrong scope, and data that was never handled seriously in the first place.
The most useful filter for avoiding that outcome is also the simplest: can a firm deploy inside infrastructure you control, and can they tell you, precisely, where your data lives while the system is running? A firm that answers that clearly tends to get everything else right too, because honest data handling is downstream of taking the work seriously. The twelve questions below are the ones we would ask if we were the buyer. We have grouped them by what they reveal, given you the answer a good firm gives and the red flag to watch for on each, and — because we would rather be measured than take your word for it — answered all twelve ourselves further down.
The one question behind all the others
Before the list, the filter worth stating on its own: if a firm can’t tell you where your data lives during inference, that’s your answer. Everything about compliance, security, and trust flows from data location. A firm that treats it as a footnote is telling you how much thought the rest of the engagement will get. Every question that follows is, in some sense, a way of asking this one again from a different angle.
The 12 questions
Data and security: the foundation
1. Where does our data live during inference? This is the question the other eleven are built on. Good answer: a specific environment you control, or a named service under a signed agreement. Red flag: hesitation, or “we’ll figure that out later.”
2. Will you deploy inside our infrastructure if we need you to? For regulated work, private deployment inside your own cloud tenant, VPC, or on-premise environment is the cleaner model, because your data never crosses your perimeter. Good answer: yes, with experience doing it in your environment. Red flag: only their platform, no other option.
3. Do you train on our data? Your proprietary information should never become training fuel for a model other customers use. Good answer: a flat no, in writing. Red flag: a qualified answer, or a pointer to a long terms-of-service document.
Compliance and auditability
4. Which regulations did you design this against? Compliance is a property of the build, not a checkbox added later. Good answer: the specific frameworks that bind your data — HIPAA, GLBA, SOC 2, the NIST AI Risk Management Framework — named correctly. Red flag: generic reassurance with no specifics.
5. What gets logged, and can we audit it? When a regulator asks why the system did what it did, you need a record, not a recollection. Good answer: every AI-influenced decision records its inputs, output, sources, and model version. Red flag: no audit trail, or one promised for “a later phase.” (We wrote a fuller piece on what regulators actually ask for.)
The team and the build
6. Who actually writes the code? Accountability and quality both live here. Good answer: the senior people you are talking to, or named engineers you will meet. Red flag: an unnamed team, or quiet offshoring after the sale.
7. Do we own what you build, and can we maintain it without you? A system you cannot run without the vendor is a subscription, not an asset. Good answer: yes, with documentation and an optional handoff. Red flag: lock-in by design.
8. Can you show us something comparable in production? A demo proves a firm can build for a stage; production proves they can build for reality. Good answer: a concrete example for an organization like yours. Red flag: demos only, or references that never made it past a pilot.
Money, speed, and risk
9. How do you price, and what is the total? A firm that can scope the work can price it. Good answer: a flat, fixed-scope number you can see before work starts. Red flag: hourly with no ceiling, or a total that depends on usage. (For the real drivers, see what custom AI costs in 2026.)
10. What will you measure success against? Without a baseline, “it’s working” is an opinion. Good answer: a real baseline taken from how the work is done today. Red flag: “you’ll know it when you see it.”
11. How long until we see something real? Time-to-value is the best early predictor of whether a project ships at all. Good answer: weeks to a working version, not quarters. Red flag: months before a first deliverable.
12. What happens if the project fails? How a firm answers this tells you how it thinks about your risk versus its own. Good answer: a clear account of scope, responsibility, and what you keep. Red flag: the question gets waved away.
How we’d answer our own twelve questions
It is easy to publish a checklist. It is harder to stand behind it. So here is how we answer the same twelve, plainly.
Your data lives where it already lives. We build private deployments inside your own cloud tenant, VPC, or on-premise environment, so protected information never leaves your perimeter and there is nothing to “figure out later.” We do not train on your data, and we will put that in writing. We design against the frameworks that actually bind you — HIPAA, GLBA, SOC 2, ISO 27001, the NIST AI RMF — and we treat the audit trail as a first-class part of the system, so every AI-influenced decision records its inputs, output, sources, and model version from day one rather than as a later phase.
The people who scope the work are the people who write the code: a team of MIT engineers, with senior review before anything reaches production. You own what we build, with documentation and an optional handoff, because lock-in is not our business model. We can show you systems running in regulated environments, not just slides. We price flat and fixed-scope, so you see the total before work starts — and we explain the real cost drivers openly. We measure against a baseline taken from how the work is done today, we ship a working version in weeks rather than quarters, and if a tightly scoped first engagement does not earn its keep, you keep the readiness work and the roadmap regardless.
That is not the only honest answer to every question. There are workflows where an off-the-shelf tool like Copilot is genuinely enough, and we say so in our Copilot vs. custom AI comparison, just as we name where a Big-4 firm is the better call in Soren vs. traditional consulting. The point of the twelve questions is not to steer you to us. It is to make any firm — us included — show its work.
A scorecard you can copy
Run any firm through this and mark each row green or red. A column of greens is a partner. A scatter of reds is a warning, and a cheap one to act on now rather than after a failed build.
| # | Question | What a green looks like | What a red looks like |
|---|---|---|---|
| 1 | Where does our data live in inference? | A specific, controlled environment | Vague or deferred |
| 2 | Will you deploy in our infrastructure? | Yes, with prior experience | Their platform only |
| 3 | Do you train on our data? | A flat no, in writing | A qualified answer |
| 4 | Which regulations did you design against? | The frameworks that bind you, named | Generic reassurance |
| 5 | What is logged, and can we audit it? | Designed-in audit log | Bolted on “later” |
| 6 | Who writes the code? | Named senior engineers | An anonymous team |
| 7 | Do we own and maintain it? | Yes, with handoff | Lock-in by design |
| 8 | Comparable system in production? | A real, similar example | Demos only |
| 9 | How do you price, and what’s the total? | Flat, fixed scope, up front | Open-ended hourly |
| 10 | What is measured? | A baseline | A feeling |
| 11 | Time to something real? | Weeks | Quarters |
| 12 | What if it fails? | Clear scope and what you keep | Waved away |
Why these questions favor a certain kind of firm
These criteria are not neutral, and we will say so plainly. The questions above reward firms that deploy inside your control, write their own code, price transparently, and design for audit from day one. That describes how we work — but it is not why the list looks the way it does. It looks this way because it matches what the failure data points to. The MIT and Gartner findings above both trace the same causes: unclear value, weak governance, data that was never production-ready, and pilots that no one was accountable for shipping. The twelve questions are aimed squarely at those reasons projects die. A firm that answers them well is, almost by definition, a firm whose projects tend to survive.
After you have the answers
Asking is half of it. The other half is starting small enough to verify the answers against reality. A tightly scoped first engagement, like an AI readiness assessment, lets you watch how a firm actually works before you commit to a larger build, and it surfaces the data and governance gaps that sink projects later. If you are still weighing the model itself — buy an off-the-shelf tool, build in-house, or hire a partner — our build vs. buy vs. Soren breakdown lays out when each is the right call, and what actually changes between an AI-native firm and a traditional consultancy goes deeper on the trade-off.
If you want to put these questions to us directly, book a demo. We would rather you ask all twelve than fewer.
Frequently asked questions
- What should I ask an AI consultant?
- Start with where your data lives during inference and whether they will deploy inside infrastructure you control. From there, ask who actually writes the code, how they price, what they will measure success against, what happens if the project fails, and whether they can show you a comparable system in production. Those answers separate a firm that ships from one that sells.
- How do I know if an AI firm is legit?
- Ask for a concrete example of something they have put into production for an organization like yours, ask who on their team will do the work, and ask exactly where your data goes during processing. A legitimate firm answers all three plainly and quickly. Vague answers to specific operational questions are the clearest signal to keep looking.
- What is the biggest red flag in an AI vendor?
- A firm that cannot tell you where your data lives during inference. If they hesitate, deflect, or describe it as a detail to sort out later, that hesitation is your answer. Data location is the foundation every compliance and security question is built on, and any serious firm knows theirs cold.
- Should an AI firm deploy in my own cloud?
- For regulated or sensitive work, yes, deploying inside your own cloud tenant, VPC, or on-premise environment is the cleaner model because your data never leaves your perimeter. For low-sensitivity workflows a managed service can be fine. The point is that a good firm can do either and will tell you honestly which your situation actually calls for.
- How much should an AI consulting project cost?
- A good firm gives you a flat, fixed-scope number before work starts, not an open-ended hourly rate that climbs with the project. The total depends on the workflow, but predictability is the tell: a firm that can scope the work can price it. We break the real drivers down in our guide to what custom AI costs in 2026.
Curious where AI can bring the most value to your team? Let’s talk.
Book a demo