If you work in investing—or really any information‑dense business— and you want to use AI, this is an exciting and scary time. A year ago the problem was “there just aren’t many off‑the‑shelf AI tools that work with our data.” Today the problem is the opposite: every week there’s another startup promising to index your deal room, draft memos, or ride along with your IC meetings. You want to get your hands on these tools to see if they can deliver, but before you can do that, you must endure the IT equivalent of a root canal: the infosec review.1
These reviews are necessary. It’s important to protect data, and long timelines haven’t historically been a huge issue when looking at big systems that might take 6-18 months to put in place. The problem is that today the pace of innovation has increased rapidly, and there are a ton of new point solutions emerging. These processes just aren’t designed to allow the rapid testing required in this new age.
So, here's what happens: team fills out a questionnaire → vendor fills out questionnaire → follow‑up calls → pen testing occurs2 → legal involved → repeat step 3. Three months later you get the green light only to realize three other vendors have launched new features that you want to test.
That lag hurts on both sides. Funds and corporations fall behind the frontier and are hesitant to try new tools, while cash‑strapped startup software companies find themselves in sales cycles so long that they struggle to survive. So, how do we keep our secrets safe and move closer the speed of innovation?
What Exactly Are We Testing?
The most common category of tool right now among funds is knowledge management copilots. Examples include Sana Labs, Athena Intelligence, Rogo, BlueFlame, Hebbia, Glean, etc.—tools that let an LLM answer questions about your internal data and reliable, paid, third party sources. Use cases include getting smart on a new asset quickly, answering questionnaires from LPs (DDQs), and drafting IC Memos.
I’ve also tinkered with point‑solutions that build briefings about new assets, co‑pilots that sit in IC meetings, and systems that create first passes at market sizing models. (I’m not mentioning names until I’ve done enough testing to confirm these really work.)
Because there are so many vendors, testing is critical to make sure a tool you are interested in is the right fit for your needs.
Security people are correct —piping private data into someone else’s environment is risky. A bad leak could blow up a deal or trigger financial penalties. But that must be balanced against the risk of moving too slowly.
Imagine Fund A spends four months on security reviews of a KM vendor and then another 6 months piloting. Fund B, meanwhile, starts piloting on day one and, three months later has adopted and rolled out. Importantly, Fund B also starts organizing their data right away, which is a key step.
Now, Fund B can bid faster and more confidently in the first round by leveraging their prior deal data and beat out Fund A on an exciting asset.
The goal isn’t to be reckless but to figure out how you can be Fund B.
A Four‑Level Sandbox
To move faster, I recommend setting up a four-level sandbox that is fully segregated from the rest of your IT stack.
Level I – Public / One Company
A handful of 10‑Ks, press releases, analyst reports and earnings calls for a single public company.Level II – Public / Many Companies
The same material for ~20 names across sectors.Level III – Private / One Company
A lightly redacted diligence folder from a closed deal five or more years ago—now you are testing on real data but it’s old enough to no longer be sensitive. Include IC memo, deal model, key artifacts from data room, consulting reports, etc.Level IV – Private / Many Companies
Diligence folders for ~20 names across sectors
A few guardrails:
Require baseline controls (SOC 2 Type II,3 penetration test, encryption at rest, SSO) before anything touches the sandbox. These are all pretty standard credentials, and if they can’t offer that you’ll never get comfortable
Security runs its full checklist in the background. When they finish, the vendor graduates to production if they’ve also proven value in Level IV. Ideally, you’ve gotten through most of the items on the list before Level IV testing
You advance through the levels only when the tool shows that it really works. You are never exposing any private data to products that don’t work.
The beauty of this approach is that real evaluation starts on week one, not month four.
Bottom line
When you propose this, someone will ask, “So any SOC‑2 vendor can just walk in the front door?” Sort of—but only into the Level I foyer. The upside is speed: analysts can start testing while they are still excited; IT and legal are comfortable with the risk level; the vendor gets a sale before they run out of money.
That speed matters because the frontier is expanding rapidly. OpenAI just announced the ability to connect to your own document repositories. Several KM players are offering agent‑based search as an alternative to RAG, promising much better results when searching across multiple deals.4 Waiting six months to even start piloting means you’ll always be evaluating last season’s tech.
And after all that, picking a winner is only phase one. Next you need to organize and index the data. Then, you need to get people to actually use the new tools, which is the toughest part! That’s going to have to be its own post for another time, though.
If you don’t know what I’m talking about, here’s a basic summary. Most businesses have a lot of confidential data. Before a third party can have access to that data, they need to be thoroughly vetted to ensure the data won’t leak out or get hacked. This is obviously very important, but the problem is that the process can often take 2-4 months to complete, which is an eternity in AI years.
Short of penetration testing. This is when white hat hackers try to break into the system. It’s usually a requirement for infosec, but it’s expensive, so startups don’t like to do it
This is a common credential that indicates that standard security best practices are in place
Retrieval‑Augmented Generation grabs a static chunk of documents and stuffs them into the prompt. Agentic approaches orchestrate multiple steps—search, reasoning, follow‑up queries—until the agent “believes” it’s answered your question. In larger data sets that cover multiple deals, agents can systemically search all of the deals while RAG often pulls relevant docs but only from a handful of deals. (That’s partly why level II and IV includes 20 deals because that will push a RAG tool to the limit.)