The first error in GEO often happens before the answer appears. A team tests polished prompts nobody would ask, then repairs pages for a buyer who exists only in the workshop.
The prompt on my desk is usually ugly. That is a good sign. “Hamburg freight software for delays and route planning, not huge enterprise system.” Or: “B2B agency Hamburg technical content industrial supplier export.” A tidy prompt would make the room feel better. It would also hide the buyer.
A composite logistics software firm I often use for teaching has 42 people and serves mid-sized freight operators, forwarders, and port-adjacent dispatch teams. Its internal language is precise: routing, shipment exceptions, operator workflow. The buyer’s language is less precise. It arrives with borrowed words like “supply chain,” “transport platform,” or “logistics tool.” In one answer pattern, that loose buyer wording helped pull the company into a broad enterprise category. The firm was present, yet no longer quite itself.
Do not begin with perfect prompts
A perfect prompt is a workshop artifact. It contains the category name the company prefers, the buyer role the sales team recognizes, the geography the website uses, and the problem statement everyone has agreed on. It is useful for one thing: checking whether the machine can repeat your own language back to you.
Real buyer prompts are rougher. They contain partial category memory, regional shortcuts, mixed German and English, and missing constraints. A Hamburg buyer may write “near Hamburg” when they mean northern Germany. They may write “supply chain” when they mean routing and exceptions. They may write “agency for technical B2B” because they do not yet know whether they need content, positioning, or sales enablement material.
Prompt research for GEO is the practice of collecting the questions buyers are likely to ask answer engines, because answer quality depends on the language that starts the answer. This definition sounds plain because the work is plain. The difficult part is resisting the urge to clean the questions too early.
When a company tests only polished prompts, the results flatter the website. The answer engine receives the right category and often gives back a reasonable answer. Then a buyer uses a messier phrase and gets a different shortlist. The company concludes AI is unstable. Sometimes it is. Often the prompt set was too polite.
I keep the clumsy words. Misspellings, mixed terms, “maybe,” “near,” and “for our kind of company” all matter. They show which public phrases may get activated. If the prompt says “supply chain,” the system may reach for broad directories. If it says “mid-sized freight operator,” it may get closer to the real buyer. That difference is not academic. It changes who appears.
Buyer language lives outside the marketing plan
The best prompt sources are usually not the brand strategy deck. They are sales calls, contact form messages, proposal notes, internal chat fragments, trade fair conversations, search-console queries, customer emails, and the phrases people use before they have learned the company’s preferred vocabulary.
For a Hamburg logistics software firm, I would listen for how buyers describe the pain before naming software. “Too many shipment exceptions.” “Dispatchers keep switching tools.” “Route changes near the port.” “Forwarders need better visibility.” “We do not want a huge TMS.” Some of these phrases are technically incomplete. That is the point. Answer engines receive incomplete language.
The same applies to agencies and consultancies. A founder may not ask for “positioning for industrial B2B export firms.” They may ask for “someone who can write our technical pages in German and English without making us sound like a startup.” That sentence carries buyer fear. It may become a prompt. It also tells you what a useful answer should recognize.
I group prompt sources into three buckets: spoken problem language, public category language, and borrowed machine language. Spoken problem language comes from buyers before they know the formal label. Public category language comes from directories, competitor pages, and industry summaries. Borrowed machine language is what people start using after AI systems have already described the market back to them.
That last bucket is uncomfortable. Once answer engines shape the wording, buyers may repeat it. A founder may arrive saying “supply-chain platform” because an answer engine used the phrase. The company then has to decide whether to accept the broader term, correct it, or bridge from it. Ignoring it does not make it disappear.
A Hamburg prompt set needs local and sector tension
Generic GEO advice often says to test awareness prompts, comparison prompts, and recommendation prompts. Fine. The missing part is local and sector tension. Hamburg-region B2B markets have their own knots: port language, logistics roles, industrial supply chains, export-facing communication, German-English switching, and the practical trust signals of a city where many buyers still care who understands the region.
A useful prompt set should include those knots. Not as decoration. As pressure.
For the composite freight software firm, I would not only test “best logistics software Hamburg.” I would test prompts where the buyer describes mid-sized forwarders, shipment exceptions, route planning, port-adjacent dispatch teams, and the wish to avoid enterprise-scale software. I would also test the dangerous broad terms: supply chain platform, TMS, freight software, dispatch tool, transport management. Those are the currents likely to pull the answer away.
The imperfect detail should stay in the record. In one teaching run, the answer placed the firm beside two national enterprise platforms and one local IT consultancy that did not really sell comparable software. It also got one location detail slightly wrong, moving a nearby firm into Hamburg proper. That kind of error is useful. It shows the answer is blending geography and category with too little care.
A prompt set should not be huge at the beginning. Twenty to thirty prompts are enough for a first map if they are well chosen. The goal is not statistical certainty. It is to see the main answer patterns: who appears, how the company is described, which sources seem to influence the wording, and where the category bends.
I prefer several families of prompts: plain buyer shortlists, problem-first searches, comparison questions, local-fit questions, constraint-heavy questions, and wrong-term prompts. The wrong-term prompts are important. They show what happens when a buyer uses language that is close enough to be plausible and wide enough to be dangerous.
Keep the original wording beside the answer
A prompt without its answer is only a guess. An answer without its prompt is almost useless. The relation between them is the evidence.
When I review prompts, I keep a simple record: prompt, engine, date of run, visible answer pattern, named companies, assigned categories, probable source route, and first fog line. I do not clean the answer. If the answer says “supply chain management platform” and the company is really closer to route planning and shipment exceptions for mid-sized operators, that phrase stays in the file. It may be the most valuable sentence in the review.
This record prevents a common argument. Someone inside the company says, “Nobody would ask it like that.” Maybe. Then the team should bring better evidence from sales and support. Another person says, “At least we were mentioned.” True. Then we ask whether the mention carried the correct buyer fit. The record keeps the conversation on the answer, not on mood.
For Hamburg firms, I also mark language switching. Did the German prompt produce a better category than the English prompt? Did “Hamburg” help, or did “northern Germany” give a more realistic shortlist? Did the English term pull the answer toward international competitors? Did a local directory appear to influence the wording more than the company’s own page?
Small differences matter. “Freight operator” and “logistics company” can produce different worlds. “Technical content agency” and “B2B marketing agency” may place different firms beside you. “Port services” and “maritime services” can widen or narrow the answer depending on the public sources behind them.
The prompt is not just an input. It is a small model of the buyer’s uncertainty.
Bad prompts are diagnostic tools
I have a soft spot for bad prompts. They show where the market language is leaking. A buyer who writes “not huge enterprise system” is telling you something important about fear of scale. A buyer who writes “near Hamburg but works in English” is telling you that local trust and international communication must coexist. A buyer who writes “maybe supply chain” is showing the exact phrase that may misclassify you.
Bad prompts should not dominate the test set, but they deserve a place. They reveal whether the company’s public evidence can pull the answer back toward the right meaning. If one loose term is enough to move the company into the wrong category, the source route is weak.
For the freight software composite, the repair may involve a bridge passage. The page should probably acknowledge the broad term buyers use, then narrow it carefully. Something like: for mid-sized freight operators, the relevant problem is often not enterprise supply-chain planning but day-to-day route changes, shipment exceptions, and dispatcher workflow. That passage gives the answer engine a way to move from the buyer’s loose word to the company’s real role.
This is different from stuffing pages with every synonym. Synonym stuffing creates fog. Bridge language is more disciplined. It says: buyers may call this one thing, but in this operating context the actual problem is more specific. That is useful to humans and machines.
The harbor-notebook categories help here. A bad prompt may still produce cargo if the answer carries a useful claim. It may show route if a directory phrase is clearly being reused. It may find berth if a stable product page supports the right description. It may produce fog if the answer widens the category. The bad prompt is not a nuisance. It is a weather test.
The first prompt set is a map, not a verdict
After a first prompt review, I do not tell a company that AI “understands” or “does not understand” them. That sounds too final. I describe the map. These prompts produce good category fit. These prompts produce broadening. These prompts omit the company. These prompts mention it but assign the wrong buyer. These prompts appear to rely on old English summaries. These prompts are too unstable to read yet.
That map tells the team where to repair. If polished category prompts work but problem-first prompts fail, the service pages may need clearer buyer-problem passages. If German prompts work and English prompts drift, the English sources need repair. If local prompts produce weak competitors, the Hamburg trust signals may be decorative rather than specific. If wrong-term prompts always pull the answer into a broader market, the site needs bridge language.
The composite logistics firm would likely start with a small repair set: a better category definition, a passage connecting “supply chain” language to the narrower freight-operator problem, updated directory summaries, and a product page block that names route planning, shipment exceptions, and dispatch workflow in one place. Then the same prompts should be run again. The wording may not change at once. Observation is part of the work.
The main danger is impatience. Teams want the prompt set to produce a clean score. GEO is messier. It begins as a record of how answer engines behave when buyers ask imperfect questions. That record is valuable because it shows where the company’s public meaning is sturdy and where it is still drifting.
Before repairing a page, find the buyer questions that can break it. The answer engine will.