an Unlikely Strategist

Category: Uncategorized

Thor’s doggie door
I include these little projects, because the problem solving involved (identifying a need, experimenting with a solution, working within constraints) keeps instincts sharp.

Imagine you have kids and a new puppy. Our kids walk to and from school. They open the front door to get in and out of their house on the way to and from school. We had to chase our puppy, Thor, down the street a handful of times when he sneaked out between the kids legs thru the open door.

Meet thor

Eyeing his escape.

So a doggie door on the front porch.

Constraints:
- No new power tools- just my jigsaw, drill, clamps.
- Less than a hundred bucks if possible for all materials and consumables
- Will survive outdoors in Canada for more than a season,
- Looks like it fits with the wooden railings (brand consistency?)
- Reasonably robust, but relatively lightweight.
- Can be made in a few rain-free April hours.
Design:

The splurge: Fancy Self-closing hinge.

Doggie door latch

Finished Gate, Unpainted (pressure treated wood needs time before painting)

Construction is kind of a basic ladder-like shape matching the existing railing in terms of a riser spacing for 5” center to center. The door is 24” high so as to minimize waste on a 2x2x8” plank, but still match the overall railing dimensions. The frame is made from 2×2 pressure treated lumber to keep the weight down ( the railing on our porch combines 2×4 and 2×2, but the door must span 5’ and only needs to resist a 20lb dog. Weather resistant spring-return hinges and a simple catch ensure the door closes and locks when the kids forget to. To keep framing square and spacing correct,a 3-d printed jig helped to space and square the wood for fastening.

Materials and cost
- ~40 CAD for 5 planks of 2x2x8 pressure treated lumber.
- ~60 CADfor a pair of weather resistant spring return gate hinges (rated for 66 lbs, so overkill but they look real pretty)
- ~4 CAD for a painted steel catch, the kind you see on dog doors in dog parks everywhere.
Jig – showing pilot hole/screw template and join

Jig – the long side positions the wood risers 5″ center to center.

The CAD model

Consumables:

~40 CAD for extra long (6”) jigsaw blades for cutting the lumber. I could have rented a chop saw for $40 but wanted to use what I had.

~3 CAD to 3d print a jig from PVA

With taxes this came to $160 -could probably have gotten it done for under 80 if I used $10 (30lb) painted steel spring hinges in place or the fancy ones

Now, is it puppy proof?

Good enough – First time he saw it, Thor tried to squirm thru only to get stuck between the spacers. Second time he didn’t even try, after the experience of us extracting him. It slowed him down enough so no street chasing required.
April 13, 2026
End of a golden era for enterprise design.

In working on archiver and in exploring the landscape changes of ‘agentic systems’ it feels like we are moving past a kind of golden age of design. It’s a shift that in some ways is the antithesis of the shifts brought by mobile and the ‘consumerization’ of enterprise in the 2010’s.

The golden Age?

Select startups I know. Starting around 2010, enterprise startup/scale ups went from spending 0.5% to spending ~5% of ARR on design. Spending 10x on design didn’t however drive 10x revenue. I expect design spend to drop closer to 1%- Design and research do make a difference.

2010ish to 2025 marked a golden age for visual design for screens. Starting around 2010, many businesses invested in far more thoughtful visual and interaction design. The standardization of high resolution screens, typography as a service, and new web standards allowed precise visual control. In 2006, one of my enterprise clients made 100mm ARR and used me as sole part-time designer for all products. By 2015, their revenue was flat but they felt the need for a 3 person design team to maintain a higher standard of polish. When moving out of consulting back to corporate in 2017, I worked in two companies who struggled to maintain 30mm ARR but staffed 10 person design groups.

For many of these companies, for a lot of reasons, it was important to visually style the controls, navigation etc in much the same way as automotive stylists do for interiors. That gave birth to larger departments and to design systems and process. If you are unfamiliar with design systems, imagine a fusion of style guides, template libraries, and branding systems intended to make sure every designer in these big departments created consistent output.

I observe companies invest in design when design drives sales (thru a perception of professionalism or social conformity) renewals (via ease of use), or referrals (via utility). For design to drive sales, some human needs to see it and feel it.

Enter the LLM \

Many fellow designers look at tools like vercel, figmas ai tooling, magic patterns. Honestly they feel threatened. That misses the real dilemma- in practice tools like magic patterns help give voice to ideas that PMs and Engineers have. They promote great discussion, and help develop concepts. A bigger issue comes from how large design ‘brigades’ and design systems de-emphasize development of taste or visual judgement for consistency of production. Essentially when you turn creativity into assembly you have reason to feel threatened.

More design automation, fewer screens, fewer designers.

Agentic systems have the potential to vastly reduce the overall number of human-computer interactions which require a screen or physical interface, and lessen the sales value and budget given to design of interfaces in many business-to-business companies. Building archiver gave me a front row seat to agentic software engineering, the most mature of these uses. My greatest frustrations came from where I had to jump to different SaaS user interfaces, just to pull data to feed to Claude to help diagnose issues. Wearing an engineering hat- it would be so much nicer if the coding agent simply interacted with these services (more mcps!) in the same headless and seamless way it can on my local dev box. OpenClaw is another example of where this is going: headless access to email, calendar, social media posting etc… When they work, tools like OpenClaw reduce the number screens vs. machine interfaces. The automation from these systems may reduce the need for human centered screens in enterprise by 10x. Fewer screens mean fewer designers, fewer design managers needed for each dollar of revenue. Discussing specific issues of trust with agentic systems is another post entirely.

We’ve seen this before

This disruption reminds me of the first big shifts I saw as a designer. From the mid 1990’s thru 2000 two similar shifts occurred: Mid-career designers either adapted using computers for print or saw their careers cut short. A second transform happened in the dot-com era when much of the bread-and-butter work for print design and advertising collapsed in favor of web presence.

One other danger which I already see is the temptation for designer to drive novelty at the expense of usability. This is all too common in automotive interior design, particularly infotainment. I see it creeping in to general screen design.

We are not there yet; there is still a mindset where each product manager(pm) pairs with a designer. But as a profession, designers would be remiss to see the winds change, just like they did 26 years ago.

March 26, 2026
Figure studies

Over the years I’ve been fascinated in exploring composition and figure in a wide space that tends to martinalize the figure itself. The convention of figure studies is all about a kind of central position where the gaze centers on the figure. I’m more interested in what the props say, and how they work formally.

November 10, 2025
Case Study: User Research Strategy for BenchSci
An overview of high level strategy, with concepts of operation for illustration.

First Principles: Why do we need user research?

Without constant buyer and user feedback how would companies dial in and course correct product?  Researchers help answer the endless stream of questions around what to make and how to build. Great researchers translate learning to words and ideas anyone in a company grasps. Great researchers answer questions with data when decisions are made and not 3 weeks later. Great researchers can anticipate helpful inputs and work independently to answer questions not yet asked. When teams don’t have answers or don’t seek them out, they fly blind.

Scope: where we play.

We answer questions involving pharma customer users and buyers on behalf of anyone on a product development team (CFT) or senior leadership, with primary accountability to product management. We deliver first insights to most questions within a day and build confidence over 1-2 weeks.

Okay, break down why that matters?

Speed is key: History tells us that many successful companies first succeed because they are able to experiment, build, and refine more efficiently (and test more ideas) than competitors. Companies rarely get the right answer the first time.

Keep customers and buyers close. Know the difference: Successful teams frequently and opportunistically reach out to customers and develop proxies (e.g. academic scientists) to speed up work. Anyone, Product Managers, Designers, Engineers, could participate in research: Attending calls, putting ideas to customers or proxies, independently learning. No priesthood sits between the team and customers acting as a sole source of contact and interpretation. We believe teams work best, align well, and stay focused when each member hears directly from customers.

Be cash aware, but not miserly: BenchSci is not yet cash flow positive, nor is it entrenched. We run on a time clock. We earn seconds on days where deals sign, funding arrives. Market challenges cost us seconds. Because this clock never stops, we always need to consider the opportunity cost of our operational decisions. Have all the people you need and no more. Find a good compromise between speed and precision. If in doubt, go cheap and fast.

So how do we enable agile-speed answers …with Pharma?

What does the customer landscape look like? Pharma scientists as a group are very distinct from consumer software targets like tax prep:
- Not easy to recruit thru un-moderated testing pools (we tried)
- Show data they know in prototypes. Scientists respond very differently to data they know or techniques they work with frequently.
- Expensive to recruit outside of customers. Engaging a research recruiting agency costs ~$1000 per hour, with a 2-4 week recruiting time, and limited ability to record
- Favors, no comp. Customer scientists cannot be paid in any way for feedback – they volunteer out of good will or self-interest
- Busy Scientists are busy, well paid, professionals: there are many calls on their time and many companies looking to sell or collect feedback. Setting up calls can take a week or two at best with scientists we have an established relationship with.
- Secrecy: Drugs take a long time to reach the market, with significant first to market advantage. Research takes a while but is easy to ‘steal.’ So Pharmas build walls of access and protect their data.
- Difficult access: Onsite visits, particularly to facilities with manufacturing and research often require pre clearance and elaborate preparation, with limited ability to explore. Labs are often off limits, both due to secrecy and the polarizing nature of experiments which may use live animals.
Other qualities of our offering affect research:
- Low organic usage per user (1-2 active monthly sessions for engaged users. A typical session may span many looker sessions)
- Low total numbers of very high value users- ‘lead scientists’ are equivalent to engineering or product directors in a company like ours. When we are successful we may have thousands of users: a company like Facebook has billions. That changes how we can learn.
- Flows are exploratory, not linear or narrative. Unlike shopping checkout, business forms, or other linear flows, scientists use us more as a non-linear workspace. This makes it harder to learn with high polish, mostly linear, limited interactivity, slide-like prototypes of the kind made by Figma.
How does the landscape shape our response? (our strategy for excelling in this environment)

Solving for: low numbers of high value, difficult to access, unpaid customer participants.
- Pre-recruited proxy users – fast access, allows us to answer select questions with good correlation to ideal participants. Informally we call our set of industry scientists and academics our “academic pool,” and we compensate so we get reliable participation.
- Make the most of time: focus on techniques which extract the richest data per session and make participants feel value so they come back(ex. fewer but high quality moderated sessions which make use of many sub techniques as opposed to many surveys with low response)
- No touch deep observation (aka FullStory, Looker).
- Rely on available artifacts. We might not go to a lab, but we can see what story the slides, figures, sheets, and other artifacts tell about a process. For example, governance reports shed great insight into how customers value and contextualize different scientific information.
Solving for: Infrequent usage
- Separate training from real sessions – cross software data correlation (SFDC training dates, users, looker data.)
- Extend use to academia to scale user numbers – i.e. we get more users we can learn from
- Automated question capture – we capture when someone puts a question to XXX, inputs a structured query to YYY or ZZZ, or any future input in a yet to be developed product. We then passively track their response to our output.
- See every session – ex. fullStory segmenting
Solving for: Scientists respond differently to data they know
- Ensure scientific accuracy of research materials (low fidelity, high fidelity, engineered prototypes, cards for sort etc…)
- Where possible, match the materials to the participants background – e.x. Show publication examples relevant to the scientist’s specialty
- In prototypes, favor data accuracy and interaction over visual fidelity. I.e. A spreadsheet or informal engineering prototype may be richer for us than a refined figma. Vibe coding tools such as vercel can help in specific cases.
Solving for: Keeping up with engineering speed (agile)
- Decentralized / High Autonomy — Dedicate a researcher to each team that matters. Give them full authority to do the work. Emphasize fast, informal, flexible working where insight for a question happens in a day not weeks. Emphasize direct PM/Engineering contact with users to pre-build empathy. These are hallmarks of a research operations concept called continuous discovery/validation.
- Keep cost per insight or session as low as possible- Lower cost means more sessions
- Focus team core on enabling, coaching, and unblocking. For example: securing budget and executive buy-in, coordination of initiatives with go-to-market(GTM), coaching and teaching.
- Reduce long lead/ tail items on research through automation or process reduction. E.x. recruiting and de-emphasized write-up.
November 4, 2025
AI for User Research @ BenchSci: Tactics that worked.
TLDR: For our use: maybe 5% more speed with acceptable quality. Biggest benefit: coding for spreadsheets / data analysis.

Context –

Until recently, I ran user research benchSci, and mentored designers and product managers. We put heavy emphasis on effective decision support for product strategy (building the right things) in addition to the usual kinds of design/PM support. We did not do market/communications research. I’ll try to limit this to user research and not general productivity. Doctrinally we followed the principles of continuous discovery / validation with a small user base.

Rules of the road for the Research team

AI helps if it accelerates some balance of speed or quality while respecting the rules.
- If you bring data to stakeholders too late, it’s not helpful.
- If the data you bring is misleading, its not helpful.
- numbers speak powerfully, but avoid misleading quantification.
- avoid jargon- always assume you speak to a design/product outsider with no interest in the finer points of CIs and statistical significance.
- stakeholders only tend to check work when the conclusions are inconvenient and trust it when they agree with it. Have your homework ready.
We Explored AI for
- rapid prototyping
- synthetic users
- data analysis
- automating review/extraction from recorded calls
- LLM as judge for design evaluation
- writing for UX
- general writing
- perf reviews
- making historical research accessible.
Not Explored

We did not explore uses where we already had reliable and effective automation or absolutely required human oversight (recruiting /compensation pipeline)

Where LLMs really helped:
1. Writing for UX. Designers are often tasked with key bits of text in the UI. As a group, designers do not write well. This was a big win for new products/features.
2. For us, 1 in 10 studies demanded low visual fidelity, highly interactive prototypes, with domain accurate content. Figma sucks at that, particularly tabular data and data visualizations. Vibe coding really helped that. This benefited new / early ideas more than shipped features in refinement/optimization. When it made the difference, it really made a difference.
3. Spreadsheet like a pro.
  There was always some project that required fusing different data, even for things like survey data fused to SFDC outputs. Advanced spreadsheet work is a kind of coding. The research team can do a little stats, we’re perceptive observers, and good interviewers. Not everyone on team can own spreadsheets like a financial analyst. This helped a lot with a small number of critical projects.
4. Tone on general comms-
  Used ‘LLM as judge’ on communications. For critical written company communications and things like performance reviews, LLMs definitely helped avoid gotchas— they didn’t improve speed or productivity but made sure poorly chosen words didn’t trigger morale or cause defensiveness.
5. Making historical research accessible
  We tried a few approaches to this, starting with adding research reports to a notebookLM. In the end we found more consistent results by translating research reports to markdown and bullet pointing them for LLM. Getting the organization to go and ask was an institutional barrier we worked to overcome: PMs still preferred to use the researcher on team as a human encyclopedia in meetings. Senior leaders, when they chose to ask approached me directly.
Shows some promise, but not there for us

While applying just a foundation model might not work… there is much to try with hybrid approaches.
1. Synthetic users for ‘attitudinal/behavioral’ questions
  (i.e. ‘what are your biggest challenges with ‘x’ )Again, this is a mixed bag depending on how much underlying high-quality data we had. It felt like we never got to critical mass – it works great when say a newly hired PM or designer asks an ‘old’ question where established data exists. We used a pseudo-RAG approach (.md files accessed via cursor prompt against different models) It wasn’t as good as our control (‘ask a researcher thru slack’) in terms of quality, but response time was better.
2. Getting insight from interview transcripts. Ideally automated insight.
  This is another one we really want to work, especially in an automated way. We had a lot of recorded conversations with customers well beyond research sessions. Both the services team and ours would love to get data from them without the time cost of manual review. So, we wrote some custom prompts as well as using dovetail’s evolving toolset. We saw improvement, but not enough maturity to save time: Commonly extracted data confused speakers (customer vs us), generalized away details, and depended on high quality transcripts. Bio research is a jargon heavy, with researchers from all over the world who have different command of English. It wouldn’t save us enough time while delivering accurate insight. This wasn’t strictly an LLM problem, it’s a data quality problem.
Unhelpful with state of art in 2025

We saw a good chance outputs may drive confidently bad decisions. Not confident that base foundational models will focus on these – it may require custom / external evaluators.
1. LLM as heuristic judge for basic usability issues.
  We really wanted this to work. This is a subcategory for synthetic users. In early designs you often find a lot of issues with ‘basics,’ well established heuristics. We tried a mix of custom prompts around this, submitting single screens. The hope was to increase speed for designs in a similar way to unmoderated testing but with somewhat better precision and a lot more efficiency. The LLM tended to miss on visual and behavioral, focusing on words in the screens or screen images.
2. General writing of reports for humans.
  Thru trial and error we found the most effective reports for our company tend to be highly visual with very few words (also memes), accompanied by a short slack summary. LLMs are great writers, but… summaries tended to miss the points we were trying to make. It’s very likely that newer tools for presentation creation will help here.
3. Study design -. In general, it didn’t save time or improve quality
4. Agentic Interface Design – Depends on the end goal: If you seek to save hours on Figma production it may be possible with future models that encode and ‘speak’ interaction, cognitive psychology, and visual concepts and not just memorized markup. Compared to 20 years ago a lot of design today is not that experimental. Tools like magic patterns do a pretty good job of pulling from what’s out there and offer a nice way to use nondeterminism to generate variations. That can save time and help communication between non-designers and their teams.
5. Generating personas or foundational research with just a foundation model search.
  When asking foundational models, we got surprisingly accurate reads on how preclinical pharmaceutical research works at a high level. When asking models to create personas, the form was amazing, but the data was only between 50% and 85% accurate when compared to our data. I would have big concerns that PMs and Designers using this data without secondary checks would make consistently bad decisions with confidence.
One final thought: Don’t let AI isolate you from customers.

Anytime you put a layer between PMs and users is a fail. Face to face interviews drive alignment far more than any research report. In my opinion listening to and observing real users and customers directly and frequently makes the biggest difference in product quality and fit. The more isolated PMs, execs, and designers are from customers, the worse their decisions become. Many AI tools for research position themselves as a timesaver in automating every thing from study design to final reports. They typically position to the haerried PM or the research manager.

As it stands execs don’t often interact with customers. Bad PMs certainly don’t.

Apologies for the long rant,
Jeremy
November 4, 2025