• Apologies for the delayed release on this one – I know you all were waiting with bated breath and I dramatically overplayed my hand last time when I promised this update “next week”.

    My rationale for why it was cool for me to keep you waiting? My daily AI drivers have changed multiple times over the last few weeks, so it made sense to try out the new toys. Also, I tore my ACL & had to have surgery, both kids have been sick, ***load of work, etc. You know what they say about excuses right?

    Anyway just like Marc Benioff, I’ve been using ChatGPT basically all-day, everyday for 3 years and now I’ve mostly ditched GPT for Gemini & Claude. Luckily for OpenAI I have kept my max tier sub because I invested in creating a ton of Custom GPTs. And their Codex CLI product still feels like the most reliable coding agent, which I have pretty extensively customized for several of my current coding projects. But I’m not sure how long OpenAI can keep me with how fast the others are gaining ground & providing more bang for the buck.

    Everyone is talking about how Gemini has come out strong this AI season, and I can’t help but agree. They’re offering a really compelling AI product suite that is starting to integrate seamlessly with all these Google products people already use. And in ways that are actually useful. I’m still just scratching the surface myself, but Gemini has integrated with Google Photos, Google Earth, Google Maps, Google Home – all us lowly Android users get Gemini as our assistant…the list goes on. You don’t have to be an industry insider to see that Google’s position as an already-profitable company with many revenue channels seems far sturdier than OpenAI, who basically has the-currently-unprofitable-ChatGPT…and nothing else. But the same could be said about Anthropic, and both of these companies seem to have no problem finding a seemingly infinite, voracious capital stream to feed the beast that seems driven more by FOMO than PE.

    From OpenAI’s perspective, Anthropic’s Claude should be viewed as just as potent a competitor as Gemini. The recent Opus 4.5 update is a serious upgrade to previous models that puts it on par with Gemini for reasoning and arguably better at certain tasks involving code or finance. Claude also gives me by far the best results for productivity tasks like editing documents, because its Claude Code DNA allows it a far greater degree of precision than the other major LLMs. Anthropic has been positioning itself as the ‘AI for business’ and you can really feel that when you’re using the product.

    So without further ado, here’s my personal ‘power rankings’ for my daily drivers:

    RossAI Power Rankings*

    General UseVideoPhoto
    Gemini 3.0 – now the best at most things. Best reasoning, very fast, gigantic context window. Lots of additional features – I use the Gemini Live voice-to-voice a lot.Google Veo3 – best EZ button video generator right now. Incredibly lifelike humans, but requires detailed prompting and patience.Gemini nano banana – delivers incredible photorealism, complex scenes, continuity. It’s crazy how far this has come.
    Claude Opus 4.5 – the best at precision tasks or specifically business / finance, or coding tasks. Opus is actually pretty incredible, the only downside is you get less bang for your buck here vs the entire Gemini AI suite.Kling AI, Higgsfield & Weavy AI – when you need long-form AI video, Kling, Higgsfield and Weavy are the way. Require a little more knowledge but allows you to get incredible specific results.

    Midjourney – when you need something more vivid / fantastic / creative, MJ is the way. Can take a little more tuning than NB and might take more tries to get something that looks ‘real’ if that’s what you’re going for.
    Chat GPT – now pretty much only if I’ve already built a custom GPT or have a detailed folder for that task. Its 5.2 update only just came out so we’ll see how it compares to the massive upgrades Google & Anthropic just put out.OpenAI Sora – 2nd best EZ button video generator. Fun horserace between Sora and Veo, but right now I think Veo is slightly ahead. This seems to literally change monthly so plan on following them both if you’re in this biz.Adobe Photoshop – if you want to edit existing photos using AI, photoshop’s capabilities are now at the near-magical level. It is truly insane how easy it is to do things I used to describe to clients as impossibly-expensive.
    Grok, Meta, Deepseek, etc – I honestly rarely even boot these other than out of curiosity to see how their updates are coming. Grok/Meta are fast on processing stuff if you want a large data table formatted. But other LLMs do pretty much everything better rn.Stable Diffusion – what a lot of pro’s and internet memes are using to create crazy AI videos with real world figures, violence, etc…because it has no content filters. Open source & runs on user machines.Stable Diffusion – comes in both image and video flavors. I don’t find myself needing to use this often, but if content filters – which are aggressive – are tripping you up on a project, this is where I’d go next.
    DesignCodingOrganization
    Adobe Firefly / CS AI tools – allow you to bring generative AI into your existing power suite which lets you actually use a high degree of precision with your generative AI. Of all the things, AI still kinda sucks at some types of design – looking at you logos – so most designers will want to use it for some mechanical elements but not everything.OpenAI Codex / Claude Code. Huge battle between Claude Code and Codex. I’ve switched multiple times back & forth between them. The consensus seems to be – as of today – that Opus 4.5 is smarter / more elegant, but GPT-5.2 Codex might be more reliable at implementation.Notion 👑- I’ll write a whole article at some point about how much I love Notion. Notion is how I keep everything I do straight & I also use its platform AI features constantly because it lets you edit datasets super easily.
    Canva – stepping up the ability to do layered generative similar in theory to Adobe, and its much easier to work with.Cursor, Windsurf, Loveable – make vibe coding a little more accessible with a nice web UI wrapper, allow you to easily swap between models. But generally speaking you do pay more per task to use them vs just getting your code agent’s straight from the source.NotebookLM – google product that’s well integrated w/ Gemini and might work better for some people if they want a simpler system than Notion.
    Figma – generative UI/UX features can feel like magic for web / app designers
    Claude is actually great at web design / UI prototypes.

    *as of 12/19/25

    Honorable mention / tool that doesn’t quite fit a category is ElevenLabs. ElevenLabs does SOTA voice generation, which has been incredibly impactful in several of the businesses I’m working on. It allows you to create extremely realistic voice clones that can read scripted text, and also allows you to very easily create voice agents that can access and leverage a custom knowledgebase you provide to do some pretty serious heavy lifting in terms of conversational engagement with customers / suppliers / etc. I have used it quite a lot for several production projects and I think it’s one of the stronger AI platforms around.

    These are totally unofficial and probably will be completely outdated starting tomorrow. There are however a few places that actually keep very on-top of this stuff that you can bookmark if you want to be really informed. It will make you super popular at all the parties I promise, everyone loves talking about this almost as much as they love hearing about my fantasy football team.

    LMArena:

    Overview Leaderboard

    Artificial Analysis

    https://artificialanalysis.ai/leaderboards/models

    Vellum

    https://www.vellum.ai/llm-leaderboard

    LLM Stats

    AI Leaderboards 2025 – Compare All AI Models

    I probably missed some good ones – let me know in the comments if there are some others I should have included.

    Final thoughts for this one – I think most of the world doesn’t realize how many LLMs there really are. While my list above is basically all ‘frontier’ models that are closed-source and ‘best of the best’, there are now thousands of open-source LLMs and SLMs (small-language models), many of which are completely free if you have the hardware to run them.

    Check out https://huggingface.co/ for a gallery of some of the more popular ones.

    A lot of these open-source models can do basic processing work as well or better than frontier models, and can be run on a pretty standard desktop computer.

    The margin between these models and the frontier models broadly seems to be shrinking more than it’s growing. In many cases, people are starting to realize that highly specific, task-oriented small-language models (SLMs) are way more efficient when it comes to tasks like data processing and computation than LLMs. The analogy that stuck with me was “using an LLM for basic math is like asking an English PhD to do algebra.” He can probably do it, but you don’t need to know Shakespeare to populate my spreadsheet.

    For the most part, I still gravitate towards the big guys, but I have definitely been playing with Ollama and some math-specific models on some of the AI finance projects I’m working on.

    Long story short, as we all know, it’s a fast moving world to try to keep up with.

    What I’ve learned is basically just to budget time for consistently researching where the bleeding edge is, and to try to prepare for switching costs because I know I’m going to change tools constantly. Now, when I’m building custom GPTs or Gems, I try to put all my source code / instructions into Notion first to keep everything platform agnostic.

    Alright that’s what I’ve got for today. Let me know if there are AI tools you’re using that I missed, or if you think I’m completely wrong on my takes.

    To keep myself honest I’ll say there’s probably no way I’m putting up another article next week, but in the next few weeks I’ll organize and share a database with a bunch of the AI accounts I follow on various platforms to stay informed.

    Cheers and happy holidays!

  • Frameworks

    Here’s my personal framework ladder for applied AI, based on complexity:


    1 – Fundamentals – Prompting

    2 – Complex Applications – Chaining Prompts, Generative Apps

    3 – Agents / Automation – No-code / low-code

    4 – Advanced Automation – CLIs, Custom Automations & Agents (pro-code)


    You might not intend to build an agent or learn to code.

    But understanding how AI automations work opens up a whole new way to see problems.

    That perspective creates new possibilities.

    We’ll get to that soon. But the truth is, you can accomplish a hell of a lot by just getting good at the fundamentals. And you’ll never do the advanced stuff well without mastering the fundamentals.

    In boxing, its the jab. In LLMs, its prompt structure.

    So here are some of my thoughts on getting the most out of your prompts.


    LLM Structure

    Caveat here – I think ‘Prompt engineering’ is going to matter less as the tech gets better.

    Big AI is working tirelessly on making every new model better at reading our most garbage inputs.

    But there are some fundamental concepts that have staying power.

    Here’s my TLDR for LLMs in November 2025:

    • Role Prompting – almost always worth including; eg “You are a McKinsey consultant”.
    • Return Format – where you want to spend your energy for complex prompts.
    • Context Management – LLMs have memory limitations. If you pass the limit, they blow up.
    • Hallucination and BS – LLMs make stuff up. You have to anticipate and mitigate.

    Role Prompting

    Role Prompting is so effective because it gives the most useful context with the fewest words.

    “Role: creative director”.

    “Role: tax accountant”.

    Two very different roles. Two totally different perspectives.

    Different skills, ways of speaking — and totally different knowledgebases for the LLM to access.

    Role Prompting is a shortcut for the LLM that immediately orients so much context for what its doing, how it should process, and what language it should use in its response.

    Its the best bang for the buck.

    So it’s almost always worth assigning a role at the beginning of your prompt.

    When you tell the LLM “Role: Financial Analyst” in 3 words you’ve already conveyed a whole world of expertise it should invoke in whatever response follows.

    I’m far from the only one to talk about this, so here are some good resources straight from the horse’s mouth.


    Return Format

    LLMs are incredibly good at guessing. But they can’t read your mind.

    They work best when they understand exactly what output you’re trying to fit.

    The more you can detail the exact return format, the happier you are going to be with the results. Specificity of return format is like superfood to LLMs.

    You don’t need a fancy elaborate prompt structure, you just need to be specific.

    Prompting “give me a 5 page report” is far superior to “give me a detailed report”.

    You’re helping cut out the guesswork, and it will focus more energy on getting the right content.

    Asking for results in a table format with defined fields is even better.

    Here are some examples.

    A)

    “Read the change history in this document and give me a 5 page report on what has happened over the last 7 days. 

    B)

    “Return the results in table format with the following setup:
    
    Property Name | Year 1 | Year 2 | Year 3 | Year 4 | Year 5
    Net Operating Income by year | 
    Net Cash Flow by Year  |
    

    C)

    “Score the following responses based on the Categories I have defined above & Produce strict valid JSON matching exactly the following structure and keys: 
    { "question_content": { "q1_why_acquisitions": "", "q2_invest_100m": [], "q3_new_deal_criteria": [], "q4_strengths_weaknesses": [] }, "question_approach": { "q1_why_acquisitions": { "communication_clarity": "", "conciseness_efficiency": "", "specificity": "" }, } }” 

    Context Window

    They say brevity is the soul of wit.

    And so it is with AI. Pruning makes better prompts. If you don’t need the word – cut it out.

    Just like any good copy editor would tell you, every word costs money.

    LLMs are still fundamentally limited by their “context window”. This is another way of saying there’s a limit to how much information they can store in their memory before they start to break down.

    Have you ever done a really long LLM chat where the responses started to get worse and worse? That’s because you’re pushing the context limit.

    The bigger your LLM conversations get, the more more the LLM will hallucinate.

    Every word in a prompt or an artifact you’ve uploaded to an LLM takes up context.

    That means there is a sweet spot between giving too much information and not enough information.

    Good reference data is essential to get the answers you need, but if you provide too much data, you’ll overload the context window and you’ll start to get poorer results.

    So you want to keep LLM conversations lean.

    Don’t go off on a million tangents in one megathread. Start new a conversation for each new topic.

    Often I’ll run a two step process: first I ask an LLM to process a large data set and come up with a consolidated analysis / summary.

    Second, I take that summary to a fresh conversation, where the next LLM can leverage the data summary without having to burn through all its context on the raw data.

    This two-step approach can dramatically improve results if you’re dealing with big documents or large spreadsheet like datasets.

    One of the easiest hacks to manage this is the “handoff prompt”.

    If your conversation is getting long-winded, here’s a prompt to transition to a fresh conversation:

    “prepare a detailed 1-3 page handoff prompt explaining what we’re doing and what we still have left to do, along with any important context so I can continue this conversation in another chat.”

    Hallucination & BS

    AI lies with high confidence and zero remorse. Tell it to cite its sources.

    Ask it to double check every important data point against multiple sources,

    Ask it to provide a confidence rating on the accuracy of the information. This seems to trigger some statistical analysis neurons in the machine that help it call out its own BS before it gets further.

    Cross-check your work by putting content from one LLM into another.

    For example, take content from ChatGPT into Gemini.

    Tell Gemini “fact check this response from ChatGPT against live sources, verify all information via multiple sources, and cite references in your response to me”.

    It will go HAM looking for mistakes and they seem to love telling on each other.


    LLM Do’s & Don’ts

    Don’tDo
    Assume the first response is the best responseExpect to correct and prod the LLM a little bit
    Assume the LLM is 100% accurateCross examine. Ask the LLM to fact check itself.
    Assume the AI can read your mindProvide a detailed explanation of how you want the AI to answer your question (”Return Format”).
    Try to ‘one-shot’ solutions to complex multi-step problems in a single promptBreak up complex tasks into multiple discrete prompts that build on each other & chain them together
    Continue iterating forever in an endless conversation that blows through your optimal context windowAsk the LLM for a “handoff summary” or “transfer prompt” to start the conversation with a fresh context window
    Give the LLM too little context or too much contextFind the goldilocks window where you are giving all the information it needs, but not added fluff or redundant data.

    Imperfect Genius

    One of the biggest things about prompting success is mentality.

    Working with LLMs means working with an imperfect but incredibly powerful toolset that requires some persistence & effort to get that value back.

    AI can’t read your mind…yet. So until we all get neuro-linked, you have to articulate.

    LLMs are also Lazy by Design. They’re programmed to save energy and calibrated for low effort inputs. Be prepared to say “that’s good, but keep going”.

    Think of LLM tools like a PhD level intern: incredibly bright, fast learners, but sometimes lacking the common sense earned through experience.

    Working with a person like this, you would be aware of the need to provide a little more guidance, and that they might occasionally sound persuasive while being totally off-base.

    But given a few extra nudges, they can create some impressive results. Basically AI tends to be book smart, but not street smart. Proceed accordingly.

    So you should expect the first pass will often come back incomplete, ill-conceived, or simply half assed.

    This doesn’t mean the LLM is incapable, it means it wants you to confirm its on the right track before expending more energy. Which coincidentally is the same thing interns tend to do.


    Thanks for reading and hope you got something out of this!

    Next week, I’ll share some of the generative AI apps I follow and use every day.

    Have an AI application you want me to cover?

    Let me know in the comments or shoot me a DM.


    References