- Project Flux
- Posts
- OpenAI's latest models: o3, o4-mini, and GPT-4.1
OpenAI's latest models: o3, o4-mini, and GPT-4.1
ChatGPT remembers your workflows—plus OpenAI's latest models, Google's AI advancements, robotics breakthroughs, and enterprise AI applications.

Proudly sponsored by ConstructAI, brought to you by Weston Analytics.
Morning Project AI enthusiasts, Your stories for this week:
OpenAI's new models and tools (o3, o4-mini, GPT-4.1)
Gemini's image segmentation capabilities
Google's legal challenges
Enterprise AI adoption trends
Flux check-in
OpenAI's latest models: o3, o4-mini, and GPT-4.1
OpenAI has released a host of new models and tools, including GPT-4.1, new reasoning models o3 and o4-mini, and Codex CLI. These models represent significant advancements in AI capabilities, with improved reasoning, tool use, and performance across various tasks. The new models can agentically use and combine every tool within ChatGPT—including web search, Python analysis, visual reasoning, and image generation.
The details:
Model capabilities: For the first time, OpenAI's reasoning models can agentically use and combine every tool within ChatGPT—including searching the web, analysing uploaded files with Python, reasoning about visual inputs, and generating images.
Pricing details: o3 is priced at $10/million input tokens and $40/million for output tokens, with a 75% discount on cached input tokens. o4-mini is more affordable at $1.10/million for input and $4.40/million for output.
Performance metrics: The models feature a 200,000 token context window, 100,000 max output tokens, and a May 31st, 2024 training cut-off (same as the GPT-4.1 models).
New feature: A new capability is that the OpenAI API can now optionally return reasoning summary text, providing more transparency into the model's thinking process.
Why it matters for project delivery professionals
For Project Managers, these advancements represent a significant opportunity to enhance project delivery workflows. Project Managers can leverage these models to automate documentation, risk assessment, and status reporting.
Software Developers will benefit from improved code generation and debugging capabilities, potentially reducing development cycles by 30-40%. System Architects can integrate these models into existing workflows to create more intelligent systems that adapt to changing requirements with minimal human intervention. Teams should evaluate how these capabilities might reshape their technical roadmaps and skill development priorities.
Rabbit Hole:
Gemini's image segmentation breakthrough
Google has quietly added a powerful new feature to the Gemini 2.5 series: image segmentation capabilities. Beyond just generating 2D bounding boxes of relevant subjects, Gemini 2.5 models can now create detailed segmentation masks. This represents a significant advancement in computer vision capabilities at an impressively affordable price point.
The details:
Key feature: For image inputs, Gemini 2.5 models can now be instructed to generate not only 2D bounding boxes of relevant subjects but also create detailed segmentation masks.
Pricing details: Using Gemini 2.5 Flash in non-thinking mode, segmenting an image costs just 0.0119 cents (just over 1/100th of a cent) , making this advanced capability extremely affordable.
Technical implementation: The models likely use a tokenizer with specialized vocabulary entries that represent coordinates in normalized image-space and codewords used by a lightweight referring-expression segmentation vector-quantized variational auto-encoder (VQ-VAE).
Developer tools: A browser-based JavaScript tool has been created that talks directly to the Gemini API, allowing developers to experiment with this new capability without complex setup.
Why it matters for project delivery professionals
For Engineers and Software Developers working on visual data projects, this breakthrough provides an affordable way to implement advanced computer vision capabilities with minimal overhead. Project Managers can now include sophisticated image analysis in project scopes without significant budget increases. Technical Architects can design systems that leverage these capabilities for quality control, visual inspection, and automated documentation of physical assets. Development teams should consider how these affordable segmentation capabilities could enhance existing projects or enable entirely new features that were previously cost-prohibitive.
Rabbit Hole:
Together with Cogram
Power your construction bids with AI

Cogram’s AI-assisted RFP bidding tool writes tailored RFP proposals in minutes instead of weeks.
Automatically extract key details from the RFP — including scope, submission requirements, deadlines, and evaluation criteria — to easily make a go/no-go decision.
Cogram’s AI will then reference your firm’s knowledge base and past proposals to draft tailored proposals within minutes.
Use AI-assisted editing tools to review, cross-check data, and make improvements remarkably fast.
IntuiCell’s “digital nervous system” – promise, provocation, or paradigm‑shift?
Here’s a breakthrough we all may have missed. On 19 March 2025, Lund‑born startup IntuiCell wheeled out Luna, a quadruped robot whose control stack is not a pre‑trained neural net like the frontier AI we preach about, but a “digital nervous system” that learns on the fly, in the real world, much like a puppy finding its feet.
The company’s pitch is audacious: throw away frozen models and brute‑force data‑centres; graft adaptive circuitry directly onto hardware so perception, decision‑making and motor control co‑evolve in real time. Early demos show Luna wobbling, falling, then self‑stabilising in minutes without simulation or scripted rewards.
If that scales, the implications are systemic. Robotics today is bottlenecked by brittleness: models break when the world drifts. A nervous‑system architecture could—in theory—yield machines that generalise across factories, mines, even planetary surfaces without retraining. In construction (where we still move bricks the way Vitruvius did) the prospect of site‑aware, self‑teaching robots is tantalising.
🍪 Food for thought
Energy economics – biological nervous systems run on watts; Luna currently gulps batteries. Can IntuiCell close the metabolic gap before the hype cycle cools?
Safety and auditability – emergent intelligence is exciting until it improvises into risk. How will regulators inspect a policy that never sits still?
Moats – the firm cites neuroscience patents, but the ingredients (spiking nets, neuromorphic chips, local learning rules) are not secret. Giants from Nvidia to Boston Dynamics could replicate the recipe overnight.
In other words, the tech is either a leap beyond the deep‑learning continuum or simply the next logical gradient step. Watch for two milestones this year: (a) a peer‑reviewed benchmark showing transfer across radically different tasks, and (b) a paid pilot in a messy industrial setting. Until then, IntuiCell’s digital nervous system remains a thrilling hypothesis—an electric whisper hinting that our machines might soon grow minds of their own.
Rabbit hole
Enterprise AI adoption accelerates
Enterprise AI adoption continues to accelerate as companies transform how they approach everything from engineering to design. LinkedIn's engineers are increasingly taking on design responsibilities, leveraging AI tools to bridge traditional role boundaries. This shift represents a broader trend of AI reshaping organisational structures and workflows across industries.
The details:
Organizational transformation: Traditional role boundaries are blurring as AI tools enable professionals to expand their capabilities beyond conventional job descriptions.
Investment trends: Major investment leaders from General Catalyst, KKR, Benchmark, and Fidelity are actively discussing the future of data centres, chips, and AI applications as enterprise adoption grows.
Implementation challenges: Companies are finding that generic, one-size-fits-all AI training approaches are ineffective, with successful organisations tailoring AI implementation to departmental needs and workflows.
ROI metrics: Organisations are increasingly measuring AI success not just by technical performance but by practical business outcomes and organizational readiness.
Why it matters for project delivery professionals
For Project Managers, this shift demands new approaches to team composition and skill development, as AI increasingly blurs traditional role boundaries. Software Developers and Engineers need to expand their capabilities beyond coding to include design thinking and user experience considerations. Technical Architects should focus on creating flexible systems that enable cross-functional collaboration rather than reinforcing siloed responsibilities. Project delivery teams that successfully integrate AI tools while addressing the organisational and human aspects of adoption will gain significant advantages in delivery speed, solution quality, and team satisfaction.
The Rabbit Hole
The pulse check
Tip of the week
When new AI models come out, they claim to be the smartest SOTA AI on the market. But how do they get to these claims? See our short guide to AI intelligent benchmarks below:
MMLU – Tests broad academic knowledge & reasoning (57 subjects).
MMMU – Assesses multimodal understanding with college-level images + text.
AIME – Measures deep mathematical problem-solving (high school Olympiad level).
GPQA – Evaluates true STEM reasoning; questions are designed to be un-Googleable.
MMLU-Pro – A tougher, less ambiguous version of MMLU.
Mobile-MMLU – Checks efficiency & accuracy in mobile-relevant tasks.
MMMU-Pro – Pushes AI with more complex vision-language challenges.
SWE-bench – Real GitHub issue fixing, tested with unit tests.
SWE-bench+ – Cleaned up SWE-bench for fairer model evals.
Multi-SWE-bench – Code repair across 7 programming languages.
SWE-Lancer – Simulates freelance coding gigs, mixing code + client understanding.
Why it matters: These aren’t just tests—they’re windows into how close (or far) AI is from real-world competence.
Trending tools
Gemini API Image Mask Visualization: A browser-based tool for experimenting with Gemini’s segmentation mask capabilities.
LLM Pricing Calculator: Compare costs across different AI models to optimise your implementation budget.
Codex CLI: OpenAI’s new lightweight coding agent that runs in your terminal.
Gamma: Create customised training materials for different teams using AI.
Governance
In April 2025, the White House issued new directives to accelerate AI adoption across U.S. federal agencies, marking a shift towards more proactive integration of AI technologies in government operations. The Office of Management and Budget (OMB) released memoranda M-25-21 and M-25-22, which outline policies for AI use and procurement, respectively.
Key mandates include the appointment of Chief AI Officers (CAIOs) in each agency to spearhead AI initiatives and the development of agency-specific AI strategies within 180 days. These strategies aim to identify and remove barriers to AI adoption, promote innovation, and ensure responsible use. Agencies are also required to conduct AI impact assessments for high-impact systems, focusing on areas such as civil rights, privacy, and public safety. The guidance emphasises the importance of transparency, accountability, and public trust in the deployment of AI technologies.
Additionally, the policies encourage the use of American-made AI solutions and the sharing of AI resources and data among agencies to foster collaboration and efficiency. The establishment of Agency AI Governance Boards is mandated to oversee AI implementation and ensure compliance with the new directives. These boards will play a crucial role in coordinating AI efforts across the federal government and promoting best practices.
This initiative represents a significant step in the U.S. government’s approach to AI, aiming to balance rapid technological advancement with the need for ethical considerations and risk management. The full text of the memoranda can be accessed via the White House’s official website.
Other things we’re loving
OpenAI's secret social network: OpenAI's future may stretch beyond frontier models to a social network riding ChatGPT's wave of success, potentially providing valuable real-time data for model training. Read more
AI is a two-horse race: Stanford's AI Index Report shows the AI race is increasingly a two-way competition between US and China, with Europe and other countries falling behind. Read more
ChatGPT goes social: OpenAI is developing an X-like social network with an internal prototype that focuses on ChatGPT's image generation capabilities and includes a social feed. Read more
Applied AI transforms LinkedIn engineers: LinkedIn's engineers are increasingly taking on design responsibilities, leveraging AI tools to bridge traditional role boundaries in a significant shift for professional roles. Read more
Master ChatGPT Prompting: New course offers comprehensive training from basic prompts to expert-level strategies for maximising ChatGPT's capabilities across various use cases. Read more
Claude Now Less Assistant More Strategic Partner: Anthropic has repositioned Claude as a strategic partner rather than just an assistant, emphasising its ability to handle complex business problems. Read more
Maximising Your LLM ROI: Turing's white paper reveals how misaligned training, poor evaluation, and optimisation gaps can quietly sabotage ROI in large language model implementations. Read more
OpenAI's dev-focused GPT-4.1: OpenAI has released a developer-focused version of GPT-4.1 with enhanced coding capabilities and improved API integration options. Read more
From Teammate AIs to Creative Tools: Harvard Business School study shows individuals using GenAI matched the performance of traditional two-person teams with less time and more engagement. Read more
Kling AI drops new video and image models: Chinese AI startup Kling AI has released major upgrades to its creative suite with KLING 2.0 Master for video and KOLORS 2.0 for images. Read more
Build a personal data analyst with n8n automation: Learn to create n8n workflows that analyze your data from spreadsheets, databases, or other sources, and deliver insights directly to your inbox. Read more
AI models play detective in Ace Attorney: UC San Diego researchers tested leading AI models on their ability to play Phoenix Wright: Ace Attorney, with OpenAI's o1 and Gemini 2.5 Pro performing best. Read more
Electricity demand for data centres projected to increase: IEA forecasts a 137% increase in electricity consumption for data centers by 2030, largely driven by AI development in the US and China. Read more
Meta's exposure to the US-China trade war: Up to 10% of Meta's global revenue comes from Chinese online ad clients, making the company vulnerable to trade war impacts. Read more
Cohere releases Embed 4: Cohere's new multimodal embedding model features 128K context length, support for 100+ languages, and up to 83% savings on storage costs. Read more
Nvidia faces $5.5 billion hit from export restrictions: Nvidia disclosed in a filing that it expects to take a $5.5 billion hit from U.S. export license requirements for shipping its H20 AI chips to China. Read more
Community
The Spotlight Podcast

Why AI Models Are Choking — And How to Fix It
In this episode of the Project Flux podcast, we speak with JB Baker, the VP of Marketing and Product Management at ScaleFlux, about the critical role of storage in AI infrastructure. They discuss the importance of efficient data pipelines, the challenges posed by the memory wall, and the revolutionary concept of computational storage. JB emphasises the need for businesses to consider the full impact of AI deployment on their infrastructure, including storage and memory technologies. The conversation also touches on sustainability in AI, the future of quantum computing, and the potential of decentralised AI systems.
One more thing

That’s it for today!
Before you go we’d love to know what you thought of today's newsletter to help us improve The Project Flux experience for you. |
See you soon,
James, Yoshi and Aaron—Project Flux
