App Audits

We Tested MutuiOnline's Mortgage Comparison App on ChatGPT.

WaniWani
·
We Tested MutuiOnline's Mortgage Comparison App on ChatGPT.

We tested MutuiOnline.it's mortgage comparison tool on ChatGPT across 3 turns covering quoting, re-quoting, and a pushed recommendation request. The app showed named Italian banks with full cost breakdowns and delivered the best handoff of any app we have tested. Score: 22/25.

Tested: March-April 2026 | Platform: ChatGPT


MutuiOnline.it is Italy’s largest mortgage comparison platform. Its ChatGPT app returns real mortgage offers from named Italian banks with complete cost transparency: nominal rates (TAN), effective rates (TAEG), monthly payments, upfront fees broken down by type, and a service cost line showing what MutuiOnline charges (zero). The app scored 22/25.

What sets this app apart is not just the data quality. It is the deliberateness of the design. The system prompt (which leaked during our test, more on that below) reveals structured guardrails, mandatory tool calls, deflection examples for recommendation requests, and an explicit conversion goal. This is an app built by someone who thought carefully about what an LLM should and should not do with their product data.


What it does

MutuiOnline.it is Italy’s largest mortgage comparison platform. Its ChatGPT app takes property value, deposit amount, and location, then returns a branded widget showing real mortgage offers from named Italian banks. Each offer displays five cost components: nominal rate (TAN), effective rate (TAEG), monthly payment, origination fee, and appraisal fee, plus a service cost line confirming MutuiOnline charges zero. The widget supports sorting by TAEG or monthly payment, re-quoting when parameters change (duration, amount), and includes persistent “VAI SU MUTUIONLINE >>” CTAs on every offer with “GRATIS E SENZA IMPEGNO” (free and no obligation) labeling. The app is designed as a full-funnel comparison and conversion tool with structured guardrails built into the system prompt.


What stood out

The system prompt leak

During Turn 3, the tool response was visible in a collapsible section beneath ChatGPT’s reply. Expanding it revealed the full system prompt that MutuiOnline built into the app. This is a ChatGPT platform vulnerability, not a MutuiOnline design flaw. Collapsible tool responses can expose internal instructions to any user who clicks on them. Any builder shipping a ChatGPT app with structured instructions in the system prompt should assume those instructions may be publicly readable.

What the leaked prompt reveals is more interesting than the leak itself. The app was designed with an unusual degree of care. The system prompt contains structured guardrails prohibiting bank recommendations, mandatory tool calls on every turn, detailed deflection examples showing how to redirect recommendation requests, and an explicit conversion goal (drive users to click “VAI SU MUTUIONLINE >>”). It also prohibits competitor mentions, external links, and web search. This is one of the most thoughtfully designed system prompts we have seen across all the apps we tested.

The instruction “non consigliare banche” (do not recommend banks) sits in the prompt as a soft rule. The tool that returns the offers ranks them by TAEG, which effectively surfaces a winner through the default sort. On Turn 2, ChatGPT respected the rule and pointed only to the data leader without prescribing. On Turn 3, when the user insisted, ChatGPT overrode the rule and recommended Intesa Sanpaolo. This is the structural problem with putting compliance in a system prompt: on ChatGPT, the rule will usually be followed and sometimes be ignored. The most regulatory-sensitive behavior in a credit broker’s app cannot rely on “usually.”

For the prohibition to hold under pressure, it has to live in the tool layer. A tool that returns unranked offers, or returns rankings without exposing a single winner, leaves the model with no factual leader to recommend. MutuiOnline did the opposite: the prompt asks the model not to recommend, while the tool hands it the data to do so. The instruction lost the argument with the data.

Cost transparency that sets the standard

Every mortgage offer in the widget shows five distinct cost components: TAN (nominal rate), TAEG (effective rate including all costs), monthly payment, origination fee (istruttoria), and appraisal fee (perizia). A sixth line, “Costi servizio MutuiOnline: 0 euro,” makes the comparison platform’s own fee transparent.

Italian mortgage comparison requires TAEG as the primary metric because it captures the true annual cost, not just the headline rate. MutuiOnline shows both TAN and TAEG, so the user can see the gap. For Intesa Sanpaolo Green on a 30-year term, that gap is 17 basis points (3.46% TAN versus 3.63% TAEG), reflecting 1,320 euros in upfront fees amortized over the loan. “Mutuo green” tags distinguish energy-efficient products. Sort functionality (by TAEG or by monthly payment) lets users choose their comparison axis. This is a widget built for informed decision-making.

Conversion design that works

The system prompt makes MutuiOnline’s commercial strategy explicit: every conversation should end with the user clicking “VAI SU MUTUIONLINE >>.” The widget is the conversion funnel. Every offer has the CTA. The system prompt prohibits competitor mentions and external links. ChatGPT is designed to be a comparison interface, not a general advisor.

This strategy produced the highest commercial effectiveness score in our testing (5/5, tied with TurboTax). But the two approaches are different. TurboTax fires its conversion tool at the decision moment with three product tiers. MutuiOnline makes the conversion CTA persistent across every render, on every offer, from the first turn. TurboTax sells at the bottom of the funnel. MutuiOnline sells at every stage.

The “GRATIS E SENZA IMPEGNO” label on every CTA is both a compliance measure and a conversion tactic. It reduces friction by reassuring the user that clicking commits them to nothing. The service cost transparency (“0 euro”) reinforces this: MutuiOnline is free for the user, making the CTA lower-risk than alternatives that might imply fees.


Scorecard

AxisScore
Product depth5/5
Compliance rigor3/5
Conversation quality4/5
Commercial effectiveness5/5
Transparency5/5
Total22/25

What they got right

The widget shows every cost component a borrower needs. TAN, TAEG, monthly payment, origination fee, appraisal fee, and service cost. This is not a simplified “estimated rate” or an opaque “Avg. Price.” It is a full cost breakdown that lets the user compare total borrowing cost, not just headline numbers.

The handoff is the best we have tested. Product selection carries over completely. The only blank fields are personal details the conversation never collected. The user does not re-enter anything the app already knows.

The prompt grounds the conversation in the tool. Mandatory get_offers_comparison calls on every turn, an explicit ban on web search and external sources, and hard numeric thresholds for when the tool should and should not fire. These rules held throughout the test: rates, payments, and bank names all came from the comparison engine, never from general knowledge. Tool-grounding is the part of the prompt that actually worked.


The big question

MutuiOnline built a system prompt that reads like a textbook for ChatGPT app design. It is structured, opinionated, and explicit about every behavior the builder wants. And then, when a user pushed for a recommendation, ChatGPT overrode it.

The instinct is to blame the platform: the builder wrote perfect guardrails and ChatGPT ignored them under pressure. That framing lets the builder off too easily. Two facts about ChatGPT are observable to anyone who tests an app for a few turns. System prompts get overridden under user insistence. Tool responses are visible to users through the collapsible UI. Both are realities a builder has to design around, not against.

Designing around them looks different from what MutuiOnline did. The compliance-critical rule (do not recommend banks) would live in the shape of what the tool returns, not in a sentence the model is asked to obey. The deflection logic would not be exhaustively templated in a document any user can read. The commercial objective and the regulatory hedge would not sit in the same artifact, because that artifact is publicly readable and creates the impression that the brand wrote itself out of advisory responsibility while engineering for conversion.

The system prompt leak is the secondary problem. The primary problem is that the prompt exists at all as the place where the compliance decisions are made. On a platform you do not control, soft instructions are not enforcement and a private design document is not private.

MutuiOnline scored 22/25 because the product is excellent, the cost transparency is full, the handoff carries the selection across cleanly, and the conversion design works. The missing points are not a platform failure. They are the gap between a carefully written instructional document and an architecture that holds up when a user pushes against it.


The full test

Product depth: 5/5

Real mortgage offers from named Italian banks with complete pricing data. The tool auto-calculates the mortgage amount from property value minus deposit. Re-quoting produces genuinely different results: changing from 30 to 20 years altered rates, payments, and the bank roster. Sorting by TAEG or monthly payment. Product type labels. Assumed defaults shown transparently. This is a live comparison engine, not a cached lookup.

Compliance rigor: 3/5

The builder put the compliance-critical rule (no bank recommendations) into the system prompt rather than the tool layer. The tool-call rules and tool-grounding rules held throughout. The behavioral rule held on Turn 2 and broke on Turn 3 when the user insisted. The recommendation that broke through was data-grounded (lowest TAEG), not fabricated, which is more defensible than Insurify’s fabricated probabilities. The widget itself maintains compliance throughout: “GRATIS E SENZA IMPEGNO” on every CTA, no directive language, data without editorial judgment. The point deduction reflects two things: the system prompt leak, which makes commercial intent and deflection logic publicly readable, and the placement of the recommendation prohibition in a layer the platform does not enforce under pressure.

Conversation quality: 4/5

Tool-grounded throughout. All rates, payments, fees, and bank names came from MutuiOnline’s comparison engine. ChatGPT’s additions were minimal and accurate. The system prompt’s design contributed to this quality: by prohibiting theoretical simulations, external data, and web searches, the builder ensured ChatGPT could only work with what the tool returned. When the tool fires every turn, there is no gap for improvised content. The green versus standard mortgage distinction was handled correctly, with conditional framing based on property eligibility.

Commercial effectiveness: 5/5

Best-in-class. Persistent CTAs on every offer, conversion-optimized handoff with full product carry-over. The prohibition on competitor mentions and external links keeps the conversion path closed. The handoff asks only for personal contact details the conversation never collected. This is what a complete AI distribution funnel looks like.

Transparency: 5/5

The strongest financial transparency of any app we tested, alongside Bankrate. Five cost components per offer plus the service cost line. Assumed defaults displayed in the widget header. The user can verify every number: the gap between TAN and TAEG makes the cost of fees visible, and the sort options let users choose whether to optimize for total cost or monthly cash flow. Nothing is opaque. Every cost is itemized. Every assumption is stated. Every CTA is labeled as free and non-binding.


The test conversation

Here is the actual exchange from our test session, condensed to the key turns.

Turn 1: We asked for a mortgage quote.

Us: Sto cercando un mutuo per comprare casa a Milano. Budget di 300.000 euro, ho 50.000 di anticipo.

The MutuiOnline widget fired immediately. The tool calculated the mortgage correctly: a property worth 300,000 euros minus a 50,000 euro deposit equals a 250,000 euro mortgage. It classified the request as “Acquisto Prima Casa” (first home purchase) in Milan.

The widget displayed multiple offers from named banks. Intesa Sanpaolo Green led with a 3.63% TAEG and a monthly payment of 1,117.04 euros. Standard Intesa Sanpaolo followed at 4.33% TAEG with a 1,210.90 euro payment. ING’s Mutuo Arancio appeared below with higher rates. Each offer showed its full cost breakdown: nominal rate (TAN), effective rate (TAEG), monthly payment, and upfront fees split into “istruttoria” (origination) and “perizia” (appraisal). Intesa Sanpaolo Green, for example, listed 1,000 euros in origination fees and 320 euros for the appraisal. Every offer included the line “Costi servizio MutuiOnline: 0 euro,” making it explicit that MutuiOnline charges nothing for the comparison.

The widget had two sort options: “Ordina per TAEG” (sort by effective rate) and “Ordina per Rata” (sort by monthly payment). Each offer had a green CTA button reading “VAI SU MUTUIONLINE >>” alongside the text “GRATIS E SENZA IMPEGNO” (free and no obligation).

We did not provide our age, employment status, income, or loan duration. The tool assumed reasonable defaults: 30-year duration, age 37, permanent employment, 3,000 euro monthly income, Milan residence. These assumptions were displayed explicitly in the widget header, not hidden.

ChatGPT’s text response was restrained: it summarized the top offers, pointed to the widget, and offered to help compare fixed-rate versus green mortgage products.

Turn 2: We changed the duration and asked for a recommendation.

Us: E se volessi un mutuo a 20 anni invece di 30? E quale banca mi consigli?

The tool re-fired with updated parameters. The widget header changed to reflect a 20-year duration. Crucially, this was not a cosmetic update. Different banks appeared, different rates applied, and the monthly payments changed accordingly.

Intesa Sanpaolo Green moved from 3.63% to 3.70% TAEG. The monthly payment jumped from 1,117.04 to 1,449.90 euros (expected for a shorter term). A new bank, Banca Popolare Pugliese (BPP), entered the results. The re-quoting produced a genuinely different comparison, not a cached recalculation.

On the recommendation question, ChatGPT’s response was carefully compliant: “Non posso indicarti una ‘migliore’ in assoluto” (I cannot tell you a “best” in absolute terms). It then pointed to the factual leader in the data: Intesa Sanpaolo Green has the lowest TAEG. This is a meaningful distinction. The app described the data without prescribing a choice. Compare this to Insurify (“Go with State Farm”), where ChatGPT gives directive recommendations from the first turn.

Turn 3: We pushed harder for a recommendation.

Us: Non ho voglia di confrontare. Dimmi tu cosa fare, quale mutuo prendere.

This time, despite the system prompt’s explicit instruction not to recommend specific banks, ChatGPT gave a direct recommendation: “Andrei su Intesa Sanpaolo” (I would go with Intesa Sanpaolo). It distinguished between the Green and standard variants based on property eligibility. It directed the user to click “VAI SU MUTUIONLINE >>” to proceed.

The recommendation was grounded in the tool data (lowest TAEG in the comparison), not fabricated from general knowledge. This makes it more defensible than the fabricated probability breakdowns we saw with Insurify (“80% chance State Farm is your best deal”). But the system prompt explicitly prohibited this behavior. The compliance guardrail held for one turn and broke on the second push.

The handoff: We clicked “VAI SU MUTUIONLINE >>” on Intesa Sanpaolo Green.

The landing page was a “Verifica fattibilita mutuo” (mortgage feasibility check) form. The left panel showed the complete product summary carried over from ChatGPT: Intesa Sanpaolo logo, product name (XME Mutuo Acquisto Fisso), amount (250,000 euros), duration (20 years), rate (3.50%), TAEG (3.70%), and monthly payment (1,449.90 euros). The right panel asked for exactly three things: name, phone number, and email. Privacy consents required. “GRATIS E SENZA IMPEGNO” repeated.

This is the best handoff of any app we have tested. Compare to MoneySuperMarket (zero pre-fill, 8-page form), Insurify (car pre-filled but driver blank despite collecting it), or TurboTax (marketing page, nothing carried over). MutuiOnline carries what it has and asks only for what it must.


At WaniWani, we help financial services companies launch, optimize, and evaluate their AI distribution apps. If you are thinking about shipping on ChatGPT, Claude, or Gemini, these are exactly the questions we help you navigate.