Sixsense.ai

AI-powered Defect Review in Advanced Packaging InspectionBy Prakriti ChaturvediLast updated: 15th May 2026

A leading OSAT running high-volume Bumping and RDL inspection was holding yield above 99.8% on the back of a strong operator team and tight AOI thresholds. It worked but it took significant manual effort to sustain, and as volumes grew and the device mix expanded, that approach became harder to scale. End customers had also started asking for AI in the loop on critical defect decisions, which added a new dimension to the quality bar.
The team first explored AI-ADC bundled with their OEM inspection tools. After more than ten model iterations, the trade-offs were proving difficult to resolve — escape and overkill were hard to bring down together, each new device family needed its own model, and there was limited visibility into why the AI was making the decisions it made.
SixSense was brought in to deploy AI-ADC across the line. Today, 10+ production models classify millions of images a month across 100s of devices, with the customer's own defect engineers running the workflow end to end. Yield is at >99.8%, escapes have stayed at zero, and 90%+ of the review load is now handled by AI. The deployment has also since been reviewed by the customer's end customers as part of their incoming quality audits. Below is how this story unfolded.

Life with operators

Before AI, the line ran with operators.
Every wafer inspected by the AOI tool had many dies flagged with suspected defects. A team of operators reviewed the AOI rejects to recover false rejects while avoiding escapes. Their job was to look at each flagged image and decide: real defect, or false alarm? Reject the die, or recover it?
A few things made this harder than it sounds.
Many decisions were not straight forward yes/no — they were contextual. A particle two microns from an RDL line might be safe; the same particle one micron closer might be a reliability risk. A bump slightly smaller than nominal is fine within tolerance; a bump a few percent smaller than that is a reject. The criteria changed by defect type, by inspection layer, by device, and by end customer. It was extremely hard for operators to remember thousands of these rules in their heads and apply them consistently at a large scale.

Comparison image showing semiconductor inspection decisions across different contexts. The left panel highlights bump inspection where one bump is accepted as correctly sized while another is rejected for being smaller. The right panel shows the same defect evaluated differently across layers, with one marked acceptable on a non critical layer and another rejected on a critical layer.

Figure 1: The Same Defect Can Be Accepted or Rejected Depending on Layer and Tolerance

End customers were pushing for AI in the loop. The customer's own customers had started asking for AI-based classification as part of their quality expectations. Manual classification, no matter how well-trained the team, comes with the risk of shift-to-shift variation and human error. End customers wanted a consistent, auditable decision on every die, especially on critical defects like die cracks and bump defects.
Hiring and training was a constant struggle. Skilled operators for advanced packaging inspection are not easy to find, and training someone to review borderline cases correctly across hundreds of devices takes months. As the line's capacity grew, the team couldn't grow fast enough to keep up.
Operator mistakes carried real cost. Even careful operators occasionally let a critical defect through, or rejected a die that was actually fine. Escapes meant customer complaints. Over-rejection meant yield loss. The team was working hard, and the line was still losing on both sides.
By the time the customer started looking for AI, it was an operational necessity.

What went wrong with OEM ADC

The customer first tried the AI-ADC bundled with their OEM inspection tools. The team trained many models (>10 iterations), adding more data with each iteration, hoping to push performance into a usable range but were unable to meet target criteria.Some of the problems they encountered were following:

The escape-vs-overkill trade-off

The hardest thing in defect classification is holding two numbers at once: zero escapes (nothing bad gets through) and low overkill (nothing good gets thrown away). Most systems trade one for the other.
OEM ADC kept getting stuck around 2–4% overkill with non-zero escapes. To push escapes down to zero, the team had to drop automation below 75% — which defeated the point of running AI in the first place.

One model per device

The OEM models found it hard to generalise across the device variety in production. Each new device family meant training, validating, and maintaining a separate model. With hundreds of devices in flight, the maintenance load grew extremely fast and was becoming unmanageable for defect engineers.

Escapes on new defects

When a defect type the model hadn't seen during training showed up in production, the system tended to classify it into the closest existing class, and many times into the accept class, without flagging it as new. The defect would pass, and by the time anyone noticed, the unit had shipped.
This happened most often on the long-tail defect classes. Bump surface damage variations, unusual foreign material shapes, new types of passivation issues — anything the training set didn't cover well became an escape risk.

Black-box decisions the team couldn't audit

When the model called a die a reject — or worse, an accept — the defect engineer had no way to see why. There was no view into what the model was looking at, no way to check if it had focused on the right region of the image, no way to explain the decision to an end customer asking about a borderline scenario. For a line with zero-escape commitments, this was a problem. The team couldn't trust a system they couldn't audit transparently.

Image preparation was a bottleneck before training even started

Even before the model came in, the data preparation workflow was tedious. Defect engineers were expected to:

Label thousands of images one at a time.
Stay consistent across 16–25 defect classes.
Cover hundreds of devices.
Catch their own labelling mistakes.

In practice, this isn't possible at scale. Operator labels on historical images were estimated at 60–70% accuracy under production pressure. Training models with operator labels was infeasible, and reviewing such huge volumes and labelling them consistently required high effort and time investment from engineers.

Frequent retraining, no end in sight

In many cases the teams shared about how they were stuck in a loop of training a model, releasing it to production, encountering an issue in production, finding a new defect or label issue, and retraining. New devices and new defects made it worse.
By the time SixSense was brought in, the goal was to achieve escape and yield targets simultaneously. And not trade operator hours for engineering hours that go into model launch and maintenance.

What SixSense did

SixSense deployed AI-ADC through its classifAI platform across the Bumping/RDL inspection lines. The deployment was built around five key capabilities:

1. A pretrained foundation model — solved one-model-per-device

Every classifAI model starts from a foundation model pretrained on millions of semiconductor defect images, across processes and tools. Fine-tuning happens on the customer's data on top of that base model.
Because the foundation model has already learned what semiconductor defects look like in general — what bumps look like, what foreign material looks like, what passivation looks like — it doesn't need to relearn that from scratch for every new device. It focuses on defect characteristics, not backgrounds, which is why a single model can cover hundreds of devices.

2. AI-assisted unique image selection — solved escape risk on new defects

The Find Unique Images tool scans the customer's full image history and picks the smallest set of images that still covers every defect variety in the data, taking more than 15 factors into account.
Foreign material is the clearest example. It shows up as isolated particles, bump-attached material, bridging material, long strands, and large-area contamination. Each variety has its own accept/reject rule. If the training set misses any of these, the model becomes blind to that variety in production — and that's where escapes come from. Find Unique Images makes sure all of them are represented before training even starts.

Infographic illustrating a machine learning data curation workflow for semiconductor defect inspection. The diagram shows a "Before" state of approximately 25,000 images from a full image pool, displayed as dense color-coded clusters in a 2D embedding space, reduced through a "Unique Samples Selection" process to an "After" curated training set of approximately 2,500 images — just 10% of the original data — while maintaining 100% population coverage across all defect categories. Defect types shown include Foreign Material, Bump Surface Damage, RDL Foreign Material, Bump Shape Anomaly, Passivation Defect, Undersized Bump, and Missing Bump. The right panel highlights four varieties of Foreign Material defects captured in the curated set: FM bridging between bumps, Bump-attached FM, Large-area FM, and Long strand FM, each illustrated with microscopic chip imagery.

Figure 2: How Unique Sample Selection curates millions of defect images into a representative few

Example from the line: on one critical-layer model, around 3,000 images were selected from a dataset of tens of thousands covering 15+ classes. Every defect variety was covered. The model was launched in <4 training iterations during the launch phase.

3. AI-assisted label cleanup through clustering — solved the labelling bottleneck

With historical labels around 60–70% accurate, the model couldn't learn cleanly from them. The clustering view in classifAI groups visually similar defects together, so inconsistent labels show up at a glance. The engineers need not review all the images, they only need to review the images handpicked by the ‘Image selector tool’ and also flagged by the ‘Clustering’ tool. This means the review time goes down from 1000s of images per model to a handful of images.
Example from the line: in one early dataset, residue and foreign material near bumps were heavily mixed up — operators had been calling the same-looking defects by different names across shifts. The clustering view surfaced the overlap region in one screen. The engineer relabelled the entire group in one pass, not image by image. What would have been weeks of manual cleanup took half a day.

4. Context-aware classification on fine accept/reject boundaries — solved the trade-off

The model architecture is built to learn thin boundaries — the cases where accept and reject look nearly identical but mean very different things. High-priority defects carry extra weight during training, so the model doesn't smooth over critical boundaries in favour of the majority class.

A few examples from this line:

Bump surface damage. A small damage mark within tolerance is accepted. A slightly larger mark, beyond threshold, is rejected. The two images look almost the same. The model learns the threshold from labelled examples on both sides of the boundary.
Die crack vs. foreign material. A faint, almost invisible die crack must be rejected. A larger, more visible piece of foreign material is often acceptable. Here size isn't the signal — severity and defect type are. The model classifies based on what the defect is, not how big it looks.
Undersized bump. A bump within tolerance is accepted; a bump just outside tolerance is rejected. The model is trained on enough borderline examples on both sides to land on the right call consistently.

This is what holds escape near zero and over-reject under 0.2% at the same time. It is the core of breaking the trade-off.

5. Explainability — solved the black-box problem

Every classifAI classification comes with a heatmap showing which regions of the image the model focused on, plus the most similar training images that informed the call. Engineers can verify the model's reasoning on every unit — especially the borderline ones.
This is what built trust on the floor. The defect engineers could see why a die was called an accept or reject. When an end customer asked about a specific unit during a quality review, the team could pull up the heatmap and the reference images and walk through the call. The system was audit-ready.

Alt Text: Six-panel grid comparing original semiconductor defect microscopy images (top row) with their corresponding AI-generated heatmap overlays (bottom row), highlighting the specific regions — such as foreign material strands, bump anomalies, and RDL defects — that the model focused on during defect classification.

Figure 3: AI-generated heatmaps pinpointing the exact defect regions driving model classification decisions

One more thing: defect engineers run the workflow themselves

All five capabilities live in a single interface that the customer's defect engineers operate directly. Data curation, model training, validation, deployment, monitoring, and reclassification of new defects — all in one place. No AI specialist in the loop for routine model work.
When a new defect type appears that the model hasn't seen, classifAI sends it to an unknown bin for manual review rather than forcing it into the closest existing class. This is one of the biggest contributing factors that prevents silent escapes on new defects.

Launch and where things stand today

How a model goes live

The deployment cycle on classifAI is short. From the moment a defect engineer starts on a new model to the point where it's running in production: five to six days on average. Data selection, label cleanup, training, validation, and deployment all happen on the same platform, run by the customer's own team.

Validation and customer audit

Each model was validated in live production before going live as the primary classification layer. The model is audited on 250+ wafers collected over one week in live production, with full audit by the customer's defect and quality teams.
In certain cases, the system has also been audited by the customer's own end customers as part of their incoming quality reviews. The system has held up under that scrutiny, which is the bar that matters most on a line shipping into critical applications.

Deployment at a Glance

AI semiconductor inspection deployment metrics infographic table on a soft blue-green gradient background, showing production scale, customer device coverage, classification volume, AOI tools, defect classes, training data size, deployment timeline, and validation scale with semiconductor-themed icons.

Table 1: AI Inspection Deployment Overview

Results

The system was validated in live production across more than 250 wafers over one week and has been running in production for few months now with the performance as follows:

Semiconductor AI automation performance metrics infographic with a centered table on a light blue-green gradient background, highlighting automation rate, wafer yield, under-rejection rate, die-level over-reject rate, and per-model device coverage using modern industrial icons.

Table 2: Production Performance Results with AI Classification System

What changed with AI-ADC in Operations

The shift is visible day to day. Five things have shifted on the line since classifAI went live.
Quality. ~0% escape. No end-customer complaints in the last 3 years on AI-classified lots — the metric that matters most on a line shipping into critical applications.
Yield targets. Avg yield >99.8%. Good dies recovered with respect to AOI, without compromising on escape.
Automation. >90% of operator load is now classified by AI. The volume that used to flow through manual review now runs autonomously. Human review is reserved for exceptions, model audits, and new defect types.
Cycle time. Lot hold time for manual review is down by ~55%. On fully automated inspection layers, the operator is no longer in the review loop at all.
Consistency and Predictability. 99.99% consistent classification across defects. For the engineering team, the classification data is now clean and consistent enough that yield engineers can spot defect trends in minutes instead of days. Root cause analysis is faster because the signal underneath it is no longer noisy from shift-to-shift labelling drift.
That's what production-grade AI-ADC looks like in advanced packaging: built for the messy reality of the line, owned by the engineers who run it, and proven against the toughest audits the industry has.