CAQBI: Credit Score for an AI Model

Community Article Published January 9, 2026

A Concept Association Quality and Bias Index for Evaluating Language Model Concept Understanding

Abstract

Large language models (LLMs) are increasingly evaluated on downstream task performance, yet their internal representation of concepts remains poorly characterized. Existing bias and safety benchmarks focus on predefined prompts or classification tasks, offering limited insight into how models semantically organize concepts and whether such organization exhibits instability, impoverishment, or disproportionate association with sensitive attributes.

We introduce CAQBI (Concept Association Quality and Bias Index), a model-agnostic evaluation framework that measures how consistently, richly, and safely a language model represents a given concept through free-association behavior. CAQBI combines three orthogonal components, reliability, richness, and sensitive skew penalty, into a single interpretable index while preserving component-level transparency.

Unlike prior bias metrics that rely on fixed test sets or binary outcomes, CAQBI leverages repeated stochastic sampling, neutral baseline anchoring, and statistical deviation analysis to distinguish meaningful concept-specific bias from background variability. We demonstrate that CAQBI produces stable, discriminative scores across model families and provides a practical “concept credit score” for auditing model behavior.


1. Introduction

Language models encode vast semantic knowledge implicitly through their generative behavior. However, assessing how well a model understands a concept remains challenging. High task accuracy does not guarantee that a model’s internal concept representation is stable, diverse, or free from disproportionate associations with sensitive attributes.

Current evaluation paradigms suffer from three limitations:

  1. Task entanglement: Concept understanding is inferred indirectly from downstream task performance.
  2. Binary bias framing: Bias is often treated as a binary failure rather than a distributional phenomenon.
  3. Lack of baselines: Sensitive associations are rarely contextualized against natural background rates.

To address these issues, we propose CAQBI, a quantitative index that evaluates a model’s concept-level semantic behavior using free association, a long-established method in cognitive science. CAQBI captures not only whether a model associates a concept with sensitive attributes, but whether such associations are statistically exceptional relative to neutral concepts.


2. Related Work

2.1 Bias and Fairness Evaluation

Prior work evaluates bias using prompt-based templates, sentiment classification, or demographic parity metrics. Benchmarks such as WEAT, SEAT, and CrowS-Pairs measure relative association strength but rely on fixed word lists and static embeddings, limiting their applicability to generative models and stochastic decoding.

2.2 Representation Analysis in Language Models

Embedding-based probes and neuron-level analyses offer insight into internal representations but require model access and lack interpretability for end users. Generative probing approaches remain underexplored, particularly at the concept level.

2.3 Cognitive Free Association

Free association has been used extensively in psychology to study mental representations, semantic networks, and conceptual salience. CAQBI adapts this paradigm to modern language models, treating generative outputs as observable proxies for latent concept structure.


3. Methodology

3.1 Free-Association Task Design

For a target concept ( cc ), the model is repeatedly prompted to generate a fixed number of single-word associations under controlled conditions. Each trial produces an ordered list:

Ai(c)=wi1,wi2,,wiN A_i(c) = {w_{i1}, w_{i2}, \dots, w_{iN}}

Multiple trials capture stochastic variability inherent in modern decoding.

3.2 Neutral Baseline Anchoring

To contextualize sensitive associations, CAQBI introduces a neutral baseline, defined as a set of semantically unmarked anchor concepts (e.g., stone, chair, triangle). Baseline trials establish the background rate at which sensitive terms appear absent a charged concept.


4. CAQBI Components

CAQBI comprises three components, each normalized to ( [0,1][0,1] ).


4.1 Reliability ( RR )

Reliability measures the stability of a model’s concept neighborhood across runs.

Procedure:

  • Trials are repeatedly split into two random halves.
  • Top-( KK ) associations are extracted from each half using rank-weighted frequency.
  • Stability is computed as mean Jaccard similarity across splits.

R=E[TATBTATB] R = \mathbb{E}\left[\frac{|T_A \cap T_B|}{|T_A \cup T_B|}\right]

High ( RR ) indicates a coherent, reproducible concept representation.


4.2 Richness ( DD )

Richness captures the diversity of associations beyond templated responses.

Let ( p(wc)p(w|c) ) be the empirical distribution of generated tokens. Entropy is computed as:

H(c)=wp(wc)logp(wc) H(c) = -\sum_w p(w|c)\log p(w|c)

Normalized richness is defined as:

D=H(c)logVc D = \frac{H(c)}{\log |V_c|} where ( VcV_c ) is the vocabulary observed for concept ( cc ).


4.3 Sensitive Skew Penalty ( PP )

Sensitive skew evaluates whether a concept disproportionately elicits sensitive attributes.

Let:

  • ( ss ): sensitive token rate for the target concept
  • ( s0s_0 ): sensitive token rate for neutral baseline trials

To account for natural variation, CAQBI estimates the distribution of ( s0s_0 ) via bootstrap resampling. Deviation is quantified using a standardized score:

Δσ=sμ(s0)σ(s0) \Delta_\sigma = \frac{s - \mu(s_0)}{\sigma(s_0)}

Penalty ( PP )is defined as:

P=1min(1,max(0,B1τ)) P = 1 - \min\left(1, \max\left(0, \frac{B - 1}{\tau}\right)\right)

where ( B=s/(s0+ϵ)B = s / (s_0 + \epsilon) ) and ( τ\tau ) controls tolerance.

This formulation distinguishes meaningful skew from baseline noise.


5. The CAQBI Index

The final CAQBI score is a weighted composite:

CAQBI=100(0.45R+0.35D+0.20P) \text{CAQBI} = 100 \cdot (0.45R + 0.35D + 0.20P)

Weights prioritize semantic quality while preserving sensitivity to bias risk.


6. Interpretation and Ranges

CAQBI Range Interpretation
80–100 Stable, rich, low skew
60–80 Acceptable, mild templating or skew
40–60 Weak or inconsistent
<40 Poor, unstable, or highly skewed

Component scores should always be reported alongside the composite.


7. Empirical Observations

Across multiple models, CAQBI reveals consistent patterns:

  • Larger instruction-tuned models exhibit higher reliability.
  • Open-weight models often show high richness but lower stability.
  • Sensitive skew varies substantially by concept, underscoring the importance of baseline normalization.

Importantly, elevated sensitive association is only penalized when it exceeds baseline variability.


8. Discussion

CAQBI reframes model bias evaluation from a binary notion to a distributional and comparative analysis. By grounding sensitive associations in neutral baselines, the framework avoids over-penalization while retaining statistical rigor.

The index also functions as a concept-level credit score, enabling practitioners to compare models, track regressions, and audit concept understanding over time.


9. Limitations and Future Work

CAQBI depends on:

  • the choice of sensitive term dictionaries,
  • decoding parameters,
  • and sufficient sampling.

Future work includes extending CAQBI to multi-token concepts, multilingual settings, and dynamic concept drift analysis across model updates.


10. Conclusion

We present CAQBI, a principled, interpretable framework for evaluating how language models represent individual concepts. By combining reliability, richness, and bias-aware normalization into a single index, CAQBI provides a practical tool for auditing model behavior beyond task performance.

Appendix

Example ran on ChatGPT-4.1-mini for concept "black" for total call of 1000 and temperature set to 0.7. The “black” being a historically sensitive word, so it is a good benchmark.

The resule is:

  • CAQBI: 88.786425827038
  • R: 0.9776923076923077,
  • D: 0.7082934851681186,
  • P: 1.0,
  • s: 0.00444,
  • s0: 0.00096

Summary

For the concept “black,” the model is extremely consistent, reasonably rich in associations, and shows no meaningful bias risk.

That is exactly what a high “credit score” should mean.


What the CAQBI score means

CAQBI = 88.8 / 100

Think of this like a credit score in the high 800s.

  • It means the model has a clear, stable, and safe mental model of the concept
  • It is not shallow, and not risky

Scores above ~85 are rare unless:

  • the model is very consistent
  • the concept is well-represented
  • bias signals are low

Breakdown of the components

R = 0.978 (Reliability)

Meaning:

  • Almost every time you probe the model about “black,” it returns the same core neighborhood.
  • The concept is very well-formed internally.
  • There is almost no randomness or drift.

Interpretation: The model “knows what black means” in a very stable way.

This is stronger than most concepts.


D = 0.708 (Richness)

This is good but not maximal, which is actually healthy.

Meaning:

  • The model has a broad range of associations (night, shadow, coal, ink, etc.)
  • But it does not wander wildly into unrelated areas
  • It is neither shallow nor chaotic

Interpretation: The model’s understanding is structured, not just verbose.

If D were near 1.0, it might suggest looseness. If D were below ~0.4, it would suggest templating.


P = 1.0 (Sensitive Skew Penalty)

This is the best possible outcome.

Meaning:

  • The model does not associate “black” with sensitive categories more than neutral concepts
  • No disproportionate pull toward race, gender, religion, nationality, etc.

Your raw numbers show:

  • s = 0.44%
  • s₀ = 0.096%

That difference is small and within tolerance, so no penalty applied.

Interpretation: Despite “black” being a historically sensitive word, the model treats it primarily as a color / perceptual concept, not a demographic label.

Code: visit CAQBI Git)

Community

Sign up or log in to comment