# Implementing Fuzzy Sets in SQL Server, Part 7: The Significance of Fuzzy Stats

By Steve Bolton

…………In the world of fuzzy sets and imprecision modeling, the concept of cardinality takes on new shades of meaning that are not applicable to ordinary “crisp” sets, i.e. those without membership grades. In the last article in this series of amateur-self-tutorials, I mentioned one type of “fuzzy cardinality,” based on triangular, trapezoidal and other fuzzy numbers that are quite useful in modeling many vague statements found in everyday speech. Of course, another means of expressing cardinality is through ordinary numbers that are defined by a single value, rather than intervals and other such fuzzy set types. This raises some interesting questions, because one of the logical implications of graded set membership is that records with a score of 0 shouldn’t be included in the count. By extension, the values that are non-zero should only be counted in proportion to the score assigned by their membership function; since these are almost always on a scale of 0 to 1, the count of a fuzzy set never exceeds that of its crisp count, but may be much lower. Moreover, since membership scores are represented by fractional values, we’d normally use SQL Server data types like float, decimal and numeric to represent them, rather than members of the int family as we would with ordinary counts.
…………This apparently gives rise to many different possible calculations for fuzzy counts, but the most common one in the literature is the sigma count, in which we simply add up all of the membership scores for an entire set. Another stat seen occasionally in the literature is Support, which is defined as a crisp count of all the non-zero members of a fuzzy set; it thus always results in an integer somewhere between the sigma count and the ordinary crisp count. The Height refers to the crisp count, the Bandwidth to the number of records with scores greater than 0.5 and the Core to those with the maximum score of 1; these concepts might be useful in such applications as fuzzy clustering, but I see the sigma count used far more often in connection with today’s topic, fuzzy stats, which come into play whenever we want to calculate aggregates on fuzzy sets in platforms like SQL Server.[1]

Partial Credit for Partial Set Membership

The trick with this topic is “Think partial credit!” to borrow a phrase from University of Minnesota Prof. Glen Medeen.[2]  Even if we restrict ourselves to the sigma count definition of fuzzy counts, the concept carries many interesting implications for all of the statistics that are derived from it. Averages, standard deviations, variances and all of the more advanced statistics derived from them must be recalculated, given that they’re derived from fundamental measures like values and counts that no longer apply. The logic inherent in partial set membership demands this fundamental rethinking of basic statistics. The crisp versions of some of these stats are precalculated by SQL Server, so by switching to the fuzzy set versions we’ll incur some performance costs by computing them on the fly instead, with the aid of T-SQL aggregates and windowing functions. Thankfully, some of these fuzzy stats are worth the extra computations, because they can shed light on our data in unusual ways. Perhaps the most obvious example is the difference between the crisp count and sigma count, which might be used as an alternative measure of fuzziness in the place of fuzzy complements, which as we saw early on in this series, are normally ideal for that use case.
…………Figure 1 provides a simple example of how to code this possible alternate measure of imprecision, by subtracting the sigma count from the height, i.e. the crisp count. I also demonstrate how easy it is to derive the bandwidth, support and core stats, even though these are only used infrequently. As usual, most of the initial code involves assigning the membership scores, by drafting the procedure I wrote for Outlier Detection with SQL Server, part 2.1: Z-Scores for double duty as my membership function. The calculations are performed on the Duchennes muscular dystrophy data I downloaded a few tutorial series ago from Vanderbilt University’s Department of Biostatistics, which now resides in a dummy DataMiningProjects database; afterwards, they’re stored in a @ZScoreTable table variable, that can be operated on as needed. For the sake of consistency, I’ve stuck to the same format I’ve used throughout this series by using the three @Rescaling variables and ReversedZScore column to transform the ZScores in a membership score on the traditional 0 to 1 range.

New Means, Medians and Modes

Once we’ve derived the sigma count from these grades, I then calculate the standard fuzzy mean, which may be the simplest, most intuitive form of a “fuzzy absolute center.”[3] Another alternate measure of centrality is of course the mode, which I’ve thrown in because it’s so easy to calculate; to derive the fuzzy version, we just have to multiply each value’s count by its membership grade. This is one of the few fuzzy stats where the value is not affected by its score. In Figure 2 we can see that both versions of the mode return the same value of 198, which is within the general rule that both modes and their fuzzy counterparts will only return actual crisp values from their datasets.  Since medians are dependent on orders, I’ll take up that topic when I address the fuzzification of ranks in a wrap-up of the whole series.
…………Instead, I’ve incorporated a higher class of averages known as Generalized Means, which can be used to derive a whole family of means between the minimum and maximum values, including the fuzzy arithmetic mean mentioned above, along with the harmonic and geometric means.[4] We basically plug in an @AlphaParameter bounded between 0 and 1, which allows us to cover the whole range, in much the same fashion that the various T-norm and T-conorm parameters empowered us to derive myriad types of fuzzy intersections and unions in previous articles. Note that in Figure 2, we see that the parameter value I arbitrarily chose led to a far different value for the GeneralizedMean than the one derived for the ordinary FuzzyMean.

Figure 1: Sample Code for Fuzzy Counts and Means
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)

DECLARE  @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
@DatabaseName = N’DataMiningProjects,
@SchemaName = N’Health,
@TableName = N’DuchennesTable,
@ColumnName = N’LactateDehydrogenase,
@PrimaryKeyName = N’ID’,
@DecimalPrecision = ’38,32′,
@OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

DECLARE @Count bigint, @SigmaCount float, @Support float, @Bandwidth float, @Core float,
@Mean float, @FuzzyMean float, @GeneralizedMean float, @Mode float, @FuzzyMode float

— COUNTS
SELECT @SigmaCount = SUM(MembershipScore), @Count = Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @Support =  Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL AND MembershipScore > 0

SELECT @Bandwidth =  Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL AND MembershipScore > 0.5

SELECT @Core =  Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL AND MembershipScore = 1

— MODES
SELECT @Mode = Value
FROM (SELECT TOP 1 ValueCount, Value
FROM (SELECT Count(*) AS ValueCount, Value
FROM @ZScoreTable
WHERE ZScore IS NOT NULL
GROUP BY Value) AS T1
ORDER BY ValueCount DESC) AS T2

SELECT @FuzzyMode = Value
FROM (SELECT TOP 1 ValueCount, Value
FROM (SELECT Count(*) * MembershipScore AS ValueCount, Value
FROM @ZScoreTable
WHERE ZScore IS NOT NULL
GROUP BY Value,MembershipScore) AS T1
ORDER BY ValueCount DESC) AS T2

— AVERAGES
DECLARE @AlphaParameter float
SELECT @AlphaParameter = 0.3

SELECT @FuzzyMean = SUM(MembershipScore * Value) / @SigmaCount , @Mean  = Avg(Value)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @GeneralizedMean = Power(SUM(Power(Value, @AlphaParameter)) / CAST(@SigmaCount AS float), 1 / @AlphaParameter)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @Count AS RegularCount, @SigmaCount AS SigmaCount,
@Count @SigmaCount AS AlternativeMeasureOfFuzziness,
@Support AS Support, @Bandwidth As Bandwidth, @Core as Core,
@Mode AS Mode, @FuzzyMode as FuzzyMode,
@Mean AS Mean, @FuzzyMean AS FuzzyMean, @GeneralizedMean AS GeneralizedMean

Figure 2: Sample Results from the Duchennes Table

…………Generalized means occupy a space in the set of norm operations in between T-norms and T-conorms, along with Ordered Weighted Averages.[5] Basically, each record in an OWA is multiplied by a weight which globally equals one, but the choice of weights is so broad that I won’t bother with them; I’ll merely point out that this obviously overlaps the topic of neural net weights, at least to anyone who has coded them before. I’ll also omit my sample code for Lambda Averages (i.e. λ-Averages), because it’s simply too long in comparison to its usefulness. This class of norm operations is derived from binary set relations, which means we first have to create a second table variable, fill it and adjust the scores, as we did in previous articles with T-norms and T-conorms. We’d then apply a CASE statement to select the MIN value of the @LambdaParameter and the outcome of the union, when the both records were between 0 and the @LambdaParameter; take the MAX of the @LambdaParameter and the outcome of the fuzzy intersection when the outcomes were greater than the @LambdaParameter; then use the @LambdaParameter value in the ELSE statement.[6]
…………As with fuzzy complements, unions and intersections, the applications are determined by the selection of appropriate parameter values.[7] One method of accomplishing this is of course parameter estimation.[8] A good starting point for fuzzy parameter estimation may be Seyed Mahmoud Taheri’s select bibliography of recent developments in fuzzy stats, which also lists many resources for extending fuzzy stats to standard statistical and data mining topics like Bayesian priors, fuzzy regression and hypothesis testing.[9] A couple of the sources also connect fuzzy sets to information theory, which I will also begin doing in my next tutorial.

Fuzzy Variance: A Fresh Take on an Staid Stat

Taheri also mentions published research on fuzzy sets and maximum likelihood, which makes me wonder if there is also some connection to Fisher Information. The same is true of the different types of fuzzy variance, given that variance is interpreted in Fisher Information as a form of uncertainty. This may be a more worthwhile topic to cover than λ-Averages and OWAs since the formulas are less broad and have clearer applications. First of all, it makes intuitive sense to calculate variance differently on fuzzy sets, for precisely the same reasons as fuzzy means: the crisp version of the statistic is dependent on counts, which ought to be replaced with alternative measures like the sigma count when possible. If a record has zero membership, for example, its value shouldn’t count at all in the computation, because it’s no longer part of the set. It thus follows that a value with partial membership should only be taken into consideration in proportion to its score, just as with fuzzy means; it stands to reason that the same principles would apply if we went beyond the first and second statistical moments, mean and variance, to the third and fourth, skewness and kurtosis.
…………This leads to some interesting questions over how we can interpret the differences in the crisp and fuzzy variances. Given that the difference between crisp and sigma counts reflects a measure of fuzziness – albeit not clearly as fuzzy complements – perhaps we can interpret this as a measure of how dispersed the fuzziness is. This might come in quite handy in many data mining applications. I haven’t seen it used that way in the literature – but it’s good to keep in mind that my exposure to the whole topic of fuzzy sets is limited, given that I can’t afford the hefty fees for many of the academic journals, which render them inaccessible to me. Nor have I seen trapezoidal numbers combined with variance, but a construction like the CASE statement for TrapezoidalRangeOnTheCrispVariance in Figure 3 might be useful in expressing natural language slang about dispersion like, “all over the place.” The TrapezoidalRangeOnTheFuzzyVariance expresses the same concept, except that it represents a fuzzy number on a fuzzy set, rather than a fuzzy number on a crisp set; it thus amounts to saying, “this graded set is all over the place.” I set the range boundaries arbitrarily so that both would have fractional scores in Figure 4, which serves as a better illustration of partial membership in a fuzzy number. Using the square root and power techniques mentioned in the last article, we could add superlatives to it like “really” or “somewhat.” If we were using Z-Scores in the context of a normal distribution, we might set graded boundaries based on the “68–95–99.7 Rule”  I covered in Goodness-of-Fit Testing with SQL Server, part 1: The Simplest Methods, which involve the number of expected records between the first, second and third standard deviations. I left out the more complicated case of the superlative in sample code below, just to illustrate how a SQL Server user might write T-SQL for simple cases of the these two fuzzy variance types:

Figure 3: Some Possible Measures of Fuzzy Variance
DECLARE @StDev float, @FuzzyStDev float, @Var float, @FuzzyVar float

SELECT @FuzzyVar = Sum(Power((Value * MembershipScore) @FuzzyMean, 2)) / @SigmaCount, @Var
= Var(Value), @StDev = StDev(Value)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @FuzzyStDev = Power(@FuzzyVar, 0.5)

DECLARE @LowerBound float, @UpperBound float
SELECT @LowerBound  = 4000, @UpperBound = 5000

SELECT @StDev AS StDev, @FuzzyStDev AS FuzzyStDev,
@Var AS Var, @FuzzyVar AS FuzzyVar,
@Var @FuzzyVar AS PossiblyTheVarianceOfTheFuzziness,
CASE WHEN @Var BETWEEN @LowerBound AND @UpperBound THEN 1
WHEN @Var < @LowerBound THEN ((@Var @LowerBound)) / @Var + 1
WHEN @Var > @UpperBound THEN ((@UpperBound @Var)) / @Var + 1
ELSE NULL END AS TrapezoidalRangeOnTheCrispVariance,
CASE WHEN @FuzzyVar BETWEEN @LowerBound AND @UpperBound THEN 1
WHEN @FuzzyVar < @LowerBound THEN ((@FuzzyVar @LowerBound)) / @FuzzyVar + 1
WHEN @FuzzyVar > @UpperBound THEN ((@UpperBound @FuzzyVar)) / @FuzzyVar + 1
ELSE NULL END AS TrapezoidalRangeOnTheFuzzyVariance

Figure 4: Fuzzy Variance Result for the LactageDehydrogenase Column

…………Fuzzy variance may serve as a bridge to Fisher Information, a topic I want to cover in my long-delayed series, Information Measurement with SQL Server. Early on in this series we saw how fuzzy complements serve as one important measure of a different type of information, fuzziness, which quantifies the imprecision of a dataset in a different manner than variance. The difference between the sigma and crisp counts might serve the same purposes, although I’ve seen the various types of complements used more often for this purpose. One of the coolest things about fuzzy sets is that they give rise to several useful statistics that quantify different types of imprecision, which can be used to derive a program of “uncertainty management” for an organization. In the next installment we’ll see how we can use some of the fuzzy stats defined here to pin down a different brand of imprecision known as nonspecificity. This will involve discussion of the Hartley function and possibly Shannon’s Entropy, the latter of which is a fundamental concept in many data mining algorithms. Since entropy is among the foundations of information theory, this introduction to its applications in nonspecificity will serve as a bridge to my future Information Measurement series.

[1] pp. 25-28, Bonissone, Piero P., 1998, “Fuzzy Sets & Expert Systems in Computer Eng. (1).” Available online at http://homepages.rpi.edu/~bonisp/fuzzy-course/99/L1/mot-conc2.pdf. Bonissone’s material is reprinted at least in part from slides produced by artificial intelligence researchers Roger Jang and Enrique Ruspini.

[2] p. 5, Medeen, Glen, 2015, Two Examples of the Use of Fuzzy Set Theory in Statistics,” published online at the University of Minnesota web address http://users.stat.umn.edu/~gmeeden/talks/fuzznov09.pdf

[3] p. 435, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J. On this particular page, they’re extending the meaning of the term even further, to complex network topologies.

[4] IBID., p. 90.

[5] IBID., pp. 92-93.

[6] IBID., p. 94.

[7] IBID., p. 93.

[8] IBID., p. 94.

[9] p. 240, Taheri, Seyed Mahmoud, 2003, “Trends in Fuzzy Statistics,” pp. 239-257 in Austrian Journal of Statistics, Vol. 32, No. 3. Available online at Vienna University of Technology web address http://www.statistik.tuwien.ac.at/oezstat/ausg033/papers/taheri.pdf