Blog Archives

Implementing Fuzzy Sets in SQL Server, Part 11: Fuzzy Addenda

By Steve Bolton

…………One of the key reasons I looked into the topic of fuzzy sets in the first place was my suspicion that T-SQL, as a set-based language, would be ideal for modeling them. That turned out to be an understatement of sorts: I definitely was not prepared to discover just how useful they can be for translating imprecise linguistic modifiers in Behavior-Driven Development (BDD) environments and user stories, nor did I realize how little information has percolated down from the mammoth amount of theoretical research done on fuzzy topics over the last 40 years. Hopefully this series of amateur mistutorials helped rectify that gap by giving fuzzy sets some badly needed free press, of the kind I tried to bring SSDM in my older A Rickety Stairway to SQL Server Data Mining series awhile back. I originally set aside this final article as a kitchen drawer of sorts, to dispense with some postscripts that would’ve interfered with the flow of the rest of the series, in which one concept was used as a building block onto the next. One leftover concept I thought might be worthy of significant attention was fuzzy orders, which sounds as if it would be right up SQL Server’s alley. After all, DBAs use the ORDER BY statement every day. The problem is that it turns out T-SQL, like most other set-based languages, is not ideal for modeling this kind of fuzzy object.

Fuzzy Orders and the Limitations of Hierarchies in SQL

                In the literature, fuzzy set orders are created by applying continuous membership grades to a record’s position in a particular fuzzy set. Devices like Hesse diagrams and properties like “dominated” and “undominated” are useful in implementing them[1], but I won’t bother, for the simple reason that SQL Server lacks robust graph database capabilities. Modeling relationships of this kind is still notoriously difficult in the relational realm, even though they’ve been augmented by such useful tools as hierarchyid data type in recent years. I am rather fond of hierarchyid, but it is unable to model multiparent trees in an efficient way, let alone multidimensional directed graphs. Just try modeling a simple genealogical tree with it. Trees are instances of what are known in mathematical parlance as partial orders; when you really stop and think about it, they represent a form of order, except in more than one dimension, such as “my grandparents and I have a descendant-ancestor relationship, but not my cousins and I.”[2] As far as I can tell, directed graphs open up more possibilities by relaxing the rules of composition, in the same way the Riemann manifolds give us access to curved hyperspace. I for one would cast my vote for adding graph database capabilities similar to those found in Neo4j[3] to SQL Server, which would add a whole new dimension to the product in the same way that Analysis Services and Reporting Services do, without being a separate service.
…………Alas, until such capabilities are added to SQL Server, it wouldn’t be useful to model most forms of fuzzy orders in T-SQL, let alone Multidimensional Expressions (MDX) in SQL Server Analysis Server (SSAS) cubes, because they immediately require the flexibility of multiparent trees and directed graphs. These tasks could be accomplished in SQL Server 2014 as it stands, but in contrast to the other fuzzy objects I’ve introduced throughout this series, I doubt it can be done in an efficient way. It also doesn’t help matters at all that the Windows Presentation Foundation (WPF) tree control is a walking disaster – for years now, its shortcomings have been a thorn in the side of .Net developers of all skill levels. Microsoft simply didn’t build in such basic functionality as searching for specific members in a collapsed tree, and in fact made it virtually impossible for third-party developers to do it themselves. Needless to say, neither the WPF TreeView nor hierarchyid is well-suited to modeling directed graphs, which are simply a more flexible generalizations of trees. The kissing cousins of fuzzy orders, like fuzzy rankings[4] and fuzzy morphisms[5], aren’t really feasible either. George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications, my favorite go-to resource for fuzzy math formulas, provides a decent starting point for all three[6], but from my little experience, I wouldn’t even try to implement them unless I had access to a good third-party product like GoXAM’s directed graph control (which may be expensive, but would probably recoup its costs by saving weeks of wasted labor on the unworkable WPF TreeView). If it one day does become worthwhile to model fuzzy orders and ranks in some future edition of SQL Server (or I turn out to be wrong), they’ll probably require the use of a lot of CASE statements in ORDER BY clauses and windowing functions, respectively. Given that there’s a mountain of currently unsolved problems out there that other aspects of fuzzy sets could tackle right away, we’ll save this topic for a later date. It’ll be a long time before all the low-hanging fruit is used up and we’re to the point where struggling to model them will become worthwhile.

Some Simple T-SQL for Fuzzy Medians

                Because I realized early on that fuzzy orders were an afterthought – at least by the present capabilities of SQL Server and other relational databases – I left the subject of fuzzy medians for this junk drawer of an article. After all, medians are inherently dependent on the order of data, given that the pick the one or two values that occur precisely in the middle of a set. Furthermore, I noticed that the formulas involved calculations on two sets rather than one, which would have cluttered Implementing Fuzzy Sets in SQL Server, Part 7: The Significance of Fuzzy Stats, where the sample code was all done on a single table. That should have been a clue, however, that the fuzzy medians in the literature are a separate subject, not just a fuzzified version of ordinary medians. That would be easy enough to implement, given the principles of fuzzy sets introduced throughout this series; for example, instead of selecting the one or two records at the dead center of the dataset, we could select a fuzzy range. The trapezoidal numbers discussed in Implementing Fuzzy Sets in SQL Server, Part 6: Fuzzy Numbers and Linguistic Modifiers might be ideal for this purpose. The type of fuzzy medians under discussion here instead belong in the taxonomic hierarchy of fuzzy objects I mentioned in the fuzzy stats article, like Ordered Weighted Averages (OWAs), Lambda Averages (λ-Averages), T-norms, T-conorms and the like. Compared to some of those operations, the logic of fuzzy medians is fairly simple: we take the maximum of the values of two sets at each corresponding row when both membership scores are between 0 and the @LambdaParameter, the minimum values when both are between the @LambdaParameter and 1 and just the @LambdaParameter (which must be set between 0 and 1) in all other cases.[7] Assuming I read the formulas correctly – which is not a given, since I’m a novice at this – then this should all be implemented in Figure 1. As usual, it looks a lot longer than it really is; everything through the second UPDATE statement is just the same sample code I’ve used this series to populate the membership functions for binary set relations. Keep in mind that we don’t need to use Z-Scores to assign membership values here; I’m just using them to illustrate how to assign memberships in a fuzzy set, using familiar code from older tutorials. The sky’s the limit as far as the number of functions you can use to assign such values; the key thing is to find the right match to the problem you’re trying to solve. This would be a good match if we were trying to rate outliers by two different forms of Z-Scores, for example. The only novel part is the last SELECT, which isn’t difficult at all. As always, the results in Figure 2 are derived from the Duchennes muscular dystrophy dataset I downloaded a few tutorial series ago from Vanderbilt University’s Department of Biostatistics and have been using for practice data ever since.

Figure 1: Sample Code for a Simple Fuzzy Median
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)
DECLARE       @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

DECLARE @ModifiedZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint,
OutlierCandidate bit
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
             @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore)
FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

INSERT INTO @ModifiedZScoreTable
(PrimaryKey, Value, ZScore, GroupRank, OutlierCandidate)
EXEC   Calculations.ModifiedZScoreSP
              @DatabaseName = N’DataMiningProjects,
             @SchemaName = N’Health,
             @TableName = N’DuchennesTable,
             @ColumnName = N’LactateDehydrogenase,
             @PrimaryKeyName = N’ID’
              @OrderByCode = 8,
              @DecimalPrecision = ’38,32′

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore)
FROM @ModifiedZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ModifiedZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

DECLARE @LambdaParameter float = 0.43 

SELECT  T1.PrimaryKey, T1.Value, T1.MembershipScore, T2.MembershipScore,
CASE WHEN (T1.MembershipScore BETWEEN 0 AND @LambdaParameter) AND (T2.MembershipScore BETWEEN 0 AND @LambdaParameter) THEN (SELECT MAX(Value) FROM (VALUES (T1.MembershipScore), (T2.MembershipScore) ) AS T1(Value))
WHEN (T1.MembershipScore BETWEEN @LambdaParameter AND 1) AND (T2.MembershipScore BETWEEN  @LambdaParameter AND 1) THEN (SELECT MIN(Value) FROM (VALUES (T1.MembershipScore), (T2.MembershipScore) ) AS T1(Value))
ELSE @LambdaParameter END AS FuzzyMedian
FROM @ZScoreTable AS T1
       INNER JOIN @ModifiedZScoreTable AS T2
       ON T1.PrimaryKey = T2.PrimaryKey AND T1.Value IS NOT NULL AND T2.Value IS NOT NULL 

Figure 2: Results from the Duchennes Dataset

…………I barely began to scratch the surface of fuzzy objects like fuzzy medians, λ-Averages, T-norms, T-conorms and OWAs in this series. In fact, there’s an entire sea of ripe research out there on all topics fuzzy that could be quite useful to relational DBAs and decision support specialists, but which has gone unpicked. There are many different directions this topic can be taken in, so I may revisit this series and tack some additional articles onto it in the future. I didn’t get a chance to mention the extension principle[8] at all and glossed over important applications of fuzzy techniques in Decision Theory, way back in Implementing Fuzzy Sets in SQL Server, Part 4: From Fuzzy Unions to Fuzzy Logic. I might provide more detail on the use cases for particular T-norms and T-conorms (if I can ever get my hands on the relevant academic journal articles, which are expensive), model more linguistic states and get into indexing considerations, other brands of fuzzy aggregates and other types of fuzzy partitions besides alpha cuts (α-cuts), among other things. Yet I’d rather branch off into “soft computing,” which is a grab-bag and hodge-podge of cutting edge fields that are quite hard, which make its name something of an oxymoron. Fuzzy logic is merely one of the buzz words associated with it, like chaos theory, neural nets, support vector machines (SVMs) and genetic algorithms. What they all have in common is that they’re useful in situations where inexact solutions are acceptable, including NP-Complete problems.[9] The same hype and intellectual intoxication I spoke of in Implementing Fuzzy Sets in SQL Server, Part 1: Membership Functions and the Fuzzy Taxonomy also surrounds certain aspects of soft computing, which seems to make some theoreticians go soft in the head; I guarantee there will still be useful innovations occurring in these fields a century from now, assuming the human race lasts that long, but these incredible tools aren’t cure-alls. There are some things they just can’t do and I’d wager that certain brands of artificial intelligence and machine learning are among them; I love science fiction but it’s not wise to confuse it with cold, hard reality.
…………That’s a discussion I’ll take up by dribs and drabs in my next, long-delayed mistutorial series, Information Measurement with SQL Server, which may serve as stepping stone to my favorite topic, neural nets. Both topics dovetail nicely with fuzzy sets and many of the tangential topics we’ve covered in this series, like Shannon’s Entropy and the Hartley function. These are among dozens of metrics which can be coded in T-SQL and Multidimensional Expressions (MDX) and put to good use for data mining purposes, as I will demonstrate over the course of this long and possibly nomadic series. I aim to familiarize myself with semantic information, measures of order, measures of sensitivity to initial conditions (like the Lyapunov Exponent used in chaos theory), various means of quantifying algorithmic complexity – anything that will reduce uncertainty and glean whatever unused information is left in our datasets, by quantifying it in some way. Some of these metrics can be plugged into the formulas I introduced in this series for measuring fuzziness in terms of set complements, such as the Küllback-Leibler Divergence and Bhattacharyya Distance. We’ve already gotten our toes wet by introducing fuzzy stats and metrics for quantifying nonspecificity and fuzziness; now it’s time to jump in. Some of the topics will be quite shallow and easy to follow, while others may be incredibly deep. It’s largely unexplored territory for me as well, so I may have to skip around from topic to topic in an unsystematic way, instead of deliberately building up to more complex concepts as I did towards Dempster-Shafer Evidence Theory in this series. At a minimum, readers should at least benefit from learning from my mistakes, which don’t require a fancy fuzzy expert system to tell us that they’re inevitable; like death and taxes, they’re one of the few pieces of information that come with any certainty in predictive analytics and data mining.

[1] pp . 137-141, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J. On this particular page, they’re extending the meaning of the term even further, to complex network topologies.

[2] For more information, see the article “Partially Ordered Set” at the Wikipedia web address  http://en.wikipedia.org/wiki/Partially_ordered_set

[3] Which I have yet to try; I’m only speaking here of what’ve read about Neo4j casually.

[4] pp. 405-408, Klir and Yuan.

[5] IBID., pp. 141-144.

[6] IBID., pp. 137-144,

[7] IBID., p. 94.

[8] IBID., pp. 44-45.

[9] See the Wikipedia article “Soft Computing” at http://en.wikipedia.org/wiki/Soft_computing.

Implementing Fuzzy Sets in SQL Server, Part 10.2: Measuring Uncertainty in Evidence Theory

By Steve Bolton                                                                                                                      

…………To avoid overloading readers with too many concepts at once, I split my discussion of Dempster-Shafer Evidence Theory into two parts, with the bulk of the data modeling aspects and theory occurring in the last article. This time around, I’ll cover how fuzzy measures can be applied to it to quantify such forms of uncertainty as nonspecificity and imprecision (i.e., “fuzziness”) that were introduced in prior articles. Since the Plausibility, Belief and probability mass assignment figures work together to assign degrees of truth, they also introduce the potential for contradictory evidence, which leads to a few other measures of uncertainty: Strife, Discord and Conflict, which aren’t as relevant to possibility distributions and ordinary fuzzy sets. In addition, the probability mass for a universal hypothesis can be interpreted as a form of uncertainty left over after all of the probabilities for the subsets have been partitioned out. For example, in Figure 1, this crude type of uncertainty would be associated with the 0.0334928229665072 value for row 6. For the sake of brevity, I won’t rehash how I derived the ordinal LactateDehydrogenaseState category and the first three fuzzy measures associated with it, since the numbers are identical to those in the last tutorial. For the sake of convenience I added three columns with nearly identical names and calculated some sham data for them (based on the frequencies of some CreatineKinase data in the original table) so that we have some Conflicting data to work with. Ordinarily, such comparisons would be made using joins against an external view or table with its own separate ProbabilityMassAssignment, BeliefScore and PlausibilityScore columns, or a query that calculated them on the fly.

Figure 1: Some Sample Evidence Theory Data from the Last Tutorial

…………In Figure 2, I translated some of the most popular formulas for evidence theory measures into T-SQL, such as Strife, Discord and Conflict.[1] For these, I used a simpler version of the equations that performed calculations on differences in set values rather than fuzzy intersections and unions.[2] Despite the fact the two measures only differ by the divisor and order of the difference operation, Discord is apparently not used as often as Strife on the grounds that it does not capture as much information. These subtle differences occur only in the alternate measures of Conflict they’re based on; since the one related to Strife is more important, I only included that one in Figure 3, where it’s represented by a score of 0.286225667126791. Versions of Strife and Discord are available for possibility distributions, but I omitted these because the fact that possibility theory is “almost conflict-free” signifies that they’re of “negligible” benefit.[3] I also coded the evidence theory version of nonspecificity and essentially rehashed the crude fuzziness measure I used in Implementing Fuzzy Sets in SQL Server, Part 2: Measuring Imprecision with Fuzzy Complements, except with the YagerComplement parameter arbitrarily set to 0.55 and the probability mass used in place of the membership function results. Both of these are unary fuzzy measures that apply only to the set defined by the first three float columns, whereas Strife, Discord and Conflict are binary measures that are calculated on the differences between the two sets encoded in the Health.DuchennesEvidenceTheoryTable. We can also add the Strife and fuzziness figures together to derive a measure of total uncertainty, plus interpret the height of a fuzzy set – i.e., the count of records with the maximum MembershipScore of 1 – as a sort of credibility measure. Keep in mind that I’m not only a novice at this, but am consulting mathematical resources that generally don’t have the kind of step-by-step examples with sample data used in the literature on statistics. This means I wasn’t able to validate my implementation of these formulas well at all, so it would be wise to recheck them before putting them to use in a production environments where accuracy is an issue. I’m most concerned by the possibility that I may be incorrectly aggregating the individual focal elements for evidentiary fuzziness and nonspecificity, each of which should be weighted by the corresponding probability mass.

Figure 2: Several Evidence Theory Measures Implemented in T-SQL
DECLARE @Conflict float, @ConflictForDiscord float

SELECT @Conflict = SUM(CASE WHEN BeliefScore2 = 0 THEN ProbabilityMassAssignment2 * ABS(BeliefScore BeliefScore2)
       ELSE ProbabilityMassAssignment2 * ABS(BeliefScore BeliefScore2) / ABS(CAST(BeliefScore AS float))
       END),
       @ConflictForDiscord = SUM(CASE WHEN BeliefScore2 = 0 THEN ProbabilityMassAssignment2 * ABS(BeliefScore2 BeliefScore)
       ELSE ProbabilityMassAssignment2 * ABS(BeliefScore2 BeliefScore) / ABS(CAST(BeliefScore2 AS float))
       END)
FROM Health.DuchennesEvidenceTheoryTable 

— FUZZINESS
DECLARE @Count  bigint, @SimpleMeasureOfFuzziness float
DECLARE @OmegaParameter float = 0.55 — ω

SELECT @Count=Count(*)
FROM Health.DuchennesEvidenceTheoryTable

SELECT @SimpleMeasureOfFuzziness = SUM(ABS(ProbabilityMassAssignment YagerComplement)) /@Count
FROM (SELECT ProbabilityMassAssignment, Power(1 Power(ProbabilityMassAssignment, @OmegaParameter), 1 / CAST(@OmegaParameter AS float)) AS YagerComplement
       FROM Health.DuchennesEvidenceTheoryTable) AS T1 

— NONSPECIFICITY
DECLARE @EvidenceTheoryNonspecificityInBits float 

SELECT @EvidenceTheoryNonspecificityInBits = SUM(ProbabilityMassAssignment * Log(@Count, 2))
FROM Health.DuchennesEvidenceTheoryTable

SELECT Strife, Discord, Conflict, EvidenceTheoryNonspecificityInBits,SimpleMeasureOfFuzziness, Strife + EvidenceTheoryNonspecificityInBits
AS TotaUncertainty,
(SELECT ProbabilityMassAssignment
       FROM Health.DuchennesEvidenceTheoryTable
       WHERE LactateDehydrogenaseState = ‘Any’) AS ProbabilityMassRemainderUncertainty
             FROM (SELECT 1 * SUM(ProbabilityMassAssignment * Log((1 @Conflict), 2)) AS Strife,
       1 * SUM(ProbabilityMassAssignment * Log((1 @ConflictForDiscord), 2)) AS Discord,  @Conflict AS Conflict, @EvidenceTheoryNonspecificityInBits AS EvidenceTheoryNonspecificityInBits, @SimpleMeasureOfFuzziness AS SimpleMeasureOfFuzziness
       FROM Health.DuchennesEvidenceTheoryTable) AS T1

 

Figure 3: Sample Results from the Duchennes Evidence Theory Table

…………The nonspecificity measure in evidence theory is merely the Hartley function weighted by the probability mass assignments. On paper, the equation for Strife ought to appear awfully familiar to data miners who have worked with Shannon’s Entropy before. The evidence theory version incorporates some additional terms so that a comparison can be performed over two sets, but the negative summation operator and logarithm operation are immediately reminiscent of its more famous forerunner, which measures probabilistic uncertainty due to a lack of stochastic information.  Evidentiary nonspecificity trumps entropy in many situations because it is measured linearly, therefore avoiding computationally difficult nonlinear math (my paraphrase), but sometimes doesn’t produce unique solutions, in which case Klir and Yuan recommend using measures of Strife to quantify uncertainty.[4] Nevertheless, when interpreted correctly and used judiciously, they can be used in conjunction with axioms like the principles of minimum uncertainty, maximum uncertainty[5] and uncertainty invariance[6] to perform ampliative reasoning[7] and draw useful inferences about datasets:

                “Once uncertainty (and information) measures become well justified, they can very effectively be utilized for managing uncertainty and the associated information. For example, they can be utilized for extrapolating evidence, assessing the strength of relationship on between given groups of variables, assessing the influence of given input variables on given output variables, measuring the loss of information when a system is simplified, and the like. In many problem situations, the relevant measures of uncertainty are applicable only in their conditional or relative terms.”[8]

…………That often requires some really deep thinking in order to avoid various pitfalls in analysis; in essence, they all involve honing the use of pure reason, which I now see the benefits of, but could definitely use a lot more practice in. For example, Dempster-Shafer Theory has well-known issues with counter-intuitive results at the highest and lowest Conflict values, which may require mental discipline to ferret out; perhaps high values of Strife can act as a safeguard against this, by alerting analysts that inspection for these logical conundrums is warranted.[9] Critics like Judea Pearl have apparently elaborated at length on various other fallacies that can arise from “confusing probabilities of truth with probabilities of provability,” all of which need to be taken into account when modeling evidentiary uncertainty.[10] Keep in mind as well that Belief or Plausibility scores of 1 do not necessarily signify total certainty; as we saw a few articles ago, Possibility values of 1 only signify a state of complete surprise when an event does not occur rather than assurance that it will happen.
…………The issue with evidence theory is even deeper in a certain sense, especially if those figures are derived from subjective ratings. Nevertheless, even perfectly objective and accurate observations can be quibbled with, for reasons that basically boil down to Bill W.’s adage “Denial ain’t just a river in Egypt.” One of the banes of the human condition is our propensity to squeeze our eyes shut to evidence we don’t like, which can only be overcome by honesty, not education; more schooling may even make things worse, by enabling people to lie to themselves with bigger words than before. In that case, they may end up getting tenure for developing entirely preposterous philosophies, like solipsism, or doubting their own existence. As G.K. Chesterton warned more than a century ago, nothing can stop a man from piling doubt on top of doubt, perhaps by reaching for such desperate excuses as “perhaps all we know is just a dream.” He provided a litmus test for recognizing bad chains of logic, which can indeed go on forever, but can be judged on whether or not they tend to drive men into lunatic asylums. Cutting edge topics like fuzzy sets, chaos theory and information theory inevitably give birth to extravagant half-baked philosophies, born of the precisely the kind of obsession and intellectual intoxication that Chesterton speaks of in his chapter on The Suicide of Thought[11] and his colleague Arnold Lunn’s addresses in The Flight from Reason.[12] These are powerful techniques, but only when kept within the proper bounds; problems like “definition drift” and subtle, unwitting changes in the meanings assigned to fuzzy measures can easily lead to unwarranted, fallacious or even self-deceptive conclusions. As we shall see in the next series, information theory overlays some of its own interpretability issues on top of this, which means we must trend even more carefully when integrating it with evidence theory.
…………Fuzzy measures and information theory mesh so well together than George J. Klir and Bo Yuan included an entire chapter on the topic of  “Uncertainty-Based Information” in my favorite resource for fuzzy formulas, Fuzzy Sets and Fuzzy Logic: Theory and Applications.[13] The field of uncertainty management is still in its infancy, but scholars now recognize that uncertainty is often “the result of some information deficiency. Information…may be incomplete, imprecise, fragmentary, not fully reliable, vague, contradictory, or deficient in some other way. In general, these various information deficiencies may result in different types of uncertainty.”[14] Information in this context is interpreted as uncertainty reduction[15]; the more information we have, the more certain we become. Methods to ascertain how the reduction of fuzziness (i.e  how imprecise the boundaries of fuzzy sets are) contributes to information gain were not fully worked out two decades ago when most of the literature I consulted for this series was written, but I have the impression that still holds today. When we adapt the Hartley function to measure the nonspecificity of evidence, possibility distributions and fuzzy sets, all we’re doing is taking a count of how many states a dataset might take on. With Shannon’s Entropy, we’re performing a related calculation that incorporates the probabilities associated with those states. Given their status as the foundations of information theory, I’ll kick off my long-delayed tutorial series Information Measurement with SQL Server by discussing both from different vantage points. I hope to tackle a whole smorgasbord of various ways in which the amount of information associated with a dataset can be quantified, thereby helping to cut down further on uncertainty. Algorithmic complexity, the Lyapunov exponent, various measures of order and semantic information metrics can all be used to partition uncertainty and preserve the information content of our data, so that organizations can make more accurate decisions in the tangible world of the here and now.

[1] pp. 259, 262-263, 267, 269, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J. The formulas are widely available, but I adopted this as my go-to resource whenever the math got thick.

[2] IBID., p. 263.

[3] IBID., pp. 262-265.

[4] IBID., p. 274

[5] IBID.,  pp. 271-272. Klir and Yuan’s explanation of how to use maximum uncertainty for ampliative reasoning almost sounds a sort of reverse parsimony:  “use all information available, but make sure that no additional information is unwittingly added…the principle requires that conclusions resulting from any ampliative inference maximize the relevant uncertainty within the constraints representing the premises. The principle guarantees that our ignorance be fully recognized when we try to enlarge our claims beyond the given premises and, as the same time, that all information contained in the premises be fully utilized. In other words, the principle guarantees that our conclusions are maximally noncommittal with regard to information not contained in the premises.”

[6] IBID., p. 275.

[7] IBID., p. 271.

[8] IBID., p. 269.

[9] See the Wikipedia webpage “Dempster Shafer Theory” at http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory

[10] IBID.

[11] See Chesterton, G.K., 2001, Orthodoxy. Image Books: London. Available online at the G. K. Chesterton’s Works on the Web address http://www.cse.dmu.ac.uk/~mward/gkc/books/

[12] Lunn, Arnold, 1931, The Flight from Reason. Longmans, Green and Co.: New York.

[13]  pp. 245-276, Klir and Yuan.

[14] IBID.

[15] IBID., p. 245.

 

Implementing Fuzzy Sets in SQL Server, Part 10.1: A Crude Introduction to Dempster-Shafer Evidence Theory

By Steve Bolton

…………Early on in this series, we learned how the imprecision in natural language statements like “the weather is hot” can be modeled using fuzzy sets. Ordinarily, the membership grades assigned to fuzzy sets are not to be interpreted as probabilities, even though they’re both implemented on continuous scales between 0 and 1; the exception to this rule is when a probabilistic meaning is consciously assigned to the type of fuzziness. A couple of articles ago we saw how membership scores can be interpreted as assessing the logical possibility of the associated statements; the possibility distributions this nuance gives rise to quantifies whether or not an event can occur, whereas a probability distribution assesses whether it will actually occur. The two scales are independent except at the maximum and minimum values, where possibility values acts as caps on probabilities, since an event must be possible if it is to have a non-zero probability. The possibility and necessity measures that factor into possibility distributions are actually special cases of the plausibility and belief measures used in Dempster-Shafer Evidence Theory, which has a related shade of meaning: instead of gauging whether or not an event can or will happen, plausibility and belief work together to grade the credibility of the associated evidence. If we were sifting through user stories in a Behavior-Driven Development (BDD) process, we wouldn’t use evidence theory for fuzzy terms like “the weather is hot,” or questions like “the weather could be cold” or “the weather is probably mild,”[1] which might be candidates for possibilistic or stochastic modeling. “As far as I can tell, the weather will be hot,” might be fair game, since the subject of the sentence is the trustworthiness of the associated statement. The clearest example I’ve yet run across in the literature occurs in George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications, which I’ve used as my go-to resource throughout this series for the heavy math formulas:

                “Consider, however, the jury members for a criminal trial who are uncertain about the guilt or innocence of the defendant. The uncertainty in this situation seems to be of a different type; the set of people who are guilty of the crime and the set of innocent people are assumed to have very distinct boundaries. The concern, therefore, is not with the degree to which the defendant is guilty, but with the degree to which the evidence proves his membership in either the crisp set of guilty people or the crisp set of innocent people.”[2]

In the last article, I gave a monologue on how organizations can benefit from uncertainty management programs, which begins with partitioning uncertainty into various types, like probabilities, nonspecificity, fuzziness and conflicting information; these in turn stretch across five mathematical subtopics, information theory, stochastics, possibility theory, fuzzy sets and evidence theory. The last of these has its own corresponding formulas for measures like nonspecificity, but is particularly useful for quantifying the degree of conflict between pieces of information. For this reason, it is widely used to aggregate disparate sources of information, which in turn integrates seamlessly with Decision Theory; for example, one of its most common implementations is sensor fusion.[3] Klir and Yuan also provide a concise list of possible use cases in various fields:

                “For instance, suppose we are trying to diagnose an ill patient. In simplified terms, we may be trying to determine whether the patient belongs to the set of people with, say, pneumonia, bronchitis, emphysema, or a common cold. A physical examination may provide us with helpful yet inconclusive evidence. For example, we might assign a high value, say 0.75, to our best guess, bronchitis, and a lower value to the other possibilities, such as 0.45 for the set consisting of pneumonia and emphysema and 0 for a common cold. These values reflect the degree to which the patient’s symptoms provide evidence for the individual diseases or sets of diseases; their collection constitutes a fuzzy measure representing the uncertainty associated with several well-defined alternatives; It is important to realize that this type of uncertainty, which results from information deficiency, is fundamentally different from fuzziness, which arises from the lack of sharp boundaries.”[4]

…………Thankfully, a sturdy mathematical scaffolding to model these types of evidence-based uncertainty already exists, although it isn’t being tested much these days in the relational database, data warehousing and data mining fields. The modeling process is akin to the one I introduced a few weeks ago for possibility distributions, but a tad more complicated. A continuous data type like float, numeric or decimal is required for probability values, but possibility theory also calls for the addition of a bit column, which is often assigned to the Necessity measure. In the theory developed independently by statisticians Glenn Shafer and Arthur Dempster, we need three measures: a Probability Mass Assignment (often denoted by a lower case m) that tells us the strength of the evidence that a record belongs just to one set; a Belief measure that measures the same, plus the evidence for belonging to its subsets; and a Plausibility measure, which covers both of those, as well as “the additional evidence or belief associated with sets that overlap with A.”[5] The easy part is that all three are measured on scale of 0 to 1, the same as fuzzy sets, probabilities, possibilities and the like; the complexity arises from the fact that they measure evidence at different levels. This leads to nested bodies of evidence, which alpha cuts (α-cuts) are ideal for modeling, as explained a couple of articles ago; I saved this topic for the next-to-last article precisely because it unites many of the concepts introduced throughout the series, like α-cuts, fuzzy unions, intersections and complements.
…………These relationships also give rise to various mathematical properties, some of which are similar to those used in possibility distributions. For example, just as Necessity is equal to 1 minus the complement of Possibility, so too is Plausibility equal to 1 minus the complement of the Belief measure. Plausibility must be greater than or equal to the Belief, since it models evidence at a higher scope. These “fuzzy measures” have weakened forms of properties like monotonicity, continuity and additivity than probabilities do.[6] Belief measures are superadditive, which means that if you sum them together across the subsets, the result must be greater than or equal to the Belief function for the whole set. For example, the Belief function for the whole set can be a figure less than 1, say 0.97, but the individual measures of each subset can be assigned degrees of belief like 0.5, 0.87, 0.3, etc. which together sum to 1.67, which is valid because it’s greater than 0.97. In contrast, probabilities must always sum to 1 across a dataset, including the probability mass assignments used in evidence theory. Plausibility is subadditive, which signifies the opposite relationship, so that the measures taken across the subsets must sum to the at least the Plausibility for the whole set. In short, they act as maximums rather than sums.  This all sounds weird, but it’s a necessary logical consequence of the nesting of evidence. As explained in the discussion on α-cuts a couple of articles ago, this signifies that records can belong to multiple hierarchical partitions of a set, which is an unfamiliar situation in the relational world (despite the fact that it is easily modeled using set-theoretic relational technology). The good news is that this web of interrelationships makes the three evidence theory measures reconstructible from each other; this makes it possible to validate the values using queries like the samples in Figure 2.

Two Common Illustrations of Dempster-Shafer Evidence Theory in Action

                The Wikipedia article on Dempster-Shafer Theory  has comprehensible examples of how these three measures work together, beginning with a sensor that detects whether a cat concealed in a box is in a Dead or Alive state. The value for Either obviously reaches the maximum value of 1 for both Belief and Plausibility, since it must be one of the two by logical necessity (that is, unless our cat happens to belong to Erwin Schröedinger or was buried in Pet Sematary).  It is thus an instance of a “universal hypothesis,” which encompasses the whole dataset. Yet the probability mass assignment for the Either state is only 0.3, which signifies the fact that we don’t have solid information on its status; the probability figure for the whole dataset still sums to 1 though, once the stats for Alive and Dead are factored in. The probability value for the universal hypothesis thus constitutes a measure of the uncertainty remaining in the data, once the probability, Belief and Plausibility measures have partitioned it off. Since Dead and Alive are discrete states without fuzzy intervals, the Wikipedia example assigns them Belief  figures equal to their probability masses – which when added to the value of 1 for the Either state, means that the total Belief for the whole dataset is greater than 1, unlike the probability mass. The Plausibility can then be reconstructed using the inverse of the complement of the Belief.
…………The tricky part is that the Belief measures must sum to 1 for each subset, which calls for looking at our data in an unfamiliar way. I initially thought that the existence of these subsets meant that we could simply model this by applying the appropriate normal form, but that’s not the case. The second example in the Wikipedia article has examples of states like Red, Yellow, Green which are mutually exclusive, as well as some that carry a bit of measurement uncertainty, like “Red or Yellow” and “Red or Green.” In this situation, the Belief figures for Red, Yellow and “Red or Yellow” must sum to 1, as must the Belief figures for Red, Green and “Red or Green,” since there are two overlapping subsets. Red, Yellow and Green are all members of more than one subset, but not the same ones. This leads to an odd predicament where each state is discrete and thus difficult to denormalize, yet the associated column still represents subsets; this is one situation where the presence of logical OR statements is not a hint that the design requires normalization. Since we can’t be certain how many other state descriptions a child could be related to, a single self-referencing ParentID column won’t do the job either. The next best thing is an interleaved solution, in which a separate table with two foreign keys pointing to the primary key of the table holding the Belief measures to keep track of which subsets each record belongs to. To aggregate the Belief figures for each subset in the parent table, we just inspect the interleaved table for all of the categories a record can belong to.

Server States: A SQL Server-Specific Example

                Let me give an example that might be more intuitive and relevant to SQL Server users: the state_desc column of sys.databases will assign one of seven mutually exclusive states to each database: Online, Offline, Restoring, Recovering; Recovery Pending, Suspect and Emergency. As far as I know, these states do not rule out which user modes a database can be, which range from SINGLE_USER to RESTRICTED_USER to MULTI_USER. Nevertheless, many combinations would be improbable, so each unique pair of descriptions requires a probability assignment that will probably differ from other pairs of state_desc and user mode values. Now let’s pretend we have a sensor that guesses which of pair of states a server is in at any given moment, perhaps based on I/O data or network bandwidth usage. If it can tell us the user mode plus whether we’re in one of the three recovery states, but can’t differentiate between them accurately, then we’re dealing with a fuzzy interval-valued set. From the point of view of the sensor, “Restoring | Recovering | Recovery Pending” is a discrete state and ought to be recorded as such in the database table. Nevertheless, to derive the Belief we must sum together all of the probabilities for the subsets it gives rise to, while the Plausibility equals one minus the sum of the probability assignments in the subsets it does not participate in. We could create a separate category like “Unknown” for situations where the sensor went offline or was otherwise unable to return accurate data – or better yet, establish a universal hypothesis like “Any State” with the Belief and Plausibility both set to 1 and we add all of its possible subsets. Subtracting the sum of the probabilities of all known states from that of the universal hypothesis would allow us to measure one type of uncertainty associated with the table. In order to measure the uncertainty inherent in the interval-valued fuzzy subsets that the Belief and Plausibility measures are attached to, we’d have to use a measure of fuzziness tailored to evidence theory. In the same vein, the count of possible state descriptions could be used to derive a measure of nonspecificity, albeit through a different formula than the ones introduced in the last article. In addition, we can define measures of uncertainty based on how much
…………It is easier to illustrate all of this with T-SQL code samples, beginning with the easiest part, a simple snapshot of a table with probability mass, Belief and Plausibility measures defined on it. Degrees of Belief are usually derived from some kind of input method, akin to fuzzy set membership functions – except that subjective ratings tend to be more common in evidence theory. It is no surprise that Bayesian methods are often applied in deriving Belief functions, given that they actually represent a more specific subset of evidence theory measures. Instead of complicating the topic any further, I’ve derived the values in Figure 1 by creating an artificial category in the Duchennes muscular dystrophy data I’ve been using for practice data for the last few tutorial series[7], then simply assigned probability mass assignments based on the frequency of the values for the LactateDehydrogenase column. From there, I derived the Belief measures, then constructed the Plausibility measures from those. I used the float data type for all three of the columns that associate measured with the LactateDehydrogenaseState column, an ordinal category; this represents yet another use of fuzzy sets to model ordinals on continuous scales, except at a more advanced level where three columns are required.

Figure 1: Simple Evidence Theory Measures Defined on the LactateDehydrogenase Column

Figure 2: Sample Validation Code for the Relationships Between the Three Evidence Theory Measures
— verifying the Belief via the ProbabilityMassAssignment mass assignment
SELECT ID, LactateDehydrogenaseState, ProbabilityMassAssignment, BeliefScore, PlausibilityScore,
CASE WHEN IntervalProbabilityMassAssignmentSum IS NOT NULL THEN IntervalProbabilityMassAssignmentSum ELSE ProbabilityMassAssignment END
AS BeliefReconstructedFromProbabilityMass
FROM Health.DuchennesEvidenceTheoryTable AS T3
        LEFT JOIN (SELECT ParentID, SUM(ProbabilityMassAssignment) AS IntervalProbabilityMassAssignmentSum
              FROM Health.DuchennesEvidenceTheoryTable AS T1
               INNER JOIN Health.DuchennesEvidenceTheoryIntervalTable AS T2
               ON T1.ID = T2.BeliefSubsetID
              GROUP BY ParentID) AS T4
       ON T3.ID = T4.ParentID 

SELECT ID, LactateDehydrogenaseState, BeliefScore, ProbabilityMassAssignment, ProbabilityMassAssignmentBySum,
CASE WHEN ProbabilityMassAssignmentBySum IS NULL THEN 1 ELSE ABS(1 (ProbabilityMassAssignment+ ProbabilityMassAssignmentBySum)) END AS PlausibilityScoreReconstructedFromProbability
FROM (SELECT ID, LactateDehydrogenaseState, BeliefScore, ProbabilityMassAssignment
FROM Health.DuchennesEvidenceTheoryTable) AS T5
       LEFT JOIN (SELECT BeliefSubsetID, SUM(ProbabilityMassAssignment) AS ProbabilityMassAssignmentBySum
       FROM (SELECT DISTINCT T1.BeliefSubsetID, T2.ParentID
              FROM Health.DuchennesEvidenceTheoryIntervalTable AS T1
                     INNER JOIN Health.DuchennesEvidenceTheoryIntervalTable AS T2
                     ON T1.ParentID = T2.BeliefSubsetID AND T1.BeliefSubsetID != T2.BeliefSubsetID) AS T4
                           INNER JOIN Health.DuchennesEvidenceTheoryTable AS T3
                           ON T4.ParentID = T3.ID
       GROUP BY BeliefSubsetID) AS T6
       ON T5.ID = T6.BeliefSubsetID

…………Note how the Belief is equal to the ProbabilityMassAssignment for Low, Medium and High, which is reflective of the fact that they have no substates; Medium or Low and High or Medium have BeliefScore values higher than their masses, precisely because we have to tack the values for Low, Medium and High onto them. The PlausibilityScore is in each case determined by adding together all of the ProbabilityMassAssignment values for the columns that aren’t among a record’s subsets, then taking an inverse, which is equivalent to subtracting the complement of the BeliefScore from 1. The second image depicts the Health.DuchennesEvidenceTheoryIntervalTable, in which the ParentID and BeliefSubsetID determine the linkages between subsets. For example, the records with ParentIDs of 4 tie together the Medium | Low, Medium and High | Medium values, so that we can aggregate the ProbabilityAssignments to derive the BeliefScore. The PlausibilityScore can be determined using the same table. Code similar to what I provided in Figure 2 can be used to validate the relationships between these fuzzy measures, with your own particular column and table names plugged in of course. The IS NULL condition is due to a bizarre problem in which setting the first condition in the CASE to BeliefScore = 1 THEN 1, or using NullIf, both led to NULL values. It is also possible to derive the ProbabilityMassAssignment values in reverse, but I’ll omit validation code for that scenario in the interest of brevity. To avoid pummeling readers with too much information all at once, I’ll also put off discussion of how to derive uncertainty measures like Strife and Discord from this crude example. In the next article, I’ll also mention some principles for interpreting the results that can in turn provide an important bridge to Information Theory. Among other things, the first table tells us that, “the belief that the Lactate Dehydrogenase values are Medium or Low is higher than that for Low alone, by a margin of 0.679425837320574 to 0.349282296650718. It is more plausible that the value is High than Low, by a margin of 0.822966507177033.” Once we define measures of fuzziness, nonspecificity and the like on top of them and apply some principles of inference drawn from Information Theory, we can partition the uncertainty further in order to glean additional valuable insights.

 

[1] Here in Western New York the natural language term “mild” has interesting shades of meaning (at least among local weathermen) which would be a challenge to model in terms of a fuzzy set. As winter approaches, “mild” means warmer than normal, but as the peak of summer comes, it means cooler than expected, so the meaning is inverted depending on the season. If we were to use an interval-valued set, we’d need a range ofvalues somewhere between 30 and 70 degrees – which is so imprecise that it borders on meaningless.

[2] p. 177, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[3] See the Wikipedia article “Dempster Shafer Theory” at http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory

[4]  p. 179, Klir and Yuan.

[5] IBID., p. 181-182.

[6] IBID., p. 179-181.

[7] Which I downloaded from the Vanderbilt University’s Department of Biostatistics and converted into a SQL Server table in my sham DataMiningProjects database.

 

Implementing Fuzzy Sets in SQL Server, Part 9: Measuring Nonspecificity with the Hartley Function

By Steve Bolton

…………Imagine how empowering it would be to quantify what you don’t know. Even an inaccurate measure might be helpful in making better decisions in any area of life, but particularly in the business world, where change is the only certainty. This is where a program of “uncertainty management” can come in handy and fuzzy set techniques find one of their most useful applications. Fuzzy sets don’t introduce new information, but they do conserve and put to good use some information left over after ordinary “crisp” sets are defined – particularly when it would be helpful to model ordinal categories on continuous number scales. As I pointed out at the beginning of this series, uncertainty reduction is akin to Stephen King’s adage that monsters are less fearsome once some scale of measurement can be applied to them; knowing that a bug is 10 feet tall is at least reassuring, in the sense that we now know that it is not 100 or 1,000 feet tall.[1] Uncertainty reduction can also be put to obvious uses in data mining activities like prediction and clustering. Another potential use is in simplification of data, so that information loss is minimized.[2] In today’s article I’ll shine a little light on the Hartley function, a tried and true method of quantifying one particular category of uncertainty that has been used since 1928 to simplify and demystify datasets of all kinds and could easily be extended to SQL Server data.
George J. Klir and Bo Yuan, the authors of my favorite resource for fuzzy set equations, note that data models must take uncertainty into account, along with complexity and credibility. Later in the book, they go onto subdivide uncertainty into three types that sprawl across possibility theory, stochastics, information theory, fuzzy sets and Dempster-Shafer Evidence Theory:

                “The relationship is not as yet fully understood…Although usually (but not always) undesirable when considered alone, uncertainty becomes very valuable when considered in connection to the other characteristics of systems models; in general, allowing more uncertainty tends to reduce complexity and increase credibility of the resulting model. Our challenge in systems modelling is to develop methods by which an optimal level of allowable uncertainty can be estimated for each modelling problem…”[3]

“…Three types of uncertainty are now recognized in the five theories, in which measurement of uncertainty is currently well established. These three uncertainty types are: nonspecificity (or imprecision), which is connected with sizes (cardinalities) of relevant sets of alternatives; fuzziness (or vagueness), which results from imprecise boundaries of fuzzy sets; and strife (or discord), which expresses conflicts among the various sets of alternatives.

“It is conceivable that other types of uncertainty will be discovered when the investigation of uncertainty extends to additional theories of uncertainty.”[4]

…………Some authors also include “ambiguity (lack of information),”[5] which Klir and Yuan define as a parent class of both discord and nonspecificity in an excellent diagram I wish I could reprint.[6] Probabilities probably also ought to be included as well.[7]As soon as I introduced to the concept of uncertainty partitioning, I was intrigued by the possibility of defining human free will as an alternative form of uncertainty, but that raises many thorny philosophical questions. Among them is the contention that it doesn’t even exist, which is a disturbing tenet of many popular philosophies, like materialistic determinism and certain forms of theological predestination. I’d dispute that with evidence that would be hard to debunk and raise the possibility that it may not be possible to quantify it at all, by definition; the ability to assign values to it would certainly be helpful in academic fields like economics and psychology, where human behavior is the crux of the matter. This topic integrates quite nicely with the contention of authors like Lofti A. Zadeh, the father of fuzzy set theory, that it might be helpful to apply fuzzy techniques in these fields to model “humanistic systems.”[8] Other controversial candidates for new categories of uncertainty include the notion that reality is somewhat subjective (which I would argue is fraught with risk, since it is a key component of many forms of madness) and the contention that some events (particularly at the quantum level) can be truly random, in the sense of being indeterminate or “uncaused.” Albert Einstein drove home the point that uncertainty is deeply rooted in all we see in his famous quote from a lecture at the Prussian Academy of Sciences in 1921, in which he seemed to extend it right into the heart of mathematics itself: “…as far as the propositions of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”[9]

Partitioning Uncertainty

                The first step is to develop a habit of explicitly recognizing which type of uncertainty is under discussion, then partitioning it off using the appropriate type of fuzzy set. For example, whenever we need to cram continuous scales into finite data types like float, decimal and numeric, we end up creating measurement uncertainty about whatever values come after the precision we’ve chosen.[10] Like other types of measurement uncertainty, this is best addressed by fuzzy sets without any special probabilistic, possibilistic or evidence theory connotations attached to them. Incidentally, some theoreticians say that if we’re trying to quantify the uncertainty of a measurement, membership functions based on the normal distribution (i.e. the bell curve) are usually the best choice (based on empirical evidence from the aerospace industry).[11] If we were uncertain about the likelihood of an event occurring, we’d assign a probability value instead; if we were unsure of the logical necessity of an event, we’d use a possibility distribution, as explained in the last installment of this series. In the next installment, I’ll explain how Dempster-Shafer Theory can be used to judge the certainty and credibility of evidence, by assigning grades of membership in the set of true statements.
…………Once the appropriate method of uncertainty modeling has been selected, we can then apply its associated formulas to compute figures for nonspecificity, imprecision, discord and the like. The good news is that we already dispensed with the main means of computing fuzziness, back in Implementing Fuzzy Sets in SQL Server, Part 2: Measuring Imprecision with Fuzzy Complements. In the remainder of this article, I’ll provide sample T-SQL for implementing two of the three main methods for calculating the “U-Uncertainty,” a.k.a. the nonspecificity. Like many other authors I consulted for this series, Klir and Yuan stress that nonspecificity and fuzziness are completely independent stats, since they measure two distinct and unrelated types of uncertainty.[12] The former is dictated by the number of possible distinct states that a set can take on, whereas the latter quantifies imprecision in class boundaries.[13] A set can have many possible arrangements, yet still be entirely crisp; there’s no mistaking what a Lego or Lincoln Log is, but there’s apparently no end to the crazy things that can be built with either one. Sets with few arrangements but really fuzzy boundaries are also possible. That is why fuzzy sets sans any additional meaning like probability, possibility and credibility scores have both fuzziness and nonspecificity measures attached to them.
…………Possibility theory, the topic of the last blog post in this amateur series of self-tutorials, has a form of nonspecificity that is easier to specify (pun intended) than the ordinary fuzzy set version, so I’ll introduce that first. The SELECT in Figure 1 is performed on a column of muscular dystrophy data I downloaded from the Vanderbilt University’s Department of Biostatistics and added to a sham DataMiningProjects database a few tutorial series ago. The PossibilityScore was assigned by a random number generator in the last article and tacked onto the table definition, for the sake of convenience. It’s time for my usual disclaimer: I’m writing this in order to learn this topic, not because I know it well, so it is a good idea to check over my T-SQL samples before putting them to serious use. This is especially true of this SELECT, where I may be applying a Lead where there should be a Lag; in contrast to the topics I post on in previous series, examples with sample data are few and far between in the fuzzy set literature, which makes validation difficult. Furthermore, there is apparently a more compact version available for specific situations, but I’ll omit it for now because I’m still unclear on what mathematical prerequisites are needed.[14]

Figure 1: Possibilistic Nonspecificity for the LactateDehydrogenase Column
SELECT SUM(PossiblityDifference * Log(RN, 2)) AS PossibilisticUUncertainty
FROM (SELECT ROW_NUMBER() OVER (ORDER BY ID) AS RN, PossibilityScore Lead(PossibilityScore, 1, 0) OVER (ORDER BY ID) AS  PossiblityDifference
       FROM Health.DuchennesTable) AS T1 

…………The SELECT returns a single value of 4.28638426128113, which measures that amount of uncertainty in bits; the greater the number of possible state descriptions, the higher the U-Uncertainty will be. The same relationship applies to the procedure below, which returns a value of 7.30278910848746 bits; the difference is that one measures uncertainty about the number of possible values the LactateDehydrogenase column can have, while the other measures lack of certainty about the number of membership function scores a row can be assigned. Figure 2 is practically identical to the sample code I’ve posted throughout this series, at least as far as the UPDATE; all I’m doing is running the stored procedure from Outlier Detection with SQL Server, part 2.1: Z-Scores on the DuchennesTable and storing the results in a table variable, then transforming them to a scale of 0 to 1 using the @Rescaling variables and ReversedZScores column. The GroupRank column can be safely ignored, as usual. The first SELECT with the AlphaCutLeftBound and AlphaCutRightBound columns is only provided to illustrate the how the nonspecificity figure is arrived at in the last SELECT. What we’re basically doing is partitioning the dataset into nested levels, using the alpha cut (α-cut) technique I introduced in the last article, then applying a Base-2 LOG and summing the results across the hierarchy.[15] The tricky part is that with α-cuts, records can belong to more than one subset, as I pontificated on in my last post; the levels are widest at the bottom of the dataset, but narrowest at the top, where the MembershipScore values approach the maximum of 1.This calls for thinking about the data in an odd way, given that in most relational operations records are assigned to only a single subset.

Figure 2: Code for Hartley Nonspecificity
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)
DECLARE @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin 

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

 

SELECT AlphaCutBound AS AlphaCutLeftBound, Lag(AlphaCutBound, 1, 0) OVER (ORDER BY AlphaCutBound) AS AlphaCutRightBound,
AlphaCutBound Lag(AlphaCutBound, 1, 0) OVER (ORDER BY AlphaCutBound) AS AlphaCutBoundaryChange, Log(AlphaCutCount, 2) AS IndividualLogValue
FROM (SELECT Count(*) AS AlphaCutCount, AlphaCutBound
       FROM @ZScoreTable AS T1
       INNER JOIN (SELECT DISTINCT MembershipScore AS AlphaCutBound
 FROM @ZScoreTable) AS T2
       ON MembershipScore >= AlphaCutBound
       GROUP BY AlphaCutBound) AS T3  

SELECT SUM(AlphaCutBoundaryChange * Log(AlphaCutCount, 2)) AS FuzzySetNonspecificityInBits
FROM (SELECT AlphaCutCount, AlphaCutBound Lag(AlphaCutBound, 1, 0) OVER (ORDER BY AlphaCutBound) AS AlphaCutBoundaryChange
       FROM (SELECT Count(*) AS AlphaCutCount, AlphaCutBound
              FROM @ZScoreTable AS T1
              INNER JOIN (SELECT DISTINCT MembershipScore AS AlphaCutBound
                     FROM @ZScoreTable) AS T2
             ON MembershipScore >= AlphaCutBound
              GROUP BY AlphaCutBound) AS T3) AS T4

 

Figure 3: Results for the Hartley Nonspecificity Example

…………The point of using the α-cuts is to chop the dataset up into combinations of possible state descriptions, which is problematic with fuzzy sets because the boundaries between states are less clear. The interpretation depends entirely on the meaning of the fuzzy attribute; as Klir and Yuan note, it can reflect an “an unsettled historical question” in the case of retrodiction, possible future states in the case of prediction, prescriptive uncertainty in the case of policies, diagnostic uncertainty in the case of medical information and so forth.[16] In the same vein, we can interpret my sample above as measuring 7.30278910848746 bits of uncertainty about a record’s place within the range of Z-Scores, which can in turn be used as a form of outlier detection. The smaller the range of possible values, the smaller the number of possible state descriptions becomes, which means that the cardinality of the α-cuts and the value of the final statistic decline as well.
…………This is an adaptation of a function developed way back in 1928 by electronic pioneer Ralph Hartley[17]; since it serves as one of the foundations of information theory I’ll put off discussion of the crisp version until my long-delayed monster of a series, Information Measurement with SQL Server. We’ve got at least two more articles in the fuzzy set series to dispense with first, including an examination of Dempster-Shafer Theory in the next installment. Evidence theory also has its own brand of nonspecificity measure, also based on the Hartley function.[18] Measures like strife and discord are more relevant to that topic, since they deal with conflicts in evidence. Possibility theory has counterparts for both, but I’ll leave them out, given that Klir and Yuan counsel that “We may say that possibility theory is almost conflict-free. For large bodies of evidence, at least, these measures can be considered negligible when compared with the other type of uncertainty, nonspecificity. Neglecting strife (or discord), when justifiable, may substantially reduce computation complexity in dealing with large possibilistic bodies of evidence.”[19] Possibility theory is a useful springboard into the topic though, given that Belief and Plausibility measures are modeled in much the same way. In fact, Possibility and Necessity measures are just special cases of Belief and Plausibility, which should serve to decomplicate my introduction to Dempster-Shafer Theory a little.

 

[1] p. 114, King, Stephen, 1981, Stephen King’s Danse Macabre. Everest House: New York. I’m paraphrasing King, who in turn paraphrased an idea expressed to him by author William F. Nolan at the 1979 World Fantasy Convention.

[2] p. 269, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[3] IBID., p. 3.

[4] IBID., p. 246.

[5] p. 2, Hinde, Chris .J. and Yang, Yingjie., 2009, “A New Extension of Fuzzy Sets Using Rough Sets: R-Fuzzy Sets,” pp. 354-365 in Information Sciences, Vol. 180, No. 3. Available online at the Loughborough University Institutional Repository web address https://dspace.lboro.ac.uk/dspace-jspui/bitstream/2134/13244/3/rough_m13.pdf

[6]  p. 268, Klir and Yuan.

[7] IBID., p. 3.

[8] IBID., p. 451.

[9] Cited from the Common Mistakes in Using Statistics web address https://www.ma.utexas.edu/users/mks/statmistakes/uncertaintyquotes.html

[10] IBID., pp. 327-328.

[11] Kreinovich, Vladik; Quintana, Chris and Reznik, L.,1992, “Gaussian Membership Functions are Most Adequate in Representing Uncertainty in Measurements,” pp. 618-624 in Proceedings of the North American Fuzzy Information Processing Society Conference, Vol. 2. NASA Johnson Space Center: Houston. Available online at the University of Texas at El Paso web address www.cs.utep.edu/vladik/2014/tr14-30.pdf

[12] p. 258, Klir and Yuan.

[13] p. 2, Hinde and Yang.

[14] pp. 253, 269, Klir and Yuan.

[15] IBID., pp. 248-251.

[16] IBID., p. 247.

[17] See the Wikipedia articles “Hartley Function” and “Ralph Hartley” at http://en.wikipedia.org/wiki/Hartley_function and http://en.wikipedia.org/wiki/Ralph_Hartley respectively.

[18] pp. 259, Klir and Yuan.

[19] IBID., p. 264.