Blog Archives

Implementing Fuzzy Sets in SQL Server, Part 10.1: A Crude Introduction to Dempster-Shafer Evidence Theory

By Steve Bolton

…………Early on in this series, we learned how the imprecision in natural language statements like “the weather is hot” can be modeled using fuzzy sets. Ordinarily, the membership grades assigned to fuzzy sets are not to be interpreted as probabilities, even though they’re both implemented on continuous scales between 0 and 1; the exception to this rule is when a probabilistic meaning is consciously assigned to the type of fuzziness. A couple of articles ago we saw how membership scores can be interpreted as assessing the logical possibility of the associated statements; the possibility distributions this nuance gives rise to quantifies whether or not an event can occur, whereas a probability distribution assesses whether it will actually occur. The two scales are independent except at the maximum and minimum values, where possibility values acts as caps on probabilities, since an event must be possible if it is to have a non-zero probability. The possibility and necessity measures that factor into possibility distributions are actually special cases of the plausibility and belief measures used in Dempster-Shafer Evidence Theory, which has a related shade of meaning: instead of gauging whether or not an event can or will happen, plausibility and belief work together to grade the credibility of the associated evidence. If we were sifting through user stories in a Behavior-Driven Development (BDD) process, we wouldn’t use evidence theory for fuzzy terms like “the weather is hot,” or questions like “the weather could be cold” or “the weather is probably mild,”[1] which might be candidates for possibilistic or stochastic modeling. “As far as I can tell, the weather will be hot,” might be fair game, since the subject of the sentence is the trustworthiness of the associated statement. The clearest example I’ve yet run across in the literature occurs in George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications, which I’ve used as my go-to resource throughout this series for the heavy math formulas:

                “Consider, however, the jury members for a criminal trial who are uncertain about the guilt or innocence of the defendant. The uncertainty in this situation seems to be of a different type; the set of people who are guilty of the crime and the set of innocent people are assumed to have very distinct boundaries. The concern, therefore, is not with the degree to which the defendant is guilty, but with the degree to which the evidence proves his membership in either the crisp set of guilty people or the crisp set of innocent people.”[2]

In the last article, I gave a monologue on how organizations can benefit from uncertainty management programs, which begins with partitioning uncertainty into various types, like probabilities, nonspecificity, fuzziness and conflicting information; these in turn stretch across five mathematical subtopics, information theory, stochastics, possibility theory, fuzzy sets and evidence theory. The last of these has its own corresponding formulas for measures like nonspecificity, but is particularly useful for quantifying the degree of conflict between pieces of information. For this reason, it is widely used to aggregate disparate sources of information, which in turn integrates seamlessly with Decision Theory; for example, one of its most common implementations is sensor fusion.[3] Klir and Yuan also provide a concise list of possible use cases in various fields:

                “For instance, suppose we are trying to diagnose an ill patient. In simplified terms, we may be trying to determine whether the patient belongs to the set of people with, say, pneumonia, bronchitis, emphysema, or a common cold. A physical examination may provide us with helpful yet inconclusive evidence. For example, we might assign a high value, say 0.75, to our best guess, bronchitis, and a lower value to the other possibilities, such as 0.45 for the set consisting of pneumonia and emphysema and 0 for a common cold. These values reflect the degree to which the patient’s symptoms provide evidence for the individual diseases or sets of diseases; their collection constitutes a fuzzy measure representing the uncertainty associated with several well-defined alternatives; It is important to realize that this type of uncertainty, which results from information deficiency, is fundamentally different from fuzziness, which arises from the lack of sharp boundaries.”[4]

…………Thankfully, a sturdy mathematical scaffolding to model these types of evidence-based uncertainty already exists, although it isn’t being tested much these days in the relational database, data warehousing and data mining fields. The modeling process is akin to the one I introduced a few weeks ago for possibility distributions, but a tad more complicated. A continuous data type like float, numeric or decimal is required for probability values, but possibility theory also calls for the addition of a bit column, which is often assigned to the Necessity measure. In the theory developed independently by statisticians Glenn Shafer and Arthur Dempster, we need three measures: a Probability Mass Assignment (often denoted by a lower case m) that tells us the strength of the evidence that a record belongs just to one set; a Belief measure that measures the same, plus the evidence for belonging to its subsets; and a Plausibility measure, which covers both of those, as well as “the additional evidence or belief associated with sets that overlap with A.”[5] The easy part is that all three are measured on scale of 0 to 1, the same as fuzzy sets, probabilities, possibilities and the like; the complexity arises from the fact that they measure evidence at different levels. This leads to nested bodies of evidence, which alpha cuts (α-cuts) are ideal for modeling, as explained a couple of articles ago; I saved this topic for the next-to-last article precisely because it unites many of the concepts introduced throughout the series, like α-cuts, fuzzy unions, intersections and complements.
…………These relationships also give rise to various mathematical properties, some of which are similar to those used in possibility distributions. For example, just as Necessity is equal to 1 minus the complement of Possibility, so too is Plausibility equal to 1 minus the complement of the Belief measure. Plausibility must be greater than or equal to the Belief, since it models evidence at a higher scope. These “fuzzy measures” have weakened forms of properties like monotonicity, continuity and additivity than probabilities do.[6] Belief measures are superadditive, which means that if you sum them together across the subsets, the result must be greater than or equal to the Belief function for the whole set. For example, the Belief function for the whole set can be a figure less than 1, say 0.97, but the individual measures of each subset can be assigned degrees of belief like 0.5, 0.87, 0.3, etc. which together sum to 1.67, which is valid because it’s greater than 0.97. In contrast, probabilities must always sum to 1 across a dataset, including the probability mass assignments used in evidence theory. Plausibility is subadditive, which signifies the opposite relationship, so that the measures taken across the subsets must sum to the at least the Plausibility for the whole set. In short, they act as maximums rather than sums.  This all sounds weird, but it’s a necessary logical consequence of the nesting of evidence. As explained in the discussion on α-cuts a couple of articles ago, this signifies that records can belong to multiple hierarchical partitions of a set, which is an unfamiliar situation in the relational world (despite the fact that it is easily modeled using set-theoretic relational technology). The good news is that this web of interrelationships makes the three evidence theory measures reconstructible from each other; this makes it possible to validate the values using queries like the samples in Figure 2.

Two Common Illustrations of Dempster-Shafer Evidence Theory in Action

                The Wikipedia article on Dempster-Shafer Theory  has comprehensible examples of how these three measures work together, beginning with a sensor that detects whether a cat concealed in a box is in a Dead or Alive state. The value for Either obviously reaches the maximum value of 1 for both Belief and Plausibility, since it must be one of the two by logical necessity (that is, unless our cat happens to belong to Erwin Schröedinger or was buried in Pet Sematary).  It is thus an instance of a “universal hypothesis,” which encompasses the whole dataset. Yet the probability mass assignment for the Either state is only 0.3, which signifies the fact that we don’t have solid information on its status; the probability figure for the whole dataset still sums to 1 though, once the stats for Alive and Dead are factored in. The probability value for the universal hypothesis thus constitutes a measure of the uncertainty remaining in the data, once the probability, Belief and Plausibility measures have partitioned it off. Since Dead and Alive are discrete states without fuzzy intervals, the Wikipedia example assigns them Belief  figures equal to their probability masses – which when added to the value of 1 for the Either state, means that the total Belief for the whole dataset is greater than 1, unlike the probability mass. The Plausibility can then be reconstructed using the inverse of the complement of the Belief.
…………The tricky part is that the Belief measures must sum to 1 for each subset, which calls for looking at our data in an unfamiliar way. I initially thought that the existence of these subsets meant that we could simply model this by applying the appropriate normal form, but that’s not the case. The second example in the Wikipedia article has examples of states like Red, Yellow, Green which are mutually exclusive, as well as some that carry a bit of measurement uncertainty, like “Red or Yellow” and “Red or Green.” In this situation, the Belief figures for Red, Yellow and “Red or Yellow” must sum to 1, as must the Belief figures for Red, Green and “Red or Green,” since there are two overlapping subsets. Red, Yellow and Green are all members of more than one subset, but not the same ones. This leads to an odd predicament where each state is discrete and thus difficult to denormalize, yet the associated column still represents subsets; this is one situation where the presence of logical OR statements is not a hint that the design requires normalization. Since we can’t be certain how many other state descriptions a child could be related to, a single self-referencing ParentID column won’t do the job either. The next best thing is an interleaved solution, in which a separate table with two foreign keys pointing to the primary key of the table holding the Belief measures to keep track of which subsets each record belongs to. To aggregate the Belief figures for each subset in the parent table, we just inspect the interleaved table for all of the categories a record can belong to.

Server States: A SQL Server-Specific Example

                Let me give an example that might be more intuitive and relevant to SQL Server users: the state_desc column of sys.databases will assign one of seven mutually exclusive states to each database: Online, Offline, Restoring, Recovering; Recovery Pending, Suspect and Emergency. As far as I know, these states do not rule out which user modes a database can be, which range from SINGLE_USER to RESTRICTED_USER to MULTI_USER. Nevertheless, many combinations would be improbable, so each unique pair of descriptions requires a probability assignment that will probably differ from other pairs of state_desc and user mode values. Now let’s pretend we have a sensor that guesses which of pair of states a server is in at any given moment, perhaps based on I/O data or network bandwidth usage. If it can tell us the user mode plus whether we’re in one of the three recovery states, but can’t differentiate between them accurately, then we’re dealing with a fuzzy interval-valued set. From the point of view of the sensor, “Restoring | Recovering | Recovery Pending” is a discrete state and ought to be recorded as such in the database table. Nevertheless, to derive the Belief we must sum together all of the probabilities for the subsets it gives rise to, while the Plausibility equals one minus the sum of the probability assignments in the subsets it does not participate in. We could create a separate category like “Unknown” for situations where the sensor went offline or was otherwise unable to return accurate data – or better yet, establish a universal hypothesis like “Any State” with the Belief and Plausibility both set to 1 and we add all of its possible subsets. Subtracting the sum of the probabilities of all known states from that of the universal hypothesis would allow us to measure one type of uncertainty associated with the table. In order to measure the uncertainty inherent in the interval-valued fuzzy subsets that the Belief and Plausibility measures are attached to, we’d have to use a measure of fuzziness tailored to evidence theory. In the same vein, the count of possible state descriptions could be used to derive a measure of nonspecificity, albeit through a different formula than the ones introduced in the last article. In addition, we can define measures of uncertainty based on how much
…………It is easier to illustrate all of this with T-SQL code samples, beginning with the easiest part, a simple snapshot of a table with probability mass, Belief and Plausibility measures defined on it. Degrees of Belief are usually derived from some kind of input method, akin to fuzzy set membership functions – except that subjective ratings tend to be more common in evidence theory. It is no surprise that Bayesian methods are often applied in deriving Belief functions, given that they actually represent a more specific subset of evidence theory measures. Instead of complicating the topic any further, I’ve derived the values in Figure 1 by creating an artificial category in the Duchennes muscular dystrophy data I’ve been using for practice data for the last few tutorial series[7], then simply assigned probability mass assignments based on the frequency of the values for the LactateDehydrogenase column. From there, I derived the Belief measures, then constructed the Plausibility measures from those. I used the float data type for all three of the columns that associate measured with the LactateDehydrogenaseState column, an ordinal category; this represents yet another use of fuzzy sets to model ordinals on continuous scales, except at a more advanced level where three columns are required.

Figure 1: Simple Evidence Theory Measures Defined on the LactateDehydrogenase Column

Figure 2: Sample Validation Code for the Relationships Between the Three Evidence Theory Measures
— verifying the Belief via the ProbabilityMassAssignment mass assignment
SELECT ID, LactateDehydrogenaseState, ProbabilityMassAssignment, BeliefScore, PlausibilityScore,
CASE WHEN IntervalProbabilityMassAssignmentSum IS NOT NULL THEN IntervalProbabilityMassAssignmentSum ELSE ProbabilityMassAssignment END
AS BeliefReconstructedFromProbabilityMass
FROM Health.DuchennesEvidenceTheoryTable AS T3
        LEFT JOIN (SELECT ParentID, SUM(ProbabilityMassAssignment) AS IntervalProbabilityMassAssignmentSum
              FROM Health.DuchennesEvidenceTheoryTable AS T1
               INNER JOIN Health.DuchennesEvidenceTheoryIntervalTable AS T2
               ON T1.ID = T2.BeliefSubsetID
              GROUP BY ParentID) AS T4
       ON T3.ID = T4.ParentID 

SELECT ID, LactateDehydrogenaseState, BeliefScore, ProbabilityMassAssignment, ProbabilityMassAssignmentBySum,
CASE WHEN ProbabilityMassAssignmentBySum IS NULL THEN 1 ELSE ABS(1 (ProbabilityMassAssignment+ ProbabilityMassAssignmentBySum)) END AS PlausibilityScoreReconstructedFromProbability
FROM (SELECT ID, LactateDehydrogenaseState, BeliefScore, ProbabilityMassAssignment
FROM Health.DuchennesEvidenceTheoryTable) AS T5
       LEFT JOIN (SELECT BeliefSubsetID, SUM(ProbabilityMassAssignment) AS ProbabilityMassAssignmentBySum
       FROM (SELECT DISTINCT T1.BeliefSubsetID, T2.ParentID
              FROM Health.DuchennesEvidenceTheoryIntervalTable AS T1
                     INNER JOIN Health.DuchennesEvidenceTheoryIntervalTable AS T2
                     ON T1.ParentID = T2.BeliefSubsetID AND T1.BeliefSubsetID != T2.BeliefSubsetID) AS T4
                           INNER JOIN Health.DuchennesEvidenceTheoryTable AS T3
                           ON T4.ParentID = T3.ID
       GROUP BY BeliefSubsetID) AS T6
       ON T5.ID = T6.BeliefSubsetID

…………Note how the Belief is equal to the ProbabilityMassAssignment for Low, Medium and High, which is reflective of the fact that they have no substates; Medium or Low and High or Medium have BeliefScore values higher than their masses, precisely because we have to tack the values for Low, Medium and High onto them. The PlausibilityScore is in each case determined by adding together all of the ProbabilityMassAssignment values for the columns that aren’t among a record’s subsets, then taking an inverse, which is equivalent to subtracting the complement of the BeliefScore from 1. The second image depicts the Health.DuchennesEvidenceTheoryIntervalTable, in which the ParentID and BeliefSubsetID determine the linkages between subsets. For example, the records with ParentIDs of 4 tie together the Medium | Low, Medium and High | Medium values, so that we can aggregate the ProbabilityAssignments to derive the BeliefScore. The PlausibilityScore can be determined using the same table. Code similar to what I provided in Figure 2 can be used to validate the relationships between these fuzzy measures, with your own particular column and table names plugged in of course. The IS NULL condition is due to a bizarre problem in which setting the first condition in the CASE to BeliefScore = 1 THEN 1, or using NullIf, both led to NULL values. It is also possible to derive the ProbabilityMassAssignment values in reverse, but I’ll omit validation code for that scenario in the interest of brevity. To avoid pummeling readers with too much information all at once, I’ll also put off discussion of how to derive uncertainty measures like Strife and Discord from this crude example. In the next article, I’ll also mention some principles for interpreting the results that can in turn provide an important bridge to Information Theory. Among other things, the first table tells us that, “the belief that the Lactate Dehydrogenase values are Medium or Low is higher than that for Low alone, by a margin of 0.679425837320574 to 0.349282296650718. It is more plausible that the value is High than Low, by a margin of 0.822966507177033.” Once we define measures of fuzziness, nonspecificity and the like on top of them and apply some principles of inference drawn from Information Theory, we can partition the uncertainty further in order to glean additional valuable insights.

 

[1] Here in Western New York the natural language term “mild” has interesting shades of meaning (at least among local weathermen) which would be a challenge to model in terms of a fuzzy set. As winter approaches, “mild” means warmer than normal, but as the peak of summer comes, it means cooler than expected, so the meaning is inverted depending on the season. If we were to use an interval-valued set, we’d need a range ofvalues somewhere between 30 and 70 degrees – which is so imprecise that it borders on meaningless.

[2] p. 177, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[3] See the Wikipedia article “Dempster Shafer Theory” at http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory

[4]  p. 179, Klir and Yuan.

[5] IBID., p. 181-182.

[6] IBID., p. 179-181.

[7] Which I downloaded from the Vanderbilt University’s Department of Biostatistics and converted into a SQL Server table in my sham DataMiningProjects database.

 

Implementing Fuzzy Sets in SQL Server, Part 9: Measuring Nonspecificity with the Hartley Function

By Steve Bolton

…………Imagine how empowering it would be to quantify what you don’t know. Even an inaccurate measure might be helpful in making better decisions in any area of life, but particularly in the business world, where change is the only certainty. This is where a program of “uncertainty management” can come in handy and fuzzy set techniques find one of their most useful applications. Fuzzy sets don’t introduce new information, but they do conserve and put to good use some information left over after ordinary “crisp” sets are defined – particularly when it would be helpful to model ordinal categories on continuous number scales. As I pointed out at the beginning of this series, uncertainty reduction is akin to Stephen King’s adage that monsters are less fearsome once some scale of measurement can be applied to them; knowing that a bug is 10 feet tall is at least reassuring, in the sense that we now know that it is not 100 or 1,000 feet tall.[1] Uncertainty reduction can also be put to obvious uses in data mining activities like prediction and clustering. Another potential use is in simplification of data, so that information loss is minimized.[2] In today’s article I’ll shine a little light on the Hartley function, a tried and true method of quantifying one particular category of uncertainty that has been used since 1928 to simplify and demystify datasets of all kinds and could easily be extended to SQL Server data.
George J. Klir and Bo Yuan, the authors of my favorite resource for fuzzy set equations, note that data models must take uncertainty into account, along with complexity and credibility. Later in the book, they go onto subdivide uncertainty into three types that sprawl across possibility theory, stochastics, information theory, fuzzy sets and Dempster-Shafer Evidence Theory:

                “The relationship is not as yet fully understood…Although usually (but not always) undesirable when considered alone, uncertainty becomes very valuable when considered in connection to the other characteristics of systems models; in general, allowing more uncertainty tends to reduce complexity and increase credibility of the resulting model. Our challenge in systems modelling is to develop methods by which an optimal level of allowable uncertainty can be estimated for each modelling problem…”[3]

“…Three types of uncertainty are now recognized in the five theories, in which measurement of uncertainty is currently well established. These three uncertainty types are: nonspecificity (or imprecision), which is connected with sizes (cardinalities) of relevant sets of alternatives; fuzziness (or vagueness), which results from imprecise boundaries of fuzzy sets; and strife (or discord), which expresses conflicts among the various sets of alternatives.

“It is conceivable that other types of uncertainty will be discovered when the investigation of uncertainty extends to additional theories of uncertainty.”[4]

…………Some authors also include “ambiguity (lack of information),”[5] which Klir and Yuan define as a parent class of both discord and nonspecificity in an excellent diagram I wish I could reprint.[6] Probabilities probably also ought to be included as well.[7]As soon as I introduced to the concept of uncertainty partitioning, I was intrigued by the possibility of defining human free will as an alternative form of uncertainty, but that raises many thorny philosophical questions. Among them is the contention that it doesn’t even exist, which is a disturbing tenet of many popular philosophies, like materialistic determinism and certain forms of theological predestination. I’d dispute that with evidence that would be hard to debunk and raise the possibility that it may not be possible to quantify it at all, by definition; the ability to assign values to it would certainly be helpful in academic fields like economics and psychology, where human behavior is the crux of the matter. This topic integrates quite nicely with the contention of authors like Lofti A. Zadeh, the father of fuzzy set theory, that it might be helpful to apply fuzzy techniques in these fields to model “humanistic systems.”[8] Other controversial candidates for new categories of uncertainty include the notion that reality is somewhat subjective (which I would argue is fraught with risk, since it is a key component of many forms of madness) and the contention that some events (particularly at the quantum level) can be truly random, in the sense of being indeterminate or “uncaused.” Albert Einstein drove home the point that uncertainty is deeply rooted in all we see in his famous quote from a lecture at the Prussian Academy of Sciences in 1921, in which he seemed to extend it right into the heart of mathematics itself: “…as far as the propositions of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”[9]

Partitioning Uncertainty

                The first step is to develop a habit of explicitly recognizing which type of uncertainty is under discussion, then partitioning it off using the appropriate type of fuzzy set. For example, whenever we need to cram continuous scales into finite data types like float, decimal and numeric, we end up creating measurement uncertainty about whatever values come after the precision we’ve chosen.[10] Like other types of measurement uncertainty, this is best addressed by fuzzy sets without any special probabilistic, possibilistic or evidence theory connotations attached to them. Incidentally, some theoreticians say that if we’re trying to quantify the uncertainty of a measurement, membership functions based on the normal distribution (i.e. the bell curve) are usually the best choice (based on empirical evidence from the aerospace industry).[11] If we were uncertain about the likelihood of an event occurring, we’d assign a probability value instead; if we were unsure of the logical necessity of an event, we’d use a possibility distribution, as explained in the last installment of this series. In the next installment, I’ll explain how Dempster-Shafer Theory can be used to judge the certainty and credibility of evidence, by assigning grades of membership in the set of true statements.
…………Once the appropriate method of uncertainty modeling has been selected, we can then apply its associated formulas to compute figures for nonspecificity, imprecision, discord and the like. The good news is that we already dispensed with the main means of computing fuzziness, back in Implementing Fuzzy Sets in SQL Server, Part 2: Measuring Imprecision with Fuzzy Complements. In the remainder of this article, I’ll provide sample T-SQL for implementing two of the three main methods for calculating the “U-Uncertainty,” a.k.a. the nonspecificity. Like many other authors I consulted for this series, Klir and Yuan stress that nonspecificity and fuzziness are completely independent stats, since they measure two distinct and unrelated types of uncertainty.[12] The former is dictated by the number of possible distinct states that a set can take on, whereas the latter quantifies imprecision in class boundaries.[13] A set can have many possible arrangements, yet still be entirely crisp; there’s no mistaking what a Lego or Lincoln Log is, but there’s apparently no end to the crazy things that can be built with either one. Sets with few arrangements but really fuzzy boundaries are also possible. That is why fuzzy sets sans any additional meaning like probability, possibility and credibility scores have both fuzziness and nonspecificity measures attached to them.
…………Possibility theory, the topic of the last blog post in this amateur series of self-tutorials, has a form of nonspecificity that is easier to specify (pun intended) than the ordinary fuzzy set version, so I’ll introduce that first. The SELECT in Figure 1 is performed on a column of muscular dystrophy data I downloaded from the Vanderbilt University’s Department of Biostatistics and added to a sham DataMiningProjects database a few tutorial series ago. The PossibilityScore was assigned by a random number generator in the last article and tacked onto the table definition, for the sake of convenience. It’s time for my usual disclaimer: I’m writing this in order to learn this topic, not because I know it well, so it is a good idea to check over my T-SQL samples before putting them to serious use. This is especially true of this SELECT, where I may be applying a Lead where there should be a Lag; in contrast to the topics I post on in previous series, examples with sample data are few and far between in the fuzzy set literature, which makes validation difficult. Furthermore, there is apparently a more compact version available for specific situations, but I’ll omit it for now because I’m still unclear on what mathematical prerequisites are needed.[14]

Figure 1: Possibilistic Nonspecificity for the LactateDehydrogenase Column
SELECT SUM(PossiblityDifference * Log(RN, 2)) AS PossibilisticUUncertainty
FROM (SELECT ROW_NUMBER() OVER (ORDER BY ID) AS RN, PossibilityScore Lead(PossibilityScore, 1, 0) OVER (ORDER BY ID) AS  PossiblityDifference
       FROM Health.DuchennesTable) AS T1 

…………The SELECT returns a single value of 4.28638426128113, which measures that amount of uncertainty in bits; the greater the number of possible state descriptions, the higher the U-Uncertainty will be. The same relationship applies to the procedure below, which returns a value of 7.30278910848746 bits; the difference is that one measures uncertainty about the number of possible values the LactateDehydrogenase column can have, while the other measures lack of certainty about the number of membership function scores a row can be assigned. Figure 2 is practically identical to the sample code I’ve posted throughout this series, at least as far as the UPDATE; all I’m doing is running the stored procedure from Outlier Detection with SQL Server, part 2.1: Z-Scores on the DuchennesTable and storing the results in a table variable, then transforming them to a scale of 0 to 1 using the @Rescaling variables and ReversedZScores column. The GroupRank column can be safely ignored, as usual. The first SELECT with the AlphaCutLeftBound and AlphaCutRightBound columns is only provided to illustrate the how the nonspecificity figure is arrived at in the last SELECT. What we’re basically doing is partitioning the dataset into nested levels, using the alpha cut (α-cut) technique I introduced in the last article, then applying a Base-2 LOG and summing the results across the hierarchy.[15] The tricky part is that with α-cuts, records can belong to more than one subset, as I pontificated on in my last post; the levels are widest at the bottom of the dataset, but narrowest at the top, where the MembershipScore values approach the maximum of 1.This calls for thinking about the data in an odd way, given that in most relational operations records are assigned to only a single subset.

Figure 2: Code for Hartley Nonspecificity
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)
DECLARE @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin 

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

 

SELECT AlphaCutBound AS AlphaCutLeftBound, Lag(AlphaCutBound, 1, 0) OVER (ORDER BY AlphaCutBound) AS AlphaCutRightBound,
AlphaCutBound Lag(AlphaCutBound, 1, 0) OVER (ORDER BY AlphaCutBound) AS AlphaCutBoundaryChange, Log(AlphaCutCount, 2) AS IndividualLogValue
FROM (SELECT Count(*) AS AlphaCutCount, AlphaCutBound
       FROM @ZScoreTable AS T1
       INNER JOIN (SELECT DISTINCT MembershipScore AS AlphaCutBound
 FROM @ZScoreTable) AS T2
       ON MembershipScore >= AlphaCutBound
       GROUP BY AlphaCutBound) AS T3  

SELECT SUM(AlphaCutBoundaryChange * Log(AlphaCutCount, 2)) AS FuzzySetNonspecificityInBits
FROM (SELECT AlphaCutCount, AlphaCutBound Lag(AlphaCutBound, 1, 0) OVER (ORDER BY AlphaCutBound) AS AlphaCutBoundaryChange
       FROM (SELECT Count(*) AS AlphaCutCount, AlphaCutBound
              FROM @ZScoreTable AS T1
              INNER JOIN (SELECT DISTINCT MembershipScore AS AlphaCutBound
                     FROM @ZScoreTable) AS T2
             ON MembershipScore >= AlphaCutBound
              GROUP BY AlphaCutBound) AS T3) AS T4

 

Figure 3: Results for the Hartley Nonspecificity Example

…………The point of using the α-cuts is to chop the dataset up into combinations of possible state descriptions, which is problematic with fuzzy sets because the boundaries between states are less clear. The interpretation depends entirely on the meaning of the fuzzy attribute; as Klir and Yuan note, it can reflect an “an unsettled historical question” in the case of retrodiction, possible future states in the case of prediction, prescriptive uncertainty in the case of policies, diagnostic uncertainty in the case of medical information and so forth.[16] In the same vein, we can interpret my sample above as measuring 7.30278910848746 bits of uncertainty about a record’s place within the range of Z-Scores, which can in turn be used as a form of outlier detection. The smaller the range of possible values, the smaller the number of possible state descriptions becomes, which means that the cardinality of the α-cuts and the value of the final statistic decline as well.
…………This is an adaptation of a function developed way back in 1928 by electronic pioneer Ralph Hartley[17]; since it serves as one of the foundations of information theory I’ll put off discussion of the crisp version until my long-delayed monster of a series, Information Measurement with SQL Server. We’ve got at least two more articles in the fuzzy set series to dispense with first, including an examination of Dempster-Shafer Theory in the next installment. Evidence theory also has its own brand of nonspecificity measure, also based on the Hartley function.[18] Measures like strife and discord are more relevant to that topic, since they deal with conflicts in evidence. Possibility theory has counterparts for both, but I’ll leave them out, given that Klir and Yuan counsel that “We may say that possibility theory is almost conflict-free. For large bodies of evidence, at least, these measures can be considered negligible when compared with the other type of uncertainty, nonspecificity. Neglecting strife (or discord), when justifiable, may substantially reduce computation complexity in dealing with large possibilistic bodies of evidence.”[19] Possibility theory is a useful springboard into the topic though, given that Belief and Plausibility measures are modeled in much the same way. In fact, Possibility and Necessity measures are just special cases of Belief and Plausibility, which should serve to decomplicate my introduction to Dempster-Shafer Theory a little.

 

[1] p. 114, King, Stephen, 1981, Stephen King’s Danse Macabre. Everest House: New York. I’m paraphrasing King, who in turn paraphrased an idea expressed to him by author William F. Nolan at the 1979 World Fantasy Convention.

[2] p. 269, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[3] IBID., p. 3.

[4] IBID., p. 246.

[5] p. 2, Hinde, Chris .J. and Yang, Yingjie., 2009, “A New Extension of Fuzzy Sets Using Rough Sets: R-Fuzzy Sets,” pp. 354-365 in Information Sciences, Vol. 180, No. 3. Available online at the Loughborough University Institutional Repository web address https://dspace.lboro.ac.uk/dspace-jspui/bitstream/2134/13244/3/rough_m13.pdf

[6]  p. 268, Klir and Yuan.

[7] IBID., p. 3.

[8] IBID., p. 451.

[9] Cited from the Common Mistakes in Using Statistics web address https://www.ma.utexas.edu/users/mks/statmistakes/uncertaintyquotes.html

[10] IBID., pp. 327-328.

[11] Kreinovich, Vladik; Quintana, Chris and Reznik, L.,1992, “Gaussian Membership Functions are Most Adequate in Representing Uncertainty in Measurements,” pp. 618-624 in Proceedings of the North American Fuzzy Information Processing Society Conference, Vol. 2. NASA Johnson Space Center: Houston. Available online at the University of Texas at El Paso web address www.cs.utep.edu/vladik/2014/tr14-30.pdf

[12] p. 258, Klir and Yuan.

[13] p. 2, Hinde and Yang.

[14] pp. 253, 269, Klir and Yuan.

[15] IBID., pp. 248-251.

[16] IBID., p. 247.

[17] See the Wikipedia articles “Hartley Function” and “Ralph Hartley” at http://en.wikipedia.org/wiki/Hartley_function and http://en.wikipedia.org/wiki/Ralph_Hartley respectively.

[18] pp. 259, Klir and Yuan.

[19] IBID., p. 264.

 

Implementing Fuzzy Sets in SQL Server, Part 8: Possibility Theory and Alpha Cuts

By Steve Bolton

…………To get the point across that fuzzy sets require membership grades of some sort, throughout this series I’ve borrowed the stored procedure I coded for Outlier Detection with SQL Server, part 2.1: Z-Scores and rescaled the results on the customary range of 0 to 1. The literature on fuzzy sets contains frequent warnings against automatically interpreting membership scores as probabilities, but I deliberately introduced a tie-in to stochastics by using Z-Scores, which are inherently probabilistic. Other shades of meaning may be assigned which are unfamiliar to modelers of ordinary “crisp” sets, which is why I pointed out early on in this series of amateur self-tutorials that interpretability is a more prominent issue with fuzzy sets. For example, membership functions can be viewed as assigning scores to the accuracy of the associated values, which is similar to the way in which we used fuzzy numbers two articles ago to code such linguistic concepts like “about” and “near.” If we add the subtle distinction that the membership scores may mean “cannot be near” or “can be around” a certain value, we’re stepping into the realm of Possibility Theory, which has important uses in fuzzy logic.[i]
…………Approximate reasoning and related concepts are more relevant to topics like expert systems that are beyond the purview of this series, but Possibility Theory can serve as a useful springboard into Evidence Theory, which is useful in developing programs of uncertainty management. Possibility distributions are in one sense a more restricted brand of probability distributions, while also acting as more restrictive versions of Evidence Theory measures; it may therefore be easier to use them as bridge from one relatively familiar topic to a lesser-known one. I originally thought the topic would be quite difficult to grasp, but it’s actually a good deal easier that stochastics. Perhaps the most difficult aspect is that possibility distributions can be modeled using alpha cuts (α-cuts), a method of partitioning fuzzy sets that will prove useful in the next two articles to come.

From ‘Can’ and ‘Must’ to Surprise

                In fact, I’ll lighten the load further by dispensing with many of the details of Possibility Theory, since its simplicity can quickly give way to complexity, same as with any other fuzzy set topic. For example, stochastic concepts like conditional and marginal probabilities have their counterparts in Possibility Theory, all of which is too far afield for our purposes. For those have a need for the corresponding formulas and don’t mind wading through the thick math, I recommend consulting the seventh chapter of my favorite resource, George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications. I’m not even going to get into a discussion of how possibility scores are assigned; for the sake of argument, let’s assume any figures used in my examples are derived from subjective ratings by end users. The important thing to keep in mind is that we need two numbers to specify a possibility distribution, not just the single probability figure used in stochastics. One of these is known as the Possibility measure and the other as a measure of Necessity, which is the inverse of Possibility’s complement.
…………The two measures can be combined by adding them together and subtracting one, but the fact that this results in a non-standard range of -1 to 1 limits its usefulness.[ii] The simplest way to model this relationship is to use a bit column, in conjunction with the float, numeric or decimal columns normally used to represent fuzzy sets on a continuous scale between 0 and 1.[iii] The tricky thing is that an event must occur when Necessity equals 1, whereas a Possibility score of 0 means that it cannot; on the other hand, a Possibility score of 1 does not imply certainty, only a state of total surprise if it did; apparently this in analogous to a measure of “surprise” developed in the mid-20th Century by economist G. L. S. Shackle,[iv] which has since been further developed by such household names in the fuzzy set field as like Henri Prade and Ronald R. Yager.[v] As Lofti A. Zadeh, the father of fuzzy set theory, explains it:

                “Consider a numerical age, say u = 28, whose grade of membership in the fuzzy set ‘young’ is approximately 0.7. First we interpret 0.7 as the degree of compatibility of 28 with the concept labelled young. Then we postulate that the proposition ‘Peter is young’ converts the meaning of 0.7 from the degree of compatibility of 28 with young to the degree of possibility that Peter is 28 given the proposition ‘Peter is young.’ In short, the compatibility of a value of u given ‘Peter is young.’”[vi]

…………This lack of symmetry is comparable to the way possibilities and probabilities differ. A Necessity measure of 1 leads inevitably to a probability score of 1, since what must happen is entirely probable; conversely, a Possibility measure of 0 leads to a probability score of 0, since what cannot happen is entirely improbable. Apart from these extremes, however, the two theories diverge. A Necessity or Possibility score of 0.5 has no effect on the probability, since whether or not a thing is logically conceivable is not equivalent to whether it is likely to happen; it is entirely possible that we may win the lottery tomorrow, but I wouldn’t bet on it. This is the core difference between the two theories: one expresses confidence in our information about whether a thing can happen, while the other reflects confidence in information about whether it will.
…………Because of this relationship, a possibility distribution acts as a cap on the associated probability distribution; this has many mathematical consequences[vii], the most important of which is that the two distribution types intersect at their minimum and maximum values. This in turn leads to the interesting property that possibility scores do not have to sum to 1 across a set of records, unlike probabilities; the only restriction is that the maximum value per record is 1.[viii] This in turn means that to assess whether or not we’ve reached a certain threshold of possibility values, all of the records with scores greater than the threshold must be taken into account. In other words, if we want to know if an event has a possibility of 0.3, we must examine all of the records with scores higher than that to come to a verdict. Every record in a set will qualify for the lowest partition, where a possibility score of 0 is all it takes to qualify, but the number of records continually shrinks as we move up the dataset towards the perfect score of 1.

Nested Sets and α-cuts

                This creates a nested set of evidence in which records can belong to multiple partitions, which can be easily implemented in T-SQL despite the fact that it calls for thinking about sets in unusual ways. We’re doing something uncommon here by cutting a set up hierarchically, so that a row belongs to more and more sets as we approach the maximum value of the membership function, rather than a single subset as we see in most relational joins. Klir and Yuan include a couple of handy illustrations which could get across the meaning of nested sets of evidence in a heartbeat, but I haven’t had a chance to seek permission to reprint them and don’t have the ability to draw my own.[ix] In turns out that the fuzzy set partitioning method known as α-cuts are an ideal tool for implementing these relationships[x] (not to mention many others that are beyond the scope of this series, like fuzzy equivalence relations[xi]). In plain English, this means that we have to use >= comparison operators to chop up a dataset into nested subsets, or > operators in the case of strong α-cuts.
…………I’m trying to keep the jargon to a minimum, but since the terms “cutworthy” and “strong cutworthy” occur frequently in the literature, it may be helpful to know that they refer to mathematical properties of fuzzy sets which are preserved in their α-cuts. [xii] Another important property is reconstructibility, which means that a fuzzy set can be rebuilt from its partitions. The manner in which possibility distributions establish maximum values for their associated probability distributions is essentially one and the same as the min/max types of unions and intersections we dealt with in previous articles, while the possibilities themselves are defined by their α-cuts.[xiii]
…………The first SELECT statement in Figure 1 illustrates how a simple GROUP BY and SUM with a ROWS UNBOUNDED PRECEDING clause can be used to partition a SQL Server table in this unconventional manner. I also have an alternate version of these SELECTs in which partitioning is done by deciles (or any other arbitrary percentile value) rather than DISTINCT MembershipScores, which I omitted to keep things simple; if anyone needs it though, I’d be happy to post it. As usual, the sample data comes from a dataset on the Duchennes form of muscular dystrophy I downloaded from Vanderbilt University’s Department of Biostatistics a few tutorial series ago, which now resides in a sham DataMiningProjects database. The code from the beginning to the UPDATE statement is basically identical to the T-SQL samples I’ve posted throughout this series, which always begins with plugging the results of the aforementioned Z-Score procedure into a table variable. The GroupRank column is only included because it was part of the original procedure and can’t be omitted from the INSERT EXEC, but it can be safely ignored. The @Rescaling variables and the ReversedZScore column are then used to adjust the Z-Scores to the 0 to 1 range used in almost all fuzzy sets. There are only 202 records in the DuchennesTable where LactateDehydrogenase where is NOT NULL, which is exactly equal to the count of values in Figure 2 where the MembershipScore is zero. The counts for each α-cut continually decline after that, till they reach the perfect score of 1, which is equivalent to the Height measure mentioned in last week’s article on fuzzy stats. I left out the middle values for the sake of brevity.

Figure 1: An Example of α-Cut Partitioning

DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)
DECLARE  @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
) 

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

— ALPHA CUTS BY DISTINCT VALUES
— =======================================

SELECT MembershipScore, SUM(DistinctCount) OVER (ORDER BY MembershipScore
DESC ROWS UNBOUNDED PRECEDING) AS AlphaCutCount
FROM (SELECT MembershipScore, Count(*) AS DistinctCount
       FROM @ZScoreTable
       WHERE MembershipScore IS NOT NULL
       GROUP BY MembershipScore) AS T1 

— MEASURE OF SURPRISE
— =======================================
SELECT ID, LactateDehydrogenase, NecessityMeasure, PossibilityScore, 1 PossibilityScore AS SimpleMeasureOfSurprise
FROM Health.DuchennesTable
WHERE LactateDehydrogenase IS NOT NULL

 

Figure 2: Sample α-Cut Values from the Beginning and End of the Duchennes Dataset
alpha-cuts-result-1

alpha-cuts-result-2

Figure 3: Possibility Scores and the Measurement of Surprise
possibility-theory

…………The second SELECT merely returns some fake PossibilityScore values I randomly generated and tacked onto the DuchennesTable, with a simple inverse calculation to illustrate the most basic measure of Surprise.[xiv] Authors like Prade and Yager have extended the measure to address more sophisticated use cases, but Figure 2 is sufficient to get the point across for our purposes. The interpretation of any Surprise measure is straightforward: the higher the value, the greater our bewilderment will be if the associated event occurs. In this context, the Surprise would be attached to the possibility of observing the corresponding LactateDehyrogenase value; of course, these are actual values taken from a muscular dystrophy in the 1980s, so if we weren’t using this for practice purposes we’d have to assign Necessity values of 1. These measurements of qualities like Surprise are of course not perfect, but they do allow us to attach some sort of ballpark figure to our expectations. As we shall see in the next two articles, one of the primary uses of fuzzy sets is to measure uncertainty, which can be valuable even when those measures are themselves uncertain. Two articles from now we’ll see how possibility theory is useful not merely in measuring surprise or in deriving interval-valued probabilities[xv], but also as a bridge to Dempster-Shafer Evidence Theory, which is useful in reckoning subtypes of uncertainty like Strife, Discord and Conflict. In the next installment, I’ll explain how both possibility distributions and α-cuts can measure nonspecificity, which is one of several types of uncertainty we can quantify with the aid of fuzzy sets.

[i] p. 200, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[ii] IBID., p. 198.

[iii] I got this idea from the Wikipedia article “Possiblity Theory” at http://en.wikipedia.org/wiki/Possibility_theory.

[iv] IBID.

[v] Prade, Henri and Yager, Ronald R., 1994, “Estimations of Expectedness and Potential Surprise in Possibility Theory,” pp. 417-428 in International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, December 1994. Vol. 2, No. 4. Available online at the National Aeronautics and Space Administration (NASA) web address http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19930020329.pdf

[vi] Posted by Kornelia Brutoczki July 4, 2001 at  the Fuzzy Logic Home Page address http://mazsola.iit.uni-miskolc.hu/DATA/diploma/brutoczki_kornelia/fu_gz_02.html. The original source is not given.

[vii] pp. 206-207, Klir and Yuan.

[viii] IBID., p. 204.

[ix] IBID., pp. 24, 195.

[x] IBID., pp. 19-21, 35.

[xi] IBID., p. 133.

[xii] IBID., p. 23, 25, 36.

[xiii] IBID., pp. 187-188, 198.

[xiv] See Prade and Yager, 1994.

[xv] p. 205, Klir and Yuan.

 

Implementing Fuzzy Sets in SQL Server, Part 7: The Significance of Fuzzy Stats

By Steve Bolton

…………In the world of fuzzy sets and imprecision modeling, the concept of cardinality takes on new shades of meaning that are not applicable to ordinary “crisp” sets, i.e. those without membership grades. In the last article in this series of amateur-self-tutorials, I mentioned one type of “fuzzy cardinality,” based on triangular, trapezoidal and other fuzzy numbers that are quite useful in modeling many vague statements found in everyday speech. Of course, another means of expressing cardinality is through ordinary numbers that are defined by a single value, rather than intervals and other such fuzzy set types. This raises some interesting questions, because one of the logical implications of graded set membership is that records with a score of 0 shouldn’t be included in the count. By extension, the values that are non-zero should only be counted in proportion to the score assigned by their membership function; since these are almost always on a scale of 0 to 1, the count of a fuzzy set never exceeds that of its crisp count, but may be much lower. Moreover, since membership scores are represented by fractional values, we’d normally use SQL Server data types like float, decimal and numeric to represent them, rather than members of the int family as we would with ordinary counts.
…………This apparently gives rise to many different possible calculations for fuzzy counts, but the most common one in the literature is the sigma count, in which we simply add up all of the membership scores for an entire set. Another stat seen occasionally in the literature is Support, which is defined as a crisp count of all the non-zero members of a fuzzy set; it thus always results in an integer somewhere between the sigma count and the ordinary crisp count. The Height refers to the crisp count, the Bandwidth to the number of records with scores greater than 0.5 and the Core to those with the maximum score of 1; these concepts might be useful in such applications as fuzzy clustering, but I see the sigma count used far more often in connection with today’s topic, fuzzy stats, which come into play whenever we want to calculate aggregates on fuzzy sets in platforms like SQL Server.[1]

Partial Credit for Partial Set Membership

                The trick with this topic is “Think partial credit!” to borrow a phrase from University of Minnesota Prof. Glen Medeen.[2]  Even if we restrict ourselves to the sigma count definition of fuzzy counts, the concept carries many interesting implications for all of the statistics that are derived from it. Averages, standard deviations, variances and all of the more advanced statistics derived from them must be recalculated, given that they’re derived from fundamental measures like values and counts that no longer apply. The logic inherent in partial set membership demands this fundamental rethinking of basic statistics. The crisp versions of some of these stats are precalculated by SQL Server, so by switching to the fuzzy set versions we’ll incur some performance costs by computing them on the fly instead, with the aid of T-SQL aggregates and windowing functions. Thankfully, some of these fuzzy stats are worth the extra computations, because they can shed light on our data in unusual ways. Perhaps the most obvious example is the difference between the crisp count and sigma count, which might be used as an alternative measure of fuzziness in the place of fuzzy complements, which as we saw early on in this series, are normally ideal for that use case.
…………Figure 1 provides a simple example of how to code this possible alternate measure of imprecision, by subtracting the sigma count from the height, i.e. the crisp count. I also demonstrate how easy it is to derive the bandwidth, support and core stats, even though these are only used infrequently. As usual, most of the initial code involves assigning the membership scores, by drafting the procedure I wrote for Outlier Detection with SQL Server, part 2.1: Z-Scores for double duty as my membership function. The calculations are performed on the Duchennes muscular dystrophy data I downloaded a few tutorial series ago from Vanderbilt University’s Department of Biostatistics, which now resides in a dummy DataMiningProjects database; afterwards, they’re stored in a @ZScoreTable table variable, that can be operated on as needed. For the sake of consistency, I’ve stuck to the same format I’ve used throughout this series by using the three @Rescaling variables and ReversedZScore column to transform the ZScores in a membership score on the traditional 0 to 1 range.

New Means, Medians and Modes

                Once we’ve derived the sigma count from these grades, I then calculate the standard fuzzy mean, which may be the simplest, most intuitive form of a “fuzzy absolute center.”[3] Another alternate measure of centrality is of course the mode, which I’ve thrown in because it’s so easy to calculate; to derive the fuzzy version, we just have to multiply each value’s count by its membership grade. This is one of the few fuzzy stats where the value is not affected by its score. In Figure 2 we can see that both versions of the mode return the same value of 198, which is within the general rule that both modes and their fuzzy counterparts will only return actual crisp values from their datasets.  Since medians are dependent on orders, I’ll take up that topic when I address the fuzzification of ranks in a wrap-up of the whole series.
…………Instead, I’ve incorporated a higher class of averages known as Generalized Means, which can be used to derive a whole family of means between the minimum and maximum values, including the fuzzy arithmetic mean mentioned above, along with the harmonic and geometric means.[4] We basically plug in an @AlphaParameter bounded between 0 and 1, which allows us to cover the whole range, in much the same fashion that the various T-norm and T-conorm parameters empowered us to derive myriad types of fuzzy intersections and unions in previous articles. Note that in Figure 2, we see that the parameter value I arbitrarily chose led to a far different value for the GeneralizedMean than the one derived for the ordinary FuzzyMean.

Figure 1: Sample Code for Fuzzy Counts and Means
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)

DECLARE  @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

 

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

DECLARE @Count bigint, @SigmaCount float, @Support float, @Bandwidth float, @Core float,
@Mean float, @FuzzyMean float, @GeneralizedMean float, @Mode float, @FuzzyMode float

— COUNTS
SELECT @SigmaCount = SUM(MembershipScore), @Count = Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @Support =  Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL AND MembershipScore > 0 

SELECT @Bandwidth =  Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL AND MembershipScore > 0.5

SELECT @Core =  Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL AND MembershipScore = 1

— MODES
SELECT @Mode = Value
FROM (SELECT TOP 1 ValueCount, Value
       FROM (SELECT Count(*) AS ValueCount, Value
              FROM @ZScoreTable
              WHERE ZScore IS NOT NULL
              GROUP BY Value) AS T1
       ORDER BY ValueCount DESC) AS T2

 

SELECT @FuzzyMode = Value
FROM (SELECT TOP 1 ValueCount, Value
       FROM (SELECT Count(*) * MembershipScore AS ValueCount, Value
              FROM @ZScoreTable
              WHERE ZScore IS NOT NULL
              GROUP BY Value,MembershipScore) AS T1
       ORDER BY ValueCount DESC) AS T2 

— AVERAGES
DECLARE @AlphaParameter float
SELECT @AlphaParameter = 0.3

SELECT @FuzzyMean = SUM(MembershipScore * Value) / @SigmaCount , @Mean  = Avg(Value)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @GeneralizedMean = Power(SUM(Power(Value, @AlphaParameter)) / CAST(@SigmaCount AS float), 1 / @AlphaParameter)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @Count AS RegularCount, @SigmaCount AS SigmaCount,
@Count @SigmaCount AS AlternativeMeasureOfFuzziness,
@Support AS Support, @Bandwidth As Bandwidth, @Core as Core,
@Mode AS Mode, @FuzzyMode as FuzzyMode,
@Mean AS Mean, @FuzzyMean AS FuzzyMean, @GeneralizedMean AS GeneralizedMean

 

Figure 2: Sample Results from the Duchennes Table
fuzzy-counts-and-averages

…………Generalized means occupy a space in the set of norm operations in between T-norms and T-conorms, along with Ordered Weighted Averages.[5] Basically, each record in an OWA is multiplied by a weight which globally equals one, but the choice of weights is so broad that I won’t bother with them; I’ll merely point out that this obviously overlaps the topic of neural net weights, at least to anyone who has coded them before. I’ll also omit my sample code for Lambda Averages (i.e. λ-Averages), because it’s simply too long in comparison to its usefulness. This class of norm operations is derived from binary set relations, which means we first have to create a second table variable, fill it and adjust the scores, as we did in previous articles with T-norms and T-conorms. We’d then apply a CASE statement to select the MIN value of the @LambdaParameter and the outcome of the union, when the both records were between 0 and the @LambdaParameter; take the MAX of the @LambdaParameter and the outcome of the fuzzy intersection when the outcomes were greater than the @LambdaParameter; then use the @LambdaParameter value in the ELSE statement.[6]
…………As with fuzzy complements, unions and intersections, the applications are determined by the selection of appropriate parameter values.[7] One method of accomplishing this is of course parameter estimation.[8] A good starting point for fuzzy parameter estimation may be Seyed Mahmoud Taheri’s select bibliography of recent developments in fuzzy stats, which also lists many resources for extending fuzzy stats to standard statistical and data mining topics like Bayesian priors, fuzzy regression and hypothesis testing.[9] A couple of the sources also connect fuzzy sets to information theory, which I will also begin doing in my next tutorial.

Fuzzy Variance: A Fresh Take on an Staid Stat

                Taheri also mentions published research on fuzzy sets and maximum likelihood, which makes me wonder if there is also some connection to Fisher Information. The same is true of the different types of fuzzy variance, given that variance is interpreted in Fisher Information as a form of uncertainty. This may be a more worthwhile topic to cover than λ-Averages and OWAs since the formulas are less broad and have clearer applications. First of all, it makes intuitive sense to calculate variance differently on fuzzy sets, for precisely the same reasons as fuzzy means: the crisp version of the statistic is dependent on counts, which ought to be replaced with alternative measures like the sigma count when possible. If a record has zero membership, for example, its value shouldn’t count at all in the computation, because it’s no longer part of the set. It thus follows that a value with partial membership should only be taken into consideration in proportion to its score, just as with fuzzy means; it stands to reason that the same principles would apply if we went beyond the first and second statistical moments, mean and variance, to the third and fourth, skewness and kurtosis.
…………This leads to some interesting questions over how we can interpret the differences in the crisp and fuzzy variances. Given that the difference between crisp and sigma counts reflects a measure of fuzziness – albeit not clearly as fuzzy complements – perhaps we can interpret this as a measure of how dispersed the fuzziness is. This might come in quite handy in many data mining applications. I haven’t seen it used that way in the literature – but it’s good to keep in mind that my exposure to the whole topic of fuzzy sets is limited, given that I can’t afford the hefty fees for many of the academic journals, which render them inaccessible to me. Nor have I seen trapezoidal numbers combined with variance, but a construction like the CASE statement for TrapezoidalRangeOnTheCrispVariance in Figure 3 might be useful in expressing natural language slang about dispersion like, “all over the place.” The TrapezoidalRangeOnTheFuzzyVariance expresses the same concept, except that it represents a fuzzy number on a fuzzy set, rather than a fuzzy number on a crisp set; it thus amounts to saying, “this graded set is all over the place.” I set the range boundaries arbitrarily so that both would have fractional scores in Figure 4, which serves as a better illustration of partial membership in a fuzzy number. Using the square root and power techniques mentioned in the last article, we could add superlatives to it like “really” or “somewhat.” If we were using Z-Scores in the context of a normal distribution, we might set graded boundaries based on the “68–95–99.7 Rule”  I covered in Goodness-of-Fit Testing with SQL Server, part 1: The Simplest Methods, which involve the number of expected records between the first, second and third standard deviations. I left out the more complicated case of the superlative in sample code below, just to illustrate how a SQL Server user might write T-SQL for simple cases of the these two fuzzy variance types:

Figure 3: Some Possible Measures of Fuzzy Variance
DECLARE @StDev float, @FuzzyStDev float, @Var float, @FuzzyVar float

SELECT @FuzzyVar = Sum(Power((Value * MembershipScore) @FuzzyMean, 2)) / @SigmaCount, @Var
= Var(Value), @StDev = StDev(Value)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @FuzzyStDev = Power(@FuzzyVar, 0.5)

DECLARE @LowerBound float, @UpperBound float
SELECT @LowerBound  = 4000, @UpperBound = 5000

 SELECT @StDev AS StDev, @FuzzyStDev AS FuzzyStDev,
@Var AS Var, @FuzzyVar AS FuzzyVar,
@Var @FuzzyVar AS PossiblyTheVarianceOfTheFuzziness,
CASE WHEN @Var BETWEEN @LowerBound AND @UpperBound THEN 1
       WHEN @Var < @LowerBound THEN ((@Var @LowerBound)) / @Var + 1
       WHEN @Var > @UpperBound THEN ((@UpperBound @Var)) / @Var + 1
       ELSE NULL END AS TrapezoidalRangeOnTheCrispVariance,
CASE WHEN @FuzzyVar BETWEEN @LowerBound AND @UpperBound THEN 1
       WHEN @FuzzyVar < @LowerBound THEN ((@FuzzyVar @LowerBound)) / @FuzzyVar + 1
       WHEN @FuzzyVar > @UpperBound THEN ((@UpperBound @FuzzyVar)) / @FuzzyVar + 1
       ELSE NULL END AS TrapezoidalRangeOnTheFuzzyVariance

Figure 4: Fuzzy Variance Result for the LactageDehydrogenase Column
fuzzy-variance

…………Fuzzy variance may serve as a bridge to Fisher Information, a topic I want to cover in my long-delayed series, Information Measurement with SQL Server. Early on in this series we saw how fuzzy complements serve as one important measure of a different type of information, fuzziness, which quantifies the imprecision of a dataset in a different manner than variance. The difference between the sigma and crisp counts might serve the same purposes, although I’ve seen the various types of complements used more often for this purpose. One of the coolest things about fuzzy sets is that they give rise to several useful statistics that quantify different types of imprecision, which can be used to derive a program of “uncertainty management” for an organization. In the next installment we’ll see how we can use some of the fuzzy stats defined here to pin down a different brand of imprecision known as nonspecificity. This will involve discussion of the Hartley function and possibly Shannon’s Entropy, the latter of which is a fundamental concept in many data mining algorithms. Since entropy is among the foundations of information theory, this introduction to its applications in nonspecificity will serve as a bridge to my future Information Measurement series.

 

[1] pp. 25-28, Bonissone, Piero P., 1998, “Fuzzy Sets & Expert Systems in Computer Eng. (1).” Available online at http://homepages.rpi.edu/~bonisp/fuzzy-course/99/L1/mot-conc2.pdf. Bonissone’s material is reprinted at least in part from slides produced by artificial intelligence researchers Roger Jang and Enrique Ruspini.

[2] p. 5, Medeen, Glen, 2015, Two Examples of the Use of Fuzzy Set Theory in Statistics,” published online at the University of Minnesota web address http://users.stat.umn.edu/~gmeeden/talks/fuzznov09.pdf

[3] p. 435, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J. On this particular page, they’re extending the meaning of the term even further, to complex network topologies.

[4] IBID., p. 90.

[5] IBID., pp. 92-93.

[6] IBID., p. 94.

[7] IBID., p. 93.

[8] IBID., p. 94.

[9] p. 240, Taheri, Seyed Mahmoud, 2003, “Trends in Fuzzy Statistics,” pp. 239-257 in Austrian Journal of Statistics, Vol. 32, No. 3. Available online at Vienna University of Technology web address http://www.statistik.tuwien.ac.at/oezstat/ausg033/papers/taheri.pdf