Author Archives: Stevan Bolton

Implementing Fuzzy Sets in SQL Server, Part 10.1: A Crude Introduction to Dempster-Shafer Evidence Theory

By Steve Bolton

…………Early on in this series, we learned how the imprecision in natural language statements like “the weather is hot” can be modeled using fuzzy sets. Ordinarily, the membership grades assigned to fuzzy sets are not to be interpreted as probabilities, even though they’re both implemented on continuous scales between 0 and 1; the exception to this rule is when a probabilistic meaning is consciously assigned to the type of fuzziness. A couple of articles ago we saw how membership scores can be interpreted as assessing the logical possibility of the associated statements; the possibility distributions this nuance gives rise to quantifies whether or not an event can occur, whereas a probability distribution assesses whether it will actually occur. The two scales are independent except at the maximum and minimum values, where possibility values acts as caps on probabilities, since an event must be possible if it is to have a non-zero probability. The possibility and necessity measures that factor into possibility distributions are actually special cases of the plausibility and belief measures used in Dempster-Shafer Evidence Theory, which has a related shade of meaning: instead of gauging whether or not an event can or will happen, plausibility and belief work together to grade the credibility of the associated evidence. If we were sifting through user stories in a Behavior-Driven Development (BDD) process, we wouldn’t use evidence theory for fuzzy terms like “the weather is hot,” or questions like “the weather could be cold” or “the weather is probably mild,”[1] which might be candidates for possibilistic or stochastic modeling. “As far as I can tell, the weather will be hot,” might be fair game, since the subject of the sentence is the trustworthiness of the associated statement. The clearest example I’ve yet run across in the literature occurs in George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications, which I’ve used as my go-to resource throughout this series for the heavy math formulas:

                “Consider, however, the jury members for a criminal trial who are uncertain about the guilt or innocence of the defendant. The uncertainty in this situation seems to be of a different type; the set of people who are guilty of the crime and the set of innocent people are assumed to have very distinct boundaries. The concern, therefore, is not with the degree to which the defendant is guilty, but with the degree to which the evidence proves his membership in either the crisp set of guilty people or the crisp set of innocent people.”[2]

In the last article, I gave a monologue on how organizations can benefit from uncertainty management programs, which begins with partitioning uncertainty into various types, like probabilities, nonspecificity, fuzziness and conflicting information; these in turn stretch across five mathematical subtopics, information theory, stochastics, possibility theory, fuzzy sets and evidence theory. The last of these has its own corresponding formulas for measures like nonspecificity, but is particularly useful for quantifying the degree of conflict between pieces of information. For this reason, it is widely used to aggregate disparate sources of information, which in turn integrates seamlessly with Decision Theory; for example, one of its most common implementations is sensor fusion.[3] Klir and Yuan also provide a concise list of possible use cases in various fields:

                “For instance, suppose we are trying to diagnose an ill patient. In simplified terms, we may be trying to determine whether the patient belongs to the set of people with, say, pneumonia, bronchitis, emphysema, or a common cold. A physical examination may provide us with helpful yet inconclusive evidence. For example, we might assign a high value, say 0.75, to our best guess, bronchitis, and a lower value to the other possibilities, such as 0.45 for the set consisting of pneumonia and emphysema and 0 for a common cold. These values reflect the degree to which the patient’s symptoms provide evidence for the individual diseases or sets of diseases; their collection constitutes a fuzzy measure representing the uncertainty associated with several well-defined alternatives; It is important to realize that this type of uncertainty, which results from information deficiency, is fundamentally different from fuzziness, which arises from the lack of sharp boundaries.”[4]

…………Thankfully, a sturdy mathematical scaffolding to model these types of evidence-based uncertainty already exists, although it isn’t being tested much these days in the relational database, data warehousing and data mining fields. The modeling process is akin to the one I introduced a few weeks ago for possibility distributions, but a tad more complicated. A continuous data type like float, numeric or decimal is required for probability values, but possibility theory also calls for the addition of a bit column, which is often assigned to the Necessity measure. In the theory developed independently by statisticians Glenn Shafer and Arthur Dempster, we need three measures: a Probability Mass Assignment (often denoted by a lower case m) that tells us the strength of the evidence that a record belongs just to one set; a Belief measure that measures the same, plus the evidence for belonging to its subsets; and a Plausibility measure, which covers both of those, as well as “the additional evidence or belief associated with sets that overlap with A.”[5] The easy part is that all three are measured on scale of 0 to 1, the same as fuzzy sets, probabilities, possibilities and the like; the complexity arises from the fact that they measure evidence at different levels. This leads to nested bodies of evidence, which alpha cuts (α-cuts) are ideal for modeling, as explained a couple of articles ago; I saved this topic for the next-to-last article precisely because it unites many of the concepts introduced throughout the series, like α-cuts, fuzzy unions, intersections and complements.
…………These relationships also give rise to various mathematical properties, some of which are similar to those used in possibility distributions. For example, just as Necessity is equal to 1 minus the complement of Possibility, so too is Plausibility equal to 1 minus the complement of the Belief measure. Plausibility must be greater than or equal to the Belief, since it models evidence at a higher scope. These “fuzzy measures” have weakened forms of properties like monotonicity, continuity and additivity than probabilities do.[6] Belief measures are superadditive, which means that if you sum them together across the subsets, the result must be greater than or equal to the Belief function for the whole set. For example, the Belief function for the whole set can be a figure less than 1, say 0.97, but the individual measures of each subset can be assigned degrees of belief like 0.5, 0.87, 0.3, etc. which together sum to 1.67, which is valid because it’s greater than 0.97. In contrast, probabilities must always sum to 1 across a dataset, including the probability mass assignments used in evidence theory. Plausibility is subadditive, which signifies the opposite relationship, so that the measures taken across the subsets must sum to the at least the Plausibility for the whole set. In short, they act as maximums rather than sums.  This all sounds weird, but it’s a necessary logical consequence of the nesting of evidence. As explained in the discussion on α-cuts a couple of articles ago, this signifies that records can belong to multiple hierarchical partitions of a set, which is an unfamiliar situation in the relational world (despite the fact that it is easily modeled using set-theoretic relational technology). The good news is that this web of interrelationships makes the three evidence theory measures reconstructible from each other; this makes it possible to validate the values using queries like the samples in Figure 2.

Two Common Illustrations of Dempster-Shafer Evidence Theory in Action

                The Wikipedia article on Dempster-Shafer Theory  has comprehensible examples of how these three measures work together, beginning with a sensor that detects whether a cat concealed in a box is in a Dead or Alive state. The value for Either obviously reaches the maximum value of 1 for both Belief and Plausibility, since it must be one of the two by logical necessity (that is, unless our cat happens to belong to Erwin Schröedinger or was buried in Pet Sematary).  It is thus an instance of a “universal hypothesis,” which encompasses the whole dataset. Yet the probability mass assignment for the Either state is only 0.3, which signifies the fact that we don’t have solid information on its status; the probability figure for the whole dataset still sums to 1 though, once the stats for Alive and Dead are factored in. The probability value for the universal hypothesis thus constitutes a measure of the uncertainty remaining in the data, once the probability, Belief and Plausibility measures have partitioned it off. Since Dead and Alive are discrete states without fuzzy intervals, the Wikipedia example assigns them Belief  figures equal to their probability masses – which when added to the value of 1 for the Either state, means that the total Belief for the whole dataset is greater than 1, unlike the probability mass. The Plausibility can then be reconstructed using the inverse of the complement of the Belief.
…………The tricky part is that the Belief measures must sum to 1 for each subset, which calls for looking at our data in an unfamiliar way. I initially thought that the existence of these subsets meant that we could simply model this by applying the appropriate normal form, but that’s not the case. The second example in the Wikipedia article has examples of states like Red, Yellow, Green which are mutually exclusive, as well as some that carry a bit of measurement uncertainty, like “Red or Yellow” and “Red or Green.” In this situation, the Belief figures for Red, Yellow and “Red or Yellow” must sum to 1, as must the Belief figures for Red, Green and “Red or Green,” since there are two overlapping subsets. Red, Yellow and Green are all members of more than one subset, but not the same ones. This leads to an odd predicament where each state is discrete and thus difficult to denormalize, yet the associated column still represents subsets; this is one situation where the presence of logical OR statements is not a hint that the design requires normalization. Since we can’t be certain how many other state descriptions a child could be related to, a single self-referencing ParentID column won’t do the job either. The next best thing is an interleaved solution, in which a separate table with two foreign keys pointing to the primary key of the table holding the Belief measures to keep track of which subsets each record belongs to. To aggregate the Belief figures for each subset in the parent table, we just inspect the interleaved table for all of the categories a record can belong to.

Server States: A SQL Server-Specific Example

                Let me give an example that might be more intuitive and relevant to SQL Server users: the state_desc column of sys.databases will assign one of seven mutually exclusive states to each database: Online, Offline, Restoring, Recovering; Recovery Pending, Suspect and Emergency. As far as I know, these states do not rule out which user modes a database can be, which range from SINGLE_USER to RESTRICTED_USER to MULTI_USER. Nevertheless, many combinations would be improbable, so each unique pair of descriptions requires a probability assignment that will probably differ from other pairs of state_desc and user mode values. Now let’s pretend we have a sensor that guesses which of pair of states a server is in at any given moment, perhaps based on I/O data or network bandwidth usage. If it can tell us the user mode plus whether we’re in one of the three recovery states, but can’t differentiate between them accurately, then we’re dealing with a fuzzy interval-valued set. From the point of view of the sensor, “Restoring | Recovering | Recovery Pending” is a discrete state and ought to be recorded as such in the database table. Nevertheless, to derive the Belief we must sum together all of the probabilities for the subsets it gives rise to, while the Plausibility equals one minus the sum of the probability assignments in the subsets it does not participate in. We could create a separate category like “Unknown” for situations where the sensor went offline or was otherwise unable to return accurate data – or better yet, establish a universal hypothesis like “Any State” with the Belief and Plausibility both set to 1 and we add all of its possible subsets. Subtracting the sum of the probabilities of all known states from that of the universal hypothesis would allow us to measure one type of uncertainty associated with the table. In order to measure the uncertainty inherent in the interval-valued fuzzy subsets that the Belief and Plausibility measures are attached to, we’d have to use a measure of fuzziness tailored to evidence theory. In the same vein, the count of possible state descriptions could be used to derive a measure of nonspecificity, albeit through a different formula than the ones introduced in the last article. In addition, we can define measures of uncertainty based on how much
…………It is easier to illustrate all of this with T-SQL code samples, beginning with the easiest part, a simple snapshot of a table with probability mass, Belief and Plausibility measures defined on it. Degrees of Belief are usually derived from some kind of input method, akin to fuzzy set membership functions – except that subjective ratings tend to be more common in evidence theory. It is no surprise that Bayesian methods are often applied in deriving Belief functions, given that they actually represent a more specific subset of evidence theory measures. Instead of complicating the topic any further, I’ve derived the values in Figure 1 by creating an artificial category in the Duchennes muscular dystrophy data I’ve been using for practice data for the last few tutorial series[7], then simply assigned probability mass assignments based on the frequency of the values for the LactateDehydrogenase column. From there, I derived the Belief measures, then constructed the Plausibility measures from those. I used the float data type for all three of the columns that associate measured with the LactateDehydrogenaseState column, an ordinal category; this represents yet another use of fuzzy sets to model ordinals on continuous scales, except at a more advanced level where three columns are required.

Figure 1: Simple Evidence Theory Measures Defined on the LactateDehydrogenase Column

Figure 2: Sample Validation Code for the Relationships Between the Three Evidence Theory Measures
— verifying the Belief via the ProbabilityMassAssignment mass assignment
SELECT ID, LactateDehydrogenaseState, ProbabilityMassAssignment, BeliefScore, PlausibilityScore,
CASE WHEN IntervalProbabilityMassAssignmentSum IS NOT NULL THEN IntervalProbabilityMassAssignmentSum ELSE ProbabilityMassAssignment END
AS BeliefReconstructedFromProbabilityMass
FROM Health.DuchennesEvidenceTheoryTable AS T3
        LEFT JOIN (SELECT ParentID, SUM(ProbabilityMassAssignment) AS IntervalProbabilityMassAssignmentSum
              FROM Health.DuchennesEvidenceTheoryTable AS T1
               INNER JOIN Health.DuchennesEvidenceTheoryIntervalTable AS T2
               ON T1.ID = T2.BeliefSubsetID
              GROUP BY ParentID) AS T4
       ON T3.ID = T4.ParentID 

SELECT ID, LactateDehydrogenaseState, BeliefScore, ProbabilityMassAssignment, ProbabilityMassAssignmentBySum,
CASE WHEN ProbabilityMassAssignmentBySum IS NULL THEN 1 ELSE ABS(1 (ProbabilityMassAssignment+ ProbabilityMassAssignmentBySum)) END AS PlausibilityScoreReconstructedFromProbability
FROM (SELECT ID, LactateDehydrogenaseState, BeliefScore, ProbabilityMassAssignment
FROM Health.DuchennesEvidenceTheoryTable) AS T5
       LEFT JOIN (SELECT BeliefSubsetID, SUM(ProbabilityMassAssignment) AS ProbabilityMassAssignmentBySum
       FROM (SELECT DISTINCT T1.BeliefSubsetID, T2.ParentID
              FROM Health.DuchennesEvidenceTheoryIntervalTable AS T1
                     INNER JOIN Health.DuchennesEvidenceTheoryIntervalTable AS T2
                     ON T1.ParentID = T2.BeliefSubsetID AND T1.BeliefSubsetID != T2.BeliefSubsetID) AS T4
                           INNER JOIN Health.DuchennesEvidenceTheoryTable AS T3
                           ON T4.ParentID = T3.ID
       GROUP BY BeliefSubsetID) AS T6
       ON T5.ID = T6.BeliefSubsetID

…………Note how the Belief is equal to the ProbabilityMassAssignment for Low, Medium and High, which is reflective of the fact that they have no substates; Medium or Low and High or Medium have BeliefScore values higher than their masses, precisely because we have to tack the values for Low, Medium and High onto them. The PlausibilityScore is in each case determined by adding together all of the ProbabilityMassAssignment values for the columns that aren’t among a record’s subsets, then taking an inverse, which is equivalent to subtracting the complement of the BeliefScore from 1. The second image depicts the Health.DuchennesEvidenceTheoryIntervalTable, in which the ParentID and BeliefSubsetID determine the linkages between subsets. For example, the records with ParentIDs of 4 tie together the Medium | Low, Medium and High | Medium values, so that we can aggregate the ProbabilityAssignments to derive the BeliefScore. The PlausibilityScore can be determined using the same table. Code similar to what I provided in Figure 2 can be used to validate the relationships between these fuzzy measures, with your own particular column and table names plugged in of course. The IS NULL condition is due to a bizarre problem in which setting the first condition in the CASE to BeliefScore = 1 THEN 1, or using NullIf, both led to NULL values. It is also possible to derive the ProbabilityMassAssignment values in reverse, but I’ll omit validation code for that scenario in the interest of brevity. To avoid pummeling readers with too much information all at once, I’ll also put off discussion of how to derive uncertainty measures like Strife and Discord from this crude example. In the next article, I’ll also mention some principles for interpreting the results that can in turn provide an important bridge to Information Theory. Among other things, the first table tells us that, “the belief that the Lactate Dehydrogenase values are Medium or Low is higher than that for Low alone, by a margin of 0.679425837320574 to 0.349282296650718. It is more plausible that the value is High than Low, by a margin of 0.822966507177033.” Once we define measures of fuzziness, nonspecificity and the like on top of them and apply some principles of inference drawn from Information Theory, we can partition the uncertainty further in order to glean additional valuable insights.

 

[1] Here in Western New York the natural language term “mild” has interesting shades of meaning (at least among local weathermen) which would be a challenge to model in terms of a fuzzy set. As winter approaches, “mild” means warmer than normal, but as the peak of summer comes, it means cooler than expected, so the meaning is inverted depending on the season. If we were to use an interval-valued set, we’d need a range ofvalues somewhere between 30 and 70 degrees – which is so imprecise that it borders on meaningless.

[2] p. 177, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[3] See the Wikipedia article “Dempster Shafer Theory” at http://en.wikipedia.org/wiki/Dempster%E2%80%93Shafer_theory

[4]  p. 179, Klir and Yuan.

[5] IBID., p. 181-182.

[6] IBID., p. 179-181.

[7] Which I downloaded from the Vanderbilt University’s Department of Biostatistics and converted into a SQL Server table in my sham DataMiningProjects database.

 

Implementing Fuzzy Sets in SQL Server, Part 9: Measuring Nonspecificity with the Hartley Function

By Steve Bolton

…………Imagine how empowering it would be to quantify what you don’t know. Even an inaccurate measure might be helpful in making better decisions in any area of life, but particularly in the business world, where change is the only certainty. This is where a program of “uncertainty management” can come in handy and fuzzy set techniques find one of their most useful applications. Fuzzy sets don’t introduce new information, but they do conserve and put to good use some information left over after ordinary “crisp” sets are defined – particularly when it would be helpful to model ordinal categories on continuous number scales. As I pointed out at the beginning of this series, uncertainty reduction is akin to Stephen King’s adage that monsters are less fearsome once some scale of measurement can be applied to them; knowing that a bug is 10 feet tall is at least reassuring, in the sense that we now know that it is not 100 or 1,000 feet tall.[1] Uncertainty reduction can also be put to obvious uses in data mining activities like prediction and clustering. Another potential use is in simplification of data, so that information loss is minimized.[2] In today’s article I’ll shine a little light on the Hartley function, a tried and true method of quantifying one particular category of uncertainty that has been used since 1928 to simplify and demystify datasets of all kinds and could easily be extended to SQL Server data.
George J. Klir and Bo Yuan, the authors of my favorite resource for fuzzy set equations, note that data models must take uncertainty into account, along with complexity and credibility. Later in the book, they go onto subdivide uncertainty into three types that sprawl across possibility theory, stochastics, information theory, fuzzy sets and Dempster-Shafer Evidence Theory:

                “The relationship is not as yet fully understood…Although usually (but not always) undesirable when considered alone, uncertainty becomes very valuable when considered in connection to the other characteristics of systems models; in general, allowing more uncertainty tends to reduce complexity and increase credibility of the resulting model. Our challenge in systems modelling is to develop methods by which an optimal level of allowable uncertainty can be estimated for each modelling problem…”[3]

“…Three types of uncertainty are now recognized in the five theories, in which measurement of uncertainty is currently well established. These three uncertainty types are: nonspecificity (or imprecision), which is connected with sizes (cardinalities) of relevant sets of alternatives; fuzziness (or vagueness), which results from imprecise boundaries of fuzzy sets; and strife (or discord), which expresses conflicts among the various sets of alternatives.

“It is conceivable that other types of uncertainty will be discovered when the investigation of uncertainty extends to additional theories of uncertainty.”[4]

…………Some authors also include “ambiguity (lack of information),”[5] which Klir and Yuan define as a parent class of both discord and nonspecificity in an excellent diagram I wish I could reprint.[6] Probabilities probably also ought to be included as well.[7]As soon as I introduced to the concept of uncertainty partitioning, I was intrigued by the possibility of defining human free will as an alternative form of uncertainty, but that raises many thorny philosophical questions. Among them is the contention that it doesn’t even exist, which is a disturbing tenet of many popular philosophies, like materialistic determinism and certain forms of theological predestination. I’d dispute that with evidence that would be hard to debunk and raise the possibility that it may not be possible to quantify it at all, by definition; the ability to assign values to it would certainly be helpful in academic fields like economics and psychology, where human behavior is the crux of the matter. This topic integrates quite nicely with the contention of authors like Lofti A. Zadeh, the father of fuzzy set theory, that it might be helpful to apply fuzzy techniques in these fields to model “humanistic systems.”[8] Other controversial candidates for new categories of uncertainty include the notion that reality is somewhat subjective (which I would argue is fraught with risk, since it is a key component of many forms of madness) and the contention that some events (particularly at the quantum level) can be truly random, in the sense of being indeterminate or “uncaused.” Albert Einstein drove home the point that uncertainty is deeply rooted in all we see in his famous quote from a lecture at the Prussian Academy of Sciences in 1921, in which he seemed to extend it right into the heart of mathematics itself: “…as far as the propositions of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”[9]

Partitioning Uncertainty

                The first step is to develop a habit of explicitly recognizing which type of uncertainty is under discussion, then partitioning it off using the appropriate type of fuzzy set. For example, whenever we need to cram continuous scales into finite data types like float, decimal and numeric, we end up creating measurement uncertainty about whatever values come after the precision we’ve chosen.[10] Like other types of measurement uncertainty, this is best addressed by fuzzy sets without any special probabilistic, possibilistic or evidence theory connotations attached to them. Incidentally, some theoreticians say that if we’re trying to quantify the uncertainty of a measurement, membership functions based on the normal distribution (i.e. the bell curve) are usually the best choice (based on empirical evidence from the aerospace industry).[11] If we were uncertain about the likelihood of an event occurring, we’d assign a probability value instead; if we were unsure of the logical necessity of an event, we’d use a possibility distribution, as explained in the last installment of this series. In the next installment, I’ll explain how Dempster-Shafer Theory can be used to judge the certainty and credibility of evidence, by assigning grades of membership in the set of true statements.
…………Once the appropriate method of uncertainty modeling has been selected, we can then apply its associated formulas to compute figures for nonspecificity, imprecision, discord and the like. The good news is that we already dispensed with the main means of computing fuzziness, back in Implementing Fuzzy Sets in SQL Server, Part 2: Measuring Imprecision with Fuzzy Complements. In the remainder of this article, I’ll provide sample T-SQL for implementing two of the three main methods for calculating the “U-Uncertainty,” a.k.a. the nonspecificity. Like many other authors I consulted for this series, Klir and Yuan stress that nonspecificity and fuzziness are completely independent stats, since they measure two distinct and unrelated types of uncertainty.[12] The former is dictated by the number of possible distinct states that a set can take on, whereas the latter quantifies imprecision in class boundaries.[13] A set can have many possible arrangements, yet still be entirely crisp; there’s no mistaking what a Lego or Lincoln Log is, but there’s apparently no end to the crazy things that can be built with either one. Sets with few arrangements but really fuzzy boundaries are also possible. That is why fuzzy sets sans any additional meaning like probability, possibility and credibility scores have both fuzziness and nonspecificity measures attached to them.
…………Possibility theory, the topic of the last blog post in this amateur series of self-tutorials, has a form of nonspecificity that is easier to specify (pun intended) than the ordinary fuzzy set version, so I’ll introduce that first. The SELECT in Figure 1 is performed on a column of muscular dystrophy data I downloaded from the Vanderbilt University’s Department of Biostatistics and added to a sham DataMiningProjects database a few tutorial series ago. The PossibilityScore was assigned by a random number generator in the last article and tacked onto the table definition, for the sake of convenience. It’s time for my usual disclaimer: I’m writing this in order to learn this topic, not because I know it well, so it is a good idea to check over my T-SQL samples before putting them to serious use. This is especially true of this SELECT, where I may be applying a Lead where there should be a Lag; in contrast to the topics I post on in previous series, examples with sample data are few and far between in the fuzzy set literature, which makes validation difficult. Furthermore, there is apparently a more compact version available for specific situations, but I’ll omit it for now because I’m still unclear on what mathematical prerequisites are needed.[14]

Figure 1: Possibilistic Nonspecificity for the LactateDehydrogenase Column
SELECT SUM(PossiblityDifference * Log(RN, 2)) AS PossibilisticUUncertainty
FROM (SELECT ROW_NUMBER() OVER (ORDER BY ID) AS RN, PossibilityScore Lead(PossibilityScore, 1, 0) OVER (ORDER BY ID) AS  PossiblityDifference
       FROM Health.DuchennesTable) AS T1 

…………The SELECT returns a single value of 4.28638426128113, which measures that amount of uncertainty in bits; the greater the number of possible state descriptions, the higher the U-Uncertainty will be. The same relationship applies to the procedure below, which returns a value of 7.30278910848746 bits; the difference is that one measures uncertainty about the number of possible values the LactateDehydrogenase column can have, while the other measures lack of certainty about the number of membership function scores a row can be assigned. Figure 2 is practically identical to the sample code I’ve posted throughout this series, at least as far as the UPDATE; all I’m doing is running the stored procedure from Outlier Detection with SQL Server, part 2.1: Z-Scores on the DuchennesTable and storing the results in a table variable, then transforming them to a scale of 0 to 1 using the @Rescaling variables and ReversedZScores column. The GroupRank column can be safely ignored, as usual. The first SELECT with the AlphaCutLeftBound and AlphaCutRightBound columns is only provided to illustrate the how the nonspecificity figure is arrived at in the last SELECT. What we’re basically doing is partitioning the dataset into nested levels, using the alpha cut (α-cut) technique I introduced in the last article, then applying a Base-2 LOG and summing the results across the hierarchy.[15] The tricky part is that with α-cuts, records can belong to more than one subset, as I pontificated on in my last post; the levels are widest at the bottom of the dataset, but narrowest at the top, where the MembershipScore values approach the maximum of 1.This calls for thinking about the data in an odd way, given that in most relational operations records are assigned to only a single subset.

Figure 2: Code for Hartley Nonspecificity
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)
DECLARE @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin 

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

 

SELECT AlphaCutBound AS AlphaCutLeftBound, Lag(AlphaCutBound, 1, 0) OVER (ORDER BY AlphaCutBound) AS AlphaCutRightBound,
AlphaCutBound Lag(AlphaCutBound, 1, 0) OVER (ORDER BY AlphaCutBound) AS AlphaCutBoundaryChange, Log(AlphaCutCount, 2) AS IndividualLogValue
FROM (SELECT Count(*) AS AlphaCutCount, AlphaCutBound
       FROM @ZScoreTable AS T1
       INNER JOIN (SELECT DISTINCT MembershipScore AS AlphaCutBound
 FROM @ZScoreTable) AS T2
       ON MembershipScore >= AlphaCutBound
       GROUP BY AlphaCutBound) AS T3  

SELECT SUM(AlphaCutBoundaryChange * Log(AlphaCutCount, 2)) AS FuzzySetNonspecificityInBits
FROM (SELECT AlphaCutCount, AlphaCutBound Lag(AlphaCutBound, 1, 0) OVER (ORDER BY AlphaCutBound) AS AlphaCutBoundaryChange
       FROM (SELECT Count(*) AS AlphaCutCount, AlphaCutBound
              FROM @ZScoreTable AS T1
              INNER JOIN (SELECT DISTINCT MembershipScore AS AlphaCutBound
                     FROM @ZScoreTable) AS T2
             ON MembershipScore >= AlphaCutBound
              GROUP BY AlphaCutBound) AS T3) AS T4

 

Figure 3: Results for the Hartley Nonspecificity Example

…………The point of using the α-cuts is to chop the dataset up into combinations of possible state descriptions, which is problematic with fuzzy sets because the boundaries between states are less clear. The interpretation depends entirely on the meaning of the fuzzy attribute; as Klir and Yuan note, it can reflect an “an unsettled historical question” in the case of retrodiction, possible future states in the case of prediction, prescriptive uncertainty in the case of policies, diagnostic uncertainty in the case of medical information and so forth.[16] In the same vein, we can interpret my sample above as measuring 7.30278910848746 bits of uncertainty about a record’s place within the range of Z-Scores, which can in turn be used as a form of outlier detection. The smaller the range of possible values, the smaller the number of possible state descriptions becomes, which means that the cardinality of the α-cuts and the value of the final statistic decline as well.
…………This is an adaptation of a function developed way back in 1928 by electronic pioneer Ralph Hartley[17]; since it serves as one of the foundations of information theory I’ll put off discussion of the crisp version until my long-delayed monster of a series, Information Measurement with SQL Server. We’ve got at least two more articles in the fuzzy set series to dispense with first, including an examination of Dempster-Shafer Theory in the next installment. Evidence theory also has its own brand of nonspecificity measure, also based on the Hartley function.[18] Measures like strife and discord are more relevant to that topic, since they deal with conflicts in evidence. Possibility theory has counterparts for both, but I’ll leave them out, given that Klir and Yuan counsel that “We may say that possibility theory is almost conflict-free. For large bodies of evidence, at least, these measures can be considered negligible when compared with the other type of uncertainty, nonspecificity. Neglecting strife (or discord), when justifiable, may substantially reduce computation complexity in dealing with large possibilistic bodies of evidence.”[19] Possibility theory is a useful springboard into the topic though, given that Belief and Plausibility measures are modeled in much the same way. In fact, Possibility and Necessity measures are just special cases of Belief and Plausibility, which should serve to decomplicate my introduction to Dempster-Shafer Theory a little.

 

[1] p. 114, King, Stephen, 1981, Stephen King’s Danse Macabre. Everest House: New York. I’m paraphrasing King, who in turn paraphrased an idea expressed to him by author William F. Nolan at the 1979 World Fantasy Convention.

[2] p. 269, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[3] IBID., p. 3.

[4] IBID., p. 246.

[5] p. 2, Hinde, Chris .J. and Yang, Yingjie., 2009, “A New Extension of Fuzzy Sets Using Rough Sets: R-Fuzzy Sets,” pp. 354-365 in Information Sciences, Vol. 180, No. 3. Available online at the Loughborough University Institutional Repository web address https://dspace.lboro.ac.uk/dspace-jspui/bitstream/2134/13244/3/rough_m13.pdf

[6]  p. 268, Klir and Yuan.

[7] IBID., p. 3.

[8] IBID., p. 451.

[9] Cited from the Common Mistakes in Using Statistics web address https://www.ma.utexas.edu/users/mks/statmistakes/uncertaintyquotes.html

[10] IBID., pp. 327-328.

[11] Kreinovich, Vladik; Quintana, Chris and Reznik, L.,1992, “Gaussian Membership Functions are Most Adequate in Representing Uncertainty in Measurements,” pp. 618-624 in Proceedings of the North American Fuzzy Information Processing Society Conference, Vol. 2. NASA Johnson Space Center: Houston. Available online at the University of Texas at El Paso web address www.cs.utep.edu/vladik/2014/tr14-30.pdf

[12] p. 258, Klir and Yuan.

[13] p. 2, Hinde and Yang.

[14] pp. 253, 269, Klir and Yuan.

[15] IBID., pp. 248-251.

[16] IBID., p. 247.

[17] See the Wikipedia articles “Hartley Function” and “Ralph Hartley” at http://en.wikipedia.org/wiki/Hartley_function and http://en.wikipedia.org/wiki/Ralph_Hartley respectively.

[18] pp. 259, Klir and Yuan.

[19] IBID., p. 264.

 

Implementing Fuzzy Sets in SQL Server, Part 8: Possibility Theory and Alpha Cuts

By Steve Bolton

…………To get the point across that fuzzy sets require membership grades of some sort, throughout this series I’ve borrowed the stored procedure I coded for Outlier Detection with SQL Server, part 2.1: Z-Scores and rescaled the results on the customary range of 0 to 1. The literature on fuzzy sets contains frequent warnings against automatically interpreting membership scores as probabilities, but I deliberately introduced a tie-in to stochastics by using Z-Scores, which are inherently probabilistic. Other shades of meaning may be assigned which are unfamiliar to modelers of ordinary “crisp” sets, which is why I pointed out early on in this series of amateur self-tutorials that interpretability is a more prominent issue with fuzzy sets. For example, membership functions can be viewed as assigning scores to the accuracy of the associated values, which is similar to the way in which we used fuzzy numbers two articles ago to code such linguistic concepts like “about” and “near.” If we add the subtle distinction that the membership scores may mean “cannot be near” or “can be around” a certain value, we’re stepping into the realm of Possibility Theory, which has important uses in fuzzy logic.[i]
…………Approximate reasoning and related concepts are more relevant to topics like expert systems that are beyond the purview of this series, but Possibility Theory can serve as a useful springboard into Evidence Theory, which is useful in developing programs of uncertainty management. Possibility distributions are in one sense a more restricted brand of probability distributions, while also acting as more restrictive versions of Evidence Theory measures; it may therefore be easier to use them as bridge from one relatively familiar topic to a lesser-known one. I originally thought the topic would be quite difficult to grasp, but it’s actually a good deal easier that stochastics. Perhaps the most difficult aspect is that possibility distributions can be modeled using alpha cuts (α-cuts), a method of partitioning fuzzy sets that will prove useful in the next two articles to come.

From ‘Can’ and ‘Must’ to Surprise

                In fact, I’ll lighten the load further by dispensing with many of the details of Possibility Theory, since its simplicity can quickly give way to complexity, same as with any other fuzzy set topic. For example, stochastic concepts like conditional and marginal probabilities have their counterparts in Possibility Theory, all of which is too far afield for our purposes. For those have a need for the corresponding formulas and don’t mind wading through the thick math, I recommend consulting the seventh chapter of my favorite resource, George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications. I’m not even going to get into a discussion of how possibility scores are assigned; for the sake of argument, let’s assume any figures used in my examples are derived from subjective ratings by end users. The important thing to keep in mind is that we need two numbers to specify a possibility distribution, not just the single probability figure used in stochastics. One of these is known as the Possibility measure and the other as a measure of Necessity, which is the inverse of Possibility’s complement.
…………The two measures can be combined by adding them together and subtracting one, but the fact that this results in a non-standard range of -1 to 1 limits its usefulness.[ii] The simplest way to model this relationship is to use a bit column, in conjunction with the float, numeric or decimal columns normally used to represent fuzzy sets on a continuous scale between 0 and 1.[iii] The tricky thing is that an event must occur when Necessity equals 1, whereas a Possibility score of 0 means that it cannot; on the other hand, a Possibility score of 1 does not imply certainty, only a state of total surprise if it did; apparently this in analogous to a measure of “surprise” developed in the mid-20th Century by economist G. L. S. Shackle,[iv] which has since been further developed by such household names in the fuzzy set field as like Henri Prade and Ronald R. Yager.[v] As Lofti A. Zadeh, the father of fuzzy set theory, explains it:

                “Consider a numerical age, say u = 28, whose grade of membership in the fuzzy set ‘young’ is approximately 0.7. First we interpret 0.7 as the degree of compatibility of 28 with the concept labelled young. Then we postulate that the proposition ‘Peter is young’ converts the meaning of 0.7 from the degree of compatibility of 28 with young to the degree of possibility that Peter is 28 given the proposition ‘Peter is young.’ In short, the compatibility of a value of u given ‘Peter is young.’”[vi]

…………This lack of symmetry is comparable to the way possibilities and probabilities differ. A Necessity measure of 1 leads inevitably to a probability score of 1, since what must happen is entirely probable; conversely, a Possibility measure of 0 leads to a probability score of 0, since what cannot happen is entirely improbable. Apart from these extremes, however, the two theories diverge. A Necessity or Possibility score of 0.5 has no effect on the probability, since whether or not a thing is logically conceivable is not equivalent to whether it is likely to happen; it is entirely possible that we may win the lottery tomorrow, but I wouldn’t bet on it. This is the core difference between the two theories: one expresses confidence in our information about whether a thing can happen, while the other reflects confidence in information about whether it will.
…………Because of this relationship, a possibility distribution acts as a cap on the associated probability distribution; this has many mathematical consequences[vii], the most important of which is that the two distribution types intersect at their minimum and maximum values. This in turn leads to the interesting property that possibility scores do not have to sum to 1 across a set of records, unlike probabilities; the only restriction is that the maximum value per record is 1.[viii] This in turn means that to assess whether or not we’ve reached a certain threshold of possibility values, all of the records with scores greater than the threshold must be taken into account. In other words, if we want to know if an event has a possibility of 0.3, we must examine all of the records with scores higher than that to come to a verdict. Every record in a set will qualify for the lowest partition, where a possibility score of 0 is all it takes to qualify, but the number of records continually shrinks as we move up the dataset towards the perfect score of 1.

Nested Sets and α-cuts

                This creates a nested set of evidence in which records can belong to multiple partitions, which can be easily implemented in T-SQL despite the fact that it calls for thinking about sets in unusual ways. We’re doing something uncommon here by cutting a set up hierarchically, so that a row belongs to more and more sets as we approach the maximum value of the membership function, rather than a single subset as we see in most relational joins. Klir and Yuan include a couple of handy illustrations which could get across the meaning of nested sets of evidence in a heartbeat, but I haven’t had a chance to seek permission to reprint them and don’t have the ability to draw my own.[ix] In turns out that the fuzzy set partitioning method known as α-cuts are an ideal tool for implementing these relationships[x] (not to mention many others that are beyond the scope of this series, like fuzzy equivalence relations[xi]). In plain English, this means that we have to use >= comparison operators to chop up a dataset into nested subsets, or > operators in the case of strong α-cuts.
…………I’m trying to keep the jargon to a minimum, but since the terms “cutworthy” and “strong cutworthy” occur frequently in the literature, it may be helpful to know that they refer to mathematical properties of fuzzy sets which are preserved in their α-cuts. [xii] Another important property is reconstructibility, which means that a fuzzy set can be rebuilt from its partitions. The manner in which possibility distributions establish maximum values for their associated probability distributions is essentially one and the same as the min/max types of unions and intersections we dealt with in previous articles, while the possibilities themselves are defined by their α-cuts.[xiii]
…………The first SELECT statement in Figure 1 illustrates how a simple GROUP BY and SUM with a ROWS UNBOUNDED PRECEDING clause can be used to partition a SQL Server table in this unconventional manner. I also have an alternate version of these SELECTs in which partitioning is done by deciles (or any other arbitrary percentile value) rather than DISTINCT MembershipScores, which I omitted to keep things simple; if anyone needs it though, I’d be happy to post it. As usual, the sample data comes from a dataset on the Duchennes form of muscular dystrophy I downloaded from Vanderbilt University’s Department of Biostatistics a few tutorial series ago, which now resides in a sham DataMiningProjects database. The code from the beginning to the UPDATE statement is basically identical to the T-SQL samples I’ve posted throughout this series, which always begins with plugging the results of the aforementioned Z-Score procedure into a table variable. The GroupRank column is only included because it was part of the original procedure and can’t be omitted from the INSERT EXEC, but it can be safely ignored. The @Rescaling variables and the ReversedZScore column are then used to adjust the Z-Scores to the 0 to 1 range used in almost all fuzzy sets. There are only 202 records in the DuchennesTable where LactateDehydrogenase where is NOT NULL, which is exactly equal to the count of values in Figure 2 where the MembershipScore is zero. The counts for each α-cut continually decline after that, till they reach the perfect score of 1, which is equivalent to the Height measure mentioned in last week’s article on fuzzy stats. I left out the middle values for the sake of brevity.

Figure 1: An Example of α-Cut Partitioning

DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)
DECLARE  @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
) 

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

— ALPHA CUTS BY DISTINCT VALUES
— =======================================

SELECT MembershipScore, SUM(DistinctCount) OVER (ORDER BY MembershipScore
DESC ROWS UNBOUNDED PRECEDING) AS AlphaCutCount
FROM (SELECT MembershipScore, Count(*) AS DistinctCount
       FROM @ZScoreTable
       WHERE MembershipScore IS NOT NULL
       GROUP BY MembershipScore) AS T1 

— MEASURE OF SURPRISE
— =======================================
SELECT ID, LactateDehydrogenase, NecessityMeasure, PossibilityScore, 1 PossibilityScore AS SimpleMeasureOfSurprise
FROM Health.DuchennesTable
WHERE LactateDehydrogenase IS NOT NULL

 

Figure 2: Sample α-Cut Values from the Beginning and End of the Duchennes Dataset
alpha-cuts-result-1

alpha-cuts-result-2

Figure 3: Possibility Scores and the Measurement of Surprise
possibility-theory

…………The second SELECT merely returns some fake PossibilityScore values I randomly generated and tacked onto the DuchennesTable, with a simple inverse calculation to illustrate the most basic measure of Surprise.[xiv] Authors like Prade and Yager have extended the measure to address more sophisticated use cases, but Figure 2 is sufficient to get the point across for our purposes. The interpretation of any Surprise measure is straightforward: the higher the value, the greater our bewilderment will be if the associated event occurs. In this context, the Surprise would be attached to the possibility of observing the corresponding LactateDehyrogenase value; of course, these are actual values taken from a muscular dystrophy in the 1980s, so if we weren’t using this for practice purposes we’d have to assign Necessity values of 1. These measurements of qualities like Surprise are of course not perfect, but they do allow us to attach some sort of ballpark figure to our expectations. As we shall see in the next two articles, one of the primary uses of fuzzy sets is to measure uncertainty, which can be valuable even when those measures are themselves uncertain. Two articles from now we’ll see how possibility theory is useful not merely in measuring surprise or in deriving interval-valued probabilities[xv], but also as a bridge to Dempster-Shafer Evidence Theory, which is useful in reckoning subtypes of uncertainty like Strife, Discord and Conflict. In the next installment, I’ll explain how both possibility distributions and α-cuts can measure nonspecificity, which is one of several types of uncertainty we can quantify with the aid of fuzzy sets.

[i] p. 200, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[ii] IBID., p. 198.

[iii] I got this idea from the Wikipedia article “Possiblity Theory” at http://en.wikipedia.org/wiki/Possibility_theory.

[iv] IBID.

[v] Prade, Henri and Yager, Ronald R., 1994, “Estimations of Expectedness and Potential Surprise in Possibility Theory,” pp. 417-428 in International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, December 1994. Vol. 2, No. 4. Available online at the National Aeronautics and Space Administration (NASA) web address http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19930020329.pdf

[vi] Posted by Kornelia Brutoczki July 4, 2001 at  the Fuzzy Logic Home Page address http://mazsola.iit.uni-miskolc.hu/DATA/diploma/brutoczki_kornelia/fu_gz_02.html. The original source is not given.

[vii] pp. 206-207, Klir and Yuan.

[viii] IBID., p. 204.

[ix] IBID., pp. 24, 195.

[x] IBID., pp. 19-21, 35.

[xi] IBID., p. 133.

[xii] IBID., p. 23, 25, 36.

[xiii] IBID., pp. 187-188, 198.

[xiv] See Prade and Yager, 1994.

[xv] p. 205, Klir and Yuan.

 

Implementing Fuzzy Sets in SQL Server, Part 7: The Significance of Fuzzy Stats

By Steve Bolton

…………In the world of fuzzy sets and imprecision modeling, the concept of cardinality takes on new shades of meaning that are not applicable to ordinary “crisp” sets, i.e. those without membership grades. In the last article in this series of amateur-self-tutorials, I mentioned one type of “fuzzy cardinality,” based on triangular, trapezoidal and other fuzzy numbers that are quite useful in modeling many vague statements found in everyday speech. Of course, another means of expressing cardinality is through ordinary numbers that are defined by a single value, rather than intervals and other such fuzzy set types. This raises some interesting questions, because one of the logical implications of graded set membership is that records with a score of 0 shouldn’t be included in the count. By extension, the values that are non-zero should only be counted in proportion to the score assigned by their membership function; since these are almost always on a scale of 0 to 1, the count of a fuzzy set never exceeds that of its crisp count, but may be much lower. Moreover, since membership scores are represented by fractional values, we’d normally use SQL Server data types like float, decimal and numeric to represent them, rather than members of the int family as we would with ordinary counts.
…………This apparently gives rise to many different possible calculations for fuzzy counts, but the most common one in the literature is the sigma count, in which we simply add up all of the membership scores for an entire set. Another stat seen occasionally in the literature is Support, which is defined as a crisp count of all the non-zero members of a fuzzy set; it thus always results in an integer somewhere between the sigma count and the ordinary crisp count. The Height refers to the crisp count, the Bandwidth to the number of records with scores greater than 0.5 and the Core to those with the maximum score of 1; these concepts might be useful in such applications as fuzzy clustering, but I see the sigma count used far more often in connection with today’s topic, fuzzy stats, which come into play whenever we want to calculate aggregates on fuzzy sets in platforms like SQL Server.[1]

Partial Credit for Partial Set Membership

                The trick with this topic is “Think partial credit!” to borrow a phrase from University of Minnesota Prof. Glen Medeen.[2]  Even if we restrict ourselves to the sigma count definition of fuzzy counts, the concept carries many interesting implications for all of the statistics that are derived from it. Averages, standard deviations, variances and all of the more advanced statistics derived from them must be recalculated, given that they’re derived from fundamental measures like values and counts that no longer apply. The logic inherent in partial set membership demands this fundamental rethinking of basic statistics. The crisp versions of some of these stats are precalculated by SQL Server, so by switching to the fuzzy set versions we’ll incur some performance costs by computing them on the fly instead, with the aid of T-SQL aggregates and windowing functions. Thankfully, some of these fuzzy stats are worth the extra computations, because they can shed light on our data in unusual ways. Perhaps the most obvious example is the difference between the crisp count and sigma count, which might be used as an alternative measure of fuzziness in the place of fuzzy complements, which as we saw early on in this series, are normally ideal for that use case.
…………Figure 1 provides a simple example of how to code this possible alternate measure of imprecision, by subtracting the sigma count from the height, i.e. the crisp count. I also demonstrate how easy it is to derive the bandwidth, support and core stats, even though these are only used infrequently. As usual, most of the initial code involves assigning the membership scores, by drafting the procedure I wrote for Outlier Detection with SQL Server, part 2.1: Z-Scores for double duty as my membership function. The calculations are performed on the Duchennes muscular dystrophy data I downloaded a few tutorial series ago from Vanderbilt University’s Department of Biostatistics, which now resides in a dummy DataMiningProjects database; afterwards, they’re stored in a @ZScoreTable table variable, that can be operated on as needed. For the sake of consistency, I’ve stuck to the same format I’ve used throughout this series by using the three @Rescaling variables and ReversedZScore column to transform the ZScores in a membership score on the traditional 0 to 1 range.

New Means, Medians and Modes

                Once we’ve derived the sigma count from these grades, I then calculate the standard fuzzy mean, which may be the simplest, most intuitive form of a “fuzzy absolute center.”[3] Another alternate measure of centrality is of course the mode, which I’ve thrown in because it’s so easy to calculate; to derive the fuzzy version, we just have to multiply each value’s count by its membership grade. This is one of the few fuzzy stats where the value is not affected by its score. In Figure 2 we can see that both versions of the mode return the same value of 198, which is within the general rule that both modes and their fuzzy counterparts will only return actual crisp values from their datasets.  Since medians are dependent on orders, I’ll take up that topic when I address the fuzzification of ranks in a wrap-up of the whole series.
…………Instead, I’ve incorporated a higher class of averages known as Generalized Means, which can be used to derive a whole family of means between the minimum and maximum values, including the fuzzy arithmetic mean mentioned above, along with the harmonic and geometric means.[4] We basically plug in an @AlphaParameter bounded between 0 and 1, which allows us to cover the whole range, in much the same fashion that the various T-norm and T-conorm parameters empowered us to derive myriad types of fuzzy intersections and unions in previous articles. Note that in Figure 2, we see that the parameter value I arbitrarily chose led to a far different value for the GeneralizedMean than the one derived for the ordinary FuzzyMean.

Figure 1: Sample Code for Fuzzy Counts and Means
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)

DECLARE  @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

 

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

DECLARE @Count bigint, @SigmaCount float, @Support float, @Bandwidth float, @Core float,
@Mean float, @FuzzyMean float, @GeneralizedMean float, @Mode float, @FuzzyMode float

— COUNTS
SELECT @SigmaCount = SUM(MembershipScore), @Count = Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @Support =  Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL AND MembershipScore > 0 

SELECT @Bandwidth =  Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL AND MembershipScore > 0.5

SELECT @Core =  Count(*)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL AND MembershipScore = 1

— MODES
SELECT @Mode = Value
FROM (SELECT TOP 1 ValueCount, Value
       FROM (SELECT Count(*) AS ValueCount, Value
              FROM @ZScoreTable
              WHERE ZScore IS NOT NULL
              GROUP BY Value) AS T1
       ORDER BY ValueCount DESC) AS T2

 

SELECT @FuzzyMode = Value
FROM (SELECT TOP 1 ValueCount, Value
       FROM (SELECT Count(*) * MembershipScore AS ValueCount, Value
              FROM @ZScoreTable
              WHERE ZScore IS NOT NULL
              GROUP BY Value,MembershipScore) AS T1
       ORDER BY ValueCount DESC) AS T2 

— AVERAGES
DECLARE @AlphaParameter float
SELECT @AlphaParameter = 0.3

SELECT @FuzzyMean = SUM(MembershipScore * Value) / @SigmaCount , @Mean  = Avg(Value)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @GeneralizedMean = Power(SUM(Power(Value, @AlphaParameter)) / CAST(@SigmaCount AS float), 1 / @AlphaParameter)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @Count AS RegularCount, @SigmaCount AS SigmaCount,
@Count @SigmaCount AS AlternativeMeasureOfFuzziness,
@Support AS Support, @Bandwidth As Bandwidth, @Core as Core,
@Mode AS Mode, @FuzzyMode as FuzzyMode,
@Mean AS Mean, @FuzzyMean AS FuzzyMean, @GeneralizedMean AS GeneralizedMean

 

Figure 2: Sample Results from the Duchennes Table
fuzzy-counts-and-averages

…………Generalized means occupy a space in the set of norm operations in between T-norms and T-conorms, along with Ordered Weighted Averages.[5] Basically, each record in an OWA is multiplied by a weight which globally equals one, but the choice of weights is so broad that I won’t bother with them; I’ll merely point out that this obviously overlaps the topic of neural net weights, at least to anyone who has coded them before. I’ll also omit my sample code for Lambda Averages (i.e. λ-Averages), because it’s simply too long in comparison to its usefulness. This class of norm operations is derived from binary set relations, which means we first have to create a second table variable, fill it and adjust the scores, as we did in previous articles with T-norms and T-conorms. We’d then apply a CASE statement to select the MIN value of the @LambdaParameter and the outcome of the union, when the both records were between 0 and the @LambdaParameter; take the MAX of the @LambdaParameter and the outcome of the fuzzy intersection when the outcomes were greater than the @LambdaParameter; then use the @LambdaParameter value in the ELSE statement.[6]
…………As with fuzzy complements, unions and intersections, the applications are determined by the selection of appropriate parameter values.[7] One method of accomplishing this is of course parameter estimation.[8] A good starting point for fuzzy parameter estimation may be Seyed Mahmoud Taheri’s select bibliography of recent developments in fuzzy stats, which also lists many resources for extending fuzzy stats to standard statistical and data mining topics like Bayesian priors, fuzzy regression and hypothesis testing.[9] A couple of the sources also connect fuzzy sets to information theory, which I will also begin doing in my next tutorial.

Fuzzy Variance: A Fresh Take on an Staid Stat

                Taheri also mentions published research on fuzzy sets and maximum likelihood, which makes me wonder if there is also some connection to Fisher Information. The same is true of the different types of fuzzy variance, given that variance is interpreted in Fisher Information as a form of uncertainty. This may be a more worthwhile topic to cover than λ-Averages and OWAs since the formulas are less broad and have clearer applications. First of all, it makes intuitive sense to calculate variance differently on fuzzy sets, for precisely the same reasons as fuzzy means: the crisp version of the statistic is dependent on counts, which ought to be replaced with alternative measures like the sigma count when possible. If a record has zero membership, for example, its value shouldn’t count at all in the computation, because it’s no longer part of the set. It thus follows that a value with partial membership should only be taken into consideration in proportion to its score, just as with fuzzy means; it stands to reason that the same principles would apply if we went beyond the first and second statistical moments, mean and variance, to the third and fourth, skewness and kurtosis.
…………This leads to some interesting questions over how we can interpret the differences in the crisp and fuzzy variances. Given that the difference between crisp and sigma counts reflects a measure of fuzziness – albeit not clearly as fuzzy complements – perhaps we can interpret this as a measure of how dispersed the fuzziness is. This might come in quite handy in many data mining applications. I haven’t seen it used that way in the literature – but it’s good to keep in mind that my exposure to the whole topic of fuzzy sets is limited, given that I can’t afford the hefty fees for many of the academic journals, which render them inaccessible to me. Nor have I seen trapezoidal numbers combined with variance, but a construction like the CASE statement for TrapezoidalRangeOnTheCrispVariance in Figure 3 might be useful in expressing natural language slang about dispersion like, “all over the place.” The TrapezoidalRangeOnTheFuzzyVariance expresses the same concept, except that it represents a fuzzy number on a fuzzy set, rather than a fuzzy number on a crisp set; it thus amounts to saying, “this graded set is all over the place.” I set the range boundaries arbitrarily so that both would have fractional scores in Figure 4, which serves as a better illustration of partial membership in a fuzzy number. Using the square root and power techniques mentioned in the last article, we could add superlatives to it like “really” or “somewhat.” If we were using Z-Scores in the context of a normal distribution, we might set graded boundaries based on the “68–95–99.7 Rule”  I covered in Goodness-of-Fit Testing with SQL Server, part 1: The Simplest Methods, which involve the number of expected records between the first, second and third standard deviations. I left out the more complicated case of the superlative in sample code below, just to illustrate how a SQL Server user might write T-SQL for simple cases of the these two fuzzy variance types:

Figure 3: Some Possible Measures of Fuzzy Variance
DECLARE @StDev float, @FuzzyStDev float, @Var float, @FuzzyVar float

SELECT @FuzzyVar = Sum(Power((Value * MembershipScore) @FuzzyMean, 2)) / @SigmaCount, @Var
= Var(Value), @StDev = StDev(Value)
FROM @ZScoreTable
WHERE ZScore IS NOT NULL

SELECT @FuzzyStDev = Power(@FuzzyVar, 0.5)

DECLARE @LowerBound float, @UpperBound float
SELECT @LowerBound  = 4000, @UpperBound = 5000

 SELECT @StDev AS StDev, @FuzzyStDev AS FuzzyStDev,
@Var AS Var, @FuzzyVar AS FuzzyVar,
@Var @FuzzyVar AS PossiblyTheVarianceOfTheFuzziness,
CASE WHEN @Var BETWEEN @LowerBound AND @UpperBound THEN 1
       WHEN @Var < @LowerBound THEN ((@Var @LowerBound)) / @Var + 1
       WHEN @Var > @UpperBound THEN ((@UpperBound @Var)) / @Var + 1
       ELSE NULL END AS TrapezoidalRangeOnTheCrispVariance,
CASE WHEN @FuzzyVar BETWEEN @LowerBound AND @UpperBound THEN 1
       WHEN @FuzzyVar < @LowerBound THEN ((@FuzzyVar @LowerBound)) / @FuzzyVar + 1
       WHEN @FuzzyVar > @UpperBound THEN ((@UpperBound @FuzzyVar)) / @FuzzyVar + 1
       ELSE NULL END AS TrapezoidalRangeOnTheFuzzyVariance

Figure 4: Fuzzy Variance Result for the LactageDehydrogenase Column
fuzzy-variance

…………Fuzzy variance may serve as a bridge to Fisher Information, a topic I want to cover in my long-delayed series, Information Measurement with SQL Server. Early on in this series we saw how fuzzy complements serve as one important measure of a different type of information, fuzziness, which quantifies the imprecision of a dataset in a different manner than variance. The difference between the sigma and crisp counts might serve the same purposes, although I’ve seen the various types of complements used more often for this purpose. One of the coolest things about fuzzy sets is that they give rise to several useful statistics that quantify different types of imprecision, which can be used to derive a program of “uncertainty management” for an organization. In the next installment we’ll see how we can use some of the fuzzy stats defined here to pin down a different brand of imprecision known as nonspecificity. This will involve discussion of the Hartley function and possibly Shannon’s Entropy, the latter of which is a fundamental concept in many data mining algorithms. Since entropy is among the foundations of information theory, this introduction to its applications in nonspecificity will serve as a bridge to my future Information Measurement series.

 

[1] pp. 25-28, Bonissone, Piero P., 1998, “Fuzzy Sets & Expert Systems in Computer Eng. (1).” Available online at http://homepages.rpi.edu/~bonisp/fuzzy-course/99/L1/mot-conc2.pdf. Bonissone’s material is reprinted at least in part from slides produced by artificial intelligence researchers Roger Jang and Enrique Ruspini.

[2] p. 5, Medeen, Glen, 2015, Two Examples of the Use of Fuzzy Set Theory in Statistics,” published online at the University of Minnesota web address http://users.stat.umn.edu/~gmeeden/talks/fuzznov09.pdf

[3] p. 435, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J. On this particular page, they’re extending the meaning of the term even further, to complex network topologies.

[4] IBID., p. 90.

[5] IBID., pp. 92-93.

[6] IBID., p. 94.

[7] IBID., p. 93.

[8] IBID., p. 94.

[9] p. 240, Taheri, Seyed Mahmoud, 2003, “Trends in Fuzzy Statistics,” pp. 239-257 in Austrian Journal of Statistics, Vol. 32, No. 3. Available online at Vienna University of Technology web address http://www.statistik.tuwien.ac.at/oezstat/ausg033/papers/taheri.pdf

Implementing Fuzzy Sets in SQL Server, Part 6: Fuzzy Numbers and Linguistic Modifiers

By Steve Bolton

…………I’ve written several amateur tutorial series on this blog in order to more quickly absorb difficult data mining, statistical and machine learning topics, while hopefully helping other SQL Server users avoid some of my inevitable mistakes. Since I don’t know what I’m talking about, I’m occasionally surprised at how useful some of the material turns out to be – particularly in the case of this week’s topic, fuzzy numbers, which I originally thought were a curiosity. In recent years I’ve made moderate progress in recapturing some of the advanced math skills I had as a kid, but some of the resources I’m consulting still wear me out with densely packed, arcane symbols; that was also true of George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications and several other sources I’ve used for this series on using T-SQL to implement fuzzy sets. Much of the literature on the subject is written by mathematicians who need thick equations and highly precise terminology to communicate with each other effectively, which can be taxing on non-specialists not accustomed to the jargon of the field.
…………That communication gap may be responsible for the enormous lag between the refinement of the theory and its adoption; this comes despite all of the empirical evidence that these data modeling techniques are insanely useful in solving certain classes of real-world problems, in a wide variety of industries. Adoption has also lagged in the relational database and data mining markets, in spite of the fact that set-based languages like T-SQL and Multidimensional Expressions (MDX) are ideal for implementing fuzzy sets. I’m trying to put my drop in the bucket to remedy that, but my lack of adequate equation translation skills almost caused me to skip over fuzzy numbers, which are actually one of the simplest and most useful aspects of fuzzy set theory.

Modeling Everyday Speech 

 …………One rule of thumb I’ve learned along the way is that whenever confusion arises over fuzzy sets, it is best to return to basics and state the problem in terms of natural language. After all, the main overarching use case for fuzzy set theory is to model imprecision that can be expressed linguistically, particularly when it would be useful to use continuous scales to ordinal categories. If we were using a Behavior-Driven Development (BDD) methodology, we might want to flag any qualifiers we encounter in user stories like “about,” “around,” “near,” “approximately,” “most,” “few” and the like as candidates for fuzzy numbers. These require several numbers and calculations of various intensity in order to assign a range of values to a single number – but since ordinary numbers require none of this extra computation, the obvious question is, why bother? It turns out that these particular instances of fuzzy sets are useful in modeling these types of natural language qualifiers, which express uncertainty about how close a value is to a definite target value.
…………“About half” is a clear and simple example from everyday speech. In fuzzy set parlance, this would be modeled as a “triangular number,” which is a fancy way of saying that we’d assign a perfect membership grade of 1 to values that were exactly equal to 0.5, but descending on either side of 0.5 in proportion to how far away the value was from that target. The term “triangular” is used because if we use a line chart to depict a membership function of this kind, it peaks at the target value and descends to 0 as the values decrease or increase away from it. Trapezoidal numbers are a mouthful, but really aren’t much more difficult; they have basically have the same shape as triangular numbers, except that the peak is flattened in order to express an interval of some kind.[1] University of Minnesota Prof. Glen Meeden’s natural language example of a trapezoidal number is the best I’ve yet run across in the literature: “What is the average yearly snowfall in the Twin Cities? You might answer somewhere between 20 and 50 inches.”[2]
…………Functions that only increase or decrease are useful in capturing related distinctions, like “a large number of” or “a small number of.”[3] Although it is wise to obey a few restrictions that lead to certain useful mathematical properties – particularly the industry-standard requirement of a boundary between 0 and 1 – it is not necessary for the membership functions to be symmetric. In fact, assigning a lopsided peak to a triangular number can be useful in modeling statements like “almost all,” which would be closer to the right edge of a line graph than a function that implements the term “most.”[4] They can even be bell-shaped.[5]
…………At a higher level of sophistication, fuzzy numbers can be used to model terms like “very,” which fall under the rubric of linguistic hedges and fuzzy modifiers – including statements like “very true,” which can be useful in fuzzy logic.[6] Klir and Yuan suggest applying powers and square roots to model such distinctions, since they don’t follow a linear scale. For example, they say that if we use a score of 0.8 to model the term “John is young,” squaring it would could model the phrase “very young” and using a square root could lead be used for “fairly young.” This is because the result of the first is 0.64, which strongly modifies the original term, while the second returns 0.89, which modifies it slightly.[7] Of course, the exact boundaries of all of these terms have to be set in light of domain knowledge of some kind. “Half” is a definite term but the modifier “about” can mean different things to different people, which may require aggregating the viewpoints of users in some way, perhaps using Decision Theory methods that integrate seamlessly with fuzzy sets. Neural nets likewise work well together with fuzzy sets because they are highly useful for encoding unknown functions, which means they can be put to use to derive such boundaries if greater precision is required. For many use cases, an informal guesstimate may suffice.

Translating Fuzzy Numbers into T-SQL

…………My sample code below implements four types of fuzzy numbers, using quite simple and arbitrary criteria that is merely designed to illustrate the concepts. For the sake of consistency, I’m once again using the procedure I wrote for Outlier Detection with SQL Server, part 2.1: Z-Scores to my derive membership grades and storing the results in a table variable (which has an extraneous column named GroupRank that can be safely ignored). In this case we’re calculating Z-Scores on the LactateDehydrogenase column of the Duchennes muscular dystrophy dataset I’ve been using as practice data for the last several tutorial series, which I downloaded ages ago from Vanderbilt University’s Department of Biostatistics and converted to a SQL Server table in a dummy DataMiningProjects database. Some of the key differences from previous articles include the absence of the ReversedZScore column and associated @Rescaling variables. This is because we don’t need to perform rescaling of any kind, since we’re measuring nearness to a few Z-Score values in all four examples, not calculating a relative score on a range of 0 to 1. The MembershipScore column is missing for the same reason. In its place, we have four computed columns, two of which measure the closeness of each Z-Score in the dataset to a target value or range.
…………To put it simply, the triangular number assigns a grade to  the natural language statement, “around a Z-Score of 0.960526,” while the trapezoidal expresses the concept of “somewhere between 0.450526 and 1.360526.” The other two columns define increasing and decreasing functions that model how close the values are to either the top or bottom of the dataset, which can be interpreted as “few” or “most.” The hard-coded numbers in the UPDATE and the declarations above it are picked out of thin air merely to illustrate the point, not because they express any domain knowledge. There are probably more efficient ways of coding this, but the point is to get the concepts across.

Figure 1: Code for the Fuzzy Number Sample
DECLARE @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
TriangularNearnessScore decimal(38,6),
TrapezoidalNearnessScore decimal(38,6),
FewScore decimal(38,6),
MostScore decimal(38,6),
GroupRank bigint
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

DECLARE  @ComparisonPoint float = 0.960526
DECLARE @LowerBound float = @ComparisonPoint 0.5, — we could of course make it lopsided if that would model our data better
@UpperBound float = @ComparisonPoint + 0.5

UPDATE @ZScoreTable
SET TriangularNearnessScore =     CASE WHEN ZScore NOT BETWEEN @LowerBound and @UpperBound THEN 0
       WHEN ZScore = @ComparisonPoint THEN 1
       WHEN ZScore < @ComparisonPoint THEN (ZScore @ComparisonPoint) + 1
       WHEN ZScore > @ComparisonPoint THEN (@ComparisonPoint ZScore) + 1
       ELSE NULL END,
       TrapezoidalNearnessScore = CASE WHEN ZScore NOT BETWEEN @LowerBound and @UpperBound THEN 0
       WHEN ZScore BETWEEN @LowerBound + 0.1 and @UpperBound 0.1 THEN 1
       WHEN ZScore < @LowerBound + 0.1 THEN (ZScore @ComparisonPoint) + 1
       WHEN ZScore >  @UpperBound 0.1 THEN (@ComparisonPoint ZScore) + 1  ELSE NULL END,
       FewScore = CASE WHEN ZScore BETWEEN 0.2 AND 0.7 THEN 1  ELSE 1 (ZScore * 0.3) END,
       MostScore = CASE WHEN ZScore BETWEEN 1 AND 2 THEN 1  ELSE ZScore / 1.2 END

SELECT PrimaryKey, ZScore, TriangularNearnessScore, TrapezoidalNearnessScore, FewScore, MostScore
FROM @ZScoreTable
WHERE ZScore BETWEEN 0.2 AND 2 AND TriangularNearnessScore != 0 AND TrapezoidalNearnessScore != 0
ORDER BY ZScore DESC

Figure 2: Sample Results from the Duchennes Practice Dataset
fuzzy-number-results

…………Note how the TriangularNearnessScore peaks at a single value, while the TrapezoidalNearnessScore is contained within a particular range. The FewScore and MostScore values peak at opposite ends of the dataset. Of course, a picture is worth a thousand words: if the values above aren’t clear, then the Reporting Services line graphs below ought to clear up any confusion. The TriangularNearnessScore doesn’t precisely follow a triangular shape, but it does come to single peak, which is good enough. The trapezoidal example reaches the same peak, but encompasses a range of values represented in the flat line. The FewScore and MostScore also have flat peaks, but these occur at the far edges of the membership grades. The shapes aren’t as neat as those in the literature, in part because I’m using real-world data from the Duchennes dataset together with some arbitrary range values, but I’m sure that readers will get the gist of it.

Figure 3: Reporting Services Line Graphs for 4 Fuzzy Number Samples
fuzzy-number-report
…………Asking questions like “what is about half of A plus almost all of B” is probably a rare use case, one that is more likely to come up in data mining than in relational situations. If the need for this kind of comparison does arise, be aware that at least some of the math has been worked out, so there’s no need to reinvent the entire wheel. I say “some” because, at least at the time Klir and Yuan wrote, mathematicians were still struggling with some of the strange properties and enigmatic logic associated with these kinds of comparisons.[8]  Just imagine several of the trapezoids and triangles above overlaid and you can see how quickly the topics of fuzzy arithmetic, fuzzy set relations and fuzzy matrix math can become.  Some basic procedures for solving them are available[9], as well as neural net techniques for solving fuzzy equations[10], but they lead to certain logical difficulties that I believe still aren’t fully understood, such as the fact that many of the approximate solutions may not be unique.[11] Thankfully, we don’t encounter statements like “between 3 and 5 of A minus a little of B” often in ordinary speech, so I imagine that SQL Server users are unlikely to encounter it. In contrast, qualifiers like “around” or “near” are so common that I guarantee these fuzzy numbers will prove valuable to many users in the long run.
…………Instead of taking fuzzy numbers in directions of this kind that lack real-world applications, I’ll instead use them as a stepping stone towards quantifying imprecision, which can be helpful in programs of uncertainty management. Many of the linguistic qualifiers mentioned here are actually instances of what is known as “fuzzy cardinality,” which enable modeling of phrases like “about a quarter” or “near.”[12] In the next article, I’ll delve into the realm of fuzzy statistics, where the implications of relative membership in a set leads automatically to a range of different types of cardinality, not just the single type of Count used in T-SQL. I have yet to see this done in the literature (I simply can’t afford access to most of the research published on certain advanced topics like fuzzy stats), but I’ll provide an example of how trapezoidal numbers might be implemented in order to create fuzzy analogues of standard deviation and variance. Given that the ordinary “crisp” versions of these aggregates are determined in part by counts, the fuzzification of counts plays into that as well. It may be useful to probe for connections to Fisher’s Information, a metric I hope to code for a long-delayed series titled Information Measurement with SQL Server, given that it apparently uses variance to model uncertainty. Using increasing and decreasing functions to model statements like “almost all” as we have done here can be seen as fuzzy instances of MIN and MAX aggregates, which essentially express the same sentiment as “near the bottom” or “near the top.” These can also be used to create new methods of fuzzy outlier detection, as I’ve essentially been doing throughout this series by using Z-Scores in my sample code; fuzzy set grades don’t have anything to do with stochastics or outlier detection unless such meanings are deliberately assigned.
…………In the next article I will probably continue to use Z-Scores in my sample code mainly for the sake of consistency and to reuse old, familiar concepts, but also to kill two birds with one stone and investigate possible uses in outlier detection. As we shall see, membership functions can also be interpreted in the light of Evidence Theory, in which each grade indicates the levels of credibility and truthfulness of a statement. In such cases, we’re speaking of fuzzy measures[13], which are also useful in partitioning and quantifying different types of uncertainty. As we already saw in Implementing Fuzzy Sets in SQL Server, Part 2: Measuring Imprecision with Fuzzy Complements, fuzzy complements can be used to measure one type of uncertainty, in quantifying just how imprecise the boundaries of fuzzy sets are; in essence, they become measures of fuzziness. There are other types of uncertainty, however, which require entirely different modeling techniques, which is where fuzzy stats and measures come in handy.

 

 

[1] Most fuzzy set references discuss triangulars and trapezoidals. One source I found to be helpful was pp. 12-13, Alavala, Chennakesava R., 2008, Fuzzy Logic and Neural Networks: Basic Concepts and Applications. New Age International Pvt. Ltd.: New Delhi.

[2] p. 3, Medeen, Glen, 2015, Two Examples of the Use of Fuzzy Set Theory in Statistics,” published online at the University of Minnesota web address http://users.stat.umn.edu/~gmeeden/talks/fuzznov09.pdf

[3] pp. 96-98, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[4] IBID., p. 228.

[5] IBID., p. 99.

[6] IBID., p. 222.

[7] IBID., p. 230.

[8] IBID., p. 115.

[9] IBID., pp. 160-162.

[10] IBID., pp. 171-173.

[11] IBID., pp. 153-157, 166-167.

[12] IBID., p. 98.

[13] IBID., p. 177.

Implementing Fuzzy Sets in SQL Server, Part 5: The Mystery of the Missing Left Join

By Steve Bolton

…………Information on set operations like complements, intersections and unions is plentiful in the literature on fuzzy sets, which made the last three articles in this series of amateur self-tutorials easier to write in a certain sense. These topics are far more complex than with ordinary “crisp” sets because there are so many different methods for calculating membership grades for the resultsets, each of which can be applied to different use cases in order to model imprecision. This can be especially useful in Behavior-Driven Development (BDD) and user stories, as well as in uncertainty management programs of all kinds, including reducing the insecurity involved in software engineering and data modeling. There may be extra steps involved in calculating these fuzzy set relations, but the topic is at least well-studied and the sources of information for the formulas are readily available. Strangely, that is not true for some of the set operations that SQL Server users encounter more often, like LEFT JOINs, INNER JOINs, OUTER JOINs, RIGHT JOINs and Cartesian Products. There are only 41 hits on Google for the terms “left join” “relational” “fuzzy set” combined, one of which is an off-hand reference from my own blog. I haven’t see these topics addressed much in the literature, which is rich in information on complements, intersections and unions, but not these common relational joins. Like many other puzzling aspects of fuzzy set theory, the reason behind this is readily apparent only after we get back to basics and frame the questions we’re asking of the database in natural language terms.
…………This problem is easily reducible to a single enigma, which nevertheless requires a lengthy explanation. As we saw in the last two tutorials, INNER and OUTER JOIN statements are useful in deriving fuzzy intersections and unions respectively. We really can’t use the standard T-SQL INTERSECT and UNION operators for these purposes, since we need to retrieve membership function values from both sides in order to calculate membership in the resultset. For all intents and purposes, these are the principal use cases for fuzzy INNER and OUTER JOIN operations. Although I won’t rule out the possibly of some oddball set operation that requires the fuzzy versions of these statements, without qualifying as fuzzy intersections and unions, such use cases are probably pretty rare. CROSS JOIN operations are easier to explain once we realize that all fuzzy binary relations are performed on the complete crisp Cartesian Product to derive a fuzzy subset of some kind. To perform a fuzzy Cartesian Product, we would have to take the membership grades of both sets and implement a mathematical operation of some type, just like with fuzzy intersections and unions. The CROSS JOIN operator isn’t used as often in T-SQL as its kin, in large part because retrieving every possible combination of records from both sides can tax the server beyond belief, but cross products aren’t mentioned much in the fuzzy set literature for an additional reason: a wide range of fuzzy subsets of it also qualify as fuzzy intersections and unions, depending on whether they’re implemented with classes of functions known as T-norms and T-conorms. The domain of possible resultsets is no wider than the crisp CROSS JOIN, but fuzzy intersections and unions take up a far wider space within it than their crisp counterparts.
…………It is certainly possible to define fuzzy Cartesian Product operators that are neither T-norms nor T-conorms, but the use cases for doing so are neither clear nor very wide. In order to derive such a measure, we’d probably have to take the next step up and select from the class of “norm operations” that encompass all types of possible fuzzy aggregates, including T-norms and T-conorms as special cases.[i] Unfortunately, the concept is so general that it provides even less guidance for deriving new types of fuzzy Cartesian Products for particular use cases than we have for matching T-norms and T-conorms to the right problems. What the CROSS JOIN, INNER JOIN and OUTER JOIN all have in common, however, is that both participating sets are given equal weight in the determining the results, which is also true for their fuzzy counterparts. As I always caution, I’m not an expert in these matters and am only writing on the topic in order to introduce myself to it, while hopefully helping other SQL Server users to avoid my inevitable mistakes. At the risk of committing another one, I’ll take a stab at solving the Mystery of the Missing Join, which may boil down to the fact that the LEFT JOIN statement gives preferential treatment to one side.

Comparing Membership Functions in LEFT JOINs

                One of the most common scenarios for a LEFT JOIN in ordinary crisp sets is to join a dependent table to a parent via a foreign key. When the relationship works in reverse, we’re speaking of a RIGHT JOIN – which is just the converse of a LEFT JOIN, so I’ll dispense with any discussion of it. The join results typically have repeated values for the parent table, since they often match multiple rows in the dependent table. Whenever we work with fuzzy sets, spelling out what we’re looking for in explicit natural language terms often simplifies what may seem like really complex problems. In the case of crisp LEFT JOINs, we’re asking the database, “Give me all of the records in Set1, plus some additional information from Set2 that we sometimes do without.” Here’s the key problem we face: fuzzy set theory only matters in the context of a LEFT JOIN if those additional columns of information are included or excluded based on some membership function, which can be mathematically compared to the membership function of the parent table. In other words, the math involved in a fuzzy complement, union or intersection is needed to determine whether or not the results are included in the new set, but with a LEFT JOIN, oftentimes we’re just suppling more information in a Master-Detail situation – in which case, the single membership function of the parent ought to be sufficient to determine membership in the resultset. With intersections and unions, both sides of the binary relation are of equal importance, but in a LEFT JOIN, the left side takes precedence, with the right just being filler material that doesn’t affect the membership values of the parent. Perhaps the only time we might take a membership function on the dependent table into account is when it measures a type of uncertainty comparable to the parent, plus we specifically want to grade how relevant the additional detail is to the parent.
…………For example, let’s say we’re performing a fuzzy LEFT JOIN on a parent CustomerTable to a child AddressTable. If the first has a function that ranks how Tall a person is and the second carries a grade for membership in the set of Rural places, then we can retrieve the extra data in order to answer questions like, “Give me the set of Short people who live in Suburbs” without performing any math operations on either membership function. If we exclude the much more advanced topic of statistical independence, how Tall or Short a customer is has no bearing on how Rural or Urban their hometown is, so the two memberships don’t affect each other. We can just do a regular LEFT JOIN in such situations and return the two grades separately.
…………The question gets murkier when we consider dependent tables that measure the same quality as the parent, or nearly so. For example, let’s pretend we’re operating a database for a hardware store that has a CeilingHeight membership function defined on the AddressTable, also split into categories like Tall, Short, etc. (for the sake of argument, let’s say the customers volunteered this information about their homes in a survey or whatever). We can ask the database a simple question like, “Give me the set of Short people who live in homes with Tall ceilings” without doing any extra math, but there may be use cases where we can compare these two types of information to draw inferences. An example might be asking a question like, “Give me the set of people who belong to the category of Mismatched Customers,” which can in turn be defined as Short people with Tall ceilings and vice-versa. This could be calculated through LEFT JOINs with WHERE clauses on the Tall and Short values for the Customers, after which some computation can be applied to derive a new category from based on some comparison of that membership function to the CeilingHeight column of the child.

An Advanced Exception: LEFT JOINs Creating New Categories of Comparison

                Keep in mind that with fuzzy intersections and unions, we’re essentially adding a new layer of fuzzy membership grades to the operation that joins the two sets; if we extend the same principle to fuzzy LEFT JOINs, then we’re likewise creating a new type of fuzzy set by defining the join in an imprecise way. I’m still a novice at all of this, but would suggest that it might be possible to discern the difference between this type of fuzzy LEFT JOIN from one that doesn’t require any extra math by looking at the results: if they define a new ordinal category that is graded on a continuous scale, then we’ve created a new type of fuzzy set from the two membership functions. If no such category is being created, then we can probably just return the child’s membership functions values as additional details if it’s germane to our query, without taking any extra steps. The question may be complicated by the fact that both tables may have multiple membership functions defined on them, some of which define sets that may not be relevant at all to the query at hand, while others can have various shades of meaning that may overlap. For example, a ProductTable might have different membership functions for such disparate characteristics as Color, Width and Availability, while the dependent table may even have stochastic labels like “Gaussian” or “Gamma,” if we needed to measure membership of a column in some kind of probability distribution.
…………Either way, we have to retrieve all the values we would for a LEFT JOIN, but would only perform additional math in certain use cases that seem to be much narrower than those for fuzzy unions and intersections. In these instances, we’re essentially automatically extending it to a new level, just as the Sugeno and Yager Complements add another level of membership grade on top of the existing membership function values. For that reason, I strongly suspect that we would have to select a fuzzy join function that most closely matches the type of imprecision we’re trying to model. As we saw in the last couple of articles, T-norms and T-conorms are the ideal mathematical structures for expressing fuzzy intersections and unions, but selecting the ones with the right mix of mathematical properties and output histograms is often a tough call. Research has been ongoing in the field for decades to narrow down those use cases, but the paucity of information on LEFT JOINs leaves us with an even shakier starting point than that. This is the weakest article in this series, in the sense that I can’t even provide T-SQL sample code for a rare (or even perhaps non-existent) set of use cases, which also would require advanced research that the professionals apparently haven’t deemed worthy to investigate much; nevertheless, anyone coming from the realm of crisp SQL database servers is going to be immediately struck by the absence of LEFT JOINs and could use a tentative answer to this nagging question. Now that we’re over this hump, in the next installment I’ll explain how to use the most common fuzzy quantifiers and measures, like triangular and trapezoidal numbers. These can be incredibly useful in the right circumstances, including making it child’s play to model common linguistic phrases like “about half” or “most” on both crisp and fuzzy sets. It is here that many SQL Server users can probably find immediate applications for the material covered in this series, which the untapped potential of fuzzy set theory can deal with better than any other alternative.

[i] p. 93, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

Implementing Fuzzy Sets in SQL Server, Part 4: From Fuzzy Unions to Fuzzy Logic

By Steve Bolton

…………Fuzzy set relations carry an added layer of complexity not seen in ordinary “crisp” sets, due to the need to derive new grades for membership in the resultset from the scores in the original sets. As I explained two weeks ago in this series of amateur self-tutorials, binary set relations like fuzzy intersections are a bit more complicated than reflexive set relations like fuzzy complements, since we have to perform fuzzy aggregation operations on multiple sets. The good news is that we can reuse most of the concepts introduced in the last article for fuzzy unions, which are handled in an almost identical way. With intersections, we’re basically requesting “all of the records that are members of a both sets,” but with unions, we’re asking SQL Server, “Give me any records that are members of either set.” When we use the fuzzy versions of either, that amounts to retrieving maximum values in the case of intersections and minimums in the case of unions. With the crisp sets SQL Server users work with every day, set relations are straightforward because no calculations of membership values are involved, but with fuzzy intersections there are myriad ways of defining these grades; fortunately, the mathematical functions for fuzzy unions are quite similar to those introduced in the last article. Last time around I provided sample code for four basic types of fuzzy intersections known as the standard intersection, drastic intersection, algebraic product and bounded difference; this week, I’ll demonstrate how to code the corresponding inverses, which are known as the Standard Union, Drastic Union, Algebraic Sum and Bounded Sum. Just as the concept of fuzzy intersections are best exemplified by a class of mathematical functions known as T-norms, so too are fuzzy unions typified by another class known as T-conorms. As we shall see, the authors and parameters of the most popular T-conorms are almost identical to those I coded for the last installment, based on my favorite mathematical reference on the topic, George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications. At the tail end of the last article I offered suggestions for coping with the odd data modeling issues that crop up in cases when it is desirable to persist parameter values for such advanced functions; I won’t rehash that here because the solutions are exactly the same for fuzzy unions. I’ll also omit discussion of the theorems and formulas that justify these functions, since it’s not really necessary to overload end users with that kind of information, for the same reason that it’s not necessary to give a dissertation on automotive engineering to get a driver’s license. Suffice it to say that T-conorms share many of the same mathematical properties as T-norms, except that they’re characterized by superidempotency instead of subidempotency, which you don’t really need to know unless you’re devising your own fuzzy aggregates.[i] Instead, I’ll spend more time discussing the use cases for the different varieties of fuzzy set relations; suffice it to say for now that it boils down to looking for different shades of meaning in conjunctions like “or” and “and” in natural language, which often correspond to fuzzy unions and joins.
…………First, let me get the code for fuzzy unions out of the way. To readers of the last article, the structure of Figures 1 through 4 ought to look familiar. Once again, I’m using the output of two stored procedures I wrote for Outlier Detection with SQL Server, part 2.1: Z-Scores and Outlier Detection with SQL Server, part 2.2: Modified Z-Scores as my membership functions (you can of course test it on other membership functions that have nothing to do with Z-Scores), then storing the results in two table variables. As usual, the GroupRank and OutlierCandidate columns can be safely ignored (they’re only included in the INSERT EXEC because they were part of the original procedures), while the ReversedZScore column is used in conjunction with the @RescalingMax, @RescalingMin and @RescalingRange variables to normalize the results on the industry-standard fuzzy set range of 0 to 1. In fact, there aren’t any differences whatsoever in the code till we insert the results of the union in a third table variable, so that we can demonstrate several T-conorms with it later on; in practice, it may be possible to perform fuzzy unions without these table variables, depending on how the membership functions are being calculated. We don’t have to perform any mathematical operations on the records returned by ordinary crisp relations, but this is not true with their fuzzy counterparts. Using a regular INTERSECT operator in the sample code in the last tutorial would have been awkward to say the least, since it would have excluded unmatched values for the MembershipScores of both sets, which must be preserved to calculate the grade for membership in the fuzzy relation. The same problem arises when we try to implement a fuzzy union with the T-SQL UNION operator, so I had to resort to a workaround similar to the one used for fuzzy intersections, with a few cosmetic changes. I use a FULL JOIN here instead of an INNER JOIN and had to add IsNull statements and a CASE to substitute grades of 0 whenever the Value of one of the columns is NULL. The same two WHERE clauses I used after the join condition in the last tutorial are present here, but are buried in two subqueries; thanks to the null checks, they’re assigned membership grades of 0 on one of the two sets, whenever the WHERE clause removes them from one side but not the other.

Figure 1: Sample Code for a Fuzzy Union on Two Different Types of Z-Scores
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)
DECLARE   @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

DECLARE  @ModifiedZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint,
OutlierCandidate bit
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

 INSERT INTO @ModifiedZScoreTable
(PrimaryKey, Value, ZScore, GroupRank, OutlierCandidate)
EXEC   Calculations.ModifiedZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @OrderByCode = 8,
              @DecimalPrecision = ’38,32′

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ModifiedZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ModifiedZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

— DERIVING THE Union
DECLARE @UnionTable table
(PrimaryKey sql_variant, Value decimal(38,6), MembershipScoreForSet1 decimal(38,6), MembershipScoreForSet2 decimal(38,6),
StandardUnion AS (CASE WHEN MembershipScoreForSet1 >= MembershipScoreForSet2 THEN MembershipScoreForSet1 ELSE MembershipScoreForSet2 END),
DrasticUnion AS (CASE WHEN MembershipScoreForSet1 = 0 THEN MembershipScoreForSet2 WHEN MembershipScoreForSet2 = 0 THEN MembershipScoreForSet1 ELSE 1 END),
AlgebraicSum AS (MembershipScoreForSet1 + MembershipScoreForSet2) (MembershipScoreForSet1 * MembershipScoreForSet2),
BoundedSum AS (CASE WHEN MembershipScoreForSet1 * MembershipScoreForSet2 > 1 THEN 1 ELSE MembershipScoreForSet1 * MembershipScoreForSet2 END)
)

INSERT INTO @UnionTable
(PrimaryKey, Value, MembershipScoreForSet1, MembershipScoreForSet2)
SELECT CASE WHEN T1.PrimaryKey IS NULL THEN T2.PrimaryKey ELSE T1.PrimaryKey END,
CASE WHEN T1.Value IS NULL THEN T2.Value ELSE T1.Value END,
IsNull(T1.MembershipScore, 0), IsNull(T2.MembershipScore, 0)
FROM (SELECT PrimaryKey, Value, MembershipScore
       FROM @ZScoreTable WHERE Value != 142) AS T1
       FULL JOIN (SELECT PrimaryKey, Value, MembershipScore
              FROM @ModifiedZScoreTable WHERE Value != 147) AS T2
       ON T1.PrimaryKey = T2.PrimaryKey
       WHERE T1.Value IS NOT NULL OR T2.Value IS NOT NULL

SELECT *
FROM @UnionTable
ORDER BY DrasticUnion ASC

Figure 2: Sample Results from the Duchennes Table[ii]
fuzzy-union-basic-results

…………Basically, the four computed columns in Figure 1 implement the Standard Union, Drastic Union, Bounded Sum and Algebraic Sum by inverting the operations used for the corresponding fuzzy intersections.[iii] The only record which has a zero membership value for the first set also has a zero membership value for the second, hence it’s assigned a fuzzy union value of 0, so there a no Drastic Union values in between 0 and 1 in this case. A few weeks ago I had to order the fuzzy intersection results by the DrasticIntersection DESC in order to depict any difference in the return values of that more restrictive function, but this time I had to order by the DrasticUnion in the opposite direction to do the same. That is because the two operations are basically the inverse of each other, with the first being a maximum and the other a minimum.
…………The more complex T-conorms cited by Klir and Yuan and other fuzzy set experts also implement minimum bounds for membership in the resultset, just in a more specific manner than the four listed above.[iv] The code in Figure 3 is not much different from that of Figure 3 in the last article, except for the substitution of MIN operators for MAX, plugging in a reciprocal here and there and other such tweaks. Once again I used the VALUES trick posted by Jamie Thomson a few years back to perform row-by-row MAX and MIN operations, instead of using them to retrieve aggregates for the whole table.[v] The fuzzy unions are named after the same authors mentioned last week, with those by József Dombi, M.J. Frank, Horst Hamacher, Yandong Yu, Didier Dubois, Henry Prade. Ronald R.Yager, Michio Sugeno and Siegfried Weber segregated into the top query. For the sake of convenience and readability, I put the four T-conorms developed by Berthold Schweizer and Abe Sklar in the second query. A couple of differences include the @OneDividedByLambdaParameter and complicated CASE statement for the SchweizerAndSklar3 fuzzy union, which are designed to strain out some opportunities for Divide By Zero and Invalid Floating Point Operation errors. The WHERE clause in both SELECTs are designed to do the same through brute force, instead of cluttering the code, which is only for demonstration purposes. I suggest more strongly than ever that my calculations be rechecked if accuracy is important for your use cases, since there are few step-by-step examples with sample data in the literature for me to validate these on.

Figure 3: Code for Several Popular Types of T-Conorm Fuzzy Unions
SELECT PrimaryKey, Value, MembershipScoreForSet1, MembershipScoreForSet2,
1 ((1 MembershipScoreForSet1) * (1 MembershipScoreForSet2)) / (SELECT MAX(Value) FROM (VALUES (1 MembershipScoreForSet1),(1 MembershipScoreForSet2),(@AlphaParameter)) AS T1(Value)) AS DuboisPradeTConorm, — basically the same as the intersection except we add some 1 minuses — I adapted this trick from http://sqlblog.com/blogs/jamie_thomson/archive/2012/01/20/use-values-clause-to-get-the-maximum-value-from-some-columns-sql-server-t-sql.aspx
Power(1 + Power(Power(CAST(1 / MembershipScoreForSet1 AS float) 1, @LambdaParameter) + Power(CAST(1 / MembershipScoreForSet2 AS float) 1, @LambdaParameter), CAST(1 / @LambdaParameter AS float)), 1) AS DombiTConorm,
1 Log(1 + (((CAST(Power(@SParameter, 1 MembershipScoreForSet1) AS float) 1) * (Cast(Power(@SParameter, 1 MembershipScoreForSet2) AS float) 1)) / (@SParameter 1)), @SParameter) AS Frank1979TConorm, — basically the same as the intersection except we add some 1 minuses
MembershipScoreForSet1 + MembershipScoreForSet2 + (@RParameter 2) * (MembershipScoreForSet1 * MembershipScoreForSet2) / (@RParameter + (@RParameter 1) * (MembershipScoreForSet1 * MembershipScoreForSet2)) AS HamacherTConorm,
(SELECT MIN(Value) FROM (VALUES (1), (Power(Power(MembershipScoreForSet1, @OmegaParameter)  + Power(MembershipScoreForSet2, @OmegaParameter), 1 / CAST(@OmegaParameter AS float))))  AS T1(Value)) AS YagerTConorm,
(SELECT MIN(Value) FROM (VALUES (1), ((MembershipScoreForSet1 + MembershipScoreForSet2 + (@LambdaParameter * MembershipScoreForSet1 * MembershipScoreForSet2)))) AS T1(Value)) AS YuTConorm,
(SELECT MIN(Value) FROM (VALUES (1), ((MembershipScoreForSet1 + MembershipScoreForSet2 (@LambdaParameter / @OneDividedByLambdaParameter) * (MembershipScoreForSet1 * MembershipScoreForSet2)))) AS T1(Value)) AS WeberTNorm
FROM @UnionTable
WHERE MembershipScoreForSet1 != 0 AND MembershipScoreForSet2
!= 0

SELECT PrimaryKey, Value, MembershipScoreForSet1, MembershipScoreForSet2,
1 Power((SELECT MAX(Value) FROM (VALUES (0), (Power(MembershipScoreForSet1, @PParameter) + Power(MembershipScoreForSet2, @PParameter) 1)) AS T1(Value)), 1 / CAST(@PParameter AS float)) AS SchweizerSklarTConorm1, — just the inverse of the same intersection
Power(Power(MembershipScoreForSet1, @PParameter) + Power(MembershipScoreForSet2, @PParameter) (Power(MembershipScoreForSet1, @PParameter) * Power(MembershipScoreForSet2, @PParameter)), 1 / CAST(@PParameter AS float)) AS SchwiezerAndSklar2,
CASE WHEN 1 MembershipScoreForSet1 <= 0 AND 1 MembershipScoreForSet2 >= 0  THEN  1 EXP(-1 * Power(Power(1, @PParameter) + Power(Abs(Log(1 MembershipScoreForSet2)), @PParameter), 1 / CAST(@PParameter AS float)))
WHEN 1 MembershipScoreForSet1 >= 0 AND 1 MembershipScoreForSet2 <= 0  THEN  1 EXP(-1 * Power(Power(1, @PParameter) + Power(Abs(Log(1)), @PParameter), 1 / CAST(@PParameter AS float)))
ELSE 1 EXP(1 * Power(Power(Abs(Log(1 MembershipScoreForSet1)), @PParameter) + Power(Abs(Log(1 MembershipScoreForSet2)), @PParameter), 1 / CAST(@PParameter AS float))) END AS SchwiezerAndSklar3,
((1 MembershipScoreForSet1) * (1 MembershipScoreForSet2)) / Power(Power((1 MembershipScoreForSet1), @PParameter) + Power((1 MembershipScoreForSet2), @PParameter) (Power(1 MembershipScoreForSet1, @PParameter) * Power(1 MembershipScoreForSet2, @PParameter)), 1 / CAST(@PParameter AS float)) AS SchwiezerAndSklar4 –basically the same as the intersection except we add some 1 minuses,
FROM @UnionTable
WHERE MembershipScoreForSet1 != 0 AND MembershipScoreForSet2
!= 0

Figure 4: T-Conorm Results (click to enlarge)

t-conorm-results

…………So what’s the point of going to all of this trouble to devise fuzzy unions and intersections? In many use cases this level of computation wouldn’t be of much benefit, but for others we can leverage the different shapes of the function returns depicted in Figure 4 to model different types of uncertainty; in fact, we could combine them into new T-norms and T-conorms more suited to our purposes if necessary.[vi] Theoreticians prefer them not only because they mesh well with fuzzy intersections and unions well, but because they’re well-studied. Apparently, there are at least five main research papers[vii] that detail which use cases match the particular T-norms and T-conorms I’ve introduced in the last couple of articles, but unfortunately, I was unable to get my hands on any of them because of my limited budget.
…………I did manage to find some more recent papers on tangential matters like linguistic connectives, such as AND and OR, which were at least helpful in outlining the main principles for selecting the right ones.[viii] Such discussions usually lead in two directions, one of which is using AND/OR logical operations to aggregate opinions and factor in costs in Decision Theory. Klir and Yuan include a well-written section on how to perform such related tasks as weighting goals, actions and constraints; aggregating multiperson decisions and criteria over multiple stages; selecting fuzzy or crisp responses and using fuzzy input states; and implementing it all with state transition matrices, a topic I touched on briefly in A Rickety Stairway to SQL Server Data Mining, Algorithm 8: Sequence Clustering.[ix]
…………The other direction is towards constructing fuzzy logic arguments from the same building blocks, like NOT, AND and OR. Some surveys have found surprisingly subtle shades of meaning in ordinary linguistic terms like AND and OR, which factor into the selection of the right T-norms and T-conorms. Once I can get my hands on those sources I may be tack another article onto the back end of this series, but for now I will have to leave readers dangling. One other suggestion I might offer for matching use cases is to use the Empirical Distribution Functions (EDFs) I introduced in my tutorial series Goodness-of-Fit Testing with SQL Server, perhaps along with scoring systems like the Kolmogorov-Smirnov Test. In crude terms, these are akin to measuring how closely two histograms match each other, so plugging in the T-norms and T-conorms alongside an ideal curve ought to give us a good starting point. Linear regression is a related technique in the same vein. Some alternatives mentioned elsewhere by Klir and Yuan include using maximum likelihood estimation (MLE) methods, curve fitting and neural nets to derive appropriate parameter values for the T-norms and T-conforms.

The Bridge to Fuzzy Logic

                As is often the case in the fuzzy set field,  simplicity quickly gives way to complexity; just as there are many more ways of defining fuzzy complements, intersections and unions than their crisp counterparts, so too are there many types of fuzzy logic. I’ll omit any in-depth discussion of alternatives like Lukasiewicz logic, since they’re beyond the range of ordinary SQL Server use cases. I’ll limit my comments to pointing out that the translation of fuzzy complements, intersections and unions into fuzzy NOT, AND and OR statements is your ticket across that bridge, should you need to cross it. Also keep in mind that there may be trolls beneath that bridge. Some of the steps are well-traveled, like certain multi-valued logics, which are really a matter of common sense; even a child can grasp the concept of inserting answer like “maybe” between a Boolean range of simple “yes” and “no” answers. Other innovations in the field of fuzzy logic smell of intellectual shock value, which may help a scholar to publish rather than perish; watch out for some of the tell-tale signs, like the justification of contradiction as a form of knowledge or misuse of subjective evidence. In such cases, dreaming up a sly mathematical justification for madness may get a person tenure, but does nothing to advance the cause of human knowledge. A solid mathematical scaffolding is available for the less glitzy forms of fuzzy logic in case you need to climb it, but be aware that it requires some mental gymnastics to navigate.
…………At first glance, even the most reliable brands of fuzzy logic can lead to apparent violations of Aristotle’s three laws of logic, like the Law of Contradiction, the Law of the Excluded Middle and the Law of Identity.[x] How can a thing be yet not be at the same time, or exist in a ghost-like state between being and not-being, or be something other than what it is? To keep from having your mind blown, just remember from Implementing Fuzzy Sets in SQL Server, Part 2: Measuring Imprecision with Fuzzy Complements how records have can have partial membership in both a fuzzy set and its complement. That’s an easily digestible example of how a thing “can be” and “not be” at the same time; the false discrepancy arises from applying ordinary crisp, Boolean-valued logic to a set that was defined on a fuzzy range of values. In the same way, many of the apparent violations of Aristotle’s principles[xi] only arise from similar misapplications of fuzzy to crisp sets and vice-versa; in fact, this is usually the hidden culprit when we encounter the kind of manic misuse of fuzziness I addressed in Implementing Fuzzy Sets in SQL Server, Part 1: Membership Functions and the Fuzzy Taxonomy. It would be a mistake to think that Aristotelian logic is incapable of incorporating fuzzy set theory – as many authors in the field seem to suggest – given that Aristotle himself mentioned in his Metaphysics that ambiguity of definitions only leads to the appearance of contradictions.[xii]

Putting Fuzzy Logic to Use

                Once we get over this bump in the road, it is possible to construct towers of logical propositions from fuzzy sets, using tried-and-true methods like generalized modus ponens, generalized modus tollens, generalized hypothetical syllogisms, the intersection/product syllogism, approximate reasoning, multi-valued and interval-valued reasoning, as well as making inferences from “quantified propositions.” [xiii] This includes the Kleene-Dienes, Reichenbach-Lukasiewicz and Goguen brands of fuzzy implications, which are a fuzzified version of regular syllogistic match that leads into the topic of fuzzy expert systems.[xiv] As Klir and Yuan point out, “To select an appropriate implication for approximate reasoning under each particular situation is a difficult problem. Although some theoretically supported guidelines are now available for some situations, we are still far from a general solution to this problem.”[xv] Fuzzy logic techniques can even enable us to make exotic apples-and-oranges comparisons of a kind rarely seen in ordinary speech, like Taner Bilgic and I.B. Turksen’s examples of “John is taller than he is clever,” “Inventory is higher than it is low,” “Coffee is at least as unhealthy as it is tasty,” and “Her last novel is more political than it is confessional.”[xvi] These are all examples of sentences with two disparate and imprecisely defined objects, each of which has membership functions measured on the same standard fuzzy scale of 0 to 1. The question becomes even more complicated when we use sentences with two objects and two subjects, like “I am taller then you are skinny,” which in fuzzy set terms means “I belong to the set of tall people more than you belong to the set of skinny people.”[xvii]
…………Fuzzy logic certainly has many legitimate uses, but I’ll omit any further discussion of the topic because of the obvious complexity involved. To that we can tack on the lack of direct application to use cases SQL Server users are liable to encounter anytime soon. Data mining and decision support are definitely within the realm of legitimate SQL Server use cases, but other types of “soft computing” like fuzzy expert systems represent advanced cases that are far beyond the scope of this series. For those with a need to know, Klir and Yuan include an entire chapter on how to assemble fuzzy expert systems out of individual components like knowledge bases, knowledge acquisition modules, explanatory interfaces, “blackboard interfaces” and inference engines, which are composed of chains of fuzzy IF-THEN clauses.[xviii] Of course, scores of books have been written on fuzzy expert systems since then, most of which seem to be just as heavy on arcane fuzzy math. Two articles from now, I’ll explain how a much simpler fuzzy math construct known as a fuzzy number can be easily implemented in SQL Server to address certain use cases that we encounter every day, particularly in modeling fuzzy linguistic statements like “about half” and “almost all” seen in Behavior-Driven Development (BDD) and user stories. First I’ll dispense with another topic that DBAs and data miners encounter on a daily basis, alongside set relations like unions, intersections: the all-pervasive LEFT JOIN operator. In the next installment, I’ll take a stab at explaining its striking absence in the fuzzy set li

[i] p. 77, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[ii] These figures were derived from the Duchennes muscular dystrophy dataset made publicly available by Vanderbilt University’s Department of Biostatistics, which I have been using for practice data for the last couple of tutorial series.

[iii] p. 78, Klir and Yuan.

[iv] IBID., p. 82.

[v] Thomson, Jamie, 2012, “Use VALUES Clause to Get the Maximum Value from Some Columns,” published Jan. 20, 2012 at the SQLBlog web address http://sqlblog.com/blogs/jamie_thomson/archive/2012/01/20/use-values-clause-to-get-the-maximum-value-from-some-columns-sql-server-t-sql.aspx

[vi] pp. 75, 83, Klir and Yuan.

[vii] IBID., p. 95. These are the five sources, for anyone interested in implementing these T-norms and T-conforms:

  • Thole, U.; Zimmerman, Hans J. and Zysno, P., 1979, “On the Suitability of Minimum and Product Operators for the Intersection of Fuzzy Sets,” pp.167-180 in Fuzzy Sets and Systems, April 1979. Vol. 2, No. 2.
  • Yager, Ronald R., 1979, “A Measurement-Informational Discussion of Fuzzy Union and Intersection,” pp. 189-200 in International Journal of Man-Machine Studies, March 1979. Vol. 11, No. 2.
  • Yager, Ronald R., 1982, “Some Procedures for Selecting Fuzzy Set-Theoretic Operations” pp. 235-242 in International Journal of General Systems, 1982. Vol. 8.
  • Zimmerman, Hans J. 1978, “Results of Empirical Studies in Fuzzy Set Theory,” pp. 303-312 in Applied General Systems Research, NATO Conference Series. Vol. 5.
  • Zimmerman, Hans J. and Zysno, P., 1980, “Latent Connectives in Human Decision Making, “pp. 37-51 in Fuzzy Sets and Systems, July 1980, Vol. 4, No. 1.

[viii] Among them were Alsina, C.; Trillas E. and Valverde, L. , 1983, “On Some Logical Connectives for Fuzzy Sets Theory,” pp. 15-26 in Journal of Mathematical Analysis and Applications. Vol. 93; Dubois, Didier and Prade, Henri, 1985, “A Review of Fuzzy Set Aggregation Connectives,” pp. 85-121 in Information Sciences, July-August, 1985. Vol. 36, Nos. 1-2.

[ix] pp. 391-405, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J. On this particular page, they’re extending the meaning of the term even further, to complex network topologies.

[x] Dawkins, Jamie, 2010, “Aristotle’s Three Laws of Thought,” published July 26, 2010 at The Year to the Present blog address https://jamescarterdawkins.wordpress.com/2010/07/26/aristotles-three-laws-of-thought/

[xi] For a handy table of these, see p. 8, Klir and Yuan.

[xii] I’ve read very little Aristotle first-hand, so I’ll have to rely on second-hand quotes from him, such as this one: “It is impossible, then, that ‘being a man’ should mean precisely ‘not being a man’, if ‘man’ not only signifies something about one subject but also has one significance. … And it will not be possible to be and not to be the same thing, except in virtue of an ambiguity, just as if one whom we call ‘man’, and others were to call ‘not-man’; but the point in question is not this, whether the same thing can at the same time be and not be a man in name, but whether it can be in fact. (Metaphysics4.4, W.D. Ross (trans.), GBWW 8, 525–526).” Dawkins, Jamie, 2010, “Aristotle’s Three Laws of Thought,” published July 26, 2010 at The Year to the Present blog address https://jamescarterdawkins.wordpress.com/2010/07/26/aristotles-three-laws-of-thought/

[xiii]  pp. 235, 239, Klir and Yuan.

[xiv] pp. 304, 306-307, Klir  and Yuan.

[xv] p. 312, Klir and Yuan.

[xvi] p. 16, Bilgic, Taner and Turksen, I.B. August 1994, “Measurement–Theoretic Justification of Connectives in Fuzzy Set Theory,” pp. 289–308 in Fuzzy Sets and Systems, January 1995. Vol. 76, No. 3.

[xvii] IBID., p. 17.

[xviii] pp. 302-324, Klir and Yuan.

Implementing Fuzzy Sets in SQL Server, Part 3: Using Fuzzy Intersections as AND Statements

By Steve Bolton

…………Whenever we assign set membership grades to records on a continuous scale, we open up a whole new world of possibilities for measuring uncertainty and modeling different types of imprecision. Two articles ago in this series of amateur self-tutorials, we saw how a whole taxonomy of fuzzy set types can be derived by applying different combinations of membership grades; these fuzzy set types allow us to model imprecise natural language terms that are more difficult to handle with traditional “crisp” sets. In the last installment, we saw how crisp sets have only one type of complement, but how the concept of graded membership naturally leads to a whole smorgasbord of fuzzy complements. These can be quite useful in developing measures of imprecision that are tailor-made for specific real-world problems. Binary relations between fuzzy sets can likewise be put to good use, especially in the development of other measures that are useful in uncertainty management programs, but reside at a higher level of complexity than complements since multiple sets are involved.
…………All of the set relations I’ll be introducing in the next three articles, like fuzzy unions and fuzzy joins, are defined on some subset of the crisp Cartesian product of the two sets. I’ve seen several discussions in literature of how to model fuzzy relations as cylindrical projections of the Cartesian product in multidimensional space, which has uses in maximizing nonspecificity[1] – a metric of uncertainty I’ll discuss later in this series – but for now I’ll keep the discussion as down-to-earth as I can. Fuzzy set relations are essentially equivalent to defining a fuzzy membership function on the set that results from an operation, which leads to myriad ways of assigning grades to the relation. [2] This in turn results in a dizzying array of choices that are not available with crisp relations, which empowers us to create new measures, as the risk of getting lost in the complexity of all the available choices. I can hardly pretend to be an authority on which choices are the best match for particular use cases, but I can at least introduce the topic in the hopes that I can learn as I go and help others to at least avoid my inevitable mistakes.

Four Types of Fuzzy Intersections

                In the case of intersections, we’re basically asking SQL Server, “Give me all of the records that are members of a both sets.” Partial membership introduces the need for some kind of math operation to assign a grade to the memberships of the resulting set, which essentially boils down to taking the minimum values of the membership grades of the two constituent sets. This is one of the unifying principles of fuzzy intersections, which differ only in the manner in which they select those minimums. The simplest ones are the Standard Intersection, which is merely minimum of either the first set or second set; the algebraic product, in which the membership grades of the corresponding rows are merely multiplied together; Bounded Difference, which is either 0 or the two set grades added together minus one, whichever is higher; and the Drastic Intersection, which is the membership value of the first set when the second equals one or vice-versa, but 0 otherwise.[3]
…………The code in Figure 1 implements all four of these, in a manner similar to the way I illustrated the use of membership functions and fuzzy complements in the last articles. The stored procedure I wrote for Outlier Detection with SQL Server, part 2.1: Z-Scores is pulled in here for double-duty as the membership function of my first set, while the alternate version I used in Outlier Detection with SQL Server, part 2.2: Modified Z-Scores is used for the second. The GroupRank and OutlierCandidate columns were members of the original procedures and could not be left out of the INSERT EXEC statements, but can be safely ignored. The @RescalingMax, @RescalingMin and @RescalingRange values are used in conjunction with the ReversedZScore of the two table variables to normalize the stored procedure results within the industry-standard 0 to 1 range for fuzzy sets. We could of course define much simpler membership functions and do without the stored procedures, but I wanted to demonstrate the usefulness of fuzzy intersections in a more meaningful way. We could also do without these two table variables and the third which holds the intersection values, but it is more practical in this situation to calculate all of this once only, since we have to demonstrate many different types of intersections afterwards. Since the membership values of both sets often differ, using an INTERSECT operator might backfire by discarding those that don’t match. Hence, I’ve found it easier in cases like this to use an INNER JOIN. The WHERE clause that follows merely illustrates how we can create an intersection on sets that have different crisp members, regardless of their membership grades. The code is lengthy because of the need for the three table variables in this particular instance, but is not difficult at all, once you strip away all of this preparatory code and see that the meat and potatoes can be found in the @IntersectionTable’s four computed columns.

Figure 1: Sample Code for a Fuzzy Intersection on Two Different Types of Z-Scores
— CREATING THE FUZZY SET TEST DATA
DECLARE
@RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)

DECLARE   @ZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
) 

DECLARE   @ModifiedZScoreTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint,
OutlierCandidate bit
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

INSERT INTO @ModifiedZScoreTable
(PrimaryKey, Value, ZScore, GroupRank, OutlierCandidate)
EXEC   Calculations.ModifiedZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @OrderByCode = 8,
              @DecimalPrecision = ’38,32′

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ModifiedZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin 

UPDATE @ModifiedZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

— DERIVING THE INTERSECTION
DECLARE  @IntersectionTable table
(PrimaryKey sql_variant,
Value decimal(38,6),
MembershipScoreForSet1 decimal(38,6),
MembershipScoreForSet2 decimal(38,6),
StandardIntersection AS (CASE WHEN MembershipScoreForSet1 <= MembershipScoreForSet2 THEN MembershipScoreForSet1
ELSE MembershipScoreForSet2 END),
DrasticIntersection AS (CASE WHEN MembershipScoreForSet1 = 1 THEN MembershipScoreForSet2
WHEN MembershipScoreForSet2 = 1 THEN MembershipScoreForSet1
ELSE 0 END),
AlgebraicProduct AS MembershipScoreForSet1 * MembershipScoreForSet2,
BoundedDifference AS (CASE WHEN MembershipScoreForSet1 + MembershipScoreForSet2 1 > 0 THEN MembershipScoreForSet1 + MembershipScoreForSet2 1
ELSE
0 END)
) 

INSERT INTO @IntersectionTable
(PrimaryKey, Value, MembershipScoreForSet1,
MembershipScoreForSet2)
 

SELECT T1.PrimaryKey, T1.Value, T1.MembershipScore, T2.MembershipScore
FROM @ZScoreTable AS T1
       INNER JOIN @ModifiedZScoreTable AS T2
       ON T1.PrimaryKey = T2.PrimaryKey
       WHERE T1.Value != 142 AND T2.Value != 147

SELECT *
FROM @IntersectionTable
ORDER BY DrasticIntersection DESC

 Figure 2: Sample Results from the Duchennes Table
first-4-fuzzy-intersection-types

…………Note how quickly the Drastic Intersection drops off to 0, when we order by that particular column. The other three undulate in comparison, but do so at varying rates I don’t have screen space to fully illustrate here.  That is because there are many duplicates values for the Lactate Dehydrogenase column of the practice dataset I used for Figure 2, which is full of Duchennes muscular dystrophy data I downloaded from Vanderbilt University’s Department of Biostatistics a few tutorial series ago.
…………The different shapes of the function results can be leveraged to meet various use cases; for example, if the problem at hand calls for a rapid drop-off in intersection membership values, the Drastic Intersection might be ideal. As I’ve pointed out a few times in this series, fuzzy sets are much easier to understand once you recognize that they’re ideal for modeling imprecise natural language terms, including many found in everyday speech like “about” or “near,” not just broad high-tech concepts like “I/O bottleneck.” Membership grades should not be interpreted as probabilities unless they’re explicitly designed to measure uncertainty about stochastic measures. In this case, we’re deliberately assigning meanings to the two membership sets that are comparable as outlier detection methods, even though they measure aberrance in different ways; the purpose here is not to teach anything about Z-Scores, but to use a familiar concept to illustrate an unfamiliar one. The results here might be expressed in natural language as, “when both methods of Z-scores are taken into account, Lactate Dehydrogenase values of 198 are very close to the center point of the dataset. They have high membership value in the set of records that are NOT outliers.” As the values of each type of intersection move in the other direction, that becomes less true.

Upgrading to T-Norms

                These four intersection types are simple instances of a broader class of functions known as triangular norms (T-Norms) that have been shown to be equivalent to fuzzy intersections, according to various mathematical theorems. The Standard Intersection is basically identical to Gödel’s T-norm, while the algebraic product is equivalent to the Product T-norm and the Bounded Difference to the Lukasiewicz T-norm.[4] The functions in this family satisfy certain obligatory mathematical properties, like commutativity, associativity, monotonicity and sometimes also continuity (i.e. a continuous function).[5] An Archimedean T-norm has the additional property of subidempotency, while a strict Archimedean T-norm also has strict monotonicity.[6] I won’t bore you with the formal definitions of these things, let alone the mathematical proofs and equations, since it is essentially my function to explain this to readers who don’t have a mathematical background; there is a crying need in the data mining field for bridges between the underlying theories and the end users, in the same way that mechanics fill a gap between drivers and automotive engineers. I suggest that anyone who needs this extra detail consult my favorite reference, George J. Klir and Bo Yuan’s Fuzzy Sets and Fuzzy Logic: Theory and Applications, which I will be leaning on through much of this series. They also get into a discussion of concepts like decreasing and increasing generators and pseudo-inverses and how to use them in the construction of T-norms, but that is also beyond the bounds of this introduction.[7]
…………Just as we were able to pile different types of fuzziness on top of each other a few articles ago to create a whole fuzzy set taxonomy, so too can we combine T-norms to form an endless array of more complex T-norms.[8] I’ll leave to the imagination and needs of the reader and just give examples of how to code 11 of the more common T-norms. Figure 3 contains additional code that can be tacked on to Figure 1 to calculate four types of T-norms developed by Berthold Schweizer and Abe Sklar and others likewise developed in the ‘70s and ‘80s by József Dombi, M.J. Frank, Horst Hamacher, Yandong Yu, Didier Dubois and Henry Prade.[9] Ronald R. Yager, the author of the Yager Complement mentioned in the last article, also developed a corresponding fuzzy intersection that likewise takes a @LambdaParameter variable as an input, while Michio Sugeno, the creator of the Sugeno Complement, devise a T-norm that was independently rediscovered by Siegfried Weber. The declaration section at the top also includes parameters also required by some of the other authors. These parameters allow us to vary the shape of each T-norm to suit particular use cases, just as we did with the parameters of the Sugeno and Yager Complements in the last article.

Figure 3: Code for Several Popular Types of T-Norm Fuzzy Intersections
DECLARE @AlphaParameter float = 1, — 0 to 1
@PParameter float = 1, — <> 0 for SchweizerandSklar 1, > 0 otherwise
@RParameter float = 1,
@SParameter float = 0.7, — must be > 0 but not =1 for Frank1979
@LambdaParameter float = 1, — > -1
@OmegaParameter float = 1, — ω must be > 0
@Pi decimal(38,37) = 3.1415926535897932384626433832795028841 — from http://www.eveandersson.com/pi/digits/100

SELECT PrimaryKey, Value, MembershipScoreForSet1, MembershipScoreForSet2,
(MembershipScoreForSet1 * MembershipScoreForSet2) / (SELECT MAX(Value) FROM (VALUES (MembershipScoreForSet1),(MembershipScoreForSet2),(@AlphaParameter)) AS T1(Value)) AS DuboisPradeTNorm,
Power(1 + Power(Power(CAST(1 / MembershipScoreForSet1 AS float) 1, @LambdaParameter) + Power(CAST(1 / MembershipScoreForSet2 AS float) 1, @LambdaParameter), CAST(1 / @LambdaParameter AS float)), 1) AS DombiTNorm,
Log(1 + (((CAST(Power(@SParameter, MembershipScoreForSet1) AS float) 1) * (Cast(Power(@SParameter, MembershipScoreForSet2) AS float) 1)) / (@SParameter 1)), @SParameter) AS Frank1979TNorm,
(MembershipScoreForSet1 * MembershipScoreForSet2) / (@RParameter + (1 @RParameter) * (MembershipScoreForSet1 + MembershipScoreForSet2 (MembershipScoreForSet1 * MembershipScoreForSet2))) AS HamacherTNorm,
(SELECT MAX(Value) FROM (VALUES (0), ((MembershipScoreForSet1 + MembershipScoreForSet2 + (@LambdaParameter * MembershipScoreForSet1 * MembershipScoreForSet2) 1) / CAST(1 + @LambdaParameter AS float))) AS T1(Value)) AS WeberTNorm,
(SELECT MAX(Value) FROM (VALUES (0), ((1 + @LambdaParameter) * (MembershipScoreForSet1 + MembershipScoreForSet2 1) (@LambdaParameter * MembershipScoreForSet1 * MembershipScoreForSet2))) AS T1(Value)) AS YuNorm,
1 (SELECT MIN(Value) FROM (VALUES (1), (Power(Power(1 MembershipScoreForSet1, @OmegaParameter)  + Power(1 MembershipScoreForSet2, @OmegaParameter), 1 / CAST(@OmegaParameter AS float))))  AS T1(Value)) AS YagerTNorm
FROM @IntersectionTable
WHERE MembershipScoreForSet1 != 0 AND MembershipScoreForSet2 != 0 

SELECT PrimaryKey, Value, MembershipScoreForSet1, MembershipScoreForSet2,
Power((SELECT MAX(Value) FROM (VALUES (0), (Power(MembershipScoreForSet1, @PParameter) + Power(MembershipScoreForSet2, @PParameter) 1)) AS T1(Value)), 1 / CAST(@PParameter AS float)) AS SchweizerSklarTNorm1,
1 Power(Power(1 MembershipScoreForSet1, @PParameter) + Power(1 MembershipScoreForSet2, @PParameter) (Power(1 MembershipScoreForSet1, @PParameter) * Power(1 MembershipScoreForSet2, @PParameter)), 1 / CAST(@PParameter AS float)) AS SchwiezerAndSklar2,
EXP(1 * Power(Power(Abs(Log(MembershipScoreForSet1)), @PParameter) + Power(Abs(Log(MembershipScoreForSet2)), @PParameter), 1 / CAST(@PParameter AS float))) AS SchwiezerAndSklar3,
(MembershipScoreForSet1 * MembershipScoreForSet2) / Power(Power(MembershipScoreForSet1, @PParameter) + Power(MembershipScoreForSet2, @PParameter) (Power(MembershipScoreForSet1, @PParameter) * Power(MembershipScoreForSet2, @PParameter)), 1 / CAST(@PParameter AS float)) AS SchwiezerAndSklar4
FROM @IntersectionTable
WHERE MembershipScoreForSet1 != 0 AND MembershipScoreForSet2 != 0

Figure 4: T-Norm Results (click to enlarge)
t-norm-results

…………There’s an awful lot of math going on here but very few examples in the literature with sample data to verify the results on, so be on the lookout for undiscovered errors in this inexpert attempt to code these more advanced T-norms. To simplify things, I decided to add the brute force WHERE clause in order to prevent divide-by-zero errors, rather than including validation logic in each separate calculation. To make the code a little easier on the eyes, I split the Schweizer and Sklar T-norms into a separate query and put each on a line of its own, which probably won’t render correctly in WordPress. In most of them, I adopted a T-SQL trick posted by Jamie Thomson at SQLBlog.com which allows you to select the MAX or MIN for a set of values at each row, without having to do an aggregate operation on the whole table.[10]  The Yager T-Norm uses a MIN operation, but the outer subtraction turns it into a reciprocal so it’s actually a MAX in the end.
…………Note that once again, the each function follows a different pattern. Some of them overlap at times, like the Yager T-norm, which equals the Bounded Difference when the @OmegaParameter is set to 1; as it approaches infinity (or the best can approximate at the limits of SQL Server’s float, numeric and decimal data types) it joins up with the Standard Intersection.[11] Our goal is to find a T-norm that produces a pattern that closely matches the imprecision we’re trying to model. It’s a mouthful when put that way; suffice it to say that once again, it is easier to grasp all of this if we get away from computer code, heavy math and big words and look at fuzzy sets in terms of natural language. As Klir and Yuan put it, our selection of the right fuzzy set operation is dependent on the meaning of particular linguistic terms”

                “We have to determine, in each particular application, which of the available operations on fuzzy sets best represent the intended operations on the corresponding linguistic terms. It is now experimentally well established that the various connectives of linguistic terms, such as and, or, not, and if-then, have different meanings in different contexts, We have to determine which of the T-norms, T-conorms, complements or other operations on fuzzy sets best approximate the intended meanings of the connectives. The various linguistic hedges are context dependent as well.”[12]

…………In Behavior-Driven Development (BDD) and user stories, it might be natural to flag “and” statements as candidates for fuzzy intersections. Once we’ve identified a suitable T-norm, parameter estimation techniques can be used to take educated guesses on where to set the corresponding input variables (I suppose this would include maximum likelihood estimation, which I am currently stuck on).[13] I’ll elaborate on this selection process further in the next article, which is likely to be a good deal shorter because fuzzy unions are implemented in almost the same way.
…………The same data modeling principles apply to both fuzzy unions and fuzzy intersections, so I’ll dispense with the topic here. If we go to all the trouble of calculating these fuzzy intersection or union values, we might want to persist them instead of incurring their performance costs repeatedly by doing them on the fly. In the last article, I introduced the issue of storing the parameters for Sugeno and Yager Complements in such instances, which leads to some intriguing data modeling issues not often in seen in ordinary crisp sets, where there are no choices of functions to apply to set relations and therefore no parameters to store. I listed a couple of possible solutions, like defining indexed views with the parameter values baked into them or creating a ComplementTable to store the parameter values alongside their function type, which might span schemas or whole databases and include text keys linking them to the table and views the complements are applied to. We could go one step further and also store the complement values in one or more dependent tables linked to the ComplementTable by a foreign key, but if we want to enforce a FOREIGN KEY REFERENCES statement on the second key we’d need one such table for each fuzzy object we’re getting records from. The issue becomes more complicated with binary relations, since we’d need another text column to identify the second object in the IntersectionTable and a third key foreign key if we’re also storing the row-by-row values returned for each parameter setting. The latter would amount to a many-to-many join across two fuzzy sets that are also joined to the IntersectionTable holding the parameters and T-norm type. Thankfully, the modeling options don’t get any more complex than this with other binary relations, such as fuzzy unions, the topic of next week’s article. If you can wade through this tutorial, the next should be a breeze, considering that the counterparts of T-norms were developed by the same theorists and are implemented with similar formulas. The logic behind them isn’t much different either, except that we’re taking maximums rather than minimums at each row. This means we can reuse much of the sample code I’ve posted here and just swap out a few math operations, which is a piece of cake.

[1] IBID., p. 122.

[2] p. 120, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[3] IBID., p. 63.

[4] See the Scholarpedia.com webpage “Triangular Norms and Conorms” at thehttp://www.scholarpedia.org/article/Triangular_norms_and_conorms#Fuzzy_intersections_and_unions

[5] IBID., pp. 62-63.

[6] IBID., p. 63.

[7] IBID., p. 68.

[8] IBID., p. 70.

[9] I implemented the formulas I found in a table at p. 76, Klir andYuan, which is widely available elsewhere. I had to look up some of the first names at the Wikipedia webpage “Construction of T-Norms” at http://en.wikipedia.org/wiki/Construction_of_t-norms

[10] Thomson, Jamie, 2012, “Use VALUES Clause to Get the Maximum Value from Some Columns,” published Jan. 20, 2012 at the SQLBlog web address http://sqlblog.com/blogs/jamie_thomson/archive/2012/01/20/use-values-clause-to-get-the-maximum-value-from-some-columns-sql-server-t-sql.aspx

[11] pp. 70-71, Klir and Yuan.

[12] IBID., pp. 280-281.

[13] IBID., p. 94.

Implementing Fuzzy Sets in SQL Server, Part 2: Measuring Imprecision with Fuzzy Complements

By Steve Bolton

…………Taking the dive into fuzzy sets immediately elicits the obvious question: just how fuzzy is the data we’re operating on? As discussed in the first two installment of this amateur series of self-tutorials, the raison d’etre of fuzzy set theory is to model imprecise data that is typically expressible in nebulous terms in everyday speech; it addresses a class of data that is represented in ordinals (i.e. categories with orders) but for which it would be helpful to at least approximate with a continuous number scale. It essentially amounts to taking the information left over after we’ve defined ordinary (i.e. “crisp’) sets and putting it to at least some good use, just as recycled aluminum and cork shavings can be used in less lucrative products like paint and corkboard. Of course, once we’ve used some of the procedures outlined in the last two articles to define them using graded membership functions, it’s natural to try to quantify the imprecision we’re modeling in some way. It at least sets a sort of outer boundary to the uncertainty we’re modeling where none existed previously, even if that barrier is not completely accurate; it is the first baby step towards a program of Uncertainty Management. The ability to apply at least some yardstick to previously unquantifiable data is useful for the same reason that the unknown is invariably certain to scare readers of horror fiction. As Stephen King puts it so well,

                “You approach the door in the old, deserted house, and you hear something scratching at it. The audience holds its breath along with the protagonist as she/he (more often she) approaches that door. The protagonist throws it open, and there is a ten-foot-tall bug. The audience screams, but this particular scream has an oddly relieved sound to it. ‘A bug ten feet tall is pretty horrible,’ the audience thinks, ‘but I can deal with a ten-foot-tall bug. I was afraid it might be a hundred feet tall.’”[I]

…………A good horror writer will of course maximize the unknown in order to scare their customers, but the object of relational database management and data mining is to minimize it, to reassure end users. We know that bugs of an entirely different kind may arise whenever uncertainty is innately woven into the tapestry of the problems software engineers, DBAs and analysts seek to solve, but these underused precision modeling tools include a yardstick with which to measure our gremlins: the fuzzy complement, which allows us to assign a ballpark figure to the amount of fuzz involved. Complements are also one of the simplest types of set relations, which can serve as a bridge into discussion of more complicated topics like fuzzy intersections, unions and joins between two sets.
…………Another single-set operation that is useful in fuzzy set theory is a partitioning method known as the alpha cut, but I’ll delay discussion of them since they’re more applicable to higher-level topics like possibility theory. Fuzzy complements and α-cuts both lead to peculiar nested subsets, albeit in different ways. In the latter case, it is because they assign to records to multiple, progressively larger subsets within the original fuzzy set, in tandem with the decrease in membership grade; in the former, it is because the various types encompass ever-wider sets of records, depending on what fuzzy complement we choose to apply and how generalized its definition is. As discussed in the taxonomy of fuzzy set types in the last article, the simplicity of these techniques quickly give rise to complexity because there are so many ways of combining these concepts; just as the difficulty in using membership functions consists mainly in the fact that there are so many we can choose from, so too does the challenge with fuzzy complements consist mostly of choosing the right type. This complexity is not an issue in ordinary crisp sets, which can be viewed as highly specific cases of fuzzy set theory. There’s no need to specify their membership functions in code because they either do or do not belong, which amounts to Boolean membership grades of 0 and 1 if we put it in fuzzy set terms. Likewise, there is only one type of complement in ordinary crisp sets, i.e. the sets of all records that don’t belong to them.

Involutive and Non-Involutive Complements

…………The whole point of membership functions is to define “belonging” in terms of a grade on a continuous scale, so by logical necessity the concept of “not belonging” that defines complements also has to be a assigned a grade on a continuous scale. The most obvious way to do this is to simply subtract a membership score from 1, assuming that we’re defining our grades on a range of 0 to 1, as is almost always the case in fuzzy set theory. This is known as the standard complement, but we’re not limited to that single choice. The injection of membership grades into set theory raises the possibility of performing further operations to alter them when we perform the inversion operation, which opens up the door to a whole class of new complements that aren’t possible in crisp sets. In fact, we’re only limited by our imagination and the restrictions that the inversion functions must be strictly increasing or strictly decreasing, i.e. the values must move in only one direction, with no repeats. A violation of this principle would appear in a Reporting Services line chart as a flat segment.
…………It is possible to define a really broad set of “non-involutive” fuzzy complements if we don’t require that the inversion function return us to the original value when applied twice[ii], but I won’t expend much time on these “since involutive fuzzy complements play by far the most important role in practical applications.”[iii] Fuzzy Sets and Fuzzy Logic: Theory and Applications, the classic resource George J. Klir and Bo Yuan I’ll be depending upon for much of the math formulas in this series, contains an elegant illustration of how the standard fuzzy complement is contained within the set of involutive complements, which is in turn nested within the set of non-involutives, which are likewise a subset of all possible continuous complements and then to the set of all fuzzy complements, without any restrictions.[iv]

Coding the Sugeno and Yager Complements

                To demonstrate how these types of complements lead to different values, I calculated the complements in Figure 1 on a column in the Duchennes muscular dystrophy data I downloaded a few tutorial series ago from  Vanderbilt University’s Department of Biostatistics. As in my last article, I’ll recycle the stored procedure I wrote for Outlier Detection with SQL Server, part 2.1: Z-Scores for double-duty as my membership function, but keep in mind that fuzzy sets have nothing to do with Z-Scores or probability in general – unless you specifically add that meaning, as I have done here to essentially turn it into a fuzzy outlier detection method. To illustrate the possibility of combining membership functions in myriad ways, I retrieved Z-Scores for two columns and multiplied them together, but this sample code is much simpler since we’re only acting on one column. The results of the stored procedure are plugged into a table variable identical to the one we used last time around, except for the addition of four columns to hold the complements we’re experimenting with (the GroupRank is part of the original procedure and required for the INSERT EXEC operation, but can be safely ignored). The same @RescalingMax, @RescalingMin and @RescalingRange values are used in conjunction with the ReversedZScore column to normalize the MembershipScores within ranges of 0 to 1.
…………The key difference is the addition of the @Pi, @LambdaParameter and @OmegaParameter variables, which are needed in the second UPDATE to calculate the non-involutive example[v] and the Sugeno and Yager Complements, which are the two most commonly mentioned ones in the literature. These two are useful when the problem at hand calls for bowing the number line between 0 and 1 in various arcs. When the @LambdaParameter is set to 0, the Sugeno Complement ought to be equal to the standard complement, while the Yager Complement ought to equal the same when its @OmegaParameter is set to 1. As the @LambdaParameter surpasses 0 the line bows towards lower values of both the complement and the membership grade, but at negative values it bows towards higher values of both. The Yager Complement moves in the opposite direction as its parameter is changed, but in different proportions.[vi] Although I have yet to do so in practice, these properties can be leveraged to fit different use cases – perhaps in curve or distribution fitting, sensitivity adjustments or instances where an object carries less imprecision than its negation. In essence, the act of parameterization turns them both into whole families of functions that can be adapted as needed.

Figure 1: Four Examples of Fuzzy Complements
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)
DECLARE @LambdaParameter float = 1,
@OmegaParameter float = 1, — ω
@Pi decimal(38,37) = 3.1415926535897932384626433832795028841 — from http://www.eveandersson.com/pi/digits/100

DECLARE  @ZScoreTable table
(ID bigint IDENTITY (1,1),
PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
RegularComplement float,
NonInvolutiveComplementExample float,
SugenoComplement float,
YagerComplement float,
GroupRank bigint
)

INSERT INTO @ZScoreTable
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

UPDATE @ZScoreTable
SET RegularComplement = 1 MembershipScore,
NonInvolutiveComplementExample = 0.5 * (1 + Cos(@Pi * MembershipSCore)),
SugenoComplement = (1 MembershipScore) / CAST(1 + (@LambdaParameter * MembershipScore) AS float),
YagerComplement = Power(1 Power(MembershipScore, @OmegaParameter), 1 / CAST(@OmegaParameter AS float))
FROM @ZScoreTable 

— RESULTS WITH EQUILIBRIA
DECLARE @StandardEquilibrium AS float = 0.5
DECLARE @SugenoEquilibrium AS float = (Power(1 + @LambdaParameter, 0.5) 1) / @LambdaParameter 

SELECT MembershipScore, RegularComplement, NonInvolutiveComplementExample, SugenoComplement, YagerComplement,
CASE WHEN MembershipScore RegularComplement > @StandardEquilibrium THEN 1 ELSE 0 END AS IsGreaterThanStandardEquilibrium,
CASE WHEN MembershipScore RegularComplement > @SugenoEquilibrium THEN 1 ELSE 0 END AS IsGreaterThanSugenoEquilibrium
FROM (SELECT MembershipScore, RegularComplement, NonInvolutiveComplementExample, SugenoComplement, YagerComplement
       FROM @ZScoreTable
       WHERE MembershipScore IS NOT NULL) AS T1
ORDER BY MembershipScore ASC

SELECT @StandardEquilibrium AS StandardEquilibrium, @SugenoEquilibrium AS SugenoEquilibrium

— MEASURE OF FUZZINESS
DECLARE @Count  bigint
SELECT @Count=Count(*)
FROM @ZScoreTable

SELECT SUM(ABS(MembershipScore YagerComplement)) / @Count AS SimpleMeasureOfFuzziness
FROM @ZScoreTable
WHERE MembershipScore IS NOT NULL

Figure 2: Sample Results from the Duchennes Table (click to enlarge)
Fuzzy Complement Results

…………The concept of fuzzy complements also gives rise to two properties which are mentioned in Klir and Yuan, but for which I have yet to find a practical use: equilibria and dual points. An equilibrium represents the single value in any set where the subtraction of the complement equals zero, which is 0.5 in a standard complement but can vary from that in more advanced types of complements. The dual point is apparently an extension of equilibria to non-involutives, which are seldom used as mentioned above – so I’ll seldom mention them in this series.[vii] The sample data in Figure 1 did not contain any records where the complements values hit the equilibrium points, so I tried to illustrate the concept by declaring two variables to identify the standard and Sugeno equilibria (which requires plugging the @LambdaParameter into a simple math formula) and adding two CASE conditions to identify whether the corresponding complement was above or below its equilibrium point.
…………The end of the procedure contains an example of the most fundamental measure of fuzziness I could think of. As Klir and Yuan put it, “Another way of measuring fuzziness, which seems more practical as well as more general, is to view the fuzziness of a set in terms of the lack of distinction between the set and its complement. Indeed, it is precisely the lack of distinction between sets and their complements that distinguishes fuzzy sets from crisp sets. The less a set differs from its complement, the fuzzier it is.”[viii] In a later discussion on evidence theory, they give some clear examples of exactly what fuzziness measures:

                “Observing attributes such as ‘a type of cloud formation’ in meteorology, ‘a characteristic posture of an animal’ in ethnology, or ‘a degree of defect in a tree’ in forestry clearly involves situations in which it is not practical to draw sharp boundaries; such observations or measurements are inherently fuzzy and consequently, their connection with the concept of the fuzzy set is suggestive. In most measurement in physics, on the other hand, such as the measurement of length, weight, electric current, or light intensity, we define classes with sharp boundaries.”[ix]

…………They go on to provide an example which uses the Hamming Distance to measure the difference between each membership grade and its complement, but I chose to go the even simpler route of devising a mere ratio of the differences to the overall count. Later in the series I’ll explain how partial set membership introduces many alternative definitions of cardinality and other statistics, but to stay on point I used a regular “crisp” count here. Note how we’ve added yet another layer of complexity here: we’re no longer dependent just on varied definitions of complements, but also of the measurement we choose to use. This leads to greater potential of getting lost in an endless sea of choices, but it can also empower us by letting us choose a distance measure that is best suited to the kind of fuzziness we’re trying to measure. It might be useful in one instance to use a simple Euclidean Distance, in another we might want to work in a Küllback-Leibler Divergence and in another, it might pay to figure out what the heck an Isaura-Saito Distance is and code it. In Figure 2, we can see that the Yager Complement leads to a fairly high measure of imprecision in this case, at 0.736238435643564, which would change in tandem with our parameter choices. Perhaps we don’t know what the ideal parameter choices would be, but this at least gets us on the board with a measure of imprecision where we had none whatsoever before.
…………In the next couple of articles I’ll show how these concepts can be extended in equally useful ways to fuzzy intersections, unions and common join types. These topics are a bit more complicated, since we’re using binary relations rather than simple reflexive ones like complements. If we want to store the values of set operations of this kind, the options are easier to choose from in the case of complements: 1) using indexed views with the parameter values baked into them, or 2) creating special tables to hold the values of the Sugeno, Yager, etc. parameters if we need to reuse them in other SQL statements. In the second modeling option, it might pay to store all of the complement parameters in a single table, possibly for a whole schema or database, with some sort of text key identifying the table/view and column they’re associated with. If we wanted to also store the row values derived from the Sugeno and Yager functions for later retrieval, we might create a ComplementValueTable with one foreign key on our ComplementTable and another leading to the primary key of whatever table the row belongs to. We wouldn’t be able to leverage a FOREIGN KEY REFERENCES to enforce these relationships unless we used multiple ComplementValueTable, each associated with a different table with fuzzy data. The issue of fuzzy set relation storage takes us in strange data modeling directions because there are so many different options with unions, joins and the like that simply aren’t available to us with crisp sets. We can at least rule out many-to-many join on a ComplementTable of any kind, because complements are a reflexive relationship. That is no longer true when we’re speaking of intersections and other binary relations that add another layer of both complexity and problem-solving potential, as we’ll see in two weeks.

[i] p. 114, King, Stephen, 1981, Stephen King’s Danse Macabre. Everest House: New York. King said he was basically paraphrasing an idea expressed to him by author William F. Nolan at the 1979 World Fantasy Convention.

[ii] For the definition of the term, I consulted the Wikipedia page”Involution (Mathematics)” at  http://en.wikipedia.org/wiki/Involution_(mathematics).

[iii] p . 59, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[iv] IBID., p. 57.

[v] IBID. Adapted from a cosine-based formula on p. 54.

[vi] IBID., pp. 55-57.

[vii] IBID., pp. 57-59.

[viii] IBID., p. 255.

[ix] IBID., p. 179.

Implementing Fuzzy Sets in SQL Server, Part 1: Membership Functions and the Fuzzy Taxonomy

By Steve Bolton

…………In the first installment of this amateur self-tutorial series on applying fuzzy set theory to SQL Server databases, I discussed how neatly it dovetails with Behavior-Driven Development (BDD) principles and user stories. This is another compelling reason to take notice of fuzzy sets, beyond the advantages of using a set-based language like T-SQL to implement them, which will become obvious as this series progresses. There aren’t any taxing mental gymnastics involved in flagging imprecision in natural language statements like “hot,” “cloudy” or “wide,” which is strikingly similar to the way user stories are handled in BDD. What fuzzy sets bring to the table is the ability to handle imprecise data that resides in the no-man’s land between ordinal and continuous Content types. In addition to flagging imprecision in natural language and domain knowledge that is difficult to pin down, it may be helpful to look for attributes which represent categories that are ranked in some way (which sets them apart from nominal data, which is not ordered on any scale) but which it would be beneficial to express on a continuous numerical scale, even at the cost of inexactness. Thankfully, mathematicians have already hashed out a whole framework for modeling this notoriously tricky class of data, even though it is as underused as the SQL Server Data Mining (SSDM) components I tried to publicize in a previous mistutorial series. It is also fortunate that we already have an ideal tool to implement it with in T-SQL, which can already handle most of the mathematical formulas devised over the last few decades. As I’ll demonstrate in this article, it only takes a few minutes to implement simple membership functions that grade records based on how much they belong to a particular set. It is only when we begin combining different types of imprecision together and assigning more nuanced interpretations to the grading systems that complexity quickly arises and the math becomes challenging. Although I’m still learning the topic as I go – I find it is much easier to absorb this kind of material by writing about it – I hope to reduce the challenge involved by taking a stab at explaining it, which will at a minimum at least help readers avoid repeating my inevitable mistakes.
…………The first challenge to overcome is intimidation, because the underlying concepts don’t even require a college education to grasp; in fact, some DBAs have probably already worked with forerunners of fuzzy sets unwittingly, on occasions where they’ve added columns that rate a row’s inclusion in a particular set. It doesn’t take much mental juggling to start thinking explicitly about such attributes as measures of membership in a particular set. Perhaps the simplest forms of membership functions are single columns filled with data that has been assigned that kind of meaning, which can even be derived from such sources as subjective grades assigned by end users in exactly the same manner as movie or restaurant ratings. The data can even be permanently static. At the next level of complexity, we could of course store such data in the form of computed columns, regardless of whether it is read-only or not.
…………A couple of really simple restrictions are needed to bring this kind of data into line with fuzzy set theory though. First, since the whole object is to treat ordinal data as if it were continuous, we’d normally use T-SQL data types like float, numeric and decimal – which are the closest we can get, considering that our finite computers can’t truly handle infinitesimal scales. Furthermore, it is probably wise to stick with the convention of using a scale between 0 and 1, since this enables us to integrate it seamlessly with evidence theory, stochastics, decision theory, control theory and neural net weights, all of which are also typically bounded in the same range or quite similar ones; some of the theoretical resources I consulted mentioned in an offhand way that it is possible to use other scales, but I haven’t seen a single instance of it yet in the literature. Ordinal categories are often modeled in SQL Server in text data types like nvarchar, tinyint codes or some type of foreign key, which might have to be retained alongside the membership function storage column; in other instances, our membership function may be scoring on the basis of several attributes in a table or view, or perhaps all of them. Of course, in many use cases we won’t need to store the membership function value at all, so it will have to be calculated on the fly. If we’re simply storing a subjective rating or whatever, we might only need some sort of interface to allow end users to enter their own numbers (on a continuous scale of 0 to 1), in which case there is no need for a membership function per se. If a table or view participates in different types of fuzzy sets, it may be necessary to add more of these membership columns for each of them, unless you want to calculate the values as you go. Simply apply the usual rules of data modeling and principles of performance maximization to determine the strategies that fit your use cases best.

Selecting Membership Functions

                That is all kid stuff for most DBAs. The challenges begin when we try to identify which membership function would be ideal for the use cases at hand. Since the questions being asked of the data vary from one problem to the next, I cannot possibly answer that. I suppose that you could say the general rule of thumb with membership functions is that the sky’s the limit, as long as we stay at an altitude between 0 and 1. Later in this series I’ll demonstrate how to use particular classes of functions called T-norms and T-conorms, since various mathematical theorems demonstrate that they’re ideal for implementing unions and intersections, but even in these cases, there are so many available to us that the difficulty consists chiefly in selecting an appropriate match to the problem you’re trying to solve. There might be more detailed guidelines available from more recent sources, but my favorite reference for the math formulas, George J. Klir and Bo Yuan’s classic Fuzzy sets and Fuzzy Logic: Theory and Applications, provides some suggestions. For example, membership values can be derived from sample data through LaGrange interpolation and two methods I have used before, least-squares curve fitting and neural networks.[1] They also discuss how to aggregate the opinions of multiple experts using both direct and indirect methods of collection, in order to ascertain the meaning of fuzzy language terms. The specifics they provide get kind of involved, but it is once again not at all difficult to implement the premises in a basic way; a development team could, for example, reach a definition of the inherently fuzzy term “performance” by scoring their opinions, then weighting them by the authority of their positions.[2] The trick is to pick a mathematical operation that pools them altogether into a single value that stays on a scale of 0 to 1, while still capturing the meaning in a way that is relevant to the problem at hand.
…………Klir and Yuan refer to this as an application of the newborn field of “knowledge engineering,”[3] which has obvious connections to expert systems. Since fuzzy set theory is still a wide-open field there’s a lot of latitude for inventing your own functions; there might be an optimal function that matches the problem at hand, but no one may have discovered it yet. In situations like these, my first choice would be neural nets, since I saw spectacular evidence long ago of how they can be ideal for modeling unknown functions (which pretty much sparked my interest in data mining). Before trying one of these advanced approaches, however, it might be wise to think hard about what mathematical properties you require of your outputs and then consult a calculus book or other math reference to try to find a matching function. While trying to teach myself calculus all over again recently, I was reintroduced to the whole smorgasbord of properties that distinguish mathematical functions from each other, like differentiability, integrability, monotonicity, analycity, concavity, subadditity, superadditivity, discontinuity, splines, super- and subidempotence and the like. You’ll encounter these terms on every other page in fuzzy set math references, which can be differentiated (pun intended?) into broad categories like function magnitude, result, shape and mapping properties. One thing I can help with is to caution that it’s often difficult or even impossible to implement ones (like the popular gamma function) which require calculations of permutations or combinations. It doesn’t matter whether you’re talking about T-SQL, Visual Basic, C# or some computer language implemented outside of the Microsoft ecosystem: it only takes very small input values before you reach the boundaries of the highest data types. This renders certain otherwise useful data mining and statistical algorithms essentially useless in the era of Big Data. An exclamation point in a math formula ought to elicit a groan, because the highest value you might be able to plug into a factorial function in SQL Server is about 170.

A Trivial Example with Two Membership Functions

                I’ll provide an example here of moderate difficulty, in between the two extremes of advanced techniques like least squares (or God forbid, the gamma function and its relatives) on the one hand and cheesy screenshots of an ordinary table that just happens to have a float column scored between 0 and 1 on the other. As we’ll see in the next few tutorials on fuzzy complements, unions, intersections and the like, when calculating set memberships on the fly we usually end up using a lot of CASE, BETWEEN and MIN/MAX statements in T-SQL, but that won’t be the case in the example below because the values are derived from a stored procedure and stored in two temporary tables. To demonstrate how seamlessly fuzzy set techniques can be integrated with standard outlier detection techniques, I’ll recycle the code from my old tutorial Outlier Detection with SQL Server, part 2.1: Z-Scores and use it as my membership function.
…………There’s a lot of code in Figure 1, but it’s really easy to follow, since all we’re doing is running the Z-Scores procedure on a dataset on the Duchennes form of muscular dystrophy I downloaded from  Vanderbilt University’s Department of Biostatistics a couple of tutorial series ago, which now occupies about 9 kilobytes of space in a sham DataMiningProjects database. There’s probably a more efficient way of going about this, but the results are stored in a table variable and the @RescalingMax, @RescalingMin and @RescalingRange variables and the ReversedZScore column are then used to normalize the Z-Score on a range of 0 to 1 (the GroupRank column was needed for the stored procedure definition in the original Z-Scores tutorial, but can be ignored in this context). To illustrate how we can combine fuzzy set approaches together in myriad combinations, I added an identical table that holds Z-Scores for a second column from the same dataset, which is rescaled in exactly the same way. In the subquery SELECT I merely multiply the two membership values together to derive a CombinedMembershipScore. What this essentially does is give us a novel means of multidimensional outlier detection.

Figure 1: Using Z-Scores for Membership Functions
DECLARE @RescalingMax decimal(38,6), @RescalingMin decimal(38,6), @RescalingRange decimal(38,6)
DECLARE  @ZScoreTable1 table
(ID bigint IDENTITY (1,1),
PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

DECLARE  @ZScoreTable2 table
(ID bigint IDENTITY (1,1),
PrimaryKey sql_variant,
Value decimal(38,6),
ZScore decimal(38,6),
ReversedZScore as
CAST(1 as decimal(38,6)) ABS(ZScore),
MembershipScore decimal(38,6),
GroupRank bigint
)

INSERT INTO @ZScoreTable1
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’CreatineKinase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

INSERT INTO @ZScoreTable2
(PrimaryKey, Value, ZScore, GroupRank)
EXEC   Calculations.ZScoreSP
              @DatabaseName = N’DataMiningProjects,
              @SchemaName = N’Health,
              @TableName = N’DuchennesTable,
              @ColumnName = N’LactateDehydrogenase,
              @PrimaryKeyName = N’ID’,
              @DecimalPrecision = ’38,32′,
              @OrderByCode = 8

— RESCALING FOR COLUMN 1
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable1
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable1
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

 — RESCALING FOR COLUMN 2
SELECT @RescalingMax = Max(ReversedZScore), @RescalingMin= Min(ReversedZScore) FROM @ZScoreTable2
SELECT @RescalingRange = @RescalingMax @RescalingMin

UPDATE @ZScoreTable2
SET MembershipScore = (ReversedZScore @RescalingMin) / @RescalingRange

SELECT ID, PrimaryKey, Value, ZScore1, ZScore2, MembershipScore1, MembershipScore2, CombinedMembershipScore
FROM (SELECT T1.ID, T1.PrimaryKey, T1.Value, T1.ZScore AS ZScore1, T2.ZScore as ZScore2,
       T1.MembershipScore MembershipScore1, T2.MembershipScore AS MembershipScore2, T1.MembershipScore * T2.MembershipScore AS CombinedMembershipScore
       FROM @ZScoreTable1 AS T1
              INNER JOIN @ZScoreTable2 AS T2
              ON T1.ID = T2.ID) AS T3
WHERE CombinedMembershipScore IS NOT NULL
ORDER BY CombinedMembershipScore DESC

if we want to store the values in the original table, we can use code like this:
UPDATE T4
SET T4.MembershipScore1 = T3.MembershipScore1, T4.MembershipScore2 = T3.MembershipScore2, T4.CombinedMembershipScore =
T3.CombinedMembershipScore
FROM DataMiningProjects.Health.DuchennesTable AS T4
       INNER JOIN  (SELECT T1.PrimaryKey, T1.MembershipScore AS MembershipScore1, T2.MembershipScore AS MembershipScore2, T1.MembershipScore * T2.MembershipScore AS CombinedMembershipScore
       FROM @ZScoreTable1 AS T1
              INNER JOIN @ZScoreTable2 AS T2
              ON T1.ID = T2.ID) AS T3
       ON T4.ID = T3.PrimaryKey

 Figure 2: Sample Results from the Duchennes Practice Data
Combined Membership Function Example

…………Figure 2 gives a glimpse of what the original DuchennesTable might look like if we wanted to store these values rather than calculate them on the fly, which can be accomplished by adding the three float columns on the right to the table definition and executing the UPDATE code at the end of Figure 1. In natural language, we might say that “the first record is 0.941446th of a member in the set around the average Creatine Kinase value” but “the fifth record is only 0.764556th of a member of the set near the mean Lactate Dehydrogenase value.” We could even model deeper levels of imprecision by creating categories like “near” for the high membership values in each column and “outlier” for the lowest ones, then define their boundaries in terms of fuzzy sets. This might be an ideal use for triangular and trapezoidal numbers, which can be worth the expense in extra code, as I’ll explain a few articles from now. We’re also modeling a different type of imprecision in another sense, because we know instinctively that there ought to be some way of gauging whether or not a record’s an outlier when both columns are taken into account; perhaps nobody knows precisely what the rules for constructing such a metric might be, but the CombinedMembershipScore at least allows us to get on the board.
…………Please keep in mind that I’m only using Z-Scores here because it’s familiar to me and is ideal for illustrating how fuzzy sets can be easily adapted to one particular use case, outlier detection. If we needed to make inferences about how well the data fit a gamma or exponential distribution, we might as well have used the corresponding goodness-of-fit tests and applied some rescaling techniques to derive our membership values; if we needed to perform fuzzy clustering, we could have plugged in a Manhattan distance function or one of its relatives. Fuzzy set memberships are often completely unrelated to stochastics and should not be interpreted as probabilities unless you specific intend to model them. The usefulness of fuzzy sets is greatly augmented when we move beyond mere set membership by tweaking the meaning a little, so that they can be interpreted as degrees of evidence, reliability, risk, desirability, or the like, which allow us to plug into various other well-developed mathematical theories. All functions can be differentiated by their return types,  number of return and input values, allowable data types and ranges, mathematical properties and the like (not to mention performance costs), but in fuzzy set theory the issue of meaning has a somewhat more prominent role. In some cases, it may even be desirable to use multiple membership functions to determine membership in one fuzzy set, as in my crude example above. These myriad shades of meaning and potential for combinations of them lead to a whole new level of complexity, which may nonetheless be worthwhile to wade through for certain imprecision modeling problems.

A Taxonomy of Fuzzy Sets (that Doesn’t Tax the Brain)

                I originally figured that I’d have to organize this series according to a taxonomy of different types of fuzzy sets, but it’s actually fairly simply to sketch the outlines of that otherwise advanced topic. Instead of delving into all of the complex math, it’s a lot easier for a layman to dream up all of the combinations of all the places in a set they can apply fuzziness, different means of encoding it and so on. The important thing to keep in mind is that there’s probably a term out there for whatever combination you’re using and that somewhere along the line, mathematicians have probably already figured out most of the logical implications decades ago (thereby saving a lot of the grunt work and reinventing the wheel, assuming that you can interpret their writing and the really thick formulas that often accompany them). The easiest ones to explain are real-valued and interval sets, in which the membership functions are determined on the real number line (which is all we ever encounter in SQL Server) or by a range of values on it.[4] Type-2 Fuzzy Sets illustrate the concept of tacking on further fuzziness perfectly – all we do take an interval-valued set and then assign grades to its boundaries as well. Fuzzy set theorists Yingjie Yang and Chris Hinde state that “A type-2 fuzzy set describes its memberships using type-1 fuzzy sets, but it needs precise crisp values to describe its secondary memberships.”[5] As the levels and number of values needed to define these sets proliferates, the performance costs do as well, so one has to be sure in advance that the extra complexity is useful in modeling real-world data. As Klir and Yuan put it, “Fuzzy sets of type 2 possess a great expressive power, and, hence, are conceptually quite appealing. However, computational demands for dealing with them are even greater than those for dealing with interval-valued sets. This seems to be the primary reason why they have almost never been utilized in any applications.”[6] I’d wager that’s still true, given the fact that the applications of ordinary fuzzy sets to data mining, data warehousing and relational databases have barely been scratched since the mathematicians invented these things years ago.
…………Rough sets also involve fuzzy values on intervals in a sense, but they model approximate distinctions between objects. Say, for example, you classify all of the objects in a child’s bedroom and want to see which qualify as part of a set labeled Toys. A sports car might be considered an adult toy to a certain degree, depending on such factors as whether or not the owner uses it for purposes other than occasional joy rides. The plastic dinosaurs and megafauna in a Prehistoric Playset are certainly toys, as are Fisher Price’s wooden people (well, cheap plastic these days). Medicine definitely wouldn’t belong to the set (at least according to these singing pills). Would one of these classic glow-in-the-dark Godzilla models from the ‘70s qualify? Well, that’s not quite clear, since it’s an object only a child would really appreciate, but they’re unlikely to actually play with it as a toy very often, since it’s designed to stay on display. They could conceivably take them off the shelf and pit them against the Fisher Price people; in this instance, the set membership might be defined by criteria as fuzzy as the whims of a child’s imagination, but we have tools to model it, if a need should arise. The definition of the attribute is in question, not whether a particular row belongs to a set, which is the case with ordinary fuzzy membership functions.
…………In Soft Sets, the characteristics that define the set are themselves fuzzy. I haven’t attempted to model those yet, but I imagine it may require comparisons between tables and views and placing weights on how comparable their different columns are to each other, rather than the rows. Here’s a crude and possibly mistaken example I came up with off the top of my head: in Soft Sets you might have a table with columns for Height, Width and Age and another with columns for Height, Width and Time, in which the first two columns of each are completely related to each and therefore are assigned weights of one, whereas Age and Time are only tangentially related and therefore might be assigned a weight somewhere between 0 and 1. Near sets apparently address a problem tangential to rough and soft sets, by quantifying the quantity and quality of resemblances between objects that might belong to a fuzzy set. Once we’ve been introduced to these concepts, they can obviously be combined together into an endless array of variants, which go by such mouthfuls as “rough intuitionistic Level-2 fuzzy near sets.” Just keep in mind that it is more common to encounter such structures in the real world in everyday language than it is to know the labels and their mathematical properties. It is also easier than it sounds to implement them in practice, if we’re using set-based tools like T-SQL that are ideal for the job.
…………I probably won’t spend much time in this series on even more sophisticated variants that might nonetheless be useful in modeling particular problems. Shadowed sets used multidimensional projections to qualify the lack of knowledge of whether or not a data point belongs to a fuzzy set. Neural nets are a cutting-edge topic I hope to tackle on this blog in a distant future (my interest in data mining was piqued way back in the 1990s when I saw some I cooked up at home do remarkable things) but it is fairly easy to describe Neuro-Fuzzy Sets, in which we’re merely using neural nets to perform some of the functions related to fuzzy sets. The combinations that can be derived from are limited only by one’s imagination; there are already neural nets in use in industry today that use fuzzy functions for activation and fuzzy sets whose membership values are derived from neural nets, and so forth. Undetermined and Neutrosophic Logic are variants of fuzzy logic that can be applied to fuzzy sets if we need to model different types of indeterminacy, which is a topic I’ll take up in a future article on how fuzzy sets can be put to good use in uncertainty management.
…………Blurry sets are a recent innovation designed to incorporate the kind of combinations of fuzziness we’ve just mentioned, but without sacrificing the benefits of normal logic – which might be of great benefit in the long run, since the value of some of recently developed logical systems is at best unproven.[7] Some will probably be substantiated in the long run, but some seem to be motivated by the sort of attention-getting shock value that can make academicians famous overnight these days (some of them seem to be implementations and formal defenses of solipsism, i.e. one of the defining characteristics of schizophrenia). Q-Sets are apparently an even more advanced variants developed for use in the strange world of quantum physics; since making Schrödinger’s cat disappear isn’t among most SQL Server users’ daily duties, I’ll leave that one out for now. I’ll probably also steer away from discussing more advanced types of fuzzy sets that include multiple membership functions, which aren’t referenced often in the literature and apparently are implemented only in rare circumstances.. Intuitionistic Sets have two, one for membership or non-membership, while Vague Sets also use two, except in that case one assesses the truth of the evidence for a record’s membership and the other its falsehood; I presume truth tables and the like are then built from the two values. A novel twist on this theme is the use of multiple membership functions to model the fact that the programmer is uncertain of which membership functions to use in defining fuzzy sets.[8] Multisets are often lumped in with the topic of fuzzy sets, but since they’re just sets that allow duplicate values, I don’t see much benefit in discussing them here. Genuine sets take fuzziness to a new level in an entirely different way, by generalizing the concept of fuzzy set in the same manner that fuzzy sets generalize ordinary “crisp” sets, but I won’t tack on another layer of mathematical complexity at this point, not when the potential for using the established methods of generalization has barely been scratched.

False Mysticism and the Fuzzy Mystique

                This wide-open field is paradoxically young in terms of mathematical intellectual history, but overripe for implementation, given that many productive uses for it were derived decades ago but haven’t percolated down from academia yet. Taking a long view of the history of math, it seems that new waves of innovation involve the addition of new dimensions to existing objects. Leonhard Euler introduced complex numbers in the 18th Century, then theoreticians like Bernhard Riemann and Charles Hinton contributed the concepts of higher-dimensional space and its curvature in the 19th. Around the same time, Georg Cantor was working out set theory and such mind-blowing structures as high-cardinality infinities and transfinities. More recently, Benoit Mandelbrot elaborated the theory of fractional dimensions, which are now cornerstones in chaos theory and modern art, where they go by the better-known term of fractals. This unifying principle of mathematical innovation stretches back as far as ancient Greece, when concepts like infinity, continuous scales and the like were still controversial; in fact, the concept of zero did not reach the West until it was imbibed from Arab sources in the Middle Ages. Given that zero was accepted so late in history, it is thus not at all surprising that negative numbers were often derided by Western mathematicians as absurdities well into the 18th and 19th Centuries, many centuries after their discovery by Chinese and Indian counterparts.[9] A half-conscious prejudice against the infinite regress of non-repeating digits in pi and Euler’s number is embedded in the moniker they still go by today, “irrational numbers.” The same culprit is behind the term “imaginary number” as well. Each of these incredibly useful innovations was powered by the extension of scales into previously uncharted territory; each was also met by derision and resistance at first, as were fuzzy sets to a certain extent after their development by 20th Century theoreticians like Max Black and Lofti A. Zadeh.
…………Many of these leaps forward were also accompanied by hype and as sort of unbalanced intellectual intoxication, which is the main risk in using these techniques. Fuzzy sets are unique, however, in that some of the pioneers were conscious of the possibility of leveraging the term “fuzzy” for attention; Zadeh openly acknowledges that the term has its uses in terms of publicity power, although he did not originally invent the term for that purpose. The strategy has backfired to a certain extent, however, by drawing the wrong kind of attention. “Fuzzy” is a term that immediately conjures up many alternative images, many of which don’t seem conducive to a high-powered, mission-critical production environment – like teddy bears, static, 1970s cop shows and something out of the back of George Carlin’s fridge.
…………Many of the taxonomic terms listed above also carry a kind of shock value to them; in other branches of academia this usually signifies that the underlying theory is being overstated or is even the product of crackpots with tenure, but in this case there is substantial value once the advertising dross has been stripped away. In fact, I’d wager that if more neutral terms like “graded set” or “continuously-valued set” were used in place of “fuzzy,” these techniques would be commonplace today in a wide variety of industries, perhaps even database management; in this case, the hype has boomeranged by stunting the adoption of an otherwise indispensable set of tools. As McNeill points out, some of the researchers employed in implementing fuzzy sets in various industries (including the development of the space shuttle) back in the early ‘90s had to overcome significant institutional resistance from “higher-ups” who “fretted about image.”[10] They are right to fret within reason, because these tools can certainly be misapplied; in fact, I’ve seen brilliant theorists who grasp the math a lot better than I do abuse it in illogical ways (for the sake of being charitable, I don’t want to call them out by name). Some highly regarded intellectuals don’t recognize any boundaries to the theory, for all of reality is fuzzy in their eyes – which is the mark of fanaticism, and certain to stiffen any institutional resistance on the other side. Every mathematical innovation in history has not only been accompanied by knee-jerk opposition from Luddites on one side, but also unwarranted hype and irrational exuberance on the other; fuzzy sets are as susceptible to misuse by bad philosophers and fanatics as higher dimensions, chaos theory and information theory have been for decades, so it is not unwise to tread carefully and maintain intellectual sobriety when integrating fuzzy sets into any development process.
…………Perhaps the best way to overcome this kind of institutional resistance and receive backing for these techniques is to be up front and demonstrate that you recognize the hype factor, plus have clear litmus tests for discerning when and when not to apply fuzzy set theory. Two of these are the aforementioned criteria of searching for data that resides in between ordinal and continuous data in the hierarchy of Content types and sifting through natural language terms for imprecision modeling. It is also imperative to develop clear standards for differentiating between legitimate and illegitimate uses of fuzzy sets, to prevent the main risk: “fuzzifying” data that it is inherently crisp. It is indeed possible to add graded boundaries to any mathematical objects (some of which we’ll explore later in this series), but in many cases, there is no need to bother. Fuzzy logic in the wrong doses and situations can even lead to fallacious conclusions. In fact, applying fuzziness to inherently crisp objects and vice-versa is one of the fundamental strategies human beings have employed since time immemorial to deceive both themselves and others. Here’s a case in point we’ve all seen: you tell your son or daughter they can’t have a snack, but you catch them eating crackers; invariably, their excuse involves taking advantage of the broad interval inherent in the term “snack,” a set which normally, but not always, included crackers. Of course, when people grow up they sometimes only get more skilled at blurring lines through such clever speech (in which case they often rise high in politics, Corporate America and the legal profession). Here’s an important principle to keep in mind: whenever you see a lot of mental energy expended to tamper with the definitions of things, but find the dividing lines less clear afterwards, then it’s time to throw a red flag. The whole point of fuzzy sets is not to obscure clear things, but to clear up the parts that remain obscure. Fuzziness is in exactly the same boat as mysticism, which as G.K. Chesterton once said, is only useful when it explains mysteries:

                “A verbal accident has confused the mystical with the mysterious. Mysticism is generally felt vaguely to be itself vague—a thing of clouds and curtains, of darkness or concealing vapours, of bewildering conspiracies or impenetrable symbols. Some quacks have indeed dealt in such things: but no true mystic ever loved darkness rather than light. No pure mystic ever loved mere mystery. The mystic does not bring doubts or riddles: the doubts and riddles exist already…The mystic is not the man who makes mysteries but the man who destroys them. The mystic is one who offers an explanation which may be true or false, but which is always comprehensible—by which I mean, not that it is always comprehended, but that it always can be comprehended, because there is always something to comprehend.”[11]

…………Fuzzy sets are not meant to mystify; they’re not nebulous or airy, but designed to squeeze some clarity out of apparently nebulous or airy data and logic. They are akin to spraying Windex on a streaky windshield; if you instead find your vision blocked by streaks of motor oil, it’s time to ask who smeared it there and what their motive was. Fuzziness isn’t an ingredient you add to a numerical recipe to make it better; it’s a quality inherent in the data, which is made clearer by modeling the innate imprecision that results from incomplete measurement, conflicting evidence and many other types of uncertainty. The point is not to make black and white into grey, but to shine a light on it, so that we can distinguish the individual points of black and white that make up grey, which is just a composite of them. These techniques don’t conjure up information; they only ensure that what little information is left over after we’ve defined the obvious crisp sets doesn’t go to waste. Fuzziness can actually arise from a surfeit of detail or thought, rather than a deficit or either; the definition of an object may be incomplete because so many sense impressions, images, stray thoughts, academic theories and whatnot are attached to its meaning that we can neither include them all nor leave any out.
…………As we shall see in future articles on uncertainty management, the manner in which the meaning of set membership can be altered to incorporate evidence theory and the like is indeed empowering, but calls for a lot of mental rigor to resist unconscious drifts in definition. It’s an all-too human problem that can occur to anyone, particular when mind-blowing topics are under discussion; it’s even noticeable at times in the writings of brilliant quantum physicists, who sometimes unconsciously define their terms slightly differently at the beginning of a book than at the end, in ways that nonetheless make all the difference between Schrödinger’s Cat being alive or dead. “Definition drift” also seems to be a Big Problem in Big Analysis for the same reason. It likewise seems to occur in texts on fuzzy sets, where term “fuzz” is often accurately described on one page as a solution to innate imprecision, but on the next, is unconsciously treated as if it were a magic potion that ought to be poured on everything. Another pitfall is getting lost in all of the bewildering combinations of fuzziness I introduced briefly in the taxonomy, but the answer to that is probably to just think of them in terms of ordinary natural language and only use the academic names when sifting through the literature for appropriate membership functions and the like. Above all, avoiding modeling crisp sets that have inherently Boolean yes-or-no membership values as fuzzy sets, because as the saying goes, you can’t be “a little bit pregnant.” Continuous scales can certainly be added to any math object, but if the object being modeled is naturally precise, then it is at best a waste of resources that introduces the risk of fallacious reasoning and at worst, an opening for someone with an axe to grind to pretend a particular scale is much more imprecise than it really is. One dead giveaway is the use of short scales in comparison to the length of the original crisp version. For example, this is the culprit when quibbling erupts over such obviously crisp sets as “dead” and “alive,” on the weak grounds that brain death takes a finite amount of time, albeit just a fraction of a person’s lifespan. It might be possible to develop a Ridiculousness Score by comparing the difference in intervals between those few moments, which occur on an almost infinitesimal scale, against an “alive” state that can span 70-plus years in human beings or the “dead,” which is always infinite. I haven’t seen that done in the literature, but in two weeks, I’ll demonstrate how the complements of fuzzy sets can be used to quantify just how imprecise our fuzzy sets are.  The first two installments of this series were lengthy and heavy on text because we needed a solid grounding in the meaning of fuzzy sets before proceeding to lessons in T-SQL code, but the next few articles will be much shorter and immediately beneficial to anyone who wants to put it into action.

[1] pp. 290-293, Klir, George J. and Yuan, Bo, 1995, Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall: Upper Saddle River, N.J.

[2] IBID., pp. 287-288, 292-293.

[3] IBID., p. 281.

[4] For a quick introduction to the various fuzzy set types, see the Wikipedia article Fuzzy Sets at http://en.wikipedia.org/wiki/Fuzzy_set. I consulted it to make sure that I wasn’t leaving out some of the newer variants that came out since Klir and Yuan and some of the older fuzzy set literature I’ve read, much of which dates from the 1990s. I lost some of the citations to the notes I derived these three paragraphs from (so my apologies go out to anyone I might have inadvertently plagiarized) but nothing I said here can’t be looked up quickly on Wikipedia, Google or any recent reference on fuzzy sets.

[5] Hinde, Chris and Yang, Yingjie, 2000, A New Extension of Fuzzy Sets Using Rough Sets: R-Fuzzy Sets, pp. 354-365 in Information Sciences, Vol. 180, No. 3. Available online at the web address https://dspace.lboro.ac.uk/dspace-jspui/bitstream/2134/13244/3/rough_m13.pdf

[6] p. 17, Klir and Yuan.

[7] Smith, Nicholas J. J., 2004, Vagueness and Blurry Sets, pp 165-235 in Journal of Philosophical Logic, April 2004. Vol. 33, No. 2. Multiple online sources are available at http://philpapers.org/rec/SMIVAB

[8] See  Pagola, Miguel; Lopez-Molina, Carlos; Fernandez, Javier; Barrenechea, Edurne; Bustince, Humberto , 2013, “Interval Type-2 Fuzzy Sets Constructed From Several Membership Functions: Application to the Fuzzy Thresholding Algorithm,” pp. 230-244 in IEEE Transactions on Fuzzy Systems, April, 2013. Vol. 21, No. 2. I haven’t read the paper yet (I simply can’t afford access to many of these sources) but know of its existence.

[9] See Rogers, Leo, 2014, “The History of Negative Numbers,” published online at the NRICH.com web address http://nrich.maths.org/5961.

[10] pp. 261-262, McNeill.

[11] Chesterton, G.K., 1923, St Francis of Assisi. Published online at the Project Gutenberg web address http://gutenberg.net.au/ebooks09/0900611.txt