A Rickety Stairway to SQL Server Data Mining, Part 14.3: Debugging and Deployment
By Steve Bolton
…………Throughout this series of amateur self-tutorials in SQL Server Data Mining (SSDM), I’ve often said that working with Analysis Services is a bit like blasting off into space with the Starship Enterprise, because you may be boldly going where no man has gone before. My An Informal Compendium of SSAS Errors series remains one of the few centralized sources of information on certain arcane SQL Server Analysis Services (SSAS) errors, some of which turn up no hits on Google and appear to have no extant documentation anywhere on the planet. SSDM is one of the most powerful yet under-utilized components of Analysis Services, which represents Microsoft’s contribution to the cutting edge field of data mining, so we’re venturing even further into uncharted territory in this series. Within SSDM, the most powerful but obscure feature is the extensibility mechanism for writing custom algorithms, so that takes us even further out on a limb. Debugging is likewise one of the most trying tasks in the computing field, since it often involves delving into the unknown to find the causes of errors that may be completely unique to a particular user, system or piece of software. It is thus not surprising that debugging SSDM plug-in algorithms is perhaps the most difficult step to climb in this Rickety Stairway.
…………To complicate matters even further, deployment bugs are often the most difficult to ferret out in any type of software, due to the endless variety of system configurations it may be installed on, not to mention the fact that it is difficult to debug a program that isn’t even running yet. This issue is more prominent in the deployment of SSDM algorithms, for the simple reason that Visual Studio debugger will only be triggered on breakpoints in your custom code or the DMPluginWrapper.dll it references, not in the SSAS process (msmdsrv.exe) that calls it. In the last two tutorials I gave examples of how to write a custom algorithm in Visual Basic.Net classes and compile the DMPluginWrapper file, in which you can set breakpoints at any line of code you prefer. After that it is a matter of starting Analysis Services, then using the Attach to Process command in Visual Studio to link the debugger to the msmdsrv process and instruct it to break on both native and .Net code (in the case of my project, the .Net version was 4.0). Whenever breakpoints in the methods of the three main plug-in classes that we discussed in last week’s tutorial are hit, or those methods call breakpoints in the original DMPluginWrapper class they’re derived from, Visual Studio should break. One of the leading challenges of debugging these projects is that sometimes they don’t break when you’d expect them to, primarily because msmdsrv calls these methods in an order that is not readily apparent to programmers. For example, in an ordinary desktop .Net program, the call stack would begin at the breakpoint, then move to whatever routine called the routine the breakpoint, and so on until you eventually hit a Sub Main or similar entry point at the beginning of the application. With SSDM plug-ins, the call stack often starts with a single routine in your project, which leads back to a routine in the DMPluginWrapper file, which in turn is called by internal msmdsrv.exe routines that you can’t debug or control the execution of. As a result, tracking down the causes of errors is much trickier than in ordinary applications, because you can’t always tell for certain which line of code caused a particular error, or even determine why a routine in your code was called rather than another. The call stack is essentially decoupled, which makes it appear at first as if some errors are being generated from thin air out of left field. In these cases the errors usually turn out to be in your plug-in code, but are only detected in the space in the call stack taken up by internal msmdsrv methods. As mentioned last week, my aim for now is merely to provide the leanest possible version of a VB plug-in, merely to illustrate how SSDM can be directed to return any results at all. Even a bare bones algorithm missing prediction, navigation, feature selection, parameters and other functionality is not feasible without first surmounting the challenge of deploying anything at all. The daunting dilemmas of debugging are readily apparent from the beginning of the deployment phase, which makes it necessary to discuss the two topics together, before fleshing out more advanced features of plug-ins.
…………The process of deployment consists primarily of eight steps, including 1) generating .snk Strong Name Key files for your DMPluginWrapper and plug-in class in order to debug them, which only needs to be done once before adding them to the Global Assembly Cache (GAC); 2) adding the DMPluginWrapper to the GAC, as well as removing it first if it already exists; 3) compiling the latest version of the project; 4) registering your plug-in class as a COM class and if necessary, unregistering the old version first; 5) adding the plug-in class to the GAC, as well as removing any old versions first; 6) informing SSAS of the existence of the new algorithm; 7) restarting the SSAS service; then 8) using Attach to Process as outlined above to begin debugging. after that, you merely need to trigger the appropriate routines in your code, such as processing a mining model to trigger the InsertCases and ProcessCase routines in AlgorithmBase or refreshing the GUI view of mining model results in SQL Server Management Studio (SSMS) or Visual Studio to trigger various routines in AlgorithmNavigationBase, all of which we discussed in depth in last week’s article. All but two of these steps must be performed repeatedly during plug-in development, so I highly recommend placing most of the command line prompts I’ll mention hereafter in a .bat file that can perform them all with a single keystroke. One of those exceptions is Step #6, which only has to be performed once for any plug-in you write. The easy way is to run the XML for Analysis (XMLA) script below, substituting the GUID of your algorithm in the CLSID value and the name of your algorithm for mine, “Internal_Name_For_My_Algorithm.” If the name does not match the one specified in the AlgorithmMetadataBase.GetServiceName routine we discussed last week, you’ll see an error like this in your msmdsrv.log and Application event log: “The ‘ (the name listed in the GetServiceName routine will appear here) ‘ service which was returned by the ” data mining algorithm, does not match its ‘(your project name will appear here)’ algorithm in the server configuration file.”
Figure 1: Sample XMLA Code to Register a Plug-In
Figure 2: Newly Deployed Plug-In Depicted in the SSAS .ini File
…………If the XMLA script runs successfully, then after restarting the service you’ll see the change reflected in the msmdsrv.ini file. Under the ConfigurationSettings\Data Mining\Algorithms node, you’ll find the nine out-of-the-box algorithms listed, all prepended with the name “Microsoft” and sans any GUID tag. For each custom plug-in, you’ll see a listing like the one in Figure 2, in which the opening and closing tags are equivalent to the service name of the algorithm I developed in last week’s tutorial. Note that some versions of the XMLA script listed in other tutorials will set the Enabled value to True rather than 1, which doesn’t seem to cause any unwanted side effects I’m yet aware of, but I changed from the Boolean to the integer value to be consistent with the format used in the .ini file for the out-of-the-box algorithms. There really aren’t many ways to mess up the XMLA script, which only has to be run correctly one time for each plug-in. While troubleshooting my deployment problems I tried substituting incorrect names and GUID values that did not yet exist in the registry, which led to errors like these in the Application log and msmdsrv file: “The data mining algorithm provider (ProgID: 06da68d6-4ef0-4cea-b4dd-1a7c62801ed2) for the Internal_Name_For_My_Algorithm algorithm cannot be loaded. The following system error occurred: Class not registered” and “The data mining algorithm provider (ProgID: (a GUID other than the 06da68d6-4ef0-4cea-b4dd-1a7c62801ed2 sample value we’re using here) for the MyAlgorithm algorithm cannot be loaded. The following system error occurred: No such interface supported.” When editing the .ini manually instead of using the XMLA script, the range of errors is only limited by one’s imagination; it’s a bit like the old proverb, “God put obvious limits on our intelligence, but none whatsoever on our stupidity.” Nonetheless, I like to at least manually inspect the .ini after running an XMLA script like this or receiving other deployment errors, just to make sure; that is how I caught the subtle difference between the data types for the Enabled tag, for example. When manually editing the .ini file, don’t edit the <Services> tag under <DataMining> to add your plug-in class, because it’s strictly for built-in algorithms.[i] I recommend keeping shortcuts to the .ini and SSAS log files in a handy place, because you’re going to need to check them repeatedly when testing plug-in algorithms.
…………The script only needs to be run one time for each plug-in, but you obviously can’t run it successfully before compiling and registering your class at least once prior to this. The first step in the process also only needs to be performed once, although there is more room for error in compiling your projects with Strong Name Key files. The other six steps in the deployment process will be repeated ad infinitum, ad nauseum while you debug your code, so we’ll discuss them separately, even though there is some overlap between these topics. The fun begins with generation of the CLSID you see in the XMLA script above, which uniquely identifies your algorithm in the registry and can be generated through a wide variety of tools familiar to Visual Studio and SQL Server programmers, like guidgen.exe, the Create GUID menu function and the NewID() function. The best option when working with VB projects, however, is to check the Sign the assembly checkbox under the Signing tab of Project Properties, which will generate a GUID for you; copy it from the Assembly Information dialog under Project Properties and put it in the GUID tag that adorns your AlgorithmMetadataBase class declaration before compiling the project. This should be identical to the CLSID in the XMLA script. In the Assembly Information box, also select the Make assembly COM Visible checkbox. Do not, however, select the Register for COM Interop checkbox on the Compile tab, otherwise you may receive the following error during compilation: “(Name and Path of your Project).dll” is not a valid assembly.” Back on the Signing tab, you must also add a Strong Name Key file generated by using the Visual Studio sn.exe tool. The DMPluginWrapper.dll file you referenced must also be signed with an .snk and recompiled first, otherwise Visual Studio won’t break on exceptions or hit your breakpoints during debugging, nor will it be installed correctly in the GAC. At one point I received the error, “Unable to emit assembly: Referenced assembly ‘DMPluginWrapper’ does not have a strong name” because I signed my plug-in class but not the DMPluginWrapper.dll it referenced. So I took the easy way out and simply removed signing from my plug-in, which won’t work because you’ll encounter the following error when trying to add DMPluginWrapper to the GAC: “Failure adding assembly to the cache: Attempt to install an assembly without a strong name” I adapted the examples provided by Inaki Ayucar at the CodeProject.com webpage How to Sign C++/CLI Assemblies with a Strong Name and Prasanjit Mandal at the MSDN thread Failure Adding Assembly to the Cache: Attempt to Install an Assembly without a Strong Name and ended up with command line text like this: sn.exe -k DMPluginWrapper.snk. I didn’t need to specify the full path to the file, but as the saying goes, Your Mileage May Vary (YMMV). It is also a good idea to verify the signing by using the –v switch on sn.exe. It is critical not to forget to add the following tag to your DMPluginWrapper’s AssemblyInfo.cpp file, substituting the name of the .snk you just generated: [assembly:AssemblyKeyFile("DMPluginWrapper.snk")]. Then rebuild the DMPluginWrapper project and reset the reference to the .dll in your plug-in class project. Add an .snk for that project as well on the Signing tab, then rebuild it as well. Compilation should occur with errors at this point and adding both classes to the GAC should now be feasible – provided you do not stumble over one of the following hurdles when adding the assemblies.
…………Experienced .Net programmers know that mismatches between Framework versions or 32-bit and 64-bit platforms can get ugly, which is a problem that is likely to occur while adding the assemblies to the GAC or the registry. The DMPluginWrapper.dll file does not have to be registered and only needs to be added to the GAC one time, but your plug-in must be unregistered, registered again, then removed and added back to the GAC every time you change its code. The catch is to use the right versions of the Regasm.exe registry tool and gacutil.exe. It is not a simple matter of opening an administrative command prompt, navigating to the Debug or Release folders, then running gacutil.exe and Regasm.exe with the appropriate switches, because there is probably more than one version of gacutil and Regasm on your system. Since I compiled my plug-in as a 64-bit class on a 64-bit machine using version 4.0 of the .Net framework, I found the right version of gacutil.exe under the x64 subfolder of the NETFX 4.0 Tools folder, as listed in the sample paths in Figure 3. I also had to use the version of Regasm included under the Microsoft.NET\Framework64\v4.0.30319 folder for the same reason. There aren’t many sources of information on the Internet for debugging SSDM plug-in deployment, but in at least two of them, Bogdan Crivat (one of the developers of the plug-in architecture) helped users who apparently had problems with matching 32-bit versions of Regasm with 64-bit versions of DMPluginWrapper.dll or vice-versa.[ii] He recommends checking the 64-bit registry to make sure the plug-in class was registered in the right hive, but determining which version of the registry you’re actually viewing in Windows 7 is not always straightforward. The correct versions of Regasm and gacutil may of course be located in different folder on your machines, but the principle of matching them with the correct .Net version and platform remains the same. If you receive an error using the commands in Figure 3 with the text “Failure adding assembly to the cache: The filename, directory name, or volume label syntax is incorrect, try adding the file extension to the filename, which in this case would be “.dll.” Note in Figure 3 that you can leave off the extension when removing an assembly from the cache with the –u switch, but must include it when adding an assembly, which is typically done with the –if switch.
Of course, none of that will work if your platform or .Net Framework versions aren’t identical in the DMPluginWrapper and your plug-in class. The good news is that I have had no problems running plug-ins with .Net 4.0, even though the architecture was originally developed on .Net 2.0. If the DMPluginWrapper has a different .Net version, however, you may receive the following error when using gacutil: “Failure adding the assembly to the cache: This assembly is built by a runtime newer than the currently loaded runtime and cannot be loaded.” In this case, you’re better off setting the .Net version in DMPluginWrapper to the one in the plug-in class; for whatever reason, when I tried to reset it in the other direction, I received an error that the DMPluginWrapper has an incompatible runtime and the references are no longer valid. To make them match, I had to edit the .vcxproj file for the C++ DMPluginWrapper project by hand and change the set ToolsVersion=”4.0″ under the Project tag near the top. Strangely, I was able to leave the <TargetFrameworkVersion> tag set to v2.0 without receiving mismatch errors with Framework version of 4.0, but I’ll leave an explanation of that mystery to C++ and deployment gurus; all I know is that it worked. Making the platform versions match was much thornier problem. Simply setting the Build/Configuration Manager setting to x64 in the DMPluginWrapper.dll project was not sufficient to get the job done. Editing the AssemblyReference tag in DMPluginWrapper.vcproj by hand to set processorArchitecture=x64 was necessary, but not sufficient. The MSDN thread How to: Configure Visual C++ Projects to Target 64-Bit Platform was critical to fixing the problem, especially the hint about changing the Preprocessor Definitions entry on the Preprocessor tab of the Project Properties. It should be set to something like “WIN64;_DEBUG;%(PreprocessorDefinitions),” yet even that does not represent a complete solution. The Target Machine on DMPluginWrapper also has to be set to MachineX64 (/MACHINE:X64); while you’re at it, you might as well verify that your plug-in class also has a Target CPU of x64, just to make sure. Then recompile the DMPluginWrapper and add it to the GAC again, then reset the reference to it in the plug-in project to the new version.
…………It is helpful to keep the Application event log and msmdsrv.log handy, because they will indicate whether or not deployment failed using roughly similar error messages. I keep the event logs open in a Microsoft Management Concole (MMC) window at all times anyways, but this is more useful than ever when debugging plug-ins. After successful deployment, they should report messages similar to this every time the SSAS service starts: “The data mining algorithm provider (ProgID: 06da68d6-4ef0-4cea-b4dd-1a7c62801ed2) for the Internal_Name_For_My_Algorithm algorithm was loaded.” The most difficult dilemmas in plug-in development arise when you see anything other than this text, because you can’t use Visual Studio to debug errors in plug-in classes if they can’t even be loaded. The server may continue to run fine, but your plug-in code won’t be called if you run into this category of errors. Some of them are easy to pin down, like an incorrect service name, GUID or class name, which will produce the aforementioned “no such interface supported,” “does not match its…algorithm in the server configuration file” and “class not registered” messages. A “cannot be loaded. The following system error occurred” message followed by blank text is sometimes evidence of incorrect Namespace declarations in your AlgorithmMetadataBase, AlgorithmBase and AlgorithmNavigationBase files. The most difficult problems I’ve yet encountered have been “class not registered” errors reported in the logs, even though the classes are registered correctly. Your plug-in class may appear in the registry with the correct framework version, GUID and all that, but loading may still fail due to errors in AlgorithmMetadataBase that Visual Studio can’t catch. As Crivat says in the SQL Server Data Mining Managed Plug-In Algorithms Tutorial file included with the plug-in SDK,
“Certain metadata errors do not result in exceptions. An exception thrown when the metadata is initialized would prevent the managed plug-in from loading and there is no way for the developer to detect the cause of that error. For this reason, certain errors in the metadata can be detected by inspecting the schema rowsets exposed by the Analysis Services server. Such errors include mining parameter errors and custom functions errors. If your algorithm uses mining parameters or defines custom functions, make sure to inspect the schema rowsets for algorithm parameters and mining functions, respectively, and check the parameter/function description. If an error is detected in the definition of these objects, the error will be displayed in the respective schema rowset.”
…………Custom functions and parameters are a topic that we will put off for the time being, but last week’s tutorial features a screenshots in which the SUPPORTED_MODELING_FLAGS column of DMSCHEMA_MINING_SERVICES schema rowset held a value of “Error.” This means that my plug-in loaded successfully, but that the GetSupModelingFlags function in AlgorithmMetadataBase returned an invalid value of some kind. Since AlgorithmMetadataBase returns all of the basic properties and functionality of an algorithm, SSAS will call its routines when the service starts, whereas the routines in AlgorithmBase or AlgorithmNavigationBase are not called until model processing or navigation through the results takes place. Once you’ve progressed from errors in AlgorithmMetadataBase that prevent a plug-in from loading to errors like the one in GetSupModelingFlags that do not, then you’ve crossed the Rubicon of SSDM plug-in debugging. When the service starts, SQL Server seems to call functions in AlgorithmMetadataBase that we discussed last week, in a particular order: GetServiceName; GetServiceType; GetServiceName again; GetServiceName a third time; GetServiceDescription; GetViewerType; GetDisplayName; GetPredictionLimit; GetTrainingComplexity; GetPredictionComplexity; GetExpectedQuality; GetScaling; GetSupportsDMDimensions; and GetDrillThroughMustIncludeChildren, sometimes followed by certain methods in the AlgorithmNavigationBase class. If you can get far enough down this chain, then Visual Studio will hit your breakpoints and the most difficult part of debugging will be over. You can set breakpoints in most of these, but not in the class declaration. The worst error I received turned out to be a missing tag in the AlgorithmMetadataBase class declaration: <MiningAlgorithmClass(GetType(MyAlgorithm))>, in which MyAlgorithm was the name assigned to an instance of my AlgorithmBase class. The Application and SSAS logs reported that the algorithm could not be loaded because the class was not registered, even though it was, just not correctly. To catch these types of uncatchable errors, I recommend keeping a bare bones side project based on Crivat’s original tutorial code, either in the original C# or converted to a .Net language of your choice, like VB. I finally trapped this error by simply removing my own lines of code from AlgorithmMetadataBase with his code, which I knew worked, until I finally found the missing class declaration that reproduced the error. If you debug in this way, make sure both projects reference the same version of DMPluginWrapper.dll, just to make sure that there are no discrepancies that might be overlooked. The AlgorithmMetadataBase class may also be the source of a minor error that does not show up until model deployment, at which point you may receive an error stating that you mining model does not have a key – which it won’t, unless the array returned by the GetSupInputContentTypes method includes a value of MiningColumnContent.Key. Most of the other errors associated with this class, however, are often showstoppers that are difficult to track down.
…………Once you’re past this no man’s land, AlgorithmMetadataBase is easier to debug because you can hit its breakpoints in the debugger. Once an algorithm is loaded, the locus shifts to AlgorithmBase, which is not trivial to debug in comparison to classes in other .Net project types, but is still several orders of magnitude easier than AlgorithmMetadataBase. This is the point at which keeping a Profiler trace makes sense, because it may trap processing errors that may not appear in the Application log, and will never be seen in msmdsrv.log. Some of these are easy to debug, while others are not. As mentioned last week, I prefer iterating over explicit arrays to the While and For/Next loops often seen in other tutorials, so I expected to receive more errors about indexes being out of bounds and the like. It comes with the territory. What I did not expect was that I’d end up in uncharted territory, in which Profiler would report an error like “The job completed with failure” shortly after calculating the marginal stats and reporting that it was “Training Mining Model.” This suggested that there was a bug in one of my processing routines in AlgorithmBase, but I only had a couple of them, like InsertCases, ProcessCase, LoadContent and SaveContent, plus some simple algorithm-specific calculations in the SkewnessKurtosisClass objects I mentioned in the previous tutorial. Sometimes Visual Studio would hit my breakpoints in these routines as expected but not report any errors until they had exited, without moving to any other method. Setting breakpoints in the DMPluginWrapper code they referenced didn’t help either. Evidently, the internal routines within SSAS that call the plug-in code reported the error, but there was no way to determine what those routines consisted of, or why they reported errors. As expected, in many cases I missed some array bounds, but I couldn’t view the offending values in a Watch window, or break on the exceptions, because the exceptions were not thrown until after the routines exited with apparent success. On one occasion I was able to catch one of these errors by examining the extra information included in a Profiler trace, in which I found the message “Index was outside the bounds of the array” embedded deep in the verbose output. Object instantiation errors may likewise be accompanied by the message “COM error: COM error: PluginProject; Object reference not set to an instance of an object,” but determining which object in which routine remains an open question. This is not to be confused with a similar message that often accompanies failure to load algorithm, which is usually caused by a bug in your AlgorithmMetadataBase code. The most difficult instance was when I missed some array bounds in ProcessCase, which didn’t report an error until it exited with apparent success, returned to InsertCases, then moved to SaveContent and again exited that method with apparent success. InsertsCases doesn’t call SaveContent directly; Analysis Services does that for you, in the internal msmdsrv.exe code, which you can’t read or set breakpoints in. Worse still, in this particular case, msmdsrv called a couple of other methods after the erroneous code, which may set programmers looking in the wrong direction for their bugs. Another common error seems to be “Internal error: An unexpected error occurred (file ”, line , function ”),” which occurred most often in the custom algorithm logic embedded in my SkewnessKurtosisClass methods.
…………The difficulty of debugging plug-ins decreases by another order of magnitude once you reach the point of retrieving processed results. The first method called during retrieval is AlgorithmBase.LoadContent, which merely loads the records written to disk the last time SaveContent was called during successful processing. If you process a model, shut down the machine and return a week later to view the results in the GUI, Analysis Services will call LoadContent. The good news is that failure of this routine due to errors like missed array bounds and uninstantiated objects will often be reported in dialogs in the GUI instead of Profiler or the logs. “The MyMiningModel mining model cannot be loaded” is usually a clue that the problem is somewhere in LoadContent. On the other hand, “Execution of the managed stored procedure GetAttributeValues failed with the following error: Exception has been thrown by the target of an invocation” could be the result of errors in practically any of the methods of AlgorithmNavigationBase. Such errors are a lot easier to debug than those in AlgorithmBase, because they don’t require reprocessing and seem to follow a more easily recognizable call stack. Debugging of these methods is not a breeze, but it is much easier than the critical troubleshooting that must be done in AlgorithmMetadataBase and AlgorithmBase just to deploy a plug-in and process models with it. Since it’s a less weighty topic, we can put it off a few weeks until we can get into an in-depth discussion of how to navigate more complex trees structures with this class. Before that, however, we need to construct those trees, which means adding nodes in the Predict method of AlgorithmBase. The next step up this Rickety Stairway will be a discussion of how to use that routine to produce the predictions returned in Data Mining Extensions (DMX) queries and the GUI tools.
[i] For more information, see Bogdan Crivat’s post in the MSDN thread Msmdsrv.ini (Split post): Error registering Plugin.