All posts in Troubleshooting

SharePoint Custom Solution Crashes IIS Worker Process (w3wp.exe) – Part 2

In Part 1 of this series, I discussed an issue we were having in one of our SharePoint 2013 farms and how I determined the issue was occurring because of a set of event receivers acting on the library.  In this post, I will discuss the code being used and what the final result was determined to be.  Stick around, it’s not what you think.

To be terribly honest nothing jumped out at me while looking over the code.  The initial review of the code indicated the issue could be around where the event receiver was trying to determine if the user adding the file was a member of the site owner group.  The original code was:

This seems pretty straight forward, but when “CheckIfUserInSPGroup” is called things aren’t quite as kosher.

Again, normally not a huge issue, except that best practices state that you shouldn’t instantiate SPSite, SPWeb or SPList objects within an Event Receiver.  The reason for this is it causes extra database calls (more information here: https://msdn.microsoft.com/en-us/library/office/ee724407(v=office.14).aspx).  I thought this could be the culprit, but wasn’t convinced.  If this was the issue, why does it work fine for years and then suddenly stop working?  The reason the code is instantiating the SPSite and SPWeb object is it is used elsewhere in the solution and could be called by users who do not have the required access.  The same goes for the event receiver.  If I do not have access to control security groups in the site, I get an UnauthorizedAccessException.

So I thought, why not just use that.  We can safely assume that if the UnauthorizedAccessException error is thrown, the user is not in the Owners group.  So I updated the code with a try\catch (why one wasn’t already being used I don’t know) and added some logic into the catch.  Not generally the best method, but when used for targeted exceptions I believe acceptable IMHO.

I also created a new method for the event receiver to call.  I couldn’t modify the existing one, as it contained valid logic to handle users without access and was being used elsewhere.

So I moved the code into Pre-Prod and tried it out.  No change.  Still hanging, throwing errors and crashing the app pool.

Next step was to install Visual Studio into Pre-Prod and attach to the IIS Worker process.  I followed the code until it got into the newly created CheckIfUserInSPGroupEvntRcvr method.  There it stayed.  It kept looping through the AD users and groups within the SharePoint group.  As it was looping I watched the worker process memory usage grow and grow until it finally crashed again.  This didn’t make any sense as there are NOT that many users in these groups.

The Cause of it All

I took a look at the ownership group for the site I was testing with.  Like most (not all) of our project sites, it contained an AD group that contains our project team.  Let’s call that group All-Project.  All-Project had about a dozen users within it, however, there was an anomaly.  It also contained the Owners group from another project site.  This was an oddity.  I took a look at the Owner group and it also contained the same All-Project group. There was the culprit.

As you can see in the code above, it is designed for nested groups, so if the code hits a group it digs down to see if the nested group contains the user.  Because this Owner group was added (in error I found out while trying to figure out why it was there) to the All-Projects group, the code would dig into All-Projects then to the Owners group, from there back into the All-Projects group and then back into the Owners group… see where I am going with this?  By adding that single group to the All-Projects group in error an infinite recursion loop was created in the code.

The Final Fix

So the final fix was not an environmental change or a code modification.  It was simply to remove the Owners group from the All-Projects group.  Once that was done, the original code functioned as designed.  If this becomes a regular occurrence I will have to update the code to handle such an event, but in this case, I didn’t.  The farm is in containment (no further development short of break\fix) and the issue has not occurred for two years before this.  I hope the steps I documented in this blog series helps others out.

 

Thanks for reading!

SharePoint Custom Solution Crashes IIS Worker Process (w3wp.exe) – Part 1

Had a doozy of an issue the other day.  All of a sudden, a SharePoint farm that has been chugging along with no changes suddenly started having some weird issues.  Users could open, view, edit documents, but as soon as they attempted a save or an upload of a new document things started to go bad.  If they were using Windows Explorer they received the error: “The specified network name is no longer available”

SharePoint Custom Solution Crashes IIS Worker Proces - Windows Explorer Error

If they were using the GUI the upload form hung for a while and eventually reverted to “The Page Cannot be Displayed”

At the same time, we were getting reports of users in other areas of the farm getting a very slow response within SharePoint.  What was really confusing about this was that the issue was happening to just a single site collection in the farm.

Errors Received

Windows Event Log

We were receiving a number of errors besides those at the end user level.  The server event log indicated our app pool was crashing.  The error received was actually a warning (to me if an app pool is crashing, it should be an error) with the msg:

A process serving application pool ‘SharePoint Web Apps’ suffered a fatal communication error with the Windows Process Activation Service. The process id was ‘6292’. The data field contains the error number.

SharePoint Custom Solution Crashes IIS Worker Proces - EventLogError

In the multiple WFE environment it was happening back and forth between the two serves indicating the load balance was doing its job.  It also indicated why people were seeing slow response.  Each time the app pool failed, it had to restart and then reload the SharePoint environment (like you see after an IIS Reset).

ULS Logs

The ULS logs were something else.  In this particular environment our logs usually range from 5MB-40MB in size for a 30 min period.  When I ran a one minute log export using “Merge-SPLogFIle” the exported file was 1.3 GB.  Nothing screamed error at me, however there were a couple of things standing out.

06/21/2017 10:30:38.00        w3wp.exe (0x112C)        0x1E58        SharePoint Foundation        Performance        naqx        Monitorable        Potentially excessive number of SPRequest objects (16) currently unreleased on thread 46.  Ensure that this object or its parent (such as an SPWeb or SPSite) is being properly disposed. This object is holding on to a separate native heap. Allocation Id for this object: {C3DC973B-90B4-4974-A33D-A5A05A722DF7} Stack trace of current allocation:    at Microsoft.SharePoint.SPGlobal.CreateSPRequestAndSetIdentity(SPSite site, String name, Boolean bNotGlobalAdminCode, String strUrl, Boolean bNotAddToContext, Byte[] UserToken, SPAppPrincipalToken appPrincipalToken, String userName, Boolean bIgnoreTokenTimeout, Boolean bAsAnonymous)     at Microsoft.SharePoint.SPWeb.InitializeSPRequest()     at Microsoft.SharePoint.SPWeb.EnsureSPRequest()     at Microsoft.SharePoint.SPSite.OpenWeb(String strUrl, Int32 mondoHint)     at Microsoft.SharePoint.SPSite.OpenWeb(Guid gWebId, Int32 mondoHint)     at Microsoft.SharePoint.SPSite.OpenWeb(Guid gWebId)….

06/21/2017 10:31:21.09        w3wp.exe (0x112C)        0x1E58        SharePoint Foundation        General        8m90        Medium        1045 heaps created, above warning threshold of 128. Check for excessive SPWeb or SPSite usage.        a8dafd9d-9faa-70d5-b0e7-8c1711386713

So this screamed of some custom code (which we do have) running that is not disposing of the SPSite or SPWeb objects properly.  Why it suddenly became a problem I don’t know.  We did have security patches happen on the server over the weekend.  I didn’t think it likely to be the cause as the environment had been used for a day and a half with no issues.  We backed out of the patch anyways, but didn’t affect the issue occurring.  What was also confusing was this issue was also occurring in our Pre-Prod environment.  The silver lining is now I could really do some troubleshooting without affecting sites that were functioning or production data.

I finally tracked down the issue to an event receiver we have running in our environment.  The project sites all of the same structure and it was decided that code would be used to enforce this structure.  To that end, event receivers were built to ensure folders at certain levels (library root, root +1 level and root +2 levels) were not deleted nor files or folders at those levels were added.  I took a guess that these event receivers were causing the issues.  Using PowerShell I removed the event receivers from a library being affected.  In case you need this for something else the code to remove a list event receiver is:

In the above code (which removes the event receivers from ALL specified libraies in ALL subsites) I used the event receiver class to find the items I wanted to remove.  You can also use .Name and .Assembly if you wish. I used Class simply because when the sites were created and the receivers attached, no names were given.  With the event receivers removed, users were now able to upload and save documents.  So I had indeed found the culprit.  Now to determine why.

I’ll cover the review of the code and the final determination of the cause of the issue in Part 2 of this series.

 

Thanks for reading!

Field or Property ‘TimeZoneId’ does not exist when using SharePoint Search

Our test environment was giving me issues today.  It’s an error that was reported and supposedly fixed nearly 18 months ago.  When typing a query in the text box, the initial search works, but when using a refiner or changing the query the result would return an error Field or Property ‘TimeZoneId’ does not exist.  This was first reported after the July 2015 CU as a regression and supposedly fixed in the August 2015 CU.  The problem was in the Microsoft.Office.Server.Search.ServerProxy.dll.  Now I admit our servers are out of date when it comes to CUs (something I have been trying to get time from my client to do) but we are well beyond August 2015 CU.

Our environment is setup with Search being provided by our App servers in a Shared Services configuration.  When checking the WFE I found the dll version to be at 15.0.4815.1000.  We have two app servers that are running the search services.  App#1 had the same version of the dll, but App#2 was actually sitting at an older version: 15.0.44020.1017.  I believed this to be the culprit.

Resolving: “Field or Property ‘TimeZoneId’ does not exist

  1. Shut down IIS on the server you need to update.
  2. From a server with the correct version copy Microsoft.Office.Server.Search.ServerProxy.dll to the _app_bin folder of the SharePoint virtual directory (example: C:\inetpub\wwwroot\wss\VirtualDirectories\AppServer\_app_bin)
    • If you don’t have a “good” location, the correct file should also be located in “C:Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\CONFIG\BIN”
  3. Restart IIS and test.  Should be fine now.

The cause is likely this single file not being copied over properly when patched.  Hope this helps someone still having this issue.

 

Thanks for reading!

Unable to Add Existing Site Columns to Content Types in SharePoint

While deploying a new solution to our test environment for my client the other day I found that I was unable to add existing site columns to content types within the SharePoint site I was working in.  No real reason given.  As you can see from the screenshot below the GUI wasn’t much help:

Unable to Add Existing Site Columns to Content Type - GUI Error

However, as always SharePoint provided the handy-dandy correlation id.  Using that I was able to get a great deal more information on the problem.

Specifically, the problem was: “No two choices should have the same ID“.  Looking at the stack trace you can see that it looks like SharePoint is looking at the fields that exist in the site.  This makes sense because it has to build that list for you to choose from doesn’t it?  I scanned through my list of site columns and noticed that for some reason I had two columns called “Hashtags”.  I am not entirely sure what this field is for, but I believe it is added with the Newsfeed.

Unable to Add Existing Site Columns to Content Type - Multiple Hashtags

So the next step?  Well let’s delete the extra column.  Attempting to delete via the GUI just resulted in a window that never provided a response, nor deleted the offending field.  Next onto PowerShell.  While getting the field object I found that the two fields actually existed with the exact same GUID.  So that’s where the error message above came from.  While deleting via PowerShell I got a bit further and received the message: “Site columns which are included in content types cannot be deleted“.   I didn’t have a content type sitting there with Hashtags within it, so ran a script to go through each content type and to look for the field.

There were two hidden, system content types Project Policy and System Media Collection that contained the site column.  I did a few things to try and remove the site column from the content types, but I am not going to outline here because I don’t want to give any of you ideas (they aren’t something you want to do in a production environment and they didn’t work anyways).  Needless to say, I did everything I could think of but I could not remove the extra field from the sites.

Placed a call with Microsoft to see if they had any suggestions.  Apparently, according to the support engineer I worked with, Microsoft has seen this before.  Nowhere in the vast interwebs did I find this information.  One reason for this post is for prosperity ;-).  The fix is actually really simple.

Removing the Extra Field

Microsoft provided me with the feature definition of the Hashtags field.  The solution was to install the feature into the farm, activate the feature on the sub sites that had the Hashtags field existing within (whether they had the double instance or not) and then disabling the feature within the site.  Finally, we removed the feature from the farm as well.  I will provide that feature definition in this post, but want to make something very clear.  I did not create this, nor do I provide any warranty or take any responsibility if applying the solution causes instability in your farm.

You can download the feature here.

To apply the feature to correct your farm, follow these steps.

  1. Extract the zip file and place the MMSField folder in the directory: C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\15\TEMPLATE\FEATURES.
  2. Open a SharePoint PowerShell window and run the following command:

  1. After the feature is added to the farm run the following script to check each sub site within your site collection to add and remove the feature (this step will force the system to remove the field and re-add it properly).

  1. Remove the solution from the farm by running the following command:

Once this is complete you will have removed the extra instances of the HashTags field and can now add site columns to your content types without issues.

 

Thanks for reading!

Accessing SharePoint Site: “The context has expired and can no longer be used”

Just a quick little note of a quick solution I came up for a weird problem that occurred on my dev server.  I was preparing for a presentation and when accessing the site that contained all settings, data and code for the presentation was immediately receiving the error message: “The context has expired and can no longer be used”.

The context has expired and can no longer be used - Error Message

Read more