All posts tagged Error Resolution

SharePoint Custom Solution Crashes IIS Worker Process (w3wp.exe) – Part 1

Had a doozy of an issue the other day.  All of a sudden, a SharePoint farm that has been chugging along with no changes suddenly started having some weird issues.  Users could open, view, edit documents, but as soon as they attempted a save or an upload of a new document things started to go bad.  If they were using Windows Explorer they received the error: “The specified network name is no longer available”

SharePoint Custom Solution Crashes IIS Worker Proces - Windows Explorer Error

If they were using the GUI the upload form hung for a while and eventually reverted to “The Page Cannot be Displayed”

At the same time, we were getting reports of users in other areas of the farm getting a very slow response within SharePoint.  What was really confusing about this was that the issue was happening to just a single site collection in the farm.

Errors Received

Windows Event Log

We were receiving a number of errors besides those at the end user level.  The server event log indicated our app pool was crashing.  The error received was actually a warning (to me if an app pool is crashing, it should be an error) with the msg:

A process serving application pool ‘SharePoint Web Apps’ suffered a fatal communication error with the Windows Process Activation Service. The process id was ‘6292’. The data field contains the error number.

SharePoint Custom Solution Crashes IIS Worker Proces - EventLogError

In the multiple WFE environment it was happening back and forth between the two serves indicating the load balance was doing its job.  It also indicated why people were seeing slow response.  Each time the app pool failed, it had to restart and then reload the SharePoint environment (like you see after an IIS Reset).

ULS Logs

The ULS logs were something else.  In this particular environment our logs usually range from 5MB-40MB in size for a 30 min period.  When I ran a one minute log export using “Merge-SPLogFIle” the exported file was 1.3 GB.  Nothing screamed error at me, however there were a couple of things standing out.

06/21/2017 10:30:38.00        w3wp.exe (0x112C)        0x1E58        SharePoint Foundation        Performance        naqx        Monitorable        Potentially excessive number of SPRequest objects (16) currently unreleased on thread 46.  Ensure that this object or its parent (such as an SPWeb or SPSite) is being properly disposed. This object is holding on to a separate native heap. Allocation Id for this object: {C3DC973B-90B4-4974-A33D-A5A05A722DF7} Stack trace of current allocation:    at Microsoft.SharePoint.SPGlobal.CreateSPRequestAndSetIdentity(SPSite site, String name, Boolean bNotGlobalAdminCode, String strUrl, Boolean bNotAddToContext, Byte[] UserToken, SPAppPrincipalToken appPrincipalToken, String userName, Boolean bIgnoreTokenTimeout, Boolean bAsAnonymous)     at Microsoft.SharePoint.SPWeb.InitializeSPRequest()     at Microsoft.SharePoint.SPWeb.EnsureSPRequest()     at Microsoft.SharePoint.SPSite.OpenWeb(String strUrl, Int32 mondoHint)     at Microsoft.SharePoint.SPSite.OpenWeb(Guid gWebId, Int32 mondoHint)     at Microsoft.SharePoint.SPSite.OpenWeb(Guid gWebId)….

06/21/2017 10:31:21.09        w3wp.exe (0x112C)        0x1E58        SharePoint Foundation        General        8m90        Medium        1045 heaps created, above warning threshold of 128. Check for excessive SPWeb or SPSite usage.        a8dafd9d-9faa-70d5-b0e7-8c1711386713

So this screamed of some custom code (which we do have) running that is not disposing of the SPSite or SPWeb objects properly.  Why it suddenly became a problem I don’t know.  We did have security patches happen on the server over the weekend.  I didn’t think it likely to be the cause as the environment had been used for a day and a half with no issues.  We backed out of the patch anyways, but didn’t affect the issue occurring.  What was also confusing was this issue was also occurring in our Pre-Prod environment.  The silver lining is now I could really do some troubleshooting without affecting sites that were functioning or production data.

I finally tracked down the issue to an event receiver we have running in our environment.  The project sites all of the same structure and it was decided that code would be used to enforce this structure.  To that end, event receivers were built to ensure folders at certain levels (library root, root +1 level and root +2 levels) were not deleted nor files or folders at those levels were added.  I took a guess that these event receivers were causing the issues.  Using PowerShell I removed the event receivers from a library being affected.  In case you need this for something else the code to remove a list event receiver is:

$siteColURL = "<Site URL>";

$spSite = Get-SPSite $siteColURL

foreach($spWeb in $spSite.AllWebs)
{
    $spList = $spWeb.Lists["<LIBRARY NAME"];
    
    try 
    {
        for($i = $splist.EventReceivers.Count - 1; $i -ge 0; $i--)
        {
            $eventReceiver = $spList.EventReceivers[$i];
            if($eventReceiver.Class -eq "<EVENT RECEIVER CLASS>")
            {
                $eventReceiver.Delete();
            }
        }

        $spList.Update();
        
        Write-Host("Updated library for site: {0}" -f $spWeb.URL);
    }
    catch 
    {
        $ErrorMessage = $_.Exception.Message;
        Write-Host ("{0}: An error occurred deleting the event receiver for site: {1}.  Error received: {2}" -f `
                        (get-date).ToString("yyyy-MM-dd HH:mm:ss"), $spWeb.URL, $ErrorMessage) -ForegroundColor Red
    }    
}

In the above code (which removes the event receivers from ALL specified libraies in ALL subsites) I used the event receiver class to find the items I wanted to remove.  You can also use .Name and .Assembly if you wish. I used Class simply because when the sites were created and the receivers attached, no names were given.  With the event receivers removed, users were now able to upload and save documents.  So I had indeed found the culprit.  Now to determine why.

I’ll cover the review of the code and the final determination of the cause of the issue in Part 2 of this series.

 

Thanks for reading!

No results when performing search. Error in logs: A database error occurred. Source: Microsoft OLE DB Provider for SQL Server Code: 14 occurred 17 time(s) Description: [DBNETLIB][ConnectionOpen (Invalid Instance()).]Invalid connection.

Came across this error while trying to setup a new search.  The search did not return any results and the Event log for the WFE had the following error on it (note this was running SP Foundation, but the error could easily occur in any version).

 

 

 

 

Log Name:      Application

Source:        Microsoft-SharePoint Products-SharePoint Foundation Search

Date:          8/20/2012 10:47:20 AM

Event ID:      57

Task Category: Search service

Level:         Warning

Keywords:     

User:          XXXXXXX\Sharepoint.srch.tst

Computer:      XXXXXXXXXX

Description:

A database error occurred. Source: Microsoft OLE DB Provider for SQL Server Code: 14 occurred 17 time(s) Description: [DBNETLIB][ConnectionOpen (Invalid Instance()).]Invalid connection.

This one was a bit tricky to track down.  The solution was actually really simple and because I didn’t have full knowledge of the environment I was unable to track down the problem right away.
It turns out the database server I was trying to connect to did not make use of the default instance.  Instead it had a custom instance setup.  I didn’t initially notice this because since it was the only instance on the SQL Server I was automatically logged into it within SQL Management Studio even though I didn’t indicate I wanted to enter the default instance.

So the solution was to simply to ensure the search and crawl were pointed at the custom instance (they were pointed at the default).  Once this was completed the jobs run, the search returned data.  Just goes to show you how important knowing your environment is.