Recently at my client site (I have a lot of posts that start this way) we have been getting more and more requests for groups that want to bring higher amounts of data into SharePoint. These requests are really pushing the limits of SharePoint Storage thresholds. So I started looking into the ways that we can get around that. Our thought was that since Microsoft recently announced being able to handle 25TB of data for SharePoint Online Site Collections. We should be able to easily handle the 4TB ceiling in our on-prem environment.
SharePoint Database Size Limits
The limitations of SharePoint’s content databases are pretty well documented here: https://technet.microsoft.com/en-CA/library/cc262787.aspx#ContentDB. But in a nutshell you want to keep your content databases below 200GB. The same document actually suggests splitting out your site collections if the content DB reached more than 100GB. This would be to allow for growth within the sites.
But what if it’s a single site collection within that database? This now means you should consider branching off the site collection into multiple site collections. For example, create an archive site collection to house data that is no longer actively updated or used. Likely this will cut down on your data usage a great deal. You will have to migrate the data in order to do it, but it is a necessary evil to save on space.
Why the Database Limits?
For the most part the limits are based around the internal tools of SharePoint. Microsoft states some of the site collection actions like moving a site collection or backing up and restoring could risk full database locks thus affecting other site collections in the content DB. It could even fail outright.
Along the same lines, patching your environment can drastically increase with huge DBs. Especially if the patch requires a DB modification. This doesn’t happen often, but has been known to occur. So if you are applying a CU to your environment, you may need to be prepared for very long processing times. And remember: the longer something takes to run, the more chance of failure as other processes within your environment are still running and could affect the work you are doing.
What about Remote Blob Storage?
So while I was looking into pushing the limits of SharePoint storage I did a lot of investigation into the concept of remote blob storage and what this does for us. While researching RBS solutions and what they could do for our environments I found the terminology and information provided by some of the providers indicated you could exceed the 200GB limitation using their tools. They didn’t actually come out and state their tool could do it, but they way it was worded gave myself and colleagues that impression. I can’t stress this enough: RBS solutions do NOT allow you to break the recommended storage ceiling.
What do they do? Well I am glad you asked. Most often, because of the use and type of data, DB drives are put on faster, more expensive drive hardware. What RBS allows you to do is take the bulk of that data and move it to storage that is fast enough, but not as expensive as the DB drives. It doesn’t break the ceiling, just provides another method of storing the same data cheaper (but then again, the cost may even out as you are now paying for the software to do this and the supporter’s hours instead of just the hardware). However, when backing up, restoring and\or patching the environment the same issues are going to occur. The process is still going to pull the data from RBS into your backup location as well as the added complexity of having a middle-ware in place while trying to patch. RBS solutions tend to really complicate things too. This is because the data is no longer located in the same location. This means performing SQL backups is not going to cut it. SQL backups will get you the data still in the content DB and the metadata for the RBS locations, but it will save the actual blob data being stored outside of the content DB. This means you need to use the vendor’s software to perform your backups.
Pushing the Limits of SharePoint Storage
So what if you can’t split out your site collections, or the data in that single site collection can’t be archived or split out somewhere else. This is the question that was tasked to me in our environment. The first thing you need to do is speak with your middle-ware team or whatever your data storage team is called. Explain the need to attain at the very least .25 IOPS per GB to the recommended 2 IOPS per GB within the disk system. Determine if the disk you are on, or the disk you could be on, support that level of throughput. In our case, our data center was believed to not only meet, but exceed the recommendations, and this was with the cheap disk. Because they didn’t have the space I needed to test (I asked for 10 TB) they were able to give me 5 TB on the faster, fibre channel disk.
My initial test was uploading enough files into a single SharePoint library in order to push the content DB to about 750 GB. At this point the interactions with the site were still normal and you couldn’t tell (other than the list threshold) that there was that much data in the site collection. That was the case until I started a site collection backup via PowerShell. Both the backup and restore took 40 hours to complete. This is because you are doing the backup through SharePoint which was designed to pull the data out of the content DB, not actually backup the DB itself. This adds a ton of overhead. This illustrates Microsoft’s concern around backups of this size. It takes a long time and a lot of things can go wrong in that time frame. This was further supported because the backup itself failed three times until I figured out it was because our VM backups “stunned” the SharePoint VM very, very briefly when completing the image backup. This allowed the network connection to break and fail the process. Perfect example of what Microsoft was warning with large environments. However, once removed from the backup process, the backup and restore worked just fine (if you ignore that 40 hour restore is ok). The restore took just about as long.
My recommendation is to completely ignore site collection backups, site exports, lists exports when dealing with large data sets (at least in your production environment). Pulling a production system out of a backup rotation is a bad idea. If you are looking at 40+ hours for a backup at 750 GB think what a backup at 2+ TB will be like. That’s a long time for your production environments to not have a backup run against them.
Instead, do your backups via SQL Server DB backups. Here’s why:
After testing the 750 GB DB via SharePoint backups my intention was to attempt a SQL Server DB backup and restore. However, the day I was going to start doing that I found out they were taking back the space granted for testing within the next few days. So over the weekend I dumped a great deal more data into SharePoint and moved the content DB to 2.3 TB of data. Again, the environment appeared to be responding fine. This time the SQL backup took only 6.5 hours, oodles faster than a SharePoint backup. That was also backing up to a network location not to a local drive. Unfortunately, I lost the drive space before I could fully test the restore process.
Further Testing Required
I had a number of tests remaining. Perhaps someone else who has the space would like to take up where I left off:
- Restore environment at 2+TB (Backup was successful)
- Move DB usage to 4 TB and perform backup and restore testing
- When at the max ceiling perform the following tests
- Create a new sub site. This appeared to take a long time at 750GB. I think it had to do with moving data around within the DB.
- Create new lists and libraries and put some data in it.
- Using stress testing software to pound the heck out of the system (impersonate multiple users) performing the following:
- Add files
- Edit files
- Delete files
- Open files
- Update metadata
- Run Workflows
Conclusions I Have Reached So Far
It is obvious that you can reach extreme levels of content within your SharePoint environment but there has to be certain controls and processes put in place. This is what Microsoft is talking about in that document I linked to at the beginning of this post. You have to plan for long backups and restores. You have to ensure you don’t have processes running that will kill these backups\restores mid-run. You have to have plans in place on how you are going to handle the data once it gets so large. What about Disasters? Do you have offsite storage for these backups? While my tests show that it is possible to reach the extremes of data storage, you really have to make sure that you have ALL your T’s crossed and I’s dotted. Because if something happens and you can’t handle the data, you just lost an unfathomable amount of information.
If I ever get the drive space back to perform more testing (I am certainly trying to). I will post a follow-up to my testing and make more concrete recommendations on moving forward.
Thanks for reading!