Backup & DR Storage Articles - Altaro DOJO | Backup & DR https://www.altaro.com/backup-dr Backup and disaster recovery guides, how-tos, tips, and expert advice for system admins and IT professionals Wed, 09 Mar 2022 17:12:00 +0000 en-US hourly 1 Backup Network Prioritization Best Practices https://www.altaro.com/backup-dr/backup-network-prioritization-best-practices/ https://www.altaro.com/backup-dr/backup-network-prioritization-best-practices/#respond Fri, 22 Oct 2021 14:44:13 +0000 https://www.altaro.com/backup-dr/?p=1455 An examination of Network Prioritization Backup starting at the basics and how to leverage advanced controls and optimal configurations

The post Backup Network Prioritization Best Practices appeared first on Altaro DOJO | Backup & DR.

]]>

In the depths of Windows Server Failover Clustering (WSFC), where the graphical interface cannot reach, network traffic shaping tools await the brave administrator. Most clusters work perfectly well without tuning network parameters, but not all. If you have low-speed networking hardware or extremely tight bandwidth requirements, then prioritization might help. This article shows how to leverage these advanced controls.

The Basics of Network Prioritization

If you have spent any time researching cluster network prioritization, then you likely noticed that most material dates back about a decade. This topic concerned us when networking was more primitive and unteamed gigabit connections pervaded our datacenters. Several superior alternatives have arisen in the meantime that requires less overall effort. You may not gain anything meaningful from network prioritization and you might set traps for yourself or others in the future.

Network Speed Enhancements in Windows

Windows Server versions beyond 2008 R2 provides network adapter teaming solutions. Additionally, 2012 added SMB multichannel which automatically works for all inter-node communications. These tools alone, with no special configuration, cover the bulk of network balancing problems that you can address at the host.

Speed Enhancements in Networking Hardware

For demanding loads, you have hardware-based solutions. Higher-end network adapters have significant speed-enhancing features, particularly RDMA (remote direct memory access), which comes on InfiniBand, RoCE, and iWarp. If that’s not enough, you can buy much faster hardware than 10 gigabit. These solutions solve QoS problems by providing so much bandwidth and reduced latency that contention effectively does not occur.

Software QoS Solutions

You can also configure software QoS for Windows Server and for Hyper-V virtual machines. This has similar challenges to cluster network prioritization, which we’ll discuss in the next section. Sticking with the QoS topic, networking hardware offers its own solutions. Unlike the other techniques in this article, QoS within the network extends beyond the hosts and shapes traffic as it moves between systems. Furthermore, Windows Server can directly interact with the 802.1p QoS standard that your hardware uses.

Drawbacks of WSFC Network Prioritization

Research the above options before you start down the path of cluster network shaping. This solution has a few problems that you need to know about:

    • The use of cluster network shaping is non-obvious. It appears nowhere in any GUI or standard report. You must clearly document your configuration and ensure that anyone troubleshooting or reconfiguring knows about it.
    • WSFC network prioritization has no effect outside the cluster, which can make it even more limited than software-only QoS solutions.
    • WSFC network prioritization knows nothing about true QoS solutions and vice versa. Combining this technology with others leads to unknown and potentially unpredictable behavior.
    • The most likely answer that you will receive if you ask anyone for help and support will be: “Revert network prioritization to automatic and try again.” I do not know of any problems other than the obvious (poor tuning that inappropriately restricts traffic), but I have not seen everything.

Essentially, if you still have all-gigabit hardware and QoS solutions that do not shape traffic the way you want, then WSFC network prioritization might serve as a solution. Otherwise, it will probably only provide value in edge cases that I haven’t thought of yet. (let me know in the comments if you know one)

Cluster Networking Characteristics

While many of the technologies mentioned in the previous section have reduced the importance of distinct cluster networks, you still need to configure and use them properly. This section outlines what you need to configure and use for a healthy cluster.

Cluster Network Redundancy

At its core, cluster networking depends on redundancy to reduce single-point-of-failure risks. What you see here shows the legacy holdover of unteamed and non-multichannel technologies. However, even these advanced solutions cannot fully ensure the redundancy that clustering desires, nor will everyone have sufficient hardware to use them.

WSFC networks operate solely at layer 3. That means that a cluster defines networks by IP addresses and subnet masks. It does not know anything about layer 2 or layer 1. That means that it cannot understand teaming or physical network connections. In classical builds, one network card has one IP address and belongs to one Ethernet network, which might give the impression that network clustering knows more than it actually does.

When the cluster service on a host starts up or detects a network configuration change, it looks at all IP addresses and their subnet masks. It segregates distinct subnets into “cluster networks”. If one host contains multiple IP addresses in the same cluster network, then WSFC chooses one and ignores the rest. It then compares its list of cluster networks and IP addresses against the other nodes. This discovery has four possible outcomes per discovered network:

    • The cluster finds at least one IP address on every node that belongs to the discovered network and all are reachable in a mesh. The cluster creates a cluster network to match if a known network does not already exist. It marks this network as “Up”. If a node has multiple addresses in the same network, the cluster chooses one and ignores the rest.
    • The cluster finds an IP address on at least one, but not all nodes, that belong to the discovered network and all are reachable in a mesh. The cluster will treat this network just as it would in the first case. This allows you to design clusters with complex networking that contain disparate networks without getting an error. It also means that you can forget to assign one or more IP addresses without getting an error. Check network membership.
    • The cluster finds an IP address on one or more nodes, but the mesh connections between them do not fully work. If the mesh pattern fails between detected addresses, the cluster marks the network as “partitioned”. This only means that the layer 3 communications failed. You usually cannot tell from this tool alone where the problem lies.
    • The cluster fails to detect any members of a previously discovered network. Removing all the members of a network will cause WSFC to remove it from the configuration.

You can view your networks and their status in Failover Cluster Manager on the Networks tab:

Failover Cluster Manager Network

In the lower part of the screen, switch to the Network Connections tab where you can see the IP addresses chosen on each node and their status within the cluster. In a two-node cluster like this one, any unreachable network member means that it marks all as unreachable. In a three+ node cluster, it might be able to detect individual node(s) as having trouble.

Cluster Network Connections

This article will not go into troubleshooting these problems. Understand two things:

    • Cluster networking understands only layer 3 (TCP/IP).
    • Cluster networking understands only cluster traffic. It works for internode communications and clustered roles with IP addresses known by the clustering service.

If you do not fully understand either of these points, stop here and perform the necessary background research. For the first, we have some introductory networking articles. I will call out some of the specific implications in the next section. For the second point, I mostly want you to know that nothing that we do here will directly impact Hyper-V virtual machine traffic. In a cluster that runs only Hyper-V virtual machines, networks marked as “Cluster Only” and “Cluster and Client” have no functional difference.

Cluster Network Roles and Uses

In pre-2012 clusters, we recommended four network roles. Depending on your hardware and configuration, you should employ at least two for complete redundancy. This section covers the different roles and concepts that you can use for optimal configuration.

Management Cluster Network

If you create only one cluster network, this will be it. It holds the endpoints of each node’s so-called “management” traffic. This network has a great deal of misunderstanding surrounding it. Even the typical “management” name fits only by usage convention. Traditionally, the IP endpoint that holds a node’s DNS name also marks its membership in this network. As a result, traffic inbound meant for the node, not a cluster role, goes to this address.

The cluster will also use the management network for its own internode purposes, although, by default, it will use all other networks marked for cluster traffic first.

Absolutely nothing except convention prevents you from creating a network, excluding cluster traffic, and using that for management. I would not consider this an efficient use of resources in most cases, but I could envision some use cases.

Cluster Communications Network

You can specify one or more networks specifically to carry cluster traffic. While some documentation suggests, or outright states, otherwise, this encompasses all types of internode traffic. Three general functions fall into this category:

    • Node heartbeat
    • Cluster configuration synchronization
    • Cluster Shared Volume traffic

The most common error made with this network comes from the widespread belief that CSV traffic has some distinction from other cluster traffic that allows you to separate it onto a network away from the other cluster communication functions. It does not.

Cluster Application Networks

The relationship between clustered objects and cluster networks leads to a great deal of confusion, exacerbated by unclear documentation and third-party articles based on misunderstandings. To help clear it up, understand that while the cluster understands networking for some applications, it does not understand all. Applications within a cluster have three different tiers:

    • Per-role cluster IP address. You will see this for roles that fully integrate with clustering, such as SQL Server. Check the properties of the role within Failover Cluster Manager. If you see an IP address, the cluster knows about it.
    • Client network binding. When you mark a network with the “Cluster and Client” role, the cluster can utilize it for hosting simple roles, such as scripts.
    • No cluster awareness. A cluster can host roles for which it does not control or comprehend the network configuration. Chief among these, we find virtual machines. The cluster knows nothing of virtual machine networking.

We will revisit that final point further on along with network prioritization.

Live Migration Network

The Live Migration cluster network represents something of an anomaly. It does not belong to a role and you can exclude it from carrying cluster traffic, but you control it from the cluster, and it only “works” between cluster nodes.

You configure the networks that will carry internode Live Migration traffic from the Network tree item in Failover Cluster Manager:

Live Migration Network

As with any other IP endpoint, nodes can use their members of Live Migration networks for any non-cluster purpose.

Non-Cluster Communications

Everything not covered above falls outside the control of the cluster service. Individual nodes can operate their own services and functions separate from the cluster. Due to common confusion, I want to call out three well-known items that fall into this category:

    • Virtual machine traffic
    • Storage traffic (host-to-storage connections, not internode CSV traffic)
    • Host-level backup traffic

The cluster knows nothing of the Hyper-V virtual switch. Furthermore, the virtual switch behaves as a layer 2 device and WSFC networking only operates at layer 3.

In fair weather, each node controls its own I/O. If a node has a problem connecting to storage and that storage is configured in one or more CSVs, then the cluster can redirect CSV traffic across the network via a node that can reach the CSV. However, the cluster classifies that traffic under the general “cluster” type.

I do not know of any backup tool that utilizes the cluster service to perform its duty. Therefore, each node handles its own backup traffic.

Once you understand what traffic the cluster cannot control, you next must understand that cluster network prioritization only impacts it indirectly and partially. The reasons will become more obvious as we investigate the implementation.

How to Discover and Interpret Cluster Network Prioritization

Before configuring anything, look at the decisions that the cluster made. Open a PowerShell prompt either in a remote session to a node directly on a node’s console and run:

    1. Get-ClusterNetwork | ft Name,AutoMetric,Metric,Role

This will output something like the following:

Cluster Network Prioritization

The “Name” and “Role” columns mean the same thing as you see in Failover Cluster Manager. “AutoMetric” means that the cluster has decided how to prioritize the network’s traffic. “Metric” means the currently assigned metric, whether automatic or not. Lower numbered networks receive higher priority.

To reiterate, these priorities only apply to cluster traffic. In other words, when the cluster wants to send data to another node, it starts at the lowest numbered network and works its way upward until it finds a suitable path.

Consider the real-world implications of the configuration in the screenshot above. The cluster has marked the “Management” network with the highest priority that can carry cluster traffic. The “Cluster” network has the lowest priority. The displayed cluster runs only Hyper-V virtual machines and stores them on an SMB target. It has no CSVs. Therefore, cluster traffic will consist only of heartbeat and configuration traffic. I have used Failover Cluster Manager as shown in a preceding section to prioritize Live Migration to the “Live Migration” network and set the Cluster and Management networks to allow Live Migration as second and third priorities, respectively. Therefore:

    • Internode traffic besides Live Migration will use the Cluster network if it is available, then the Live Migration, and as a last resort, the Management network.
    • Internode Live Migrations will prefer the Live Migration network, then the Cluster network, then the Management network.
    • Because cluster and Live Migration traffic use the Management network as a last resort, they should leave it wide open for my backup and other traffic. Due to isolation, non-cluster traffic does not have any access to the Cluster or Live Migration networks.
    • None of the preceding traffic types can operate on either Storage network.

The cluster automatically made the same choices that I would have, so I do not see any need to change any metrics. However, it does not make these decisions randomly. “Cluster only” networks receive the highest priority, then “Cluster and client” networks. Networks marked “None” appear in the list because they must, but the cluster will not use them. As for the ordering of networks with the same classification, I have not gathered sufficient data to make an authoritative statement. However, I always give my “Cluster” networks a lower IP range than my “Live Migration” networks, and the cluster always sorts them in that order (ex: 192.168.150.0/24 for the “Cluster” network and 192.168.160.0/24 for the “Live Migration” network) and my clusters always sort them in that order. But, “L” comes after “C” alphabetically, so maybe that’s why. Or, perhaps I’m just really lucky.

I want to summarize the information before I show how to make changes.

Key Points of Cluster Network Prioritization

We covered a lot of information to get to this point, and some of it might conflict with material or understanding that you picked up elsewhere. Let’s compress it to a few succinct points:

    • Cluster network prioritization is not a quality-of-service function. When the cluster service wants to send data to another node, it uses this hierarchy to decide how to do that. That’s it. That’s the whole feature.
    • The cluster service uses SMB to perform its functions, meaning that SMB multichannel can make this prioritization irrelevant.
    • In the absence of redirected CSV traffic, a cluster moves so little data that this prioritization does not accomplish much.
    • Cluster network prioritization does not “see” network adapters, virtual switches, or non-cluster traffic. It only recognizes IP addresses and only cares about the ones that carry cluster traffic.
    • You can only use cluster network prioritization to shape non-cluster traffic via the process of elimination as discussed in the real-world example. Due to the low traffic needs of typical cluster traffic, you may never see a benefit.

How to Set Cluster Network Prioritization

Hopefully, you read everything above and realized that you probably don’t need to know how to do this. That said, a promise is a promise, and I will deliver.

The supported way to manually configure cluster network priority is through PowerShell. You can also use the registry, although I won’t directly tell you how because you can wreck it that way and I’m not sure that I could help you to put it back. I think you can also use the deprecated cluster CLI, but I never learned how myself and it’s deprecated.

Unfortunately, even though PowerShell is the only supported way, the PowerShell module for Failover Clustering remains surprisingly primitive. Much like PowerShell 2.0-era snap-ins, it usually requires you to acquire an object and manipulate it. It implements very few variables beyond “Get-“, and the ones that it has do not expose much functionality. Furthermore, the module implements the extremely rare “packet privacy” setting, which means that you must carry out most of its functions directly on a node’s console or in a first-hop remote session. I believe that the underlying CIM API exposed by the cluster service imposes the packet privacy restriction, not PowerShell. I do not know what problem packet privacy solves that makes it worth the potential administrative frustration. Just know that it exists and how to satisfy it.

So, with all of the caveats out of the way, let’s change something. Again, I will work with the network list as displayed earlier:

Cluster Network

Let’s imagine that my manager does not trust the auto-metric to keep the “Cluster” network at first priority and wants me to force it. To do that, I must manually give the “Cluster” network a metric lower than anything that the cluster might pick for itself. As you can see, the cluster uses very high numbers, so I can hit that target easily.

First, acquire an object that represents the “Cluster” network:

    1. $ClusterNetwork = Get-ClusterNetwork -Name ‘Cluster’

Second, modify the acquired object’s “Metric” value:

    1. $ClusterNetwork.Metric = 42

Third, verify:

    1. Get-ClusterNetwork | ft Name, AutoMetric, Metric

You should see something like the following:

Cluster Auto Metric

I performed the object acquisition and parameter setting in two discrete steps for clarity. If you only want to modify the metric property, then you do not need to keep the object and can perform it all on one line. I will demonstrate this by reverting to the auto-metric setting:

    1. (Get-ClusterNetwork -Name ‘Cluster’).AutoMetric = $true

By using Get-ClusterNetwork to place the object into the $ClusterNetwork variable in the first demonstration, I could continue on to make other changes without reacquiring the object. In the second demonstration, I lose the object immediately after changing its setting and would need to acquire it again to make further changes. Also, I find the second form harder to read and understand. It might perform marginally faster, but it would cost more time to prove it than it could ever be worth.

Changes to the cluster network priority take effect immediately.

Going Forward with Cluster Network Prioritization

Ordinarily, I would wrap up the article with some real-world ideas to get you going. Really, I don’t have any for this one except don’t. You probably have a better way to solve whatever problem you face than this. Hopefully, this article mainly serves as an explanation of how newer features have made this one obsolete and that much of the existing material gets a lot of points wrong. If you have clusters configured under the patterns from 2008 R2 and earlier, review them to see if you can successfully employ the auto-metric setting. Happy balancing!

The post Backup Network Prioritization Best Practices appeared first on Altaro DOJO | Backup & DR.

]]>
https://www.altaro.com/backup-dr/backup-network-prioritization-best-practices/feed/ 0
Backup Storage Options and Size Requirements for Business https://www.altaro.com/backup-dr/backup-storage-options/ https://www.altaro.com/backup-dr/backup-storage-options/#respond Tue, 25 May 2021 05:54:33 +0000 https://www.altaro.com/backup-dr/?p=752 Two of the primary considerations that IT professionals need to take into account when planning for backups are the type of backup target that they will use, and the backup capacity that is required. Backup targets can be grouped at a high level into two main categories - on premises and offsite.

The post Backup Storage Options and Size Requirements for Business appeared first on Altaro DOJO | Backup & DR.

]]>

To architect a backup solution that comprehensively covers all the organization’s needs, an IT professional must satisfy three criteria: location, type, and capacity. In other words, where will we put the data, what will we put it on, and how much space do we need? This article assumes that you already know your RPOs, RTOs, retention policies, and what you need to back up. We treat that knowledge as a storage “problem” for which we will architect a solution. 

Location, Location, Location (Where to Back Up Data)

A proper backup solution does not answer the location question with “on-premises” or “offsite”. Treat this as an “and” phrase. Location governs the types of available storage and transfer speed of data. You will need to divide and direct your backup operations in a way that you can realistically achieve your RTOs, RPOs, and retention policies in any probable event. Because you need both onsite and offsite backup, you will need to consider multiple storage types. For the rest of this article, we will categorize media types according to their location options and talk about capacity at each point. 

Public Cloud Backup Storage

If you choose to use a cloud provider to host backup data, then they will make most of the media type decisions for you. Instead, you will choose between speed and protection options. 

Fully Managed or Self-Managed Cloud Backup 

Many vendors offer fully managed cloud backup systems. They involve some sort of purpose-built agent for on-site and cloud-based operating systems. Some will also back up data directly from applications. The company that operates the cloud will provide options, as will third parties. 

Alternatively, or additionally, you can use a cloud provider’s general purpose storage offerings as targets for your backup. You will determine what data to send, how to transmit it, and how to manage its lifecycle. Many on-premises backup applications can target this type of storage. 

Speed, Capacity, and Price of Cloud Backup Storage 

No matter the method, you will pay the provider for the storage that you use. As with most cloud services, vendors offer multiple storage tiers. Typically, price binds most closely to speed first, then capacity. If you can tolerate hours of waiting for data retrieval, then you can often get substantial amounts of cloud storage at extremely low cost. If you want your data to travel more quickly, then you’ll pay more for the same amount of storage. 

In most cases, you will not concern yourself with the IOPS capability of storage that you use for backup. IOPS plays most prominently in random access compute workloads. Moving backup data across the Internet, even with the highest available public speeds, will not stress the IOPS limits of standard cloud storage. Purchasing higher tiers only to hold backup might possibly speed up backup operations for resources in the same cloud datacenter, but not likely enough to justify its cost. 

Data Integrity of Cloud Backup Storage

As a benefit, the cloud provider takes on some responsibility for the integrity of data on their systems. That does not make it completely fool proof. Read your agreements carefully. The provider likely has multiple terms that limit their liability. The provider may offer monetary restitution in case of data failure. They may fall back on that in place of complex data protection systems, especially on their less expensive tiers. Take time to understand what your subscription buys and plan accordingly. 

Cloud backup storage offerings usually add geographic redundancy and other replication options for an additional cost. While not a perfect protection against corruption, replication does guard against localized events. Understand that a cloud replica, just like a private replica, can receive corrupted data without generating an error. 

Private Remote Backup Storage

You have options besides public cloud for remote storage. You can choose solution hosted by a third party (also called co-location) or yourself. Hosted data differs from cloud data in that you will purchase control over a specific set or subset of hardware in one or more specific locations. As technology progresses, some of the distinction between “cloud” and “hosted” will fade away, but the general difference relates to your ability to know the precise location of your data. 

Third-Party Hosted Storage 

Third-party hosted storage usually comes at a lower price than self-hosted storage. You will only pay for the hardware and the portion of the facility that you use, not the entire site. If you run a small business with a small budget but want to maintain more control over your data than a cloud provider allows, third-party hosting presents an attractive option. The primary drawback is that you do not have the same options for storage technology as you do for cloud or self-hosted storage. In trade, someone else deals with the physical component. You must configure the backup and restore software and transmission process, however. 

Self-Hosted Remote Storage 

If you already own a facility with significant geographical distance from your primary site, you might have an opportunity to host your own offsite backup. Even if you don’t have such a site, you might find that you can rent small office or even warehouse space at an affordable rate and use that as a disaster recovery site. You bear the facility costs, but you gain complete control over your backup and disaster recovery configuration. 

Choosing and Using Remote Storage 

Whatever remote option you choose, you take on the bulk of the responsibility for getting the data there. Whereas cloud providers go to great lengths to provide secure options for onsite-to-cloud data connectivity, hosting providers usually do not. They may offer secure FTP or something similar and nothing else. If you own both (or multiple) endpoints in a backup chain, then you control the security of data transmission. Because this article is not about security, I’ll leave a greater investigation of this subject for other articles. Take care to properly secure your inter-site data transmissions and at-rest data in remote sites. However, if you employ backup software in this task, such as Altaro’s Offsite Server, then it may include encryption capabilities to reduce your effort. 

As for storage media, your choices with a third-party host are limited to whatever they offer. It may just be a big pool of data disks and nothing else. However, they might offer a second-tier backup option, such as copying your replicated data to tape or other locations. Consult with your provider to discover their options. 

When you own the remote site, you have the same choices as you do for on-premises storage, which we cover in the next section. You must figure out how to adequately use it as protection for the primary site, but otherwise, it presents no special hurdles. 

On-Premises Backup Storage

With on-premises storage, you take on the responsibility of deciding on media and target types. The balance of price, speed, and capacity works differently than cloud options, but still follows the same general pattern: more money for higher speed and capacity. You have plenty of options, which we’ll explore in general order of popularity. 

Disk-Based Backup Storage 

In the not-so-distant past, few could afford to use disk as their backup medium. The price per gigabyte continually falls, though. Even better, new market entrants have forced down the costs of chassis-based high-capacity storage. Even small budgets can now afford devices that will safely hold multiple backup copies. 

This category includes four major sub-categories: 

  • SAN (storage area network): high-capacity, high-speed, high-cost block storage 
  • NAS (network-attached storage): high-capacity, moderate speed, moderate cost file storage 
  • Storage-heavy commodity servers: standard servers with hardware that focuses on storage capacity over other components 
  • External hard disks: this category includes temporarily attached disks, usually USB 

SAN devices carry a high cost (due to low competition and premium features), but also give the greatest performance and scalability. In a completely greenfield project scoped specifically to backup, you probably would not choose SAN for backup. You can almost always scale out NAS or commodity servers to a space and network capacity that satisfies your backup needs at a lower cost. However, if you already have SAN capacity and do not anticipate needing it for production purposes for a long time, or if you want to have do-it-all storage, a SAN can meet the goal. Just take care that you do not use a single device for everything; backup must exist independently of its source data. 

The major difference between NAS devices and commodity storage servers is that a NAS is purpose-built by the manufacturer to serve solely as a storage device. A commodity storage server could run typical compute or general-purpose workloads, although its hardware is usually configured to maximize disk space at the expense of compute capacity and it likely has some software enhancements tuned for storage. Traditionally, NAS devices are “file”, not “block” storage devices. A block storage device presents its storage to consumers like a locally-attached unformatted hard drive. A file storage device must be formatted before use and can only present its space as a network share (e.g., SMB or NFS). Modern NAS devices often present block space as well, so the block vs file distinction usually does not apply anymore. Currently, SAN and NAS devices differ mostly by capability, especially scalability. Most SAN devices allow you to distribute the same logical storage location across multiple devices; NAS devices do not. Commodity storage server software has begun to evolve beyond NAS capability until SAN capability, but still matches more closely with NAS. 

External hard disks have almost none of the capabilities of NAS or SAN devices, but you can transport them easily and keep them in a disconnected state between backups. You trade speed and convenience for safety.  

Drawbacks of Disk-Based Storage 

The speed of NAS and SAN also present a weakness. They remain physically and logically connected to the rest of your network at all times. Therefore, any bad things that occur anywhere will find their way to the backup system sooner rather than later. Ransomware instantly comes to mind, but you also need to worry about other malware, data corruption, malicious activity, and physical catastrophe. While you can defend against such problems, you can never consider online storage as truly safe. 

External disks protect against most of the problems in the previous paragraph. They will faithfully copy any ransomware or other malware that reaches them, unfortunately, but if you know that a network has an infection then you just need to leave your disks disconnected. Also, you can quickly and conveniently transport external disks offsite. However, if you require several external disks per day to contain all your backup data, switching them can become a tedious chore. 

Magnetic disks always have some risk of data corruption due to fluctuation. When powered, disk controllers can mitigate some of that, especially in arrays. When unpowered, you don’t have as much local magnetic activity to worry about, but background radiation and the lack of any active control mechanism present their own problems. Following backup best practices will give you the greatest protection (always maintain multiple distinct copies and test periodically). 

SSD for Backup Storage 

I guess that you hit this sub-section because you have one particular question: Can I use SSD for backup data? Honestly, we don’t know. We know that when heated, SSDs lose data quickly. When I say heated, I don’t mean oven temperatures; I mean, like, backseat of your car in June (northern hemisphere) temperatures. How quickly? Well… that varies. At room temperature, you can probably trust an unplugged SSD for a year. If you chill it, maybe it will stretch out to ten years. 

How much does this matter? Remember your best practices: keep multiple distinct copies, test regularly. If you keep them in a reasonably cool location and periodically energize them to verify their contents, then they will probably last just fine. Neglect them, and their cells will discharge. But, you probably won’t find any SSD manufacturer that will guarantee anything. 

Tape Backup Storage

We have used tape for so long that most people still think of it first when anyone mentions backup. It wins over disk when transportability and long-term storage matter most. However, it has fallen significantly behind disk in speed metrics. It also lags in dollar-per-gigabyte and capacity-per-unit scores. Furthermore, while you can still find functional PATA and wide SCSI controllers for disks that have probably long since demagnetized, you might struggle a bit more to find a working tape drive to read that ten-year-old tape that has probably lost almost no consistency. You can plan for drive obsolescence by storing a working unit with tapes when you switch technologies, but the organizations that don’t do that greatly outnumber the ones that do; they often switch technologies because the previous drive failed, and they wanted a new one anyway. 

To overcome speed and capacity limits, you can employ tape libraries. These further increase the advantages of disk over tape in capacity-per-dollar, but they allow you to maintain the portability and durability advantages of tape. To balance that out, you can craft disk-to-disk-to-tape strategies that use disk for short-term rotation and tape for less frequent long-term backups. 

As time goes on, disk will almost certainly replace tape. As solid-state costs fall and reliability increases, we will eventually use it for everything. 

Optical Media Backup Storage 

I mostly mention optical media for completeness. Once upon a time, we all believed that the high capacity and near indestructibility of optical disk would solve all our backup problems. Then optical hit a per-unit capacity ceiling somewhere around 100GB while spinning disk capacities continued to double. While tape couldn’t keep up with other magnetic storage, it outpaced optical in every metric. Also, it turned out that even though optical media lasts effectively forever, the typical “burned” variety will lose data after a few years. Pressed discs have indefinite age, but exorbitant cost. 

To set expectations, let’s say that you have 4TB of data to store. You select 100GB M-Discs. You’ll need 40 discs which will need 40 hours to burn at a cost of $1,600. All that gets you exactly one copy – assuming no errors occurred while burning any disc. However, the M-Disc apparently has a substantially longer lifespan than other optical media, so that’s a benefit. 

Using Multiple Locations for Backup 

To fit the minimal definition of backup, you need one complete copy of your data onsite (production data) and one complete copy offsite (cold data). You can build a robust solution on top of that foundation. This will require knowledge of RTOs, RPOs, data size, and budget. The following items show a few examples to help you architect. 

  • Disk-to-disk-to-disk: You can backup your production data to a live, always-on local storage location, then duplicate that on external disks. 
  • Disk-to-disk-to-offsite: Instead of external disks in the previous example, you can make the second hop transmit to a remote facility, either cloud or hosted. However, you must employ some precautions that prevent every backup from being accessible online. Without cold data, you expose yourself to malware and other malicious activity. 
  • Disk-to-disk-to-tape: This replaces the external disk in the disk-to-disk-to-disk scheme with a tape drive in the last step. Tape allows for very long-term archival storage, so you can use it less frequently to capture complete backups. 
  • Disk-to-offsite-to-offsite: You can transmit data offsite to multiple locations in parallel or in a chain or in any other configuration that your software and processes allow. You could configure a mesh pattern that maintains many copies in many disparate locations. Remember the importance of cold data and testing, though.  

Backup Capacity, Revisited 

We talked about capacity in the individual sections, but it might help to cover it more generically. Properly layered, all media types and locations provide effectively infinite capacity. Your barriers come from time and money. In speed order: 

  • SAN 
  • NAS/commodity storage server 
  • External disk 
  • Tape 
  • Optical 

Cloud and hosted speed depends on your connectivity and the technology in place at the receiver, but typically ranks between external disk and tape. If you have very high speed Internet or you have the budget for fiber connections between your sites, then it can approach SAN speeds. 

Expense ordering depends on how you prioritize metrics. If you value longevity, then tape provides the least expense per year. If you value capacity per dollar, then on-premises NAS rules all. Consider transport convenience and the hot/cold applications. 

The final piece to consider is your need for rotation and retention. To start, think of the venerate grandfather-father-son (GFS) rotation scheme. Each day’s backup overwrites the previous day’s (son) until a specified day of the week. On that day, you perform a complete backup, which you overwrite on the same day the following week (father). Once each month, you remove that weekly backup copy from the rotation, store it (grandfather), and introduce a new “father” tape for the following month. Depending on retention needs, some organizations overwrite the grandfather each month and retain only an end-of-year backup. GFS has other variants; some organizations only write differentials on the “son” tape due to time constraints, others have tapes for each day that they use to hold incrementals.  

GFS was designed for tape systems, but the same concepts apply to modern disk-based systems. For example, think of Altaro’s deduplication technology. Every backup that invokes a deduplication pass works like a “son” in GFS. The latest full backup was the “father”. To recover from a complete loss, you must have at least one “father”. 

With that understanding, you can now properly estimate your space requirements. Every “father”, regardless of location or media, requires approximately 100% of the size of the production data. While many technologies offer compression, do not overestimate its savings. The size of each “son” depends on daily churn rates. The impact of that churn will vary greatly between backup technologies; a full backup requires as much space as a “father”. Incremental backups require less, then differentials. Compression may play a factor, but again, beware optimistic expectations. Advanced technologies like Altaro’s deduplication can have a substantial shrinking effect on “son” sizes. In all cases, it will require an understanding of your organization’s typical daily activities to predict the size of non-full backup runs. You will almost certainly need some experience with your chosen backup tool(s) for useful predictions, but most organizations churn a very low percentage of their total data on a daily basis. 

From here, you have simple addition to perform. If it helps, map it out. Example: 

Full Backup Map

Backup additions example

With a quick sketch, I see that I can buy one 4 TB disk to handle the backup workload and budget for a new disk each month. I will rotate out the June and December disks, so each year I will need to replace two of the monthly disks. If I can get an external 4 TB drive for $100, then my total spend for the first year comes to $1,300. Replacing the June and December disks will cost $200 for the year. Budget a few hundred dollars for the occasional drive failure, add in software expenses, and you’ve effectively predicted your backup budget for the next 3 years or more. At 5% annual growth, you will exceed the capacity of a 4TB drive in a few years, but falling disk prices will likely handle that without a budget adjustment. 

Keep It Going 

In my last example, I used only external disks. If you keep the daily disk onsite and keep the weekly and monthly disks offsite far enough away from the primary site to survive any probable disaster, then you have a perfectly viable backup solution. Is that enough? You decide. Add on and expand to other options as your budget allows. The only line for “too much backup” is “way over budget”. Keep copying, and remember to test that backup! 

The post Backup Storage Options and Size Requirements for Business appeared first on Altaro DOJO | Backup & DR.

]]>
https://www.altaro.com/backup-dr/backup-storage-options/feed/ 0
System Image vs Backup at File Level: Which is the Best and Why? https://www.altaro.com/backup-dr/system-image-vs-backup-file-level/ https://www.altaro.com/backup-dr/system-image-vs-backup-file-level/#respond Mon, 22 Mar 2021 23:14:31 +0000 https://www.altaro.com/backup-dr/?p=1128 Although there are countless backup applications on the market, nearly all of these backup applications use one of several high-level methods for creating backups.

The post System Image vs Backup at File Level: Which is the Best and Why? appeared first on Altaro DOJO | Backup & DR.

]]>

Although there are countless backup applications on the market, nearly all of these backup applications use one of several high-level methods for creating backups. Some backup applications create image backups, while others concentrate on file-level backups. There are also a few solutions that give the backup operator a choice of which backup method they want to use. This article explores the difference between image-based backups and file-based backups and compares the advantages and disadvantages of each approach. 

Note: Block-level backups will be covered in a separate article. 

What is system image backup?

An image backup is exactly what it sounds like. It is a full copy of a computer’s contents. In other words, an image backup is a mirror image of the computer’s hard drive. 

What is the operational impact of a system image backup?

There are a few different things to consider with regard to the operational impact associated with using image backups. First and foremost, depending on the backup software that is being used, and image backup may limit your options for restoring data. Some backup vendors will allow you to perform granular recovery operations from an image-based backup (such as restoring an individual file). Others, however, will only allow you to perform a full restoration and do not allow for the restoration of individual files or folders. 

It is also worth considering that a PCs hardware may limit its ability to restore an image backup. Suppose for a moment that a PC is equipped with a 2 TB hard drive, but only contains half a terabyte of data. Now imagine that the hard disk fails and that the only spare hard disk that is immediately available is a 1 TB disk. Using the smaller disk shouldn’t be a problem because the PC only contains half a terabyte of data. Even so, restoring an image backup to the smaller disk may be impossible because the image was based on a much larger disk. 

Another thing to consider with regard to image backups is that because images are essentially full copies of a computer’s hard disk, image backups tend to be quite large in size. While it is true that most image backup solutions create images that are smaller than the hard disks that they are backing up, images do tend to be large because they do not use technologies such as deduplication to reduce the image size, and because they may potentially include temporary files or a copy of the Windows pagefile. Again, however, each vendor has its own way of doing things, so some image backup solutions will inevitably create smaller images than others. 

The reason why image size is an important consideration is because the size of the image has a direct impact on backup storage cost. Similarly, an organization that wants to use a cloud-based backup target may find it impractical to do so if their backup software produces excessively large images. 

What is a file-based backup?

Whereas an image-based backup attempts to create a full copy of an entire hard disk, a file-based backup focuses on backing up individual files and folders. 

What is the operational impact of a file-based backup?

Early on, file-based backups performed direct copies of the files residing on a protected system. The problem with this approach, however, is that most of the backup applications of the time were unable to back up open files. This meant that the operating system and the applications could not be protected, nor could the backup application protect any documents or data files that a user was actively working on. 

Modern file-based backup solutions tend to use changed block tracking as an alternative to direct file copies. The idea behind changed block tracking is that initially, the backup software makes a backup copy of every storage block on the protected system. Deduplication is often used to ensure that duplicate blocks are not backed up, thereby reducing the amount of time required to create the initial backup, and shrinking the backup footprint. Subsequent backups are run every few minutes, as opposed to the nightly backups that were once common practice and protect any newly created or modified storage blocks. 

The main advantage of this type of backup is that it allows for the frequent creation of recovery points. Additionally, because each backup cycle is only protecting the storage blocks that have been created or modified since the previous backup, backups tend to be very small in size. 

It is worth noting, however, that some (but not all) file-based backup solutions are incapable of performing a full restoration of an entire physical or virtual machine. Such solutions protect data but may be incapable of restoring the operating system or applications. 

Image-Based Backup vs. File-Level Backup

When properly implemented, both image-based backups and file-based backups are viable solutions for protecting a computer’s contents. Even though some file-based backup applications are incapable of performing full system restorations, there are file-based backup solutions that are able to protect the operating system, applications, and everything else on a computer. Similarly, some image-based backup solutions do not allow for granular restoration of files, folders, and other objects, but there are those that do. As such, the ability to perform both bare metal and granular restorations needs to be considered when selecting a backup solution, but this consideration does not necessarily disqualify the use of image or file-based backup technology. 

The main things that need to be considered when choosing between the two technologies are the frequency with which recovery points can be created, and the size of the backups. Although there are some image-based backup products that are able to create differential images, such products tend to be extremely limited in the number of recovery points that they are able to create over the course of the day. Conversely, file-based backup solutions that are based on changed block tracking are generally able to create recovery points every few minutes. Likewise, image-based backup solutions tend to create much larger backups than those produced by file-level backup solutions. 

How does Altaro tackle backups? 

Altaro takes a block-based approach to backups (which is the best of both worlds) but does so in a way that allows entire physical or virtual machines to be backed up and restored. In fact, Altaro fully supports Windows VSS and is application-aware as well. 

See the source image

Figure 1 

Because Altaro does use a block-based approach to backups, it supports continuous data protection, with backups being created as frequently as every five minutes. Additionally, technologies such as augmented inline deduplication help to increase the speed of the backup process, while also significantly reducing backup storage requirements. This can be extremely beneficial to organizations who wish to back up or restore their data to the cloud. 

Continuous data protection solutions, such as the one used by Altaro are based on changed block tracking. This means that the backup application monitors the protected system’s storage to keep track of how storage blocks are being used. If the operating system writes data to a storage block, that data is backed up. 

One of the problems that has long been associated with this approach, is that because each scheduled backup only backs up storage blocks that have been created or modified, restorations almost always require both new and previously existing storage blocks to be recovered. This isn’t a problem if the backup is healthy, but if any corruption exists within the backup then that corruption may inhibit an organization’s ability to recover data from even the most recent recovery point. 

Altaro avoids this problem with its Backup Health Monitor. The Backup Health Monitor regularly checks the backup repository for the existence of missing or corrupt storage blocks. If any problems are detected, then the affected blocks are automatically repaired (re-backed up) within the next backup cycle. This type of self-healing is one of the things that really sets Altaro apart from competing backup solutions. 

See the source image

Figure 2 

The post System Image vs Backup at File Level: Which is the Best and Why? appeared first on Altaro DOJO | Backup & DR.

]]>
https://www.altaro.com/backup-dr/system-image-vs-backup-file-level/feed/ 0
What is Backup and Recovery for Enterprise Computer Data https://www.altaro.com/backup-dr/what-is-backup-recovery/ https://www.altaro.com/backup-dr/what-is-backup-recovery/#respond Wed, 28 Oct 2020 05:35:39 +0000 https://www.altaro.com/backup-dr/?p=727 This article was designed to provide you with an overview an overview of computer data backup and recovery. It described some of the key features which organizations should look for when selecting their backup software vendor

The post What is Backup and Recovery for Enterprise Computer Data appeared first on Altaro DOJO | Backup & DR.

]]>

Some argue that data has become the world’s most valuable commodity, worth even more to businesses than gold.  Unfortunately, it is also much easier to lose all of your data than a stockpile of precious metals.  As the importance of data increases, so does the need to protect it using backup software and recover it if a disaster strikes.

Conceptually, a backup is fairly straightforward – it involves making a copy of a file, database, or computer and saving it as a backup file.  Usually, the initial backup will take longer since all the data must be captured. Still, subsequent backups can be much quicker as usually only the changes since the last backup are saved, which is used with an incremental or differential backup (these are discussed later).  Recovery allows you to restore that backup file to the original or new computer with the same information or state as when it was backed up.

When planning your backup strategy, there may be dozens or even hundreds of variables that you will want to consider.  Some organizations plan their backups based on the location of their datacenter(s), others will make the decision based on their existing storage providers, some look at the feature set or cost of the backup software. In contrast, others will make the decision based on how quickly they can recover after a disaster to minimize their downtime.  There is no right strategy, so ultimately, it comes down to the organization’s priorities and budget.

This article introduces computer data backup for enterprises by explaining how it works and reviewing different planning considerations.

Components which can be Backed Up in the Enterprise

Here are the most common components that enterprises should consider protecting, starting from smallest to largest.

  • Files – Files are generally the smallest unit of backup, although individual blocks of the file can also be protected. Operating systems like Windows or Windows Server will usually give you the option to automatically backup all of your files to a local disk or to a cloud service like Microsoft OneDrive.  Sometimes you can even restore files to different points in time by picking which backup you want to recover from. You may want to create custom backup policies for specific files if they are important; otherwise, they will usually get backed up when the entire disk or computer is backed up. In order to save space, it is usually possible to backup and recover specific items within the application, such as a single mailbox or message from an Exchange Server.
  • Databases – Databases usually have customizable backup policies since they can often contain business-critical data. Most databases will have built-in backup tools, and third-party providers will usually support the most common databases like SQL Server.  A computer’s registry is also an internal database, but this will usually get protected when the operating system is backed up.
  • Disks / Drives – Most enterprises will backup their applications at the disk or drive level as their smallest unit to simplify management. Virtually all backup ISVs will support disk-level backup and recovery while supporting common storage optimizations like RAID, mirroring, and deduplication.
  • Applications – Most applications will allow you to separately backup the application’s settings and its data. The former may include items like the user’s preferences or the application’s IP address.  Generally, the application’s data is the most important, and this may have its native backup process or leverage an industry-wide solution to allow third-party solutions to perform a backup, like Microsoft Volume Shadow Copy Service (VSS).  Check out this blog for more information about how application-consistent backups work using Microsoft VSS.
  • Operating Systems – It is possible to backup an operating system. During this process, the configuration data and registry settings are being protected, and independently all the users’ files are being backed up.  Since these usually result in separate backup files, it allows them to be individually restored.  By restore settings, it effectively allows their user preferences to be copied to a different device.
  • Virtual Machines – A Virtual Machine (VM) usually consists of a virtual hard disk (VHD), which is a file hosting the VM’s OS, a second VHD that hosts the data for the application running inside the VM, and a configuration file which defines the settings for that VM. Each of these file types can be backed up and restored independently.  Hyper-V, VMware, and Linux virtualization all operate slightly differently. Whichever one you choose, make sure the chosen backup solution supports your hypervisor.  You should also check that your various guest operating systems running within the environment can also be protected.
  • Mobile Devices –Backing up settings or data from a mobile device is usually done by the telecom carrier or the device manufacturer. If it is a device managed by the enterprise, then the security settings and user data can usually be backed up by IT.  Some backup ISVs will support popular mobile operating systems, but many do not.  If this is a requirement, make sure that you check that provider’s support matrix to ensure that your organization’s devices and their specific operating systems can be protected.
  • PCs –Most standard computers will use the built-in backup solution provided by their OEM, such as Windows Backup. Similar to other systems, PCs will separately protect the computer’s configuration data from the registry from its users’ files.  At the enterprise level, organizations can centrally control policies to automatically protect all the PCs in their environment and centrally store their data.  Enterprise backup vendors may not support client operating systems, so always verify with your preferred ISV if this needs to be part of your protection plan.
  • Servers –A server is protected just like a PC, where its configuration data and files are independently protected. Servers are usually centrally managed by the enterprise, giving them control over critical policies to ensure security and compliance.  Most backup ISVs will support the common server operating system, like Windows Server and VMware ESXi, and may have varying support for different Linux distributions.
  • Clusters – A cluster is a collection of servers that are often running VMs. While the servers and VMs will be protected by the backup provider, the cluster configuration information is the component needing to be protected at the cluster level.  Deploying and optimizing clusters can be complicated, so having a backup solution that can restore these settings is often desired by enterprises.  Always make sure that your backup ISV is cluster-aware.  For more information, check out Altaro’s blog on protecting Windows Server Failover Clusters.
  • Network Devices –Configuring physical and virtual network devices is complicated and once operational, it is important to retain their settings, security policies, and routing tables. Generally, networking devices will be able to output their configuration data to a standard file format (such as XML), which can then be protected by standard backup solutions.  If recovery is needed, it can import that same XML file.  Make sure that you include any critical network devices in your planning.
  • Datacenters – In the event that you lose an entire datacenter to a disaster, you will want to be able to recover and restart your critical services in a second site or in a public cloud. This is generally referred to as disaster recovery.  When protecting an entire datacenter, you are essentially protecting all of its components as a single logic group, including the disks, VMs, servers, and network devices.  If you are seeking a datacenter-wide protection solution, make sure that every component is protected by your backup ISV, and you have a disaster recovery plan which will let you restart your critical services in an alternative site.

As you can see, it is possible to protect almost every datacenter component.  Next, you will want to consider where you are storing those backup files.

Storage Options for Enterprise Computer Backup Data

Storage will be a key factor in defining your backup strategy, and it may also impact which backup ISV you select.  When you plan your backup strategy, you want to also think about the reasons that may lead to you needing to recover data.  Is it because you are working with unsophisticated users, are you a target for hackers, or could a natural disaster destroy your datacenter?  For this reason, you want to consider local, shared, and offsite storage options.  The amount of data you are protecting (capacity) may also influence your decision as there is usually a tradeoff between ease-of-use and price.  Finally, you will want to consider how quickly you need to recover from a disaster, and the storage speed can impact this.

  • Local Storage – When a backup is created, it creates a backup that is usually as large as the data set, it is protecting. With this in mind, ff you are running backups within an OS at the file level, this will generally be smaller than the file size of the OS because you are just copying the settings and user files, but not the operating system itself.   However, if you are protecting a whole drive, it will generally be the size of all the files since it needs to take full copies of the data.  There are techniques to shrink this backup file, such as compression and deduplication, covered later in this article.  If you want to backup a component to the PC or server it is running on, be aware of the file size to ensure that you have enough capacity for the initial backup while giving it room to grow.  While local backups usually have a very fast backup and recovery time, if you lose your laptop, then you may also lose your backup.
  • Shared Storage– Most enterprises will backup all of their components onto centralized storage. This helps them because if any PC, mobile phone, or server crashes, gets stolen, or is destroyed, its data can be easily recovered to a replacement device. Centralized management usually provides operational efficiencies through shared policies and standards, and the storage may be cheaper than hosting the backup files on local devices.  The main disadvantage is that all of the computers must regularly connect to the backup system to offload their backup files.  This can create significant network traffic, so much so that some organizations create dedicated networks that handle all of the traffic for tasks like backup, deployment, and patching.  The other challenge is that this centralized storage could be a single point of failure if a fire burns down the datacenter, so most enterprises will also look for an offsite solution.
  • Offsite Storage – Having site-wide resiliency is an important part of business continuity planning. You should be able to copy your backup files from local storage or shared storage directly to an offsite location.  Most organization will use the shared storage as an intermediary device so that backup and recovery is quicker.  When sending backups offsite, the data files must be encrypted as they travel through longer networks or the public Internet.  Offsite storage could be a secondary datacenter managed by the enterprise, a partner or service provider’s cohosted datacenter, or even a public cloud service.  Microsoft Azure is a popular backup solution for many enterprises as it has a broad set of supported data, applications, and virtual machines and offers backup as a service (BaaS).  Many backup providers now offer backup to the cloud, such as Altaro’s VM Backup.  Whenever you are using an offsite backup solution, keep in mind that you will still need to pay for the cloud storage capacity you are using and possibly the network bandwidth, which the backup traffic is consuming.

Once you have defined your site requirements, then you can consider additional backup features.

Storage Media for Enterprise Computer Backup Data

This section will review some of the storage media options that enterprises should consider with their backup strategy.  These will be more applicable to shared storage, which is managed by the company, as local storage and cloud storage may have limited options.  Centralized storage used to be more expensive when servers had to be connected using a storage area network (SAN) with proprietary storage connections called host bus adapters (HBAs).  Now there are numerous storage protocols that use ethernet connections and NICs, simplifying management and reducing costs.  The best solution for an enterprise will depend on its budget and recovery speed needs.

  • Hard Disk – The most common type of storage media is traditional mechanical hard disks. Over the past decade, the commoditization of storage has significantly lowered the price of disks, making them affordable for backup.  Access to specific files is quick as the location of the files is indexed, which means the recovery software can “jump” to the correct place on the disk. Hard disks are also supported by virtually every backup provider, and they can be easily replaced if they fail.  Since they are mechanical, failures do happen, so they are usually deployed in a redundant configuration using disk management solutions like RAID and mirroring.  Recovery speed is average and limited by the disk speed and network bandwidth to transmit and recovery the backup files.
  • Magnetic Tape Drive – Tape drives are the second most common type of backup storage media after hard disks. Tape drives are fairly cheap and can have massive capacity, so they are great for backup files that do not need to be regularly accessed, such as for archival content.  Storage Tapes need to be read sequentially, which means that finding files can take a long time, but once the data is accessible, it can be read much quicker than hard disks.  The major criticism is that the tape drive can easily fail and it often becomes worn out if it is used frequently.  Once a tape drive is full, it may even require an admin to physically replace it with a new tape drive, which removes automation and significantly slows down recovery time.  Check out this Altaro blog for more information about using backups with a tape drive.
  • Solid State Drive (SSD) – SSDs have become more popular over the past few years as prices have declined; however, they can still be significantly more expensive than hard disks and magnetic tape drives. They are, however, much faster and more reliable as they do not have any moving parts. If you have the budget to use SSDs for backup, it is the best option and will give you the fastest recovery.
  • Optical Storage – Optical storage includes DVDs and CDs, which use lasers to read and write the backup files. These disks are rarely used in the enterprise because of their limited capacity, slower speeds, and the frequency of needing manual intervention to change the disks.  This is only a practical storage option for PCs and individual backup needs.
  • Cloud Storage – Recently cloud storage has emerged as a popular option for storing backup files. While it is managed by third parties, which limits the amount of control that the enterprise has to manage this storage, it comes with numerous advantages.  Usually, cloud backup will offer the option to use hard disks or SSDs, or both.  Cloud storage generally comes with many resiliency features, and it functions as a disaster recovery solution.  Cloud storage is usually fairly cheap since you are only paying for the capacity that you are using, rather than purchasing physical storage devices for future needs.  The downside is that the data must be encrypted, and it is usually sent over the public Internet.  This means that backup and recovery is usually much slower as the data must be copied over a longer distance.

Many organizations use a combination of storage media, which often dedicates the faster (and expensive) hardware to critical workloads and the slower (and cheaper) devices to less frequently accessed backup files.  This is known as storage tiering, and more information can be found in this blog.

Backup Methods for Enterprise Computer Data

This section describes the most common methods to create backup files.  There is a tradeoff between storage capacity and recovery time, which organizations need to consider when selecting the best option for their business needs.

  • Full Backup – The most common type of backup is a full backup where all of the data is captured. This is always required even if you are then using other backup methods, such as an incremental or differential backup.  This process may look a little different based on the component you are protecting, whether it is a file or a virtual machine. The downside of full backups is that they consume a lot of storage space as a full copy of the protected component’s data is captured each time. A full backup performs the following steps:
    • The backup is requested by a user or automatically on a schedule.
    • The backup software will identify the type of component which needs to be protected.
    • The backup software will wait for the component to be in a healthy state so that a complete backup can be created. This may involve pausing the component, flushing any transactions which are in memory (“quiescing”), or closing a file.  It is important that the file is in a “consistent” state so that it is healthy when it is restored.  Additionally, any other data in which the file needs will be captured, which could include metadata, system settings, boot settings, and the disk layout.
    • The backup files are stored.
    • The backup software is notified that the backup was complete successfully.
  • Incremental Backup –An incremental backup will back up the data which has changed since a full backup was taken. This method still requires an initial full backup, but afterward, it is much more efficient because each subsequent backup is faster and uses less storage space than the initial full backup.  However, when incremental backups are restored, they are much slower because each backup file must be sequentially merged as they are structured like a “chain” of backup files.  If one of the backup files is missing or corrupt, the backup will not be able to be recovered from that point onwards. An incremental backup performs the following steps:
    • The backup is requested.
    • If a full backup has not been taken, then take a full backup and save the backup file using the steps above. If a full backup has been taken, then only changes since the last incremental backup will be saved in an incremental backup file.
    • The full backup (if needed) and the incremental backup file is stored.
    • The backup software is notified that the backup was complete successfully.
  • Differential Backup – A differential backup is similar to an incremental backup since it requires taking a full backup then taking smaller subsequent backups. The main difference is that each of these secondary backups tracks all of the changes since the full backup, whereas the incremental backup tracks the changes since the last incremental backup.  The advantage to using differential backups is that they only require two files to be merged to restore the backup, the full backup, and the most recent differential backup, so this process is faster than restoring incremental backups.  Also, if one backup file is deleted, it doesn’t impact any of the other backups unless it is the full backup file. The downside to using differential backups is that it takes up more storage capacity than incremental backups since each backup file contains all the changes since the full backup.  A differential backup performs the following steps:
    • The backup is requested.
    • If a full backup has not been taken, then take a full backup and save the backup file using the steps above. If a full backup has been taken, then all the changes since that last full backup will be saved in the differential backup file.
    • The full backup (if needed) and the differential backup file is stored.
    • The backup software is notified that the backup was complete successfully.
  • Continuous Data Protection – If you require data protection with almost no data loss, then some providers will offer a continuous data protection (CDP) option. This works by saving a copy of the file each time that a change has been made.  Behind the scenes, this could use incremental backups, mirroring, or some other type of replication technology.  The benefit is that in a disaster, there is very little data loss. Still, the tradeoff is that this type of backup requires a lot of processing overhead and storage capacity for all of the backups, so it is usually expensive.

It is possible to use multiple backup methods for different types of workloads based on their priority or other business needs.  There are even more advanced backup types, which include synthetic full, reverse incremental, and incremental forever, although these are usually not offered by all ISVs.

Backup Features for Enterprise Computer Data

This section will review some of the popular backup features which different ISVs offer to enhance the speed or maximize the capacity of the backup and recovery process.  It is still important that you verify with your ISV that the features you needed are supported by them on your specific operating system, hypervisor, and storage media.  The following list shows the most commonly requested features.

  • Compression – A majority of backup providers will include built-in compression tools so that the backup files take up less space on the disk. This slows down the backup process, and decompression will slow down recovery.
  • Encryption – Since backup files will often contain sensitive information, many organizations will want to protect them with encryption automatically. This is particularly important if those backup files are stored in a remote location or transmitted across public networks. This slows down the backup process, and decryption will slow down recovery.
  • Duplication – Some organizations which to have multiple copies of each backup file, sometimes sending them across different locations so that the backup can be recovered from different sites.
  • Deduplication –This feature will detect identical files or blocks of data and only retain one copy, removing the duplicate files. This process can save a significant amount of storage space since many backup files contain identical and redundant content.
  • Item-Level Backup and Recovery – Instead of protecting and recovering an entire database, some ISVs will provide more granular options for specific workloads. For example, with Exchange Server, it is a common practice to offer the ability to recover a specific mailbox for a single user rather than to restore the database with all mailboxes for all users.
  • Data Grooming–One common challenge that enterprises often face is deleting old or unneeded backup files. Some regulated industries even require old backups to be deleted after a certain period to maintain compliance.  Data grooming allows you to set policies to automatically retain backups for a time then automatically delete them when they are no longer needed.
  • Multiplexing – Larger organizations will likely have more backup sources and storage locations. Multiplexing allows for multiple backup writers to access the same disk in a coordinated fashion so that all the backup data can be written simultaneously to specific storage media.

Recovery Options for Enterprise Computer Data

Being able to efficiently and completely recover your backup is a critical part of the process.  Recovery should be tested regularly as part of the standard operating procedure to familiarize the staff with the process and to minimize the chance of data loss.  When planning for recovery, the two common goals that organizations should consider include:

  • Recovery Time Objective (RTO) – This defines the goal for how long it should take to recover a backup. This includes the amount of time to detect that there is data loss, identify the appropriate backup to use for recovery, copy the backup file, restore it to the running system, then reconnect any dependent services or users.  Organizations want to minimize the RTO so that they can bring their systems online as fast as possible after a disaster.
  • Recovery Point Objective (RPO) – This defines the goal for the amount of data (in time) that can be lost during a disaster. This may be determined by the frequency and completeness of each backup.  For example, if a backup is taken every hour, then up to an hour of data could be lost.  Similar to the RTO, the organization should try to reduce its RPO.

Check out Altaro’s blog for more information about RTO and RPO [add link].  The final set of recovery options to consider are whether the recovery can be automated or if it requires manual intervention to find the backup file from the storage media.

  • Online Recovery – If the recovery server is connected to the storage media, which hosts the backup file, recovery can be made fairly quickly. The downside to support this would be if malware, particularly ransomware, infects the primary server.  This threat could also spread to the storage and damage the backup files, making them unrecoverable.
  • Offline Recovery – This option is less desired as the storage media for the backup files as it is not directly connected to the recovery system. This means that when recovery is needed, the staff may need to manually find the drive or tape drive containing the backup before the recovery process can begin.  Organizations may use this to reduce costs, but another advantage is that it is hard for this offline media to be infected by a datacenter-wide virus or malware.

This article was designed to provide you with an overview of computer data backup and recovery.  It described some of the key features which organizations should look for when selecting their backup software vendor.  There are many tradeoffs to be considered when evaluating the frequency of backups, the cost to store those backup files, and other desirable features.  Altaro is one of the industry’s leading backup providers and supports a majority of the scenarios described in this article, so be sure to consider them as you plan your organization’s backup strategy.

The post What is Backup and Recovery for Enterprise Computer Data appeared first on Altaro DOJO | Backup & DR.

]]>
https://www.altaro.com/backup-dr/what-is-backup-recovery/feed/ 0
Tape Storage vs. Disk Storage: Which is best for backup? https://www.altaro.com/backup-dr/tape-storage-vs-disk-storage/ https://www.altaro.com/backup-dr/tape-storage-vs-disk-storage/#respond Wed, 21 Oct 2020 07:42:35 +0000 https://www.altaro.com/backup-dr/?p=687 Disk storage!That is an easy question because using a disk for backup is clearly the best medium, especially in the modern software-define datacenter, as it provides technical, reliability, security, ease-of-use, and ecosystem advantages.

The post Tape Storage vs. Disk Storage: Which is best for backup? appeared first on Altaro DOJO | Backup & DR.

]]>

Disk storage! That is an easy question because using a disk for backup is clearly the best medium, especially in the modern software-defined datacenter, as it provides technical, reliability, security, ease-of-use, and ecosystem advantages.  However, if you have archival storage needs with very irregular access to the data, then tape storage has some benefits.  Perhaps a good comparison is that backups give you flexibility like using digital music on your computer, while tape storage is similar to playing music on cassette tapes. This blog will explore the tradeoffs between these hardware solutions for backing up enterprise data.

Benefits of Tape Storage for Archiving

Magnetic tapes were the first mainstream storage device offered to the IT industry and are still available today, with the most recent standard, LTO-8, being published in 2017.  A tape storage stores data on thin film and needs a mechanical drive to spin the tape and a drive head to read the data.  It is very similar to storing and playing music from a cassette tape, although it has much greater performance.

The hardware cost of tape storage is cheaper than using disks, particularly SSDs, making it a good choice for organizations to retain a significant amount of data. Businesses with compliance requirements that have to archive data for a long period of time, such as pharmaceutical companies, hospitals, or universities, may opt to use tape drive for its cost savings.

Read and write speeds on tape can be very fast, but only when the data is optimized, and there are no bottlenecks that affect its throughput.  Modern tape storage solutions support reasonable compression and encryption technologies. However, they may need to be uncompressed or decrypted from the same device that wrote them, which is good for security but bad for compatibility.

Once tape storage has been written to and is full, it must be physically moved to a separate archiving location, and it is considered offline and inaccessible until it is remounted on the tape drive.  Previously this was a manual operation performed by the IT staff, but with modern technologies, this can be automated.  The significant advantage of having genuine offline storage is that it is resilient to ransomware attacks since the data is physically isolated from the rest of the IT infrastructure.  These benefits are likely only going to make sense for organizations looking to archive their data that do not expect that they will need to regularly (or ever) access it again.

Technical Benefits of Disk Storage

Disk storage provides IT departments with agility, particularly in a modern and highly-virtualized datacenter.  Using software-defined storage, disks can be pooled so that writing backups to the disks is simplified since data can be split or mirrored across multiple disks for flexibility and resiliency.  Reading backups from the disks is also easier compared to tapes that need to be physically mounted and switched when they are full.  This usually limits the frequency in which backups can be taken using tape, which increases the Recovery Point Objective (RPO), and increases the time it takes to recover a backup after a failure, making the Recovery Time Objective (RTO) unpredictable.  Tape recovery can be especially challenging since it relies on the IT staff to physically find and mount the tape drive then find the data, and there is unlikely to be any type of remote access.  This also makes it difficult for an organization to configure different RTOs and RPOs for separate services, as tape backup forces a significant amount of policy standardization. Check out this Altaro blog for Defining the Recovery Time (RTO / RTA) and Recovery Point (RPO / RPA) for your Business.

While disks can be read from any point, tapes provide linear storage, so they need to be wound at the correct point.  This makes it much harder to find specific data, especially if you are trying to only restore a specific database or item, making a recovery even slower.  When there is an inconsistency between the expected and actual tape speed, “shoe-shining” occurs, which is when a tape has to be rewound or fast-forwarded again, and this decreases the tape’s transfer speed and durability.

Since tape storage is offline, it also lacks the ability to easily integrate with automated disaster recovery solutions.  If you are failing your services over to a second site that uses tape, the tape still needs to be mounted before the data is recovered and the services are brought online.  Tape storage is also fairly incompatible when backing up to or recovering from a public cloud, so cloud backup should be considered a preferable backup alternative in any highly virtualized datacenter.

Reliability Benefits of Disk Storage

Since most tape systems lack modern storage management technologies, they may not be able to take advantage of deduplication, replication, data grooming, defragmentation, or other optimization and resiliency features.  Even when a backup is taken, there may be a lack of proof that it was complete or that it was a data-consistent copy.  Small data losses or corruption can impact an entire tape drive, which may not be able to recover from an error.  Tape storage generally has more resource contention and connectivity issues, making recovery time for backups even longer.

Tape storage not only takes up a lot of space, but it is also fragile as both the tape’s film along with the tape drives which are used to read and write the data have many weaknesses.  The film cannot be exposed to magnetic fields, UV, sunlight, or any type of radiation or it becomes corrupted.  Tape storage must be used in a clean room, as any dust, heat, moisture, or creases can corrupt the file, making fieldwork with tape almost impossible.  Trying to read a flawed film can further damage both the tape and the drive reader.  There are many moving parts, the drive head has to be regularly cleaned, and all of the components will wear out within a few years of use.  The film must be stored vertically and handled gently to avoid damage, which can lead to corruption.  Since these tapes are decoupled from the physical storage system, they are also easier to steal and for the organization to detect that a theft has even occurred. In order for an IT department to scale, the staff should be focused on automating repetitive tasks, rather than having to “babysit” sensitive equipment and regularly change the tape to ensure that there is sufficient capacity for the next backup to be successful.

Ecosystem Benefits of Disk Storage

Since tape storage has become less popular, there has been less innovation in this space.  Disk storage vendors have been collaborating on industry standards to support software-defined storage; whereas they still exist, there is limited compatibility between tape storage suppliers.  Even when using the same vendor, some tape storage admins report that head alignment differences can create a problem when reading film that was written by another tape drive or between higher and lower capacity drives.  Because this legacy technology is harder to use, it is even more expensive to support, so tape repair specialists are costly, and recovery is harder compared to restoring a corrupted disk.

As one example, disk storage backup vendor Altaro offers the following features which would not be available with tape backup:

  • Filesystem-based backup storage (disk storage) on top of today’s modern file server infrastructure with all the associated tools like deduplication and encryption
  • Varying types of disk storage such as spinning disk, standard SSD, NVME…etc
  • OS-Level permission and access control provided with the used storage system
  • Access to cloud-based storage
  • Easily managed using today’s technology management stack.

As you can see, disk storage provides numerous benefits, with the primary pro being the utilization of all the software-defined capabilities in the world today. For example, inline-deduplication is a highly effective cost-savings tool, and while some tape drives may have this technology, it doesn’t perform the deduplication as effectively, depending on your software vendor.

Another item worth noting specifically is the fact that you can use high-performance flash storage for backups if needed. For example, if you’re in a highly-regulated environment with a high rate of data change and strict RPO requirements, you may need flash storage to be able to accommodate your data protection requirements. You won’t get that level of performance with tape storage.

Finally, the management tools of today’s IT department are all fully capable of managing general-use disk storage. As mentioned previously, tape storage is considered legacy, and as such many software vendors today do not provide the needed tools or support to manage tape storage en-masse.

A Note on Air-Gapped Backups

Air-Gapping is a term that you’ll often see used with disk-based backup systems. Air-Gapping refers to the practice of having one copy of disk-based backups stored in a location with heavily controlled access or completely inaccessible to all accounts outside of the service account used to run the backup application.

The primary reason behind this is to address the problem of ransomware. As you’ve likely heard, ransomware will often seek out backups and encrypt them. This prevents them from being used to recover from a ransomware attack. The aim of air-gapping is to prevent uncontrolled internal access to at least one set of backups. This allows them to be used to recover in the event of a ransomware attack.

Common methods for Air-Gapping include:

  • File-Share with the backup service account being the only account with access.
  • File Storage on a completely different LAN segment
  • Cloud-Based Storage Accounts
  • File Shares with automated timed connection just prior to the backup window

An example air-gapped deployment shown below:

 

Summarizing the Tradeoffs Between Tape and Disk Backup Storage

It should be apparent that disk storage is a better solution for backup, and tape storage should only be considered for archiving data.  While there may be some initial cost savings from using tape, these are artificial when considering the overall human effort involved in managing the tape system, the risk of losing the data, and the longer recovery time if a failure happens.

Below is a chart showcasing some of the comparisons between tape and disk storage.

If you are deploying a backup solution in a modern datacenter, use current technology with disk storage.  While using retro cassette tapes may still look cool, I bet you are using a *digital* music player at work! It’s suggested you do the same with your backups.

The post Tape Storage vs. Disk Storage: Which is best for backup? appeared first on Altaro DOJO | Backup & DR.

]]>
https://www.altaro.com/backup-dr/tape-storage-vs-disk-storage/feed/ 0
Managing Mailbox Retention and Archiving Policies in Microsoft 365 https://www.altaro.com/backup-dr/retention-archiving-microsoft-365/ https://www.altaro.com/backup-dr/retention-archiving-microsoft-365/#respond Fri, 15 May 2020 16:32:09 +0000 https://www.altaro.com/hyper-v/?p=18635 This article shows options a Microsoft 365 administrator has when setting up retention policies for Exchange, SharePoint and other Microsoft 365 workloads

The post Managing Mailbox Retention and Archiving Policies in Microsoft 365 appeared first on Altaro DOJO | Backup & DR.

]]>

Microsoft 365 (formerly Office 365) provides a wide set of options for managing data classification, retention of different types of data, and archiving data. This article will show the options a Microsoft 365 administrator has when setting up retention policies for Exchange, SharePoint, and other Microsoft 365 workloads and how those policies affect users in Outlook. It’ll also cover the option of an Online Archive Mailbox and how to set one up.

There’s also an accompanying video to this article which shows you how to configure a retention policy, retention labels, enabling Archive mailboxes, and creating a move to archive retention tag.

How To Manage Retention Policies in Microsoft 365

There are many reasons to consider labeling data and using retention policies but before we discuss these let’s look at how Office 365 manages your data in the default state. For Exchange Online (where mailboxes and Public Folders are stored if you use them), each database has at least four copies, spread across two datacenters. One of these copies is a lagged copy which means the replication to it is delayed, to provide the option to recover from a data corruption issue. In short, a disk, server, rack, or even datacenter failure isn’t going to mean that you lose your mailbox data.

Further, the default policy (for a few years now) is that deleted items in Outlook stay in the Deleted Items folder “forever”, until you empty it, or they are moved to an archive mailbox. If an end-user deletes items out of their Deleted Items folder, they’re kept for another 30 days (as long as the mailbox was created in 2017 or later), meaning the user can recover it, by opening the Deleted Items folder and clicking the link.

Where to find recoverable items in Outlook, Microsoft 365

Where to find recoverable items in Outlook

This opens the dialogue box where a user can recover one or more items.

Recovering deleted items in Exchange Online, Microsoft 365

Recovering deleted items in Exchange Online

If an administrator deletes an entire mailbox it’s kept in Exchange Online for 30 days and you can recover it by restoring the associated user account.

Additionally, it’s also important to realize that Microsoft does not back up your data in Microsoft 365. Through native data protection in Exchange and SharePoint online they make sure that they’ll never lose your current data but if you have deleted an item, document or mailbox for good, it’s gone. There’s no secret place where Microsoft’s support can get it back from (although it doesn’t hurt to try), hence the popularity of third-party backup solutions such as Altaro Office 365 Backup.

Litigation Hold – the “not so secret” secret

One option that I have seen some administrators employ is to use litigation or in-place hold (the latter feature is being retired in the second half of 2020) which keeps all deleted items in a hidden subfolder of the Recoverable Items folder until the hold lapses (which could be never if you make it permanent). Note that you need at least an E3 or Exchange Online Plan 2 for this feature to be available. This feature is designed to be used when a user is under some form of investigation and ensures that no evidence can be purged by that user and it’s not designed as a “make sure nothing is ever deleted” policy. However, I totally understand the job security it can bring when the CEO is going ballistic because something super important is “gone”.

Litigation hold settings for a mailbox, Microsoft 365

Litigation hold settings for a mailbox

Retention Policies

If the default settings and options described above doesn’t satisfy the needs of your business or regulatory requirements you may have, the next step is to consider retention policies. A few years ago, there were different policy frameworks for the different workloads in Office 365, showing the on-premises heritage of Exchange and SharePoint. Thankfully we now have a unified service that spans most Office 365 workloads. Retention in this context refers to ensuring that the data can’t be deleted until the retention period expires.

There are two flavors here, label policies which publish labels to your user base, letting users pick a retention policy by assigning individual emails or documents a label (only one label per piece of content). Note that labels can do two things that retention policies can’t do, firstly they can apply from the date the content was labeled, and secondly, you can trigger a disposition / manual review of the SharePoint or OneDrive for Business document when the retention expires.

Labels only apply to objects that you label; it doesn’t retroactively scan through email or documents at rest. While labels can be part of a bigger data classification story, my recommendation is that anything that relies on users remembering to do something extra to manage data will only work with extensive training and for a small subset of very important data. You can (if you have E5 licensing for the users in question) use label policies to automatically apply labels to sensitive content, based on a search query you build (particular email subject lines or recipients or SharePoint document types in particular sites for instance) or to a set of trainable classifiers for offensive language, resumes, source-code, harassment, profanity, and threats. You can also apply a retention label to a SharePoint library, folder, or document set.

As an aside, Exchange Online also has personal labels that are similar to retention labels but created by users themselves instead of being created and published by administrators.

A more holistic flavor, in my opinion, is retention policies. These apply to all items stored in the various repositories and can apply across several different workloads. Retention policies can also both ensure that data is retained for a set period of time AND disposed of after the expiry of the data, which is often a regulatory requirement. A quick note here if you’re going to play around with policies is that they’re not instantaneously applied – it can take up to 24 hours or even 7 days, depending on the workload and type of policy – so prepare to be patient.

These policies can apply across Exchange, SharePoint (which means files stored in Microsoft 365 Groups, Teams, and Yammer), OneDrive for business, and IM conversations in Skype for Business Online / Teams and Groups. Policies can be broad and apply across several workloads, or narrow and only apply to a specific workload or location in that workload. An organization-wide policy can apply to the workloads above (except Teams, you need a separate policy for its content) and you can have up to 10 of these in a tenant. Non-org wide policies can be applied to specific mailboxes, sites, or groups or you can use a search query to narrow down the content that the policy applies to. The limits are 10,000 policies in a tenant, each of which can apply to up to 1000 mailboxes or 100 sites.

Especially with org-wide policies be aware that they apply to ALL selected content so if you set it to retain everything for four years and then delete it, data is going to automatically start disappearing after four years. Note that you can set the “timer” to start when the content is created or when it was last modified, the latter is probably more in line with what people would expect, otherwise, you could have a list that someone updates weekly disappear suddenly because it was created several years ago.

To create a retention policy login to the Microsoft 365 admin center, expand Admin centers, and click on Compliance. In this portal click on Policies and then Retention under Data.

Retention policies link in the Compliance portal, Microsoft 365

Retention policies link in the Compliance portal

Select the Retention tab and click New retention policy.

Retention policies and creating a new one, Microsoft 365

Retention policies and creating a new one

Give your policy a name and a description, select which data stores it’s going to apply to and whether the policy is going to retain and then delete data or just delete it after the specified time.

Retention settings in a policy, Microsoft 365

Retention settings in a policy

Outside of the scope of this article but related are sensitivity labels, instead of classifying data based on how long it should be kept, these policies classify data based on the security needs of the content. You can then apply policies to control the flow of emails with this content, or automatically encrypt documents in SharePoint for instance. You can also combine sensitivity and retention labels in policies.

Conflicts

Since there can be multiple policies applied to the same piece of data and perhaps even retention labels in play there could be a situation where conflicting settings apply. Here’s how these conflicts are resolved.

Retention wins over deletion, making sure that nothing is deleted that you expected to be retained and the longest retention period wins. If one policy says two years and another says five years, it’ll be kept for five. The third rule is that explicit wins over implicit so if a policy has been applied to a specific area such as a SharePoint library it’ll take precedence over an organization-wide general policy. Finally, the shortest deletion policy wins so that if an administrator has made a choice to delete content after a set period of time, it’ll be deleted then even if another policy applies that requires deletion after a longer period of time. Here’s a graphic that shows the four rules and their interaction:

Policy conflict resolution rules. Microsoft 365

Policy conflict resolution rules (courtesy of Microsoft)

As you can see, building a set of retention policies that really work for your business and don’t unintentionally cause problems is a project for the whole business, working out exactly what’s needed across different workloads, rather than the job of a “click-happy” IT administrator.

Archive Mailbox

It all started with trying to rid the world of PST stored emails. Back in the day, when hard drive and SAN storage only provided small amounts of storage, many people learnt to “expand” the capacity of their small mailbox quota with local PST files. The problem is that these local files aren’t backed up and aren’t included in regulatory or eDiscovery searches. Office 365 largely solved part of this problem by providing generous quotas, the Business plans provide 50 GB per mailbox whereas the Enterprise plans have 100 GB limits.

If you need more mailbox storage one option is to enable online archiving which provides another 50 GB mailbox for the Business plans and an unlimited (see below) mailbox for the Enterprise plans. There are some limitations on this “extra” mailbox, it can only be accessed online, and it’s never synchronized to your offline (OST) file in Outlook. When you search for content you must select “all mailboxes” to see matches in your archive mailbox. ActiveSync and the Outlook client on Android and iOS can’t see the archive mailbox and users may need to manually decide what to store in which location (unless you’ve set up your policies correctly).

For these reasons many businesses avoid archive mailboxes altogether, just making sure that all mailbox data is stored in the primary mailbox (after all, 100 GB is quite a lot of emails). Other businesses, particularly those with a lot of legacy PST storage find these mailboxes fantastic and use either manual upload or even drive shipping to Microsoft 365 to convert all those PSTs to online archives where the content isn’t going to disappear because of a failed hard drive and where eDiscovery can find it.

For those that really need it and are on E3 or E5 licensing you can also enable auto-expanding archives which will ensure that as you use up space in an online archive mailbox, additional mailboxes will be created behind the scenes to provide effectively unlimited archival storage.

To enable archive mailboxes, go to Security & Compliance Center, click on Information governance, and the Archive tab.

The Archive tab, Microsoft 365

The Archive tab

Click on a user’s name to be able to enable the archive mailbox.

Archive mailbox settings, Mod admin, Microsoft 365

Archive mailbox settings

Once you have enabled archive mailboxes, you’ll need a policy to make sure that items are moved into at the cadence you need. Go to the Exchange admin center and click on Compliance management – Retention tags.

Exchange Admin Center - Retention tags, Microsoft 365

Exchange Admin Center – Retention tags

Here you’ll find the Default 2 year move to archive tag or you can create a new policy by clicking on the + sign.

Exchange Retention tags default policies, Microsoft 365

Exchange Retention tags default policies

Pick Move to Archive as the action, give the policy a name and select the number of days that has to pass before the move happens.

Creating a custom Move to archive policy, Microsoft 365

Creating a custom Move to archive policy

Note that online archive mailboxes have NOTHING to do with the Archive folder that you see in the folder tree in Outlook, this is just an ordinary folder that you can move items into from your inbox for later processing. This Archive folder is available on mobile clients and also when you’re offline and you can swipe in Outlook mobile to automatically store emails in it.

Conclusion

Now you know how and when to apply retention policies and retention tags in Microsoft 365, as well as when online archive mailboxes are appropriate and how to enable them and configure policies to archive items.

The post Managing Mailbox Retention and Archiving Policies in Microsoft 365 appeared first on Altaro DOJO | Backup & DR.

]]>
https://www.altaro.com/backup-dr/retention-archiving-microsoft-365/feed/ 0
Evaluating Hyper-V Backup Storage Solutions https://www.altaro.com/backup-dr/evaluating-hyper-v-backup-storage-solutions/ https://www.altaro.com/backup-dr/evaluating-hyper-v-backup-storage-solutions/#respond Thu, 20 Oct 2016 14:35:50 +0000 http://www.altaro.com/hyper-v/?p=9739 The first step is determining your storage requirements. Here are the things you need to know before choosing your preferred Hyper-V Backup Storage.

The post Evaluating Hyper-V Backup Storage Solutions appeared first on Altaro DOJO | Backup & DR.

]]>

It’s not difficult to find recommendations about what storage to use with your backup solution — from the people that make backup storage solutions. I certainly can’t begrudge a company trying to turn a coin by promoting their products, but it’s also nice to get some neutral assistance. What I won’t do is throw a list of manufacturers at you and send you on your way; that doesn’t help anyone except the manufacturers on the list. What I am going to do is give you guidance on how to analyze your situation to determine what solutions are most applicable to you which gives you the ability to select the manufacturer(s) on your own terms. I’m also going to show you some possible backup designs that might inspire you in your own endeavors.

Needs Assessment

The very first thing to do is determine what your backup storage needs are. Most people are not going to be able to work from a simple formula such as: we have 1 TB of data so we need 1 TB of backup storage. Figure out the following two items first:

  • How long does any given bit of data need to be stored?
  • Do we only need the most recent copy of that data or do we need a historical record of the original and changed versions? For instance, if your CRM application is tracking all customer interactions and you do not purge data from it, how many backups of that data are necessary to meet your data retention goals?

As you are considering these, be mindful of any applicable legal regulations. This is especially true in finance and related fields, such as insurance. Do not try to get everything absolutely perfect in this first wave. This is the part where you prioritize your data and determine what its lifespan should be. You’ll need to have a decent understanding of the concepts in the next section before you can begin architecting your solution.

If you need to brush up on any of the basics to help you complete your needs assessment, we have an article that covers them.

Backup Storage Options

Alongside a needs assessment, you need to know what storage options are available to you. This will guide you to your final design. At a high level, the options are:

  • Non-hard disk media, such as tape and optical
  • Portable disk drives
  • Solid state media
  • Permanently-placed disk drives
  • Over-the-network third-party provider storage

Non-Hard Disk Media

Disk drives have precision internal mechanical components and electronics that fail. The conventional decades-old wisdom has been to copy data to some other type of media in order to protect it from these shortcomings. The two primary media types that fall into this category are tapes and optical systems.

backstore-nondiskPros of Non-Hard Disk Media

  • Tried-and-true
  • Vendors have specialized to the particular needs of backup and restoration
  • Portable
  • Durable long-term storage (tape)
  • Relatively inexpensive long-term storage

Cons of Non-Hard Disk Media

  • Expense (tape)
  • Special drives and software are needed, which may fail and/or become obsolete while the media is still viable
  • Easily damaged
  • Very slow recovery process

Tape is the traditional king of backup media and is going strong today. It’s not very fast, but it’s highly portable, well-understood, and usually provides a solid ratio of expense, risk, and protection. It can be very expensive, however, but it’s typically the tape drive that drives the cost up the most. Media costs vary; smaller is cheaper, obviously, but there is also a difference in formats. DAT drives are cheaper than LTO drives, but even the highest capacity DAT tapes are nowhere near as large as most same-generation LTO tapes.

Tape must be cared for properly — it absolutely must be kept away from electromagnetic fields and heat. Tapes should be stored upright, preferably in a shielded container designed specifically for holding backup tapes. If these precautions are followed, tapes can easily last a decade. That said, the drives that can read a particular tape have a much shorter lifespan and you might have trouble finding a working drive that can read those old tapes. I’ve also run into issues where I had a good tape and a tape drive that was probably good enough to read it, but we couldn’t locate the software that recorded it. If you’re looking to hold onto backups for a very long time, tape has the highest shelf lifespan-to-cost ratio of all backup media.

Optical backup media popped up as an inexpensive alternative to tape. Optical drives are much cheaper than tape drives and optical media provides the same capacity at a fraction of the price of tape. However, optical media’s star never burned very brightly and dimmed very quickly. Optical media backups are very slow, the capacity-per-unit is not ideal, and durability is questionable. Optical media does have the ability to survive in electromagnetic conditions that would render tape useless, but is otherwise inferior. Unless you only have an extremely small amount of data to protect and your retention needs are no more than a few years, I would recommend skipping optical media.

A very large problem with tape and other non-disk media is that restoring data is a time-consuming process. If you want to restore just a few items, it will almost undoubtedly take far longer to locate that data on media than it will to restore it.

Portable Disk Drives

In my mind, portable disk drives are a relative newcomer in the backup market, although it has occurred to me that many of you have probably used them your entire career.

PortableDiskPros of Portable Disk Drives

  • Inexpensive
  • Reasonably durable
  • Portable
  • Common interfaces that are likely to still be usable in the years to come
  • Relatively quick recovery process

Cons of Portable Drives

  • Mechanical and electronic failure can render data inaccessible except by specialized, expensive processes
  • Long-term offline storage capability is not well-known
  • Drive manufacturers do not tailor their products to the backup market (although some enclosure manufacturers do)
  • Fairly expensive long-term storage

The expense and physical size of portable drives have shrunk while their bandwidth and storage capacity have grown substantially, making them a strong contender against tape. Their great weakness is a reliance on internal mechanisms that are more delicate and complicated than tape, not to mention their electronic circuitry. Most should be well-shielded enough that minor electromagnetic and static electricity fields should not be of major concern.

What you must consider is that tape drives have been designed around the notion of holding their magnetic state for extended periods of time; if kept upright in a container with even modest shielding, they can easily last a decade. Hard drives are not designed or built to such standards. You’ll hear many stories of really old disks pulled out of storage and working perfectly with no data loss — I have several myself. The issue is that those old platters did not have anything resembling the ultra-high bit densities that we enjoy today. What that means is that the magnetic state for any given bit might have degraded a small amount without affecting the ability of the read/write head to properly ascertain its original magnetic state. The effects of magnetic field degradation will be more pronounced on higher density platters. I do not have access to any statistics, primarily because these ultra-high capacity platters haven’t been in existence long enough to gather such information, but at this time, I personally would not bank on a very large stored hard drive keeping a perfect record for more than a few years.

Hard disks that are rotated often will suffer mechanical or electronic failure long before magnetic state becomes a concern. A viable option is to simply swap new physical drives in periodically. If you want to use hard drives for very long-term offline storage, add it to your maintenance cycle to spin up old drives and transfer their contents to new drives that replace them.

Solid State Media

The latest entry in the backup market is solid state media. The full impact of solid state on the backup market has not yet been felt. I expect that it will cause great changes in the market as costs decline.

SSDPros of Solid State Media

  • Extremely durable
  • Fast (newer types)
  • Very portable

Cons of Solid State Media

Its high cost-to-capacity ratio is the primary reason that it has not overtaken all other media types. It is far more durable and some types are faster than both disk and tape. If you can justify the expense, solid state is the preferred option.

Permanently-Placed Disk Drives

Another option that has only become viable within the last few years is storage that never physically moves, such as NAS devices.

SANPros of Permanently-Placed Drives

  • Very high reliability and redundancy — dependent upon manufacturer/model
  • High performance
  • Can be physically secured and monitored

Cons of Permanently-Placed Drives

  • High equipment expense
  • Best used with multi-site facilities
  • Dependent upon speed and reliability of inter-site connections

Loss of the primary business location and theft of backup media are ever-present concerns; the traditional solution has been to transport backup tapes offsite to a secured location, such as a bank safety deposit box (or somebody’s foyer credenza, that sort of thing happens a lot more often than many will admit). With the cost of Internet bandwidth declining, we now have the capability to transmit backup data over the wire to remote locations in a timely and secured fashion.

While I do not recommend it, it would theoretically be acceptable to use on-premises permanent disk drives for very short-term backup storage. This would allow for a very short RTO to address minor accidents. As long as it is made abundantly clear to all interested parties that such a solution is equally vulnerable to anything that threatens the safety and security of the site, there are viable applications for such a solution.

Over-the-Network Third Party Provider Storage

The primary distinguishing factor between this category and the prior entry is ownership. You can pay someone else to maintain the facility and the equipment that houses your offsite copies.

OffsiteDiskPros of Third-Party Offsite Providers

  • In theory, it is a predictable recurring expense
  • Potential for additional included services at a lower expense than a do-it-yourself solution
  • Full-time subject-matter experts maintain your data for you

Cons of Third-Party Offsite Providers

  • In theory, providers could make dramatic changes in pricing and service offerings and effectively hold your data and reliability of storage hostage
  • Trust and integrity concerns
  • You may not be able to control the software and some other components of your backup strategy

There are several enticing reasons to work with offsite backup providers. Many offer additional services, such as hosting your data in a Remote Desktop Services environment as a contingency plan. Truthfully, I believe that the primary barrier in the cloud-based storage market is trust. Several of the organizations offering these services are “fly-by-night” operations trying to turn a fast dollar by banking on the fact that almost none of their customers will ever need to rely on their restoration or hosting services. I also don’t think that the world is soon going to forget how Microsoft did everything but make it a requirement that we sync our Windows profiles into Onedrive and then radically increased the costs of using the service. Large service providers can do that sort of things to their customers and survive the fallout.

You can approach third-party offsite storage in two ways:

  • A full-service provider that supplies you with software that operates on your equipment and transmits to theirs
  • A no-/low-frills provider that supplies you with a big, empty storage space for you to use as you see fit

What you receive will likely have great correlation with what you spend.

Designing a Backup Storage Solution

At this point, you know what you need and you know what’s available. All that’s left is to put your knowledge to work by designing a solution that suits your situation.

Backup Strategies

Let’s start with a few overall strategies that you can take.

Disk-to-Tape (or other Non-Disk Media)

This is the oldest and most common backup method. You have a central system that dumps all of your data on tape (or some other media, such as optical) using a rotation method that you choose.

Disk-to-Disk

A more recent option is disk-to-disk. Your backup application transfers data to portable disks which are then transferred offsite or to a permanent installation, hopefully in another physical location.

Disk-to-Disk-to-Tape

A somewhat less common method is to first place regular backups on disk. At a later time these backups, or a subset of them, can be transferred to tape. This gives you the benefit of rapidly recovering recent items while keeping fairly inexpensive long-term copies. You wouldn’t need to rotate as many tapes through, and the constant rewriting of the disks mean that they won’t be expected to retain their data for long.

Disk-to-Local-to-Offsite

Another recent option that can serve as a viable alternative to tape is first backing up data locally, then transferring it to offsite long-term storage, whether its a site that you control or one owned by a third-party provider. This type of solution eliminates the need to manually move data by entrusting someone to physically carry media. In order for this type of solution to be viable, you must have sufficient outbound bandwidth to finish backups in a sufficiently small time frame.

Disk-to-Offsite

You could also opt to transfer your data directly offsite without keeping a local copy. This approach is essentially the same as the previous, but there’s nothing left at the primary location.

Backup Storage Examples

Let’s consider a few real-world-style examples.

Scenario 1

  • 4 virtual machines
    • 1 domain controller
    • 1 file/print VM
    • 1 application VM
    • 1 SQL VM
  • 300 GB total data
  • Cloud or ISP-based e-mail provider
  • No particular retention requirements
  • Uses line-of-business software with a database

This example is a fair match for a large quantity of small businesses. Some might have mixed their roles into fewer VMs and most will have somewhat different total backup data requirements, but this should scenario should have a large applicability base.

I would recommend using a set of portable hard drives in a rotation. I’d want a solid monthly full backup and a weekly full with at least two drives rotated daily. If using a delta-style application like Altaro VM Backup, the daily deltas are going to be very small so you won’t need large drives. Keeping historical data is probably not going to be helpful as long as at least one known good copy of the database exists.

If budget allows, I would strongly encourage using an offsite or third party storage-only provider to hold the monthly backups.

Probably the biggest thing to note here is that retention isn’t really an issue.

Scenario 2

  • 4 virtual machines
    • 1 domain controller
    • 1 file/print VM
    • 1 application VM
    • 1 SQL VM
  • 300 GB total data
  • Cloud or ISP-based e-mail provider
  • No particular retention requirements
  • Uses line-of-business software with a database

The layout here is the same as Scenario 1. As small as this is, it would be a good candidate for direct offsite transmission. Most backup applications that allow for such a thing allow a “seed” backup. You copy everything to a portable disk, have the disk physically transported to the destination backup site, then place that backup onto permanently-placed storage. From then on, nightly backups are just deltas from that original. Small businesses typically do not have a great deal of daily data churn, so this is a viable solution.

Scenario 3

  • 4 virtual machines
    • 1 domain controller
    • 1 file/print VM
    • 1 application VM
    • 1 SQL VM
  • 300 GB total data
  • Cloud or ISP-based e-mail provider
  • 5-year retention requirements for financial data
  • Uses line-of-business software with a database

This is the same as the first scenario, only now we have a retention requirement. To figure out how to deal with that, we need to know what qualifies as “financial data”. If your accountant keeps track of everything in Excel, then those Excel files probably qualify. If it’s all in the line-of-business app and it holds financial records in the database for at least five years before purging, then you probably don’t need to worry about retention in backup.

I want to take a moment here to talk about retention, because I’ve had some issues getting customers to understand it in the past. If you’ve got a 5-year retention requirement, that typically means that you must be able to produce any data that was generated within the last five years. It does not necessarily mean that you need to have every backup ever taken for the last five years. If I created a file in December of 2012 and that file is still sitting on my file server, then it was included in the full backup that I took on Sunday, July 4th, 2016. I don’t need to produce an old backup. Retention mainly applies to deleted and changed data. So, in more real-world terms, if all of the data that is in scope for your retention plan is handled by your line of business application and it is tracking changes in the database for at least as far back as the retention policy, then the only thing that you need old backups for is if you suspect that people are purging data before it reaches its five-year lifespan. That’s a valid reason and I won’t discount it, but I also think it’s important for customers to understand how retention works.

Let’s say that the data applicable to the long-term retention plan is file-based and is not protected in the database. In that case, I would recommend investigating options for capturing annual backups. Retain twelve monthly backups and keep one per year. Annual backups can be discarded after five years. My preference for storage of annual backups:

  1. Third-party offsite storage provider
  2. Self-hosted offsite permanent disk storage
  3. Portable hard disk
  4. Tape

Remember that we’re talking about up to 5 TB of long-term storage, although I wouldn’t recommend trying to keep 100% of the 300 GB in each annual backup. 5 TB of offline storage is not expensive (unless you’re buying a tape drive just for that purpose), so this should be a relatively easily attainable goal.

Scenario 4

  • 7 virtual machines
    • 2 domain controllers
    • 1 file/print VM
    • 2 application VMs
    • 1 SQL VM
    • 1 Exchange VM
  • 1.2 TB total data
  • 5-year retention requirements for financial data

This is a larger company than the preceding and it’s got some different requirements. The first thing to sort out will be what the 5-year retention requirement applies to and if it can be met just by ensuring that there is a solid copy of the database. Read the expanded notes for scenario 2, as they would apply here.

Truthfully, I would follow generally the same plan as in scenario 2. The drives would need to be larger, of course, but 1.2 TB in a single backup is very small these days. With applications such as Altaro VM Backup able to target multiple storage drives simultaneously, this system could grow substantially before portable disks become too much of a burden for a nightly rotation. This is in contrast to my attitude from only a few years ago, when I would have almost undoubtedly installed a tape drive and instituted something akin to a GFS rotation.

Scenario 5

Let’s look at a larger institution.

  • 25 virtual machines
    • Multiple domain controllers
    • Large central file server
    • Multiple application servers
    • Highly available Exchange
    • Highly available SQL
  • 10 TB total data
  • 5-year retention plan; financial only by law, but CTO has expanded the scope to all data

Honestly, even though it seems like there is a lot going on here, 10 TB is much more than most installations that fit this description will realistically be using. But, I wanted to aim large. This scenario is probably not going to be well-handled by portable drives unless you have someone on staff that enjoys carting them around and plugging them in. Even tape is going to struggle with this unless you’ve got the money for a multi-drive library.

Here’s what I would recommend:

  • A data diet. 10 TB? Really?
  • A reassessment of the universal 5-year retention goal
  • 2 inexpensive 8-bay NAS devices, filled with 3 TB SATA disks in RAID-6, with a plan in the budget to bring in a third and fourth NAS

Part of this exercise is to encourage you to really work on assessing your environment, not just nodding and smiling and playing the ball as it lies. Ask the questions, do the investigations, find out what is really necessary. The last thing that you want to do is back up someone’s pirated Blu-Ray collection and then store it somewhere that you’re responsible for. “Employment gap to fulfill a prison sentence due to activities at a previous employer” is an unimpressive entry on a resume. Also, be prepared to gently challenge retention expectations. Blanket statements are often issued in very small and very large institutions because it sometimes costs them more to carefully architect around an issue than it does to just go all-in. Organizations in the middle can often benefit from mixed retention policies. So, before you just start drawing up plans to back up that 10 TB and keep it for 5 years, find out if that’s truly necessary.

My third bullet point assumes that you discover that you have 10 TB of data that needs to be kept for 5 years. That does happen. I’m also working from the assumption that any organization that needs to hold on to 10 TB of data has the finances to make that happen. I would configure the first NAS as a target for a solid rotation scheme similar to GFS with annuals. Use software deltas and compression to keep from overrunning your storage right away. All data should be replicated to the second NAS which should live in some far away location connected by a strong Internet connection. As space is depleted on the devices, bring in the second pair of NAS devices — by that time, 4 TB drives (or larger) might be a more economical choice.

I would also recommend bringing in a second tier of backup for long-term storage. That might take the form of an offsite provider or tape.

Hopefully, though, you discover that you really don’t need to backup 10 TB of storage and can just follow a plan similar to scenario 3.

The post Evaluating Hyper-V Backup Storage Solutions appeared first on Altaro DOJO | Backup & DR.

]]>
https://www.altaro.com/backup-dr/evaluating-hyper-v-backup-storage-solutions/feed/ 0