Non-disruptive and minimally disruptive data migration in active-active clusters

US 9 460 028B1

drawing #0

Show all 10 drawings

Data migration is performed in a cluster of host computers each using a mechanism associating data with a source LUN. During a synchronization operation the contents of the source LUN are copied to the target LUN while ongoing normal source LUN writes are cloned to the target LUN. A datapath component of an agent coordinates the writes at the target LUN to maintain data consistency. Upon completion of synchronization, each host stops the write cloning and disables access to the source LUN, in conjunction with a modification of the mechanism to newly associate the data with the target LUN. Depending on the type of mechanism and system, the modification may be done either disruptively or non-disruptively, i.e., with or without stopping normal operation of software of the host computers.

PatentSwarm provides a collaborative workspace to search, highlight, annotate, and monitor patent data.

Start free trial Sign in

Tip: Select text to highlight, annotate, search, or share the selection.

Claims

1. A method of migrating data from a source logical unit (LUN) to a target LUN in a data processing system having a cluster of multiple host computers and a storage subsystem containing the source LUN and the target LUN, the data being accessed concurrently by software of the host computers using a mechanism in each of the host computers initially associating the data with the source LUN as the location of the data in the system, comprising:
performing a migration operation to migrate the data from the source LUN to the target LUN by
i) commanding a kernel-level migration component of each host computer to disable access to the target LUN and to begin cloning source LUN writes to the target LUN, each write to be cloned by writing a duplicate of each source LUN write to the target LUN, wherein the source LUN writes are application writes of new data,
ii) subsequently initiating a LUN copying operation and target LUN write coordination by an agent in the system, the LUN copying operation reading data from the source LUN in order of increasing block addresses and writing the data to the target LUN in multi-block chunks, the target LUN write coordination coordinating target LUN writes from the LUN copying operation with target LUN writes from the cloning of source LUN writes to ensure consistency between the source LUN and target LUN, wherein the target LUN write coordination includes maintaining pointers that divide the target LUN into a) a synchronized region containing chunks that have previously been copied from the source LUN to the target LUN, b) an in-progress region containing a chunk that is currently being copied from the source LUN to the target LUN, and c) a to-be-synchronized region having contents not yet copied from the source LUN to the target LUN, and wherein maintaining the pointers includes updating the pointers responsive to each chunk copied from the source LUN to the target LUN being completely stored on the target LUN, wherein the agent includes a control component and a datapath component, the datapath component residing in a physical storage array device with the target LUN and operative to perform the target LUN write coordination, wherein the datapath component residing on the physical storage array device with the target LUN coordinates target LUN writes from the LUN copying operation with target LUN writes from the cloning of source LUN writes to ensure consistency between the source LUN and target LUN at least in part by discarding those target LUN writes that result from the cloning of source LUN writes and are directed to the to-be-synchronized region of the target LUN, and returning an indication of success, with regard to the discarded target LUN writes, to hosts that initiated the discarded writes, and
iii) upon completion of the LUN copying operation, commanding the kernel-level component of each host computer to stop the cloning of source LUN writes and to disable access to the source LUN by the host computer software in conjunction with a modification of the mechanism to newly associate the data with the target LUN as the location of the data in the system; and
wherein each host computer clones each source LUN write to the target LUN at least in part by i) writing to the source LUN, ii) determining whether the write to the source LUN completed, and iii) in response to determining that the write to the source LUN completed, duplicating the source LUN write to the target LUN.

Show 15 dependent claims

17. A system comprising:
at least one processor;
at least one memory for storing program code for, when executed on the processor, migrating data from a source logical unit (LUN) to a target LUN in a data processing system having a cluster of multiple host computers and a storage subsystem containing the source LUN and the target LUN, the data being accessed concurrently by software of the host computers using a mechanism in each of the host computers initially associating the data with the source LUN as the location of the data in the system, the program code comprising:
program code for performing a migration operation to migrate the data from the source LUN to the target LUN including
i) program code for commanding a kernel-level migration component of each host computer to disable access to the target LUN and to begin cloning source LUN writes to the target LUN, each write to be cloned by writing a duplicate of each source LUN write to the target LUN, wherein the source LUN writes are application writes of new data;
ii) program code for subsequently initiating a LUN copying operation and target LUN write coordination by an agent in the system, the LUN copying operation reading data from the source LUN in order of increasing block addresses and writing the data to the target LUN in multi-block chunks, the target LUN write coordination coordinating target LUN writes from the LUN copying operation with target LUN writes from the cloning of source LUN writes to ensure consistency between the source LUN and target LUN, wherein the target LUN write coordination includes maintaining pointers that divide the target LUN into a) a synchronized region containing chunks that have previously been copied from the source LUN to the target LUN, b) an in-progress region containing a chunk that is currently being copied from the source LUN to the target LUN, and c) a to-be-synchronized region having contents not yet copied from the source LUN to the target LUN, and wherein maintaining the pointers includes updating the pointers responsive to each chunk copied from the source LUN to the target LUN being completely stored on the target LUN, wherein the agent includes a control component and a datapath component, the datapath component residing in a physical storage array device with the target LUN and operative to perform the target LUN write coordination, wherein the datapath component residing on the physical storage array device with the target LUN coordinates target LUN writes from the LUN copying operation with target LUN writes from the cloning of source LUN writes to ensure consistency between the source LUN and target LUN at least in part by discarding those target LUN writes that result from the cloning of source LUN writes and are directed to the to-be-synchronized region of the target LUN, and returning an indication of success, with regard to the discarded target LUN writes, to hosts that initiated the discarded writes, and
iii) program code for, upon completion of the LUN copying operation, commanding the kernel-level component of each host computer to stop the cloning of source LUN writes and to disable access to the source LUN by the host computer software in conjunction with a modification of the mechanism to newly associate the data with the target LUN as the location of the data in the system; and
wherein each kernel-level migration component of each host computer clones each source LUN write to the target LUN at least in part by i) writing to the source LUN, ii) determining whether the write to the source LUN completed, and iii) in response to determining that the write to the source LUN completed, duplicating the source LUN write to the target LUN.

Show 2 dependent claims

Description

BACKGROUND

The present invention relates to migration of data from a source data storage device to a target data storage device in a data processing system.

Data migration techniques are used to move or migrate data from one storage device (or logical unit) to another for any of a variety of purposes, such as upgrading storage hardware or information lifecycle management. Generally, migration involves synchronizing the target device to the source device, i.e., achieving an operating state in which the target device stores the same data as the source device, and then switching operation so that subsequent accesses of the data are directed to the target device instead of the source device. Once the switching is successfully accomplished, the source device can be taken out of service or put to some other use.

Non-disruptive migration is performed while there is ongoing application-level access to the data stored on the source storage device. In non-disruptive migration, there are two parts to achieving synchronizationexisting data on the source device is copied to the target device, and ongoing application writes of new data are cloned, i.e., sent to both the source and target devices. Non-disruptive migration also requires a non-disruptive mechanism for switching operation to the target device. Example descriptions of non-disruptive migration can be found in the following US patents, whose entire contents are incorporated by reference herein:

    • 1. U.S. Pat. No. 7,904,681 Methods and systems for migrating data with minimal disruption
    • 2. U.S. Pat. No. 7,809,912 Methods and systems for managing I/O requests to minimize disruption required for data migration
    • 3. U.S. Pat. No. 7,770,053 Systems and methods for maintaining data integrity during a migration

Clustering is a technique used in computer systems to provide certain desirable functionality and characteristics from the perspective of external users. Advantages include increased performance and availability over non-clustered systems. Two general types of clusters are failover and parallel or active-active clusters. In a failover cluster, all cluster nodes may be aware of a given storage device accessible in the cluster, but in general a given storage device is accessed by only one node during operation. In the event of node failure, a failover mechanism causes ownership of the storage device to be transferred to a new node that has assumed responsibility for the workload of the failed node. Due to the single-node access, there is no need for synchronizing accesses among the hosts. In active-active clusters, storage devices may be actively accessed from all nodes in the cluster, and the operating software (e.g., application software) of the nodes is responsible for synchronizing access to shared storage resources.

SUMMARY

It is desirable to support data migration in a cluster environment, but providing such support can present certain challenges. Non-disruptive migration involves several sensitive operations where input/output (I/O) is temporarily suspended and from which it is necessary to recover in a non-obtrusive manner. The fine control over I/O and the possibility of aborting and restarting a migration at different steps of the process would require significant communication and coordination among the nodes of the cluster, most of it needed only for the unlikely event of a failure and therefore constituting inefficient use of system resources.

Methods and apparatus are disclosed for migrating data from a source LUN to a target LUN in a data processing system having a cluster of multiple host computers, where the data is accessed by software of the host computers using a mechanism initially associating the data with the source LUN as the location of the data in the system. An example of such a mechanism is a pseudoname as described in the above-referenced U.S. Pat. No. 7,904,681, and other examples are described herein.

A method includes commanding a kernel-level migration component of each host computer to begin cloning source LUN writes to the target LUN, each write to be cloned by writing a duplicate of each source LUN write to the target LUN. Subsequently, a LUN copying operation and target LUN write coordination by an agent in the system are initiated. The LUN copying operation reads data from the source LUN and writes the data to the target LUN so as to transfer all existing data. The target LUN write coordination coordinates target LUN writes from the LUN copying operation with target LUN writes from the cloning of source LUN writes, so as to ensure consistency between the source LUN and target LUN.

Upon completion of the LUN copying operation, the kernel-level component of each host computer is commanded to stop the cloning of source LUN writes and to disable access to the source LUN by the host computer software, in conjunction with a modification of the mechanism so as to newly associate the data with the target LUN as the location of the data in the system. Depending on the type of mechanism and system, the modification may be done either disruptively or non-disruptively, i.e., with or without stopping normal operation of the data processing system.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, features and advantages will be apparent from the following description of particular embodiments of the invention, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of various embodiments of the invention.

FIG. 1 is a block diagram of a data processing system;

FIG. 2 is a block diagram of a hardware organization of a host;

FIGS. 3 through 5 are flow diagrams for a migration operation;

FIG. 6 is a schematic diagram showing regions of a target storage device during migration;

FIGS. 7 and 8 are flow diagrams for a migration operation; and

FIGS. 9 and 10 are block diagrams of software organizations of a host.

DETAILED DESCRIPTION

FIG. 1 shows a data processing system having a set of host computers (HOSTs) 10 which are configured as a group referred to as an active-active cluster 12. The hosts 10, which are also referred to as nodes herein, are connected by interconnect 14 to storage devices 18 which are also referred to as logical units or LUNs 20. Each host 10 includes a storage device driver (DRVR) 21 providing low-level functionality required for I/O access to storage devices 18. Examples are described below. In one embodiment, the cluster 12 may be formed by a plurality of physical servers each executing virtualization software enabling it to host one or more virtual machines. An example of such virtualization software is ESX Server® sold by VMWare, Inc. In another embodiment, the cluster 12 may implement a database clustering solution known as Oracle RAC from Oracle Corporation.

The interconnect 14 includes one or more storage-oriented networks providing pathways for data transfer among the hosts 10 and devices 18. An example of the interconnect 14 is a FibreChannel storage area network (SAN), either by itself or in conjunction with Ethernet or other network components. The devices 18 are logical units of storage allocated for uses such as storing databases, file systems, etc. used by application programs executing on the hosts 10. Generally, the devices 18 are visible to the hosts 10 as block-oriented storage devices.

The LUNs 20 include a source LUN 20-S and a target LUN 20-T participating in a migration operation by which the target LUN 20-T functionally replaces the source LUN 20-S in the system. It is assumed that prior to migration, the source LUN 20-S stores a data resource that is accessed by operating software (e.g., application software) executing on the hosts 10 using a mechanism that specifically associates the application-visible data resource with the source LUN 20-S. Specific examples of such mechanisms are described below. A migration operation moves the data resource to the target LUN 20-T and changes the mechanism so that future application accesses of the data resource are directed to the target LUN 20-T rather than to the source LUN 20-S. Reasons for such migration of storage resources include a desire for additional storage capacity or improved performance, or to upgrade to more current and well-supported hardware, for example. In some cases the source LUN 20-S is to be removed from the system, although in other cases it may be maintained and reused for other purposes.

In the active-active cluster 12, there may be applications executing simultaneously on different hosts 10 having access to the source LUN 20-S. One aspect of the migration operation is to coordinate certain operations of the hosts 10 to ensure that there is no data loss or data incoherency created, which could have any of several deleterious effects as generally known in the art. These aspects of the migration operation are described below.

Citations

US 7,945,669 B2 - Method and apparatus for provisioning storage resources
A method and apparatus for automatically provisioning at least a portion of a computer system to meet a specification provided in a provisioning request. In...

US 7,080,225 B1 - Method and apparatus for managing migration of data in a computer system
Methods and apparatus for migrating a data set. In one embodiment, a migration is paused. In another embodiment, for a migration of data between multiple...

US 7,809,912 B1 - Methods and systems for managing I/O requests to minimize disruption required for data migration
Methods and systems are provided for minimizing disruptions when host data on a source logical unit is migrated onto a target logical unit. I/O requests...

US 7,093,088 B1 - Method and apparatus for undoing a data migration in a computer system
A method and apparatus for managing a migration of a data set from at least one first storage location to at least one second storage...

US 7,882,286 B1 - Synchronizing volumes for replication
In one aspect, a method to perform synchronization in a network-based system includes notifying a source side appliance that I/O data is going to be...

US 7,770,053 B1 - Systems and methods for maintaining data integrity during a migration
Systems and methods are provided for maintaining data integrity in the event of device write failures during a non-disruptive migration. In one embodiment, a computer-implemented...

US 7,904,681 B1 - Methods and systems for migrating data with minimal disruption
Methods and systems are disclosed that enable data migration from a source logical volume to a target logical volume in signal communication with the source...

US 7,890,664 B1 - Methods and apparatus for non-disruptive upgrade by redirecting I/O operations
Methods and apparatus for non-disruptive upgrade by redirecting I/O operations. With this arrangement, a driver upgrade does not require restarting an application. In one embodiment,...

US 2005 193,181 A1 - Data migration method and a data migration apparatus
The management computer 600 copies data in the volume 111 within the storage device 100A to the volume 115 within the storage device 100B. Upon...

US 7,805,583 B1 - Method and apparatus for migrating data in a clustered computer system environment
Methods and apparatus for performing a data migration in a clustered computer system. In one aspect, the availability of the data being migrated is maintained...

US 7,076,690 B1 - Method and apparatus for managing access to volumes of storage
One embodiment is directed to a method in a computer system including a host computer and at least one storage system including first and second...

PatentSwarm provides a collaborative workspace to search, highlight, annotate, and monitor patent data.

Start free trial Sign in