FSx ONTAP – Enterprise storage on AWS

Even though object storage has gained a lot of popularity, file storage is still prevalent. AWS has Elastic File System but the performance is insufficient for enterprise workload. The FSx product line has enterprise storage options and on Sept 2, 2021, AWS launched FSx ONTAP.

This post is my impression about FSx ONTAP. As previously discussed, FSx ONTAP is a managed NetApp storage service by AWS. Essentially, AWS installs NetApp arrays in their data centres, so that users can provision ONTAP volumes from AWS console, or using AWS CLI.

FSx ONTAP

ONTAP (or Data OONTAP) has been a very successful operating system to manage storage arrays. It was so successful that NetApp uses ONTAP as brandname for their storage arrays. This is similar to Isilon, a name of BSD based operating system to manage storage and later becomes brandname of EMC’s storage products. ONTAP has been in competition with other enterprise storage players such as EMC Isilon, HP 3PAR, etc and AWS landed on ONTAP as their partner for enterprise storage. It appears that NetApp still owns their ONTAP storage technology. AWS operates the data centre and provides capability via the CLI layer and console.

Sometimes, people confuses FSx ONTAP with NetApps offering Cloud volumes ONTAP. The two are fundamentally different. Cloud volumes ONTAP works with Cloud Manager (available as self-hosted or SaaS) as the management UI. It manages volumes provisioned from cloud vendor such as AWS, Azure, etc. Clients often configure these “Cloud volumes” as extension to an existing on-premise ONTAP storage deployment. When their on-prem ONTAP volumes fall short of space, the Cloud Manage is aware of a remotely available, cloud backed volume to move cold data off to.

Client Tool

FSx ONTAP is essentially an ONTAP storage cluster sitting in AWS data centre. Users have the options of using AWS CLI or ONTAP CLI to manage the cluster. Users with storage administrator background are likely to prefer the latter. In my professional service experience, I have taken some iterations to come to best practice to use the right tool to interact with ONTAP resources. In a nut shell, it depends on the level of resource that we are interacting with. I categorize those resources into two classes:

CategoryExampleIdentification
AWS-level resourceFSx ONTAP file system, Storage Virtual MachineThese resources are from ONTAP but they are identified as AWS resources (with ARN). They are also exposed to AWS SDK and can be managed by AWS CLI.
ONTAP-native resourceVolume, Snapshot policy, Schedule, Snapmirror relationship, VserverThese resources come from ONTAP and can only be managed using ONTAP CLI. The AWS CLI cannot manage these resources simply because they are not exposed to AWS SDK. Some type of resource such as Volume, may be managed by AWS CLI with very limited options. So we still prefer ONTAP CLI to manage resources.

The AWS CLI (v 2.2.37) has very limited options when creating volumes. For example, the create-volume documentation states that the OntapVolumeType section of output can display types of RW, DP (data protection), or LS. However, it doesn’t allow users to create a volume other than the default RW type. When we configure SnapMirror destination, we need a volume of DP type. We had to use ONTAP CLI to achieve that.

Our practice works out to be: use AWS CLI to create a file system and storage virtual machine. Then we use ONTAP CLI to create everything else. Even though AWS CLI intend to support volume creation, we prefer ONTAP CLI for full functionality support, and alignment with the ONTAP literature.

Administration Tasks

You should configure most of the administrative tasks with ONTAP CLI. We use the documentation by ONTAP as reference. For example, when a volume runs out of Inode, the AWS CLI reports that there is no space left. We need to increase the inode limit and this is, again, not something that AWS CLI can manage.  We’d have to use the volume modify command from ONTAP CLI.

NFS version

The AWS document states that FSx ONTAP supports NFSv3.0, v4.0 and v4.1. However, FSx ONTAP is currently backed by NetApp ONTAP 9.10.0, which partially supports NFSv4.2, with basic protocol and Labled NFS feature. NetApp’s ONTAP Best Practices and Implementation Guide suggests a method for clients to mount as NFSv4.2. In the POC we mount as NFS v4.2 in all of our testings.

Clarity of terminology

The ONTAP storage system has been around for a while and many of its concepts are well known in the storage community. For example, A “Snapshot copy” is a read-only, point-in-time image of a volume. (ref: ONTAP 9 documentation -> ONTAP concepts -> Replication -> Snapshot copies) This concept becomes “Snapshot” in AWS literature.  It took us some research to come to realize that “Snapshot” in AWS document, essentially maps to “Snapshot copy” in ONTAP literature.

This creates confusion, because we use ONTAP documentation for operation guidance because we can’t get enough help from AWS documentation. The terminology in AWS documentation should align with ONTAP.

The other example is the difference between “backup” and “snapshot” in AWS documentation. It is my understanding that they both use the same underlying Snapshot technology on the ONTAP side. I’m not exactly sure what their difference is.

ONTAP can create a Snapshot copy nearly instantaneously. However, when taking a snapshot using web console in AWS, it takes up to 10 minutes to update the status. This is confusing because it creates a perception that it takes 10 minutes to complete snapshot.

Cross Region Replication

There is a document page on AWS about using SnapMirror at a very high level. It points to two documents: using NetApp Cloud Manager and ONTAP CLI.

The former is not a viable option as we started natively on FSx ONTAP and do not have NetApp Cloud Manager. As to the latter, we managed to configure cross-region replication with ONTAP CLI following the document and identified some gaps in the documentation. Specifically, it would be helpful if AWS documentation calls out that:

  1. inter-cluster network connectivity is a prerequisite (e.g. via VPC peering, transit gateway)
  2. Port 10000, 11104-11105 must be added to security group for inter cluster communication.
  3. The ONTAP CLI command to validate connectivity between clusters (using the ping command from ONTAP CLI).

With AWS CLI alone it is not possible to configure cross-region replication.

Final words

As someone who lived with enterprise storage for more than a decade, I’m glad to see that cloud vendors brings enterprise storage into their data centre, acknowledging that consumer grade file storage are just insufficient for heavy storage use cases such as medical imaging. FSx ONTAP seems to be in early maturity level. However, since AWS exposes ONTAP CLI access to users, ONTAP professionals are able to leverage its full potential.