Known issues and limitations

The following table lists known issues and limitations that exist in PowerFlex 3.5.1.

NOTE: If an issue was reported by customers, the customers' Service Request numbers appear in the "Issue number & SR number" column, and serve to correlate between customer-reported issues and the PowerFlex issue number.
Table 1. Known issues and limitations—PowerFlex Gateway
Issue number & SR number Problem summary Workaround
SCI-55819 During NDU to v3.5.1, in rare cases, the PowerFlex Gateway might have high CPU usage and get stuck at the upload phase. Perform a restart of the gateway service this command in an SSH session to the gateway:

service scaleio-gateway restart

Then, in the Installer user interface, click Retry to continue the upgrade.

SCI-55795 The Run Script on Host feature has a package upload capability. In this case, the Upload ability might fail due to insufficient resources with the error:

Failed to upload package. ResultCode: REMOVE_COMMAND_FAILED

The LIA process did not have sufficient memory to finish the upload process.

Copy the script to be run on the server to the relevant LIA folder, and execute the "Run script on Host" capability without the upload option.
SCI-54887 When performing the log collection via the PowerFlex Gateway with the "exception only" check box marked, the log collection fails. Perform log collection with the default settings via the PowerFlex Gateway.
SCI-54799 If Auto Collect logs is configured in the system during an NDU session, when the NDU ends, the PowerFlex Installer will display a button asking to start the first phase of the NDU, even though the previous one was completed successfully (instead of displaying the "Mark operation as done" button). Disable auto log collection prior to starting the NDU, and run the NDU. After NDU completion, enable auto log collection again.
SCI-55438 During NDU to v3.5.1, in rare cases, the LIA might time out during the upload phase.

Examples of errors:

"Command failed: Could not upload sdc package to 1.1.0.7,2.2.0.7 due to: A timeout occurred"

"Command failed: Failed getting the OS on node 1.1.0.236,2.2.0.236 due to: Invalid LIA session. A login is required

Click Retry in the PowerFlex Gateway user interface, and the NDU will continue.
SCI-55405 When installing a system, removing it, and then installing again using the same PowerFlex Gateway (without re-installing the PowerFlex Gateway), deployment fails during the Configuration stage. Remove the PowerFlex Gateway RPM, and re-install it as a clean installation.
SCI-50992 After configuring Email Call home notification, alerts about ESRS not being configured are still generated by the system. Email call home uses ESRS, so this is a false alert. Restart the PowerFlex Gateway process using the command:

etc/init.d/scaleio-gateway restart

The alert should disappear.

Table 2. Known issues and limitations—MDM
Issue number & SR number Problem summary Workaround
SCI-55686 When an Admin configures more SDSs than supported in the Protection Domain max limit, during the configuration step the error received will be uninformative: "Communication error"

In the MDM logs, the error will be:

Command add_sds was not successful. Error code: There are too many SDSs in Protection Domain

None
SCI-55439 In systems that have a large quantity of objects (more than 250 SDSs , 250 SDCs, thousands of volumes, etc.), if the system disk is not fast enough, the MDM repository might have delays that are too long for committing changes. The system disk, which is used for both logs and the high level of updates, must be fast enough for these activities. Refer to PowerFlex guidelines for the required system disk specification.
SCI-54198 When a Protection Domain is disabled, throttle settings are not preserved when the Protection Domain is enabled again. Reconfigure the settings using one of the system's management interfaces.
SCI-54017 If a process is killed during preparation of devices for checksum protection, progress for enabling this feature might get stuck. Disable and then enable checksum protection to resume the operation. Note: The relocation work that has been done on this and/or other devices is not lost, and will resume from the last point (implicitly).
SCI-53973 In rare cases, after SDS failure, if rebalance is in process the rebalance rate might be very slow. There is no impact on service or performance. Increase the rebalance rate to 25 MB/sec (default = 10 MB/sec)
SCI-53894 Initial copy of volumes might get stuck at 100% with no progress if the MDM process in the source system experiences several crashes after the copy progress reaches 100% Remove pairings that are stuck at 100% and re-add them
SCI-53810 Traces produced using the get_info script will not include the latest 128 KB of traces Increase trace verbosity before using get_info to ensure that relevant traces are flushed
SCI-14632 Changing the size of a device that is in use by an SDS is currently not supported. To change a device size, first remove the device from the VxFlex OS/PowerFlex system, then change the device size. None
Table 3. Known issues and limitations—Network
Issue number & SR number Problem summary Workaround
SCI-27540

SR# 08520639, 07724183

The SDS connectivity test (SDS network test) tool might return inconsistent results in networks with configuration issues (Routing, MTU, etc), and when non-vxFlex OS traffic is running on the data subnet (SDS-SDS, SDC-SDS). None.
Table 4. Known issues and limitations—SDC
Issue number & SR number Problem summary Workaround
SCI-56088 On ESXi 7.0 based servers, in very rare cases, after deployment of an SDC, an ESXi PSOD crash might occur. This only occurs when changing the SDC performance profile from High (Default) to Compact (used for very old CPUs). There is no reason to change this setting, and it should not be done with out first contacting Customer Support. Avoid changing the default performance profile (High).
SCI-54074 When running SDC upgrade from the PowerFlex Installer in Ubuntu and SLES, the SDC driver upgrade will be completed successfully, but in rare cases the driver will load from cache instead of from the Binary location. When loading from cache, it will load the old driver. This can be determined by checking the driver version. For example, use "modinfo scini" to check the version. Run the following command: mv /bin/emc/scaleio/scini_sync/driver_cache/Ubuntu/3.0.* /tmp/ and restart the driver \ node (if volumes are in use a reboot of the operating system is needed)
SCI-52611 When deploying an HCI system with 4 data networks (PowerFlex manager configuration preparing for future replication support in HCI), some SDCs may raise alerts for SDC socket oscillating failure (in the PowerFlex Gateway or PowerFlex Manager it will appear as network oscillating failures). For example, the following line will appear when querying an SDC:

sdc_socket_allocation_failure:

Short window: 492 failures in 60 seconds (limit is 300).

Run the following scli commands from the master MDM (after performing login with admin user to MDM via scli)

scli --set_performance_parameters --all_sdc --tech --sdc_tcp_send_buffer_size 2048

scli --set_performance_parameters --all_sdc --tech --sdc_tcp_receive_buffer_size 2048

scli --reset_oscillating_failure_counters --all_counters --all_sdc

SCI-11026 When volume size changes, the SDC should return 'sense-data' on the next I/O, to indicate that the volume size was changed. Rescan All adapters.
Table 5. Known issues and limitations—SDR
Issue number & SR number Problem summary Workaround
SCI-55965 During a replication Initial Copy operation, when the initial copy is at 100%, if connection to the remote site fails and the MDM at the local site is not able to communicate with the MDM in the remote peer system, the initial copy might never finish creating the full copy. Possible workarounds (in order of least to most effort):

- Pause and then resume USER RCG of the pair.

- Restart the MDM master at the remote site (which will cause a switch over)

- Delete the pair and then create it again (only to be used if the two previous workarounds above did not work)

SCI-54067 During replication fail-over at a new destination site (was source), journal capacity can grow to above the expected size. Reclamation will occur over time. This issue will happened only if two or more SDRs fail. None
SCI-53892

SCI-53371

Creation of a RCG (Replication Consistency Group) might take a long time during creation of multiple initial copies of several volumes. Follow PowerFlex best practice guidelines for replication: perform an initial copy of one volume at a time. When the volume has finished creating the initial copy and has reached a stable replication state, continue by adding another volume to the RCG and initiate the creation of its initial copy. Optimization will be added in future versions to automate this process.

Note: If the volumes are empty (without any data) initial copies can be created in parallel.

SCI-52509 When the replicated volumes are all from one pool, and the journal is also allocated from the same pool, if that pool becomes Data Unavailable (DU), then as a result of journal unavailability the replication is broken and does not recover automatically after the pool is recovered. In order to recover replication, replication must be recreated.

Notes:

# If the application volumes are spread over other pools, the journal unavailability means that the replication cannot handle I/O to those volumes and *must* be broken.

# If the journal is spread over multiple pools, and there is free journal capacity in other pools, replication will not fail.

Recovery from this issue requires creating the RCGs again, and performing a full initial copy of the RCGs in the Protection Domain.

Note: Following best practices, which require that the journal is spread over multiple pools, eliminates the risk of this issue.

Table 6. Known issues and limitations—SDS
Issue number & SR number Problem summary Workaround
SCI-55441 In rare cases, when a device fails and then after that an SDS crashes, the following error appears in the exp.0 file of the SDS:

ScaleIO-Common-Job/src/nv_ds/change_log/change_log_set.c, line 1936, function changeLogSet_GetRecords the disk failed to avoid an unrecoverable error.

Clear the disk error for the failed disk, using the Web UI\REST\scli clear disk error command.
SCI-44515 In rare cases when deleting a large number of volumes with snapshots while an SDS reboot occurs, the deletions can be finished in the absence of the rebooting SDS. In this case, after the reboot ends, the devices in the SDS are automatically attached as "new" devices (bFormat=TRUE). Despite being marked as new, these devices still have data residing in NVRAM (from before the reboot). This data can be erased only after the devices finish their attachment as "new" devices. The SDS fails to attach the devices since it does not have enough space in NVRAM for both the old NVRAM data and the new data. Remove the disks from the SDS and add them back again.
SCI-44410 Volume snapshot deletion seems stuck or might take a long time to be completed. Snapshot deletion is dependent on the system status, and will not complete until system rebuild is over.
SCI-35732 When a disk has failed in an ESXi HCI node , the Storage VM might freeze. This will result in SDS failure, and commencement of a rebuild operation. 1. Shut down the SVM.

2. Enter the ESXi host into maintenance mode (Shut down or migrate any VM located on the host).

3. Reboot the host.

4. Identify the faulty device and remove it from the SVM, using "edit virtual machine".

5. Start the SVM. The SDS should start, the device should be removed and rebuild should be initiated.

SCI-19551

SR# 84174508

The VxFlex OS system only supports the UNMAP SCSI command to free up capacity in Medium Granularity (MG) Storage Pools in cases where the volume has no snapshots. None. Planned to be added in future versions.
Table 7. Known issues and limitations—vSphere PowerFlex plug-in
Issue number & SR number Problem summary Workaround
SCI-38905 When installing VxFlex OS using the vSphere VxFlex OS plug-in, and rolling back from a failed installation, upon re-launching the installation wizard, some of the previously chosen configuration parameters might be missing. Cancel the operation, and start deployment again.
Table 8. Known issues and limitations—vVols
Issue number & SR number Problem summary Workaround
SCI-54057 Unmapping a VMware virtual volume, and then immediately afterwards trying to overwrite its content, might generate the error: "volume n is in use". None. Try again shortly after, because some volume operations are executed in an asynchronous fashion.