How to: Troubleshoot EMC Celerra/VNX Integration

Summary

The purpose of this document is to serve as an information bank for EMC-related problems. It will cover the most common problems and the recommended steps and tools needed in order to solve them.

The document is divided into three parts:

Quotas Not Locking

If the EMC quotas aren't locking, the first thing to look for is if the NSS server receives heartbeats from the CEPA-server. A failure to receive heartbeats will result in quotas not locking. There are three main ways to check for heartbeat errors:

  1. In the Notice field in Quota Server.
  2. In the System\History-tab in Quota Server.
  3. In Application Event log in Windows.

This is the most common message if no heartbeats are received:

"Failed to receive heartbeats from EMC CEPA. Locking on EMC quotas will not be fully operational. Please check if it is installed and configured."

The three most common reasons behind the failure to receive heartbeats:

  1. Failed communication from the CEPA-server, i.e. stopped/crashed EMC CAVA-service.
  2. Missing or incorrectly configured endpoint at HKEY_LOCAL_MACHINE\SOFTWARE\EMC\Celerra Event Enabler\CEPP\CQM\Configuration.
  3. Endpoint still claimed after a NSS Quota Server service stop.

It can take several seconds for the QS Service instance to de-register as the end-point for CEPA, this is carried out during service shutdown. If the service is restarted before de-registration is completed, then its attempt to connect to CEPA is refused (as the end-point is still ‘claimed’) – so no heart-beats to QS. To reset the connection it is necessary to restart the QS Service – ensuring that the original shutdown has actually completed.

Note: Although it can appear in the service manager that services are stopped, it is more reliable to monitor the process in task manager as applications can send ‘completed’ messages before actions are actually completed (QS is sometimes guilty of this).

Endpoint check:

Make sure that the endpoint is correctly configured on the CEPA-server(s). If NSS and CEPA runs on the same machine, the endpoint can be set to only Northern. If the CEPA-server is external it needs to point to the direction (IP address) where all the information should be sent to - in this case the NSS server(s).

A single NSS server: Northern@<IP>
Multiple NSS servers: Northern@<IP1>;Northern@<IP2> etc.

Disregard the brackets when setting the endpoint. It should look like this: Northern@xx.xx.xx.xx

If you are receiving heartbeats the problem is either:

  1. Account & permissions-related.
  2. A communication problem between CEPP and CEPA.

1 a) EMC CAVA Service running with the wrong account

  • No errors are displayed in this case, which makes it difficult to troubleshoot. The only obvious symptom is that the files cannot be blocked. Make sure that the EMC CAVA service runs with an account that has administrative rights on the CIFS servers managed by Quota Server.

1 b) EMC CAVA Service running with the SYSTEM account

  • In case the CQM application is co-resident (i.e. NSS services and CEPA server are the same host), the EMC CAVA can run with the Local System account. However, this configuration is strongly not recommended. The Local System account can be easily affected by security policies forced on the server, preventing connection from the network, for example.

1 c) NSS Services running with the wrong account

  • The NSS Quota Server service account should belong at least to both “Backup Operators” and “Power Users” groups in the VNX / Celerra CIFS server. If not, quotas may not be locked, without any errors logged in the NSS trace files or in the Data Mover log files.

2. Communication problem between CEPP and CEPA

Enter the EMC Control Station and make a CEPA pool check. This will provide a status report of the CEPA pool. If there are any problems with the communication between CEPP and CEPA, it will be displayed in the pool information. This is the command for a pool check:

$ server_cepp datamover_name -pool -info

The command will produce a result similar to this:

server name :
pool_name = Northern
server_required = No
access_checks_ignored = 0
req_timeout = 5000ms
retry_timeout = 1000ms
pre_events = OpenFileWrite, CreateFile, RenameFile, DeleteFile, CloseModified, CreateDir, RenameDir, DeleteDir, SetAclFile
post_events =
post_err_events =
CEPP Servers:
IP = xx.xx.xx.xx, state = ONLINE, rpc = MS-RPC over SMB, cava version = 6.0.4.0, nt status = SUCCESS, server name = server.domain.com

If there are any problems on this end they will be featured in the bottom row. Check the 'state' and the 'status'.

Common state errors:

  • ERROR_CEPP_NOT_FOUND - Insufficient account permissions on the EMC CAVA service account. Add this acccount to the Local Administrators group on the target CIFS-server. Verify that the CEPA endpoint is correctly configured.
  • OFFLINE - NSS Quota Server not running or not registered as a CQM application. Verify that the CEPA endpoint is correctly configured and that the NSS Quota Server and EMC CAVA server services are running. 

Common status errors:

  • OBJECT_NAME_NOT_FOUND - CEPP is unable to communicate to EMC CAVA-service on the CEPA-server. This is sometimes caused by an outdated EMC CEE framework. EMC CEE 6.0.0 or later is recommended.
  • CONNECTION_DISCONNECTED - Connection rejected. Possibly by closed ports, a firewall or insufficient account permissions. This error could occur if the cepp.conf-file is pointing to the wrong server (e.g. to a server that does not have the EMC CEE Framework installed).
  • INVALID_PARAMETER - Account problems of a more complex nature. The MS RPC account is incorrectly mapped and configured in the domain.

If the problem should persist on this end (CEPP & CEPA), you need to contact EMC support in order to receive further assistance.

Quotas not Updating

Disabled CIFS Notifications

The most common reason behind quotas not updating synchronously on EMC is the absence of CIFS notifications. NSS 8.x, 9.0 and 9.5 relies on CIFS notifications in order to update quotas. No CIFS notifications means no usage level update.

A quick way to verify that the server receives CIFS notifications is to enter the trace file named ncl_trace_qsserver_statistics.txt and search for the term "CIFS notifications". How big is this number? If it's zero it means that no CIFS notifications are received. If it's larger than zero, how big is it? Does the number change over time or does it remain unchanged? Does the number of CIFS notifications really match the size and activity of the environment?

One way to see if the number of CIFS notifications is correct is to compare it with the number of CheckEvents in the previously mentioned statistics log. These two numbers should be fairly close to each other. If the difference is large it's usually a sign of that CIFS notifications are turned off for a majority of the CIFS servers.

CIFS notifications need to be enabled for ALL CIFS servers used. The setting responsible for this is called 'notifyonwrite'  and it's disabled by default.

This command enables CIFS notifications on the CIFS server:

$ server_mount server_2 -option notifyonwrite ufs1 /ufs1 (where ufs1 is a fileserver name)

Notify on Write

Consult with your EMC technical account manager if you are unsure of the implications of enabling CIFS notifications in your environment.

Empty CIFS Notifications

Another common reason behind quotas not updating is empty CIFS notifications. An empty CIFS Notification is a notification that one or several changes have occurred within the file system, but the CIFS server is unable to deliver a complete message of these changes due to an overflowed command buffer. An empty notification can be likened to an error message "changes occurred in a share, but no details can be provided". NSS responds to this error by re-scanning the quota path, or the entire share where multiple quotas are configured on the share, in order to calculate current usage levels.

An abnormal rate of Empty Notifications could potentially lead to a state of constant rescanning. In this scenario, the file change notifications will get stuck in the scan queue and a significant delay in processing can be witnessed. In a worst case scenario, this could continuously and negatively affect major Quota Server features such as quota locking.

Read more about Empty CIFS notifications here.

Error 1450

For versions 9.5 or earlier, this is a problem that shows up as Error 1450 in Windows Application Event Log. Error 1450 means that "Insufficient system resources exist to complete the requested service". The error message refers to a resource exhaustion on the EMC CIFS server. All available CIFS/SMB-threads on the CIFS server are consumed.

Due to the insufficient resources on the CIFS server, Quota Server will not be able to perform operations on the target storage and process quota usage level updates. This could potentially cause serious harm to the Quota Sever functionality (i.e. quotas not updating, miscalculating quotas and failed locking).

Illustration:

Error 1450

Description: Failed to queue for notification on drive root: \\device\fs1$  Error:1450.

The entries of Error 1450 in the Windows Application Event log can be matched to a specific message in the EMC Command Station:

2013-09-26 09:26:36: VC: 3:[vdm_002v] Too many access from CAVA server xx.xx.xx.xx:
2013-09-26 09:26:36: VC: 3:[vdm_002v] without the EMC VirusChecking privilege:

The IP address mentioned in this message is the IP address of the NSS server (and the CEPA/CAVA-server if everything runs on the same machine). Through cooperation with EMC engineers, it has been discovered that the combination of these two error messages is a safe indicator that all available CIFS/SMB threads are consumed at the time the error is reported. The error messages are printed out as soon as NSS tries to spawn a thread to perform a required action, but is denied by the EMC CIFS server.

EMC's default maximum number of threads, in both EMC Celerra and EMC VNX OE for File environments, is 256 for systems with more than 1GB of memory. In a highly active environment this can become a bottleneck. It is possible to increase the number of threads by making alterations to a specific EMC parameter. Northern's experience shows that the resource exhaustion can be greatly mitigated (or in some cases even resolved) by increasing the number of maximum threads available.

IMPORTANT:  

EMC customers should always consult with EMC technical personnel to get expert advice on the effect that a change of this setting may have on the EMC Datamover and the specific environment in question. This is an EMC setting within EMC technology, Northern is providing this information to assist customers in investigating, together with EMC personnel, what is the most appropriate action to resolve resource exhaustion. Northern makes no claim as to the applicability of these settings in a specific customer's environment, and shall not be held responsible for any ill effect in the use of these settings. 

How to increase the number of threads:

$ server_setup server_X  -P cifs -o  start=XXX  (Where XXX decides the number of available CIFS threads. Default is 256)

The following is a more detailed explanation from EMC's document Configuring and Managing CIFS on VNX (P/N 300-013-429 Rev 02, page 65): 

Setting the number of CIFS threads

Please note once again that EMC personnel must be consulted prior to changing this parameter!

Other considerations:

Error 1450 is directly linked to the amount of activity that NSS must monitor; a combination of system activity and the scope of quota policies configured. As such, and if the number of available threads cannot be successfully extended, it may be possible to look at these two parameters: reducing the rate of activity on the device, reducing the scope of the quota policies.

NSS subscribes to receive notification of file system changes. When a change notification is received NSS scans the individual folder where the change occurred in order to establish the new quota usage level. These operations (notification and scan) require system resources. As such it is always wise to review quota policies and ensure no unnecessary quotas are configured. Additionally, it may be possible to reduce the number of quotas configured, to prioritize specific file shares - avoiding high-level 'general monitoring' quotas (this monitoring can be achieved with NSS' reporting capabilities). Note that hard and soft quotas require the same level of access to CIFS threads in order to perform monitoring operations.

Northern has seen excessive load being generated by the constant writing of temporary internet files to remote storage devices in Virtual Desktop environments. Non business-related streaming media has been seen to generate huge amounts of traffic to remote Internet Explorer temporary file caches, tying up resources and destroying system performance. This is a possible opportunity to avoid resource exhaustion.

For advanced troubleshooting, please contact the Technical Support team at Northern (support@northern.net).

ADDITIONAL RESOURCES

  • KB2884 How to: Configure EMC & NSS
  • KB1785 About: Handling of Empty CIFS Notifications in NSS
  • KB Article: 3035

    Updated: 4/21/2016

    • Category
      • Usage
    • Affected versions
      • NSS 9.0
      • NSS 9.5
      • NSS 9.6

    North America HQ

    NORTHERN Parklife, Inc.
    301Edgewater Place, Suite 100
    Wakefield, MA 01880
    USA

    Voice: 781.968.5424
    Fax: 781.968.5301

    salesUS@northern.net

     

    Additional Contact Information

    EMEA & APAC HQ

    NORTHERN Parklife AB
    St. Göransgatan 66
    112 33 Stockholm
    Sweden

    Voice: +46 8 457 50 00

    salesHQ@northern.net

    Northern Parklife



    ©2018 northern parklife

    privacy statement 
    terms of use