Previous Topic: resetadminpwNext Topic: BFC: Network Interface Node Scan From – Includes Switch Assignment Details - nwd.pl


Basic Troubleshooting Procedure of Time Synchronization Issue

In Applogic 2.x and 3.0, controller is solo ntp server in the grid, time sync flow is like below. Ntp on controller and physical node processes the time sync up. Phyiscal node pass the time to appliance VM through Xen hypervisor

controller=> physical node => Xen hypervisor =>appliance VM

From 3.1, BFC take the role of controller and become the ntp server of all grids managed by it, time sync flow becomes to the following. Ntp on BFC and physical node processes the time sync up.Phyiscal node pass the time to controller and appliance VM through Xen hypervisor

BFC=>physical node=>Xen hypervisor => controller and application VM

If external ntp server is configured from BFC GUI, time sync flow should look like this

external ntp server=>BFC=>physical node=>Xen hypervisor => controller and application VM

Another major change from 3.1 is all physical node clock (both system clock and hwclock) use UTC+0 time as opposed to local time. Time drift sync with BFC to physical node is also based on UTC time.

If you would like know more details of ntp, please refer to following link

http://en.wikipedia.org/wiki/Network_Time_Protocol

http://www.meinberg.de/english/info/ntp.htm

Check list

Check list for Applogic 2.x and 3.0

  1. Controller time is correctly sync or not
  2. Physical node system time is correctly sync or not
  3. Application VM time zone is correctly configured or not

Check list for Applogic 3.1 and newer release

  1. If external ntp server is configured in BFC GUI, BFC time is correctly configured or not
  2. Physical node system time is correctly sync or not
  3. Physical node hwclock is correctly configured or not
  4. Affected Appliance VM time zone is correctly configured or not
  5. Affected appliance is windows or linux box which is running HVM mode or PV mode

Note: there is known time sync issue in 3.1 due to a Xen time drift bug in which physical node(dum0) has trouble to pass time drift to hypervisor, the end result is controller and appliance VM has incorrect time. The solution is set independent wall clock in appliance VM, additionally, install and configure ntp to sync time from either BFC or external ntp server.

How to identify the time sync with external ntp works properly

This section applies to controller in Applogic 3.0 and older release , as well as BFC of 3.1 and newer release if external ntp server is configured.

Note: When configuring ntp server in the BFC GUI 3.1 and newer release, you may input any valid and avaliable external ntp server,but do NOT input BFC name or ip address.

First step is to verify the ntp configuration

  1. In /etc/ntp.conf, the entry with keyword “server” is the external ntp server name/ip.
  2. Run “ntpq –p” to show ntp configuration of local server. In the following sample, external ntp server is usilgr40.ca.com and Mustang.ca.com.
    remote                    refid                  st  t          when        poll         reach     delay     offset    jitter
    ==============================================================================
    *usilgr40.ca.com      141.202.0.2        4  u          995         1024       377         0.411      0.019   0.031
    Mustang.ca.com     141.202.0.25       5  u          708         1024       377         46.798   -0.100   0.051
    

    if there are multiple entities in the output of “ntpq –p", the entity started with *(asterisk) is the current (preferred) ntp source.

    Note: please refer to following documet for more details of how to utilize ntpq to address the connection issue with ntp source

    https://support.ca.com/irj/portal/kbtech?docid=573076&searchID=TEC573076

  3. Run “ntpq -c readvar” to show external ntp server name/ip and status. Here is a sample of ouput
     assID=0 status=06f4 leap_none, sync_ntp, 15 events, event_peer/strat_chg,
    version="ntpd 4.2.2p1@1.1570-o Fri Nov 18 13:21:16 UTC 2011 (1)",
    processor="i686", system="Linux/2.6.18-238.el5PAE", leap=00, stratum=5,
    precision=-20, rootdelay=17.898, rootdispersion=75.212, peer=26686,
    refid=141.202.0.25,
    reftime=d3965450.304f487c  Wed, Jun 27 2012 23:56:00.188, poll=10,
    clock=d3965990.187e30f8  Thu, Jun 28 2012  0:18:24.095, state=4,
    offset=0.019, frequency=115.304, jitter=0.108, noise=0.634,
    stability=0.003, tai=0
    

    If configuration is correct, next step is to verify if time sync up from external work properly. The recommend procedure including following steps

    1. service ntpd stop
    2. ntpdate –d <ntp server name/ip>
    3. service ntpd start
    4. date

How to identify the time sync to physical node work properly

If physical node time are different to the solo ntp server in the grid(controller in 3.0 and older release, BFC in 3.1 and newer release), similarly, we can also utilize the following approaches for verification and troubleshooting.

  1. Check /etc/ntp.conf. In 3.0 and older release, ntp server should point to controller, in 3.1 and newer release, it should be BFC. For instance, the below entity in ntp.conf stands for ntp server is controller(private ip)
    server < controller private ip>
    
  2. Run “ntpq –p” and “ntpq -c readvar” to verify ntp configuration. The below is a smaple of "ntpq -p" output of grid 2.9, 192.168.6.254 is the controller ip. If there is multiple entity, the entity started with *(asterisk) is the current (preferred) ntp source, please make sure it's controller private ip in 3.0 and older release, or BFC private ip in 3.1 and newer release
     remote                refid               st  t    when poll    reach     delay       offset        jitter
    ==============================================================================
     *192.168.6.254   LOCAL(0)        11  u   91    1024   377        0.204      467.992     0.764
     LOCAL(0)         .LOCL.             10   l   29      64     377        0.000       0.000       0.001
    
  3. Sync node system clock as below
    1. service ntpd stop
    2. ntpdate –d <ntp server name/ip>
    3. service ntpd start
    4. date
  4. Sync node hardware clock by running “hwclock –systohc”. "hwclock" without parameter is used to display current hardware clock time.

    Note: From 3.1, both system time and hardware clock time of physical node should be UTC+0 time, and they should not have significant gap. BFC is still local time. For instance, current time on BFC is 20:00 PM (UTC+10 time zone), node time is 10:00 AM (UTC+0), in such case, their time are consistent.

    [root@ srv1 ~]# date
    Thu Jun 28 06:19:37 UTC 2012                                                                                                   ->  OS system time
    APPLOGIC RESTRICTED AREA
    [root@ srv1 ~]# hwclock
    Thu 28 Jun 2012 06:19:38 AM UTC  -0.549140 seconds                                                              ->  hardware clock
    

    The time zone of physical node system is stored in /etc/localtime, it should either set as UTC like below or link to a /usr/share/zoneinfo/UTC

    [root@ srv1 ~]# cat /etc/localtime
    TZif2UTCTZif2UTC
    UTC0
    

    The time zone of hardware clock is stored in /etc/sysconfig/clock as below.

    [root@ srv1 ~]# cat /etc/sysconfig/clock
    ZONE="UTC"
    UTC=true
    ARC=false
    

How to identify the time of appliance VM correct or not

Basically, if physical node time is correct, the appliance VM time should be correct as well as long as its time zone is configured as correct local time zone.

\If appliance VM time is incorrect, following information may help you

  1. PV appliance VM has incorrect time in 3.1(and ONLY 3.1)

    Appliance VM in Applogic 3.1 may not obtain the correct time due to Xen time drift bug even though BFC and physical node have correct time. This bug only affect PV appliance, not HVM appliance(Windows appliance/VDS always run as HVM mode)

    If the system is affected by this bug, the recommended workaround is to set independent wall clock in PV appliance VM using " echo 1 > /proc/sys/xen/independent_wallclock". If you would like to know more details of it, please refer to the following document

    http://docs.vmd.citrix.com/XenServer/4.0.1/guest/ch04s06.html

    In addition, it’s strongly recommended to install ntp package into PV appliance VM in such a scenario and configure either BFC or external ntp server as the source of time sync.

    Note: Run “hwclock –systohc” to correct physical node hwclock then reboot physical node can only temporarily pass correct time to appliance for hours, it’s not the final solution.

  2. Windows appliance has incorrect time in 3.1 and newer release

    From 3.1, the physical node system time and hwclock is set as UTC+0 time during the installation.

    Microsoft Windows is expecting the realtime clock to be set to localtime and not UTC. As a result of this, the time/date is not correctly calculated inside the windows appliance for the various timezones.

    To fix this a registry edit is needed to tell the system the realtime clock is set to UTC.

    Navigate to HKLM\SYSTEM\CurrentControlSet\Control\TimeZoneInformation\ and create or set the RealTimeIsUniversal value to a dword value of 1

    After rebooting the date/time is adjusted correctly for the timezone set.

    Note:Windows box(not domain memeber) has capability to sync time with internet time server. To access it, click on the clock. At the bottom, click "Change Date and Time Settings...", go to "interent time" tab and input perfered time server name or ip

  3. Windows appliance in domain does incorrect time.

    If Windows box join the domain, by default, it sync time from domain controller. It's necessary to make sure domain controller time is correct.

    You may refer to following document for details of how time is sync in the domain.

    http://blogs.msdn.com/b/w32time/archive/2007/07/07/welcome.aspx

    http://blogs.msdn.com/b/w32time/archive/2007/09/04/keeping-the-domain-on-time.aspx

    “w32tm /query [/peers | /status | /configuration]” is used to display time sync up information on windows box. In the following sample, you may see time sync up source is AUSYDC02.ca.com.

    C:>w32tm /query /peers

    #Peers: 1

    Peer: AUSYDC02.ca.com

    State: Active

    Time Remaining: 282.9147723s

    Mode: 3 (Client)

    Stratum: 6 (secondary reference - syncd by (S)NTP)

    PeerPoll Interval: 13 (8192s)

    HostPoll Interval: 13 (8192s)

Comment

Known bug of time drift

SCR 7145 (AppLogic 3.1 ) – fixed in hf7728

SCR 7214 (AppLogic 3.5 ) - fixed in 3.5.12