We installed a new server today (linux, latest version of eDir) and tried to join this server to an existing Tree. The new server has been in the new state for almost 6 hours, and checking the synchronization status shows an error on at least one of the existing servers. It feels like we're in a stuck or error state, and need to take some action to get a replica onto the new server, but we're not sure what is causing the error and how to correct it.
The existing tree has 5 servers. All are running NW 6.5. (This new linux server is the first step in our migration from NW to Linux). Timesync is running properly across all the servers, including the new one. Network links between all the servers are up and error free. The servers can all ping each other, and no communication error is being reported to any of the NW consoles.
Using dsrepair, the sync status on all NW servers is similar to this:
Partition: .[Root].
Replica: .us-predr01.Operations.WS ********** ******** -603
Replica: .PRINDS03.Operations.WS 3-07-2012 00:08:05
Replica: .PRINDS02.Operations.WS 3-06-2012 19:20:35
Server: CN=us-predr01.OU=Operations.O=... 3-06-2012 19:20:41 -625 Remote
Object: CN=santos\.maria.OU=Users.OU=WebService.O=WS
Replica: .PRINDS01.Operations.ND 3-07-2012 00:08:10
Replica: .SECNDS03.Operations.ND 3-07-2012 00:08:01
I understand the 625 error to be a Transport error, but I don't know why it always reports the same object, and I can't find any communication related problem that would cause the error.
The 603 error is a missing attribute error, I think, but I can't find any other information as to where that error is coming from either.
I've been watching all the replication traffice with DSTrace. On the Master replica (PRINDS02), the only lines that mention the new server are this:
Rescheduling sync (replica on .us-predr01.Operations.WS in state: New replica)
The other servers all attempt to start an outbound sync, but report this:
Error _StartUpdateReplica to .us-predr01.Operations.WS, failed, replica in skulk. (-698)
Does anyone have any suggestions as to where we might look next to locate the error and get this new server replicating with the rest of the tree? We're hoping to get this resolved before business hours tomorrow.
Thanks much,
Steve
The existing tree has 5 servers. All are running NW 6.5. (This new linux server is the first step in our migration from NW to Linux). Timesync is running properly across all the servers, including the new one. Network links between all the servers are up and error free. The servers can all ping each other, and no communication error is being reported to any of the NW consoles.
Using dsrepair, the sync status on all NW servers is similar to this:
Partition: .[Root].
Replica: .us-predr01.Operations.WS ********** ******** -603
Replica: .PRINDS03.Operations.WS 3-07-2012 00:08:05
Replica: .PRINDS02.Operations.WS 3-06-2012 19:20:35
Server: CN=us-predr01.OU=Operations.O=... 3-06-2012 19:20:41 -625 Remote
Object: CN=santos\.maria.OU=Users.OU=WebService.O=WS
Replica: .PRINDS01.Operations.ND 3-07-2012 00:08:10
Replica: .SECNDS03.Operations.ND 3-07-2012 00:08:01
I understand the 625 error to be a Transport error, but I don't know why it always reports the same object, and I can't find any communication related problem that would cause the error.
The 603 error is a missing attribute error, I think, but I can't find any other information as to where that error is coming from either.
I've been watching all the replication traffice with DSTrace. On the Master replica (PRINDS02), the only lines that mention the new server are this:
Rescheduling sync (replica on .us-predr01.Operations.WS in state: New replica)
The other servers all attempt to start an outbound sync, but report this:
Error _StartUpdateReplica to .us-predr01.Operations.WS, failed, replica in skulk. (-698)
Does anyone have any suggestions as to where we might look next to locate the error and get this new server replicating with the rest of the tree? We're hoping to get this resolved before business hours tomorrow.
Thanks much,
Steve