Skip to content
This repository was archived by the owner on Feb 18, 2025. It is now read-only.
This repository was archived by the owner on Feb 18, 2025. It is now read-only.

Hook for take-master / GracefulIntermediateMasterTakeover #799

@daniel-2647

Description

@daniel-2647

Hello Shlomi, we have the following test topology and db hosts:

1 MASTER --> shadowmaster (has all schemas)
3 SLAVES --> sm hosts (have only certain schemas being replicated, but they all have the same filter)

Initial topology:

shadowmaster:3306          [unknown,invalid,Unknown,rw,nobinlog,downtimed]
+ sm-ohq-applogdb-1:3306   [0s,ok,10.3.12-MariaDB-log,rw,MIXED,>>]
  + sm-atl-applogdb-3:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]
  + sm-ohq-applogdb-2:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]

screen shot 2019-02-08 at 5 15 22 pm

We are using AutoPseudoGTID and everything is working as expected.

Here's a scenario I'm trying to make work, but so far have not been able to:

We would like to be able to drag/drop (promote) sm-atl-applogdb-3 so it becomes master of both sm-ohq-applogdb-1 and sm-ohq-applogdb-2, and have sm-atl-applogdb-3 replicate from shadowmaster, as shown below:

shadowmaster:3306          [unknown,invalid,Unknown,rw,nobinlog,downtimed]
+ sm-atl-applogdb-3:3306   [0s,ok,10.3.12-MariaDB-log,rw,MIXED,>>]
  + sm-ohq-applogdb-1:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]
  + sm-ohq-applogdb-2:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]

screen shot 2019-02-08 at 5 17 36 pm

Unfortunately, this does not happen, and we end up with the following: (notice sm-ohq-applogdb-2 remained as slave of the old master)

shadowmaster:3306            [unknown,invalid,Unknown,rw,nobinlog,downtimed]
+ sm-atl-applogdb-3:3306     [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]
  + sm-ohq-applogdb-1:3306   [0s,ok,10.3.12-MariaDB-log,rw,MIXED,>>]
    + sm-ohq-applogdb-2:3306 [0s,ok,10.3.12-MariaDB-log,ro,MIXED,>>]

screen shot 2019-02-08 at 5 15 49 pm

I attempted to use a hook, as I thought this would fall under PostIntermediateMasterFailoverProcesses. I created a hook that would move all slaves of the old intermediary master (in this case sm-ohq-applogdb-1) as slaves of the new master (sm-atl-applogdb-3), but it never got called.

When troubleshooting the PostIntermediateMasterFailoverProcesses hook to find out why it was not being called, I noticed it never get's triggered, and maybe it is because this is being handled during the take-master call, and not as a graceful intermediate master promotion.

Here are the logs:

2019-02-08 17:05:18 DEBUG raft leader is 10.0.84.117:10008 (this host); state: Leader
[martini] Started GET /api/take-master/sm-atl-applogdb-3/3306 for 69.41.14.254:15162
2019-02-08 17:05:22 DEBUG TakeMaster: will attempt making sm-atl-applogdb-3:3306 take its master sm-ohq-applogdb-1:3306, now resolved as sm-ohq-applogdb-1:3306
2019-02-08 17:05:22 INFO Stopped replication on sm-ohq-applogdb-1:3306, Self:mysql-bin-sm-ohq-applogdb-1.000057:169115058, Exec:shadowmaster.027720:455315683
2019-02-08 17:05:23 DEBUG analysis: IsMaster: true, LastCheckValid: false, LastCheckPartialSuccess: true, CountReplicas: 1, CountValidReplicatingReplicas: 0, CountLaggingReplicas: 0, CountDelayedReplicas: 0,
2019-02-08 17:05:23 DEBUG raft leader is 10.0.84.117:10008 (this host); state: Leader
2019-02-08 17:05:23 DEBUG orchestrator/raft: applying command 1863: request-health-report
[martini] Started GET /api/raft-follower-health-report/4c36b85e/sm-ohq-proxysql-1/sm-ohq-proxysql-1 for 10.0.84.117:50200
[martini] Completed 200 OK in 582.534µs
[martini] Started GET /api/raft-follower-health-report/4c36b85e/sm-ohq-proxysql-2/sm-ohq-proxysql-2 for 10.0.84.118:10344
[martini] Completed 200 OK in 580.334µs
[martini] Started GET /api/raft-follower-health-report/4c36b85e/sm-atl-proxysql-3/sm-atl-proxysql-3 for 10.5.4.171:47266
[martini] Completed 200 OK in 566.785µs
2019-02-08 17:05:24 INFO Stopped replication on sm-atl-applogdb-3:3306, Self:mysql-bin-sm-atl-applogdb-3.000057:169115074, Exec:mysql-bin-sm-ohq-applogdb-1.000057:169115058
2019-02-08 17:05:24 INFO Will start replication on sm-atl-applogdb-3:3306 until coordinates: mysql-bin-sm-ohq-applogdb-1.000057:169115058
2019-02-08 17:05:26 INFO Stopped replication on sm-atl-applogdb-3:3306, Self:mysql-bin-sm-atl-applogdb-3.000057:169115074, Exec:mysql-bin-sm-ohq-applogdb-1.000057:169115058
2019-02-08 17:05:26 DEBUG ChangeMasterTo: will attempt changing master on sm-atl-applogdb-3:3306 to shadowmaster:3306, shadowmaster.027720:455315683
2019-02-08 17:05:26 INFO ChangeMasterTo: Changed master on sm-atl-applogdb-3:3306 to: shadowmaster:3306, shadowmaster.027720:455315683. GTID: false
2019-02-08 17:05:26 DEBUG ChangeMasterTo: will attempt changing master on sm-ohq-applogdb-1:3306 to sm-atl-applogdb-3:3306, mysql-bin-sm-atl-applogdb-3.000057:169115074
2019-02-08 17:05:26 INFO ChangeMasterTo: Changed master on sm-ohq-applogdb-1:3306 to: sm-atl-applogdb-3:3306, mysql-bin-sm-atl-applogdb-3.000057:169115074. GTID: false
2019-02-08 17:05:27 WARNING executeCheckAndRecoverFunction: ignoring analysisEntry that has no action plan: AllIntermediateMasterSlavesNotReplicating; key: sm-atl-applogdb-3:3306
2019-02-08 17:05:27 INFO Started replication on sm-atl-applogdb-3:3306
2019-02-08 17:05:28 DEBUG raft leader is 10.0.84.117:10008 (this host); state: Leader
2019-02-08 17:05:28 INFO Started replication on sm-ohq-applogdb-1:3306
2019-02-08 17:05:29 INFO auditType:take-master instance:sm-atl-applogdb-3:3306 cluster:shadowmaster:3306 message:took master: sm-ohq-applogdb-1:3306

Would it be possible to create a GracefulIntermediateMasterTakeover hook or a hook for the take-master call above?
Thanks for your time, please let me know if you have any questions and I can try to explain more if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions