Skip to content

Conversation

ghost
Copy link

@ghost ghost commented Jan 15, 2018

This pull request relates to the stream #330
The ideas of the 2 new parameters:

  1. Implemented "AdditionalSyncStandbyNames" parameter;
    • configure additional value for the "synchronous_standby_names" parameter
  2. Implemented "AdditionalReplicationSlotName";
    • provide additional replication slot with specified configurable name

The additional synchronous name will be used in 2 nodes synchronous stolon cluster. If the master or standby fails then postgres will start using additional synchronous name for the synchronous replication, and therefore the stolon cluster (now it consists of the one survived single node) will continue operate.
As the value for "AdditionalSyncStandbyNames" parameter we want to use PgBarman's application name and since it requires exact replication slot we have implemented the "AdditionalReplicationSlotName" parameter
From the consistency point of view the schema seems to be ok because in a moment of a failure the PgBarman will be engaged as the synchronous destination and the data will be saved by the additional synchronous standby (e.g. PgBarman).

The pull request is the continue of the discussion: #330

@Vadim0908
Copy link

Vadim0908 commented Jan 15, 2018

This patch allows us to use additional replication slots for backup (RPO=0). For example, we can implement two options for working with a Barman.

Classic Stolon Architecture and Barman Backup

stolon barman sr 3 nodes

Barman offers 2 options for backup and archiving. Traditionally, Barman has supported standard WAL file shipping through PostgreSQL’s archive_command (usually via rsync/SSH). With this method, WAL files are archived only when PostgreSQL switches to a new WAL file. To keep it simple, this normally happens every 16MB worth of data changes.
Barman 1.6.0 introduces streaming of WAL files for PostgreSQL servers 9.2 or higher, as an additional method for transactional log archiving, through pg_receivexlog. WAL streaming is able to reduce the risk of data loss, bringing RPO down to near zero values.
Barman 2.0 introduces support for replication slots with PostgreSQL servers 9.4 or above, therefore allowing WAL streaming-only configurations. Moreover, you can now add Barman as a synchronous WAL receiver in your PostgreSQL 9.5 (or higher) cluster, and achieve zero data loss (RPO=0).
Thus, the traditional archiving of WAL logs can lead to data loss in the event of an accident. Therefore, we believe that we need to use both approaches for archiving WA-logs.

Synchronous WAL streaming

This feature is available only from PostgreSQL 9.5 and above. Barman can also reduce the Recovery Point Objective to zero, by collecting the transaction WAL files like a synchronous standby server would.

Stolon Architecture with 2 Nodes and Barman as 3rd Nodes

stolon barman sr 2 nodes

Copy link
Member

@sgotti sgotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saranhada @Vadim0908 thanks for you PR and detailed explanation. Comments inline. There's also the need for some integration tests.

But I don't consider this as a fix for #330:
This PR will just add additional sync standbys but, if the internal good standbys are less then minSynchronousStandbys, than a fakesyncstandby will be added anyway. And I won't change this behavior because if you define minSynchronousStandbys stolon must ensure that there're at least minSynchronousStandbys synchronous stanbys and these stanbys must be controlled by stolon (not external).

The fix for #330 will be the ability to not make stolon add fakesyncstanbys when there less than minSynchronousStandbys good keepers. I haven't tests this and it should be better discussed but probably just defining minSynchronousStandbys to 0 will make this happen without any code change (or if not a PR, different than this, should be submitted to discuss this).

log.Errorw("failed to create replication slot", "slotName", backupSlotName, zap.Error(err))
return err
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this drop and recreate the replication slot at every loop?

// TODO(sgotti) Find a way to detect the pg major version also
// when the instance is empty. Parsing the postgres --version
// is not always reliable (for example we can get `10devel` for
// development version).
// Now this will lead to problems starting the instance (or
// warning reloading the parameters) with postgresql < 9.6
if len(synchronousStandbys) > 1 {
parameters["synchronous_standby_names"] = fmt.Sprintf("%d (%s)", len(synchronousStandbys), strings.Join(synchronousStandbys, ","))
if syncStandbyNumber > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably here there's some logic that needs to be detailed in a comment. If I got it right you want to define the additionalsyncstandbynames only if there's at least one defined synchronous standby?

I won't do all of this here. This is something that should be handled by the sentinel. So, inside the sentinel I'll just add the additional synchronous standby in db.Spec.SynchronousStandby.

@@ -24,7 +24,9 @@ Some options in a running cluster specification can be changed to update the des
| synchronousReplication | use synchronous replication between the master and its standbys | no | bool | false |
| minSynchronousStandbys | minimum number of required synchronous standbys when synchronous replication is enabled (only set this to a value > 1 when using PostgreSQL >= 9.6) | no | uint16 | 1 |
| maxSynchronousStandbys | maximum number of required synchronous standbys when synchronous replication is enabled (only set this to a value > 1 when using PostgreSQL >= 9.6) | no | uint16 | 1 |
| additionalSyncStandbyNames| additional names for possible standbys not included into stolon cluster, but used as replication destinations | mo | string | null |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do this an []string or you'll have to explicitly define that they should be comma separated and parse/validate them.

| additionalWalSenders | number of additional wal_senders in addition to the ones internally defined by stolon, useful to provide enough wal senders for external standbys (changing this value requires an instance restart) | no | uint16 | 5 |
| additionalReplicationSlotName | name of the additional replication slot which is always created by the master postgres instance | no | string | null |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot be this a []string as above so users can define multiple additional replication slots?

}
if s.AdditionalReplicationSlotName == nil {
s.AdditionalReplicationSlotName = StringP(DefaultReplicationSlotName)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making them a []string the values should be validated to check that they're valid sync standby names and replication slot names.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AdditionalReplicationSlotName - is the name of a single replication slot, not an array

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I was asking to make it an array so multiple additional replication slots can be defined like done for additionalSyncStandbyNames.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simon, in our task we need one definite replication slot. But if you insist we can extend the basic idea. Do you?

Copy link
Member

@sgotti sgotti Jan 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saranhada yes, since we are providing a new option to define an additional replication slot lets make it useful for everyone giving the ability to define N additional replication slots if needed.

@@ -360,6 +360,8 @@ func (s *Sentinel) setDBSpecFromClusterSpec(cd *cluster.ClusterData) {
db.Spec.FollowConfig.StandbySettings = clusterSpec.StandbySettings
}
db.Spec.AdditionalWalSenders = *clusterSpec.AdditionalWalSenders
db.Spec.AdditionalSyncStandbyNames = *clusterSpec.AdditionalSyncStandbyNames
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this and just add the additional synchronous standbys in db.Spec.SynchronousStandby.

@@ -530,6 +586,9 @@ type DBSpec struct {
// AdditionalWalSenders defines the number of additional wal_senders in
// addition to the ones internally defined by stolon
AdditionalWalSenders uint16 `json:"additionalWalSenders"`

AdditionalSyncStandbyNames string `json:"additionalSyncStandbyNames"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an []string.

// AdditionalWalSenders defines the number of additional wal_senders in
// addition to the ones internally defined by stolon
AdditionalWalSenders *uint16 `json:"additionalWalSenders"`

AdditionalSyncStandbyNames *string `json:"additionalSyncStandbyNames"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an []string.

// AdditionalWalSenders defines the number of additional wal_senders in
// addition to the ones internally defined by stolon
AdditionalWalSenders *uint16 `json:"additionalWalSenders"`

AdditionalSyncStandbyNames *string `json:"additionalSyncStandbyNames"`
AdditionalReplicationSlotNames *string `json:"additionalReplicationSlotNames"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an []string.

synchronousStandbys = append(synchronousStandbys, syncStandByName)
}
}

Copy link
Member

@sgotti sgotti Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly can't understand the logic here. As I tried to explain in my previous comment: https://github.com/sorintlab/stolon/pull/410/files#r161692823 just drop this and let the sentinel add the AdditionalSyncStandbyNames to the db.Spec.SynchronousStandbys.

@@ -360,6 +360,8 @@ func (s *Sentinel) setDBSpecFromClusterSpec(cd *cluster.ClusterData) {
db.Spec.FollowConfig.StandbySettings = clusterSpec.StandbySettings
}
db.Spec.AdditionalWalSenders = *clusterSpec.AdditionalWalSenders
db.Spec.AdditionalSyncStandbyNames = clusterSpec.AdditionalSyncStandbyNames
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't define db.Spec.AdditionalSyncStandbyNames but just add the additional synchronous standbys to db.Spec.SynchronousStandby.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgotti We want these additional standbys not to be taken into account here keeper.go:303

		parameters["synchronous_standby_names"] = fmt.Sprintf("%d (%s)", syncStandbyNumber, strings.Join(synchronousStandbys, ","))
	} else {
		parameters["synchronous_standby_names"] = strings.Join(synchronousStandbys, ",")
	}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saranhada I don't understand why. As the name states they are AdditionalSyncStandbyNames so they should be added to the list of "synchronous_standby_names" no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saranhada If you want them only as a "fallback" standby when one or more keeper standby isn't working then that's another story and it should be handled differently since this will break the current concistency logic that uses a fake standby as this change will skip it.

As I explained previously these are 2 distinct changes. One is to set additional sync standbys (not replacement standbys) that will be counted in the number of requires sync standbys. Another is to disable sync repl when you don't have enough standbys.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgotti if I add the additional standby names into db.Spec.SynchronousStandby then the provided piece of will be generated like (for instance) = "2 (stolon_fdg4gs, add_name)", and this is what I want to avoid. We want to implement additional synchronous standbys for needs of backups as it was mentioned on the schema (before). It is supposed that PgBarman will serve the required needs. Thus we add PgBarman into 2 nodes stolon cluster with enabled sync replication. In a moment the master fails the standby keeper will be promoted into new Master with sync replication still enabled. The single stolon node will still operate and the PgBarman will be used as sync standby destination.

Later when a new 2nd node of stolon is restored/created the cluster will be switched to use it. But PgBarman will serve the needs of sync backup.

Copy link
Member

@sgotti sgotti Jan 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered here: #410 (comment)

@sgotti
Copy link
Member

sgotti commented Jan 24, 2018

@saranhada Looks like you have two requirements:

a. You don't want that a primary blocks when there are less than MinSynchronousStandbys working standbys AND you don't want to just fallback to async repl but instead you want to send wal records to an external standby that will become Synchronous ONLY when the above condition happen.

Note that this isn't #330 (deactivate synchronous replication for cluster consisting of 2 nodes in case the slave fails).

and

b. You want an external synchronous standby for backups with rpo = 0 but only when there are less than MinSynchronousStandbys working standbys.

But I'm quite sure they can be summed in only one requirement:

  • Add an option to define N Fallback synchronous standbys that will be added only when there are less than MinSynchronousStandbys working standbys.

The main issue with your implementation is that you can't control all of this using the FIRST method of the sync_standby_names parameters because it's acting outside stolon control (switching sync standbys without the sentinel knowing this). Let me explain:

This PR will define for the primary instance something like this: 1 (stolon_fdg4gs, add_name).

This means that if the standby keeper stolon_fdg4gs fails, the master instead of blocking will continue committing transactions since it can send data to pgBarman (or any other external pg instance defined by add_name).

If in a small time window a standby dies then the primary dies and the failed standby comes back before the sentinel notices this, it may happen that the standby is elected as master also if it's not in sync with the dead primary causing the loss of all the changes contained in the missed wal records.

That's the reason why we are forced to have FIRST value equal to the total number sync standbys. Will open a PR to document it better.

That's the reason why I was asking to just add the AdditionalSyncStandbys to db.Spec.SynchronousStandbys. But obviously this will block your primary if just one of the external sync standbys are down and you don't want this.

Proposed implementation

So the starting point is that it must be the sentinel to control all of this. And let's change the option name to some more clear like ExternalFallbackSyncStandbys.

A possible solution is to handle all of this in the sentinel (like we already do adding a fakestandby when there aren't enough keeper to force the primary to block): if the sentinel detects that the good standbys are less than MinSynchronousStandbys then add one or more ExternalFallbackSyncStandbys to reach MinSynchronousStandbys.

So, in a two node cluster, you'll have MinSynchronousStandbys = 1, if the standbys dies the sentinel will notice this, remove it from the db.Spec.SynchronousStandbys and add the first entry in ExternalFallbackSyncStandbys. In this way the primary will unblock. The above problem won't happen because the external standby will be added by the sentinel only if it detects that the internal standbys is dead. If the dead standby comes back it'll be readded to the list of sync stanbys and the external stanby will be removed.

Of course the switch isn't instantaneus as if it's managed by the postgres instance but will permit the cluster to unblock after some seconds/minutes.

@ghost
Copy link
Author

ghost commented Jan 26, 2018

@sgotti I'm agree the name ExternalFallbackSyncStandbys sounds good. It really reflects the idea of the backup synchronous standbys.

Also I have reviewed the stolon source code again, of course I'm not an deep expert in it.

You proposed to migrate the logic into sentinel. I think I see a piece of code can be responsible for the creation of the required configuration. It seems to start here sentinel.go:952 =>

if !masterOK {
log.Infow("trying to find a new master to replace failed master")
.....
newMasterDB.Spec.SynchronousStandbys = []string{fakeStandbyName}

I have the following proposal:

  • If the ExternalFallbackSyncStandbys parameter is not empty then just take its values and put into the "newMasterDB.Spec.SynchronousStandbys" list.

How do you think?

@sgotti
Copy link
Member

sgotti commented Jan 26, 2018

@saranhada Without testing a start to see if it works or there're some corner cases could be:

  • Before

    // If there're not enough real synchronous standbys add a fake synchronous standby because we have to be strict and make the master block transactions until MaxSynchronousStandbys real standbys are available
    add a check if len(synchronousStandbys) < int(*clusterSpec.MinSynchronousStandbys) { that will add N ExternalFallbackSyncStandbys to to reach MinSynchronousStandbys.

  • Validate in clusterspec.Validate() that the provided ExternalFallbackSyncStandbys are not stolon names (use IsStolonName in

    func IsStolonName(name string) bool {
    ) to not clash with internal stolon names.

  • I'll add some tests to see that all of this works to both sentinel/sentinel_test.go and integration tests in tests/integration

@ghost
Copy link
Author

ghost commented Jan 27, 2018

@sgotti
Your suggestion:
add a check if len(synchronousStandbys) < int(*clusterSpec.MinSynchronousStandbys) { that will add N ExternalFallbackSyncStandbys to to reach MinSynchronousStandbys.
will not work.

Please see the code in keeper.go:275:

// Setup synchronous replication
if db.Spec.SynchronousReplication && len(db.Spec.SynchronousStandbys) > 0 {
synchronousStandbys := []string{}
for _, synchronousStandby := range db.Spec.SynchronousStandbys {
synchronousStandbys = append(synchronousStandbys, common.StolonName(synchronousStandby))
}

Every sync name which we add in sentinel will be prefixed by "stolon_" in keeper.

I think we should implement the mentioned by you logic (add required the insufficient number of sync standby names) in keeper and take into consideration the MinSynchronousStandbys & MaxSynchronousStandbys

@sgotti
Copy link
Member

sgotti commented Jan 29, 2018

Every sync name which we add in sentinel will be prefixed by "stolon_" in keeper.

So lets remove this from the keeper and make the sentinel to generate a StolonName. The keeper will put the provided synchronous standby names as provided in the db spec without any change.

Also remove all the prefix checks and removal in the keeper:

if !common.IsStolonName(n) {

to report them as is (so also the external stanbys are reported).

Perhaps there's something else to change in the sentinel. Let's see after these changes.

I think we should implement the mentioned by you logic (add required the insufficient number of sync standby names) in keeper and take into consideration the MinSynchronousStandbys & MaxSynchronousStandbys

The keeper just tries to converge to the clusterdata db spec and for this reason must not implement any cluster logic. The clusterdata db spec is calculated by the sentinel so all of this logic pertains to the sentinel.

@sgotti
Copy link
Member

sgotti commented Jan 31, 2018

@saranhada I rebased you branch on master and changed some parts to make it work. You can find it here (just one commit): https://github.com/sgotti/stolon/tree/saranhada-ADD_PARAMS_SYNC

  • In the end, instead of keeping both internal and external standbys in db.Spec.SynchronousStanbys, I preferred to add a db.Spec.ExternalSynchronousStandbys since these are two different things. Se below.
  • Added some comments to clarify that db.Spec.SynchronousStanbys is a list of internal db UIDs (not generic standby names) while db.Spec.ExternalSynchronousStandbys is a list of external standby names.
  • For the above reason the fakestandbyname will now be added to db.Spec.ExternalSynchronousStandbys

Please take a look.

You can just take my commit and put it in your branch and force push to update this PR (no need to open a new PR).

@marct83
Copy link

marct83 commented Jan 31, 2018

This is completely unrelated to this issue....but does Barman work with Stolon and is configuration any different than a regular Barman Postgres setup?

@sgotti
Copy link
Member

sgotti commented Jan 31, 2018

This is completely unrelated to this issue....but does Barman work with Stolon and is configuration any different than a regular Barman Postgres setup?

@marct83 can you please ask this question on the gitter channel so more people can help you and we don't pollute this PR? Thanks.

@marct83
Copy link

marct83 commented Jan 31, 2018

No problem. Delete my comments. Thanks

@sgotti
Copy link
Member

sgotti commented Feb 5, 2018

@saranhada I saw you imported my commit but also added another commit (2f25a26) that is adding unneeded things (ExternalFallbackSyncStandbys is not used and need).

You said on gitter that you completed manual tests but before this PR can be merged we need some automatic tests. We need some unit tests in cmd/sentinel/sentinel.go and an integration tests in tests/integration/ha_test.go. For this last one a possible implementation idea is to use the Standby cluster feature to set up a stanby cluster as a synchronous standby (it'll be the same as you barman instance but we can reuse the code used in other tests) and do all the thing you probably did manually.

If you have some problems adding them I can do it for you in the next days when I have some time.

@ghost
Copy link
Author

ghost commented Feb 7, 2018

@sgotti sorry for the late response.
Can you please confirm now that the created code is OK? ...and I should created auto-tests.
If not then I'd prefer to accomplish with the code/logic and after that to create tests. I'm absolutely agree that I have to create the tests as well.

}

if len(s.AdditionalReplicationSlotNames) > 0 {
rp := regexp.MustCompile("[_0-9a-z]+")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

symbolsNameFilter := regexp.MustCompile("[_0-9]")
if len(s.ExternalFallbackSyncStandbys) > 0 {
correctSymbolsNameFilter := regexp.MustCompile("[_0-9A-Za-z]")
for _, standbyName := range s.ExternalFallbackSyncStandbys {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this in its own function and add some unit tests to it?

@sgotti
Copy link
Member

sgotti commented Feb 8, 2018

@saranhada The logic related to the FallbackExternalSyncStandbys looks good but I'm quite sure there'll be bugs that we could catch only if we write tests to cover the changes. As I explained before I don't understand why you added commit 2f25a26

Additionally:

  • The part related to creating additional replication slot is a different feature than the FallbackExternalSyncStanbys and should be split in its own PR. I haven't reviewed it because it's difficutl to see if it does what it's meant to do without some tests. Can you open a new PR for it so I can review it separately from this?

  • Added some comment for the new fields validation.

@ghost
Copy link
Author

ghost commented Feb 8, 2018

@sgotti do you suggest to split the PR into the following 2?

  1. Additional replication slots
  2. Fallback sync standbys

Regarding option 2 I understand the integration tests are required. But how about option 1? DO you require it? I think unit tests will be enough.

@sgotti
Copy link
Member

sgotti commented Feb 8, 2018

@sgotti do you suggest to split the PR into the following 2?

  1. Additional replication slots
  2. Fallback sync standbys

yes.

Regarding option 2 I understand the integration tests are required. But how about option 1? DO you require it? I think unit tests will be enough.

Now the current updateReplSlots is covered by the current integration tests. In its current shape we cannot cover it with just an unit test since to test it we need a running keeper and postgres instance. To make an unit test you should split it in two part, one that does the get and the drop/create on the postgres instance and a function than given the current replication slots and the desired slots will return a list of slots to drop and a list of slots to create (so an unit test can be created on that function).

But I'd say to just start splitting it in its own PR, then let's see which kind of tests are needed. I could write something myself to let you see how this test could be done.

@ghost ghost closed this Feb 8, 2018
@ghost ghost deleted the ADD_PARAMS_SYNC branch February 8, 2018 20:52
@ghost ghost mentioned this pull request Feb 8, 2018
@ghost
Copy link
Author

ghost commented Feb 10, 2018

Reopened

@ghost ghost reopened this Feb 10, 2018
@ghost ghost closed this Feb 10, 2018
@ghost ghost reopened this Feb 10, 2018
@ghost ghost closed this Feb 28, 2018
@ghost ghost reopened this Feb 28, 2018
@@ -312,6 +312,7 @@ func (p *PostgresKeeper) createPGParameters(db *cluster.DB) common.Parameters {
for _, synchronousStandby := range db.Spec.SynchronousStandbys {
synchronousStandbys = append(synchronousStandbys, common.StolonName(synchronousStandby))
}
numberSyncStandbys := len(synchronousStandbys)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not do it then then the following code will result in forming of incorrect "synchronous_standby_names", it will will be like "2 (standby1, standby2)":
for _, synchronousStandby := range db.Spec.ExternalSynchronousStandbys {
synchronousStandbys = append(synchronousStandbys, synchronousStandby)
}
By saving the size of synchronousStandbys we are avoiding the for compilation of the incorrect value

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 (standby1, standby2) is correct. As the comment below in the code says all the provided synchronous standbys must be synchronous. This is what I explained previously here: #410 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgotti I'm just wanted to notify you that we are currently considering your point of view. Will return to you in a couple of days....

@@ -331,7 +332,7 @@ func (p *PostgresKeeper) createPGParameters(db *cluster.DB) common.Parameters {
// which of the 3 standbys is really synchronous and in sync with the
// master. And choosing the non synchronous one will cause the loss of
// the transactions contained in the wal records not transmitted.
if len(synchronousStandbys) > 1 {
if numberSyncStandbys > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the previous comment

}
func (p dbSlice) Swap(i, j int) {
p[i], p[j] = p[j], p[i]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't reformat this.

@@ -1320,7 +1326,18 @@ func (s *Sentinel) updateCluster(cd *cluster.ClusterData, pis cluster.ProxiesInf
log.Debugf("synchronousStandbys not changed", "masterDB", masterDB.UID, "prevSynchronousStandbys", prevSynchronousStandbys, "synchronousStandbys", synchronousStandbys)
}

// If there're not enough real synchronous standbys add a fake synchronous standby because we have to be strict and make the master block transactions until MinSynchronousStandbys real standbys are available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment shouldn't be removed.

@@ -26,6 +26,7 @@ Some options in a running cluster specification can be changed to update the des
| maxSynchronousStandbys | maximum number of required synchronous standbys when synchronous replication is enabled (only set this to a value > 1 when using PostgreSQL >= 9.6) | no | uint16 | 1 |
| additionalWalSenders | number of additional wal_senders in addition to the ones internally defined by stolon, useful to provide enough wal senders for external standbys (changing this value requires an instance restart) | no | uint16 | 5 |
| additionalMasterReplicationSlots | a list of additional replication slots to be created on the master postgres instance. Replication slots not defined here will be dropped from the master instance (i.e. manually created replication slots will be removed). | no | []string | null |
| externalFallbackSyncStandbys | a list of additional names for possible standbys not included into stolon cluster, but used as replication destinations. | no | []string | null |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description doesn't look very clear. I'll prefer something more detailed and also document this in https://github.com/sorintlab/stolon/blob/master/doc/syncrepl.md

@@ -483,6 +493,27 @@ func (os *ClusterSpec) Validate() error {
return nil
}

func validateExternalSyncStandby(externalSyncStandbyName string) error {
symbolsNameFilter := regexp.MustCompile("[_0-9]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add an unit test to this function.

if filteredSymbolsOnlyName := symbolsNameFilter.ReplaceAllString(externalSyncStandbyName, ""); len(filteredSymbolsOnlyName) == 0 {
return fmt.Errorf("external fallback synchronous standby name \"%s\" has an incorrect format, every name should start with a symbol", externalSyncStandbyName)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above code look too much complicated. The regexp should already catch trailing spaces and \n.

@ghost
Copy link
Author

ghost commented Mar 29, 2018

@sgotti I fixed your last comment. I'm agree with your logic. But there is one question left. Lets consider a stolon cluster:

  1. 2 keeper nodes sync (master - slave)
  2. 1 barman node connected as "externalFallbackSyncStandbys"
  3. minSynchronousStandbys = 1

In this cluster I kill "standby keeper", but I see that master keeper doesn't rebuild synchronous_standby_names attribute. At leaves it as untouched.
I need the "synchronous_standby_names" to be set into "barman".

May I ask you to point a piece code which is responsible for the specified scenario?

@sgotti
Copy link
Member

sgotti commented Mar 30, 2018

May I ask you to point a piece code which is responsible for the specified scenario?

Perhaps I lost something but wasn't the purpose of this patch to do this? And this should be done here: https://github.com/sorintlab/stolon/pull/410/files#diff-9961d4546ef09e0152c558e820699523R1324

When the sentinel declares the standby as not good it'll catch this code and add an external synchronous standby.

@ghost
Copy link
Author

ghost commented Jul 5, 2018

Hi @sgotti,

can we continue the conversation on this PR?

@sgotti
Copy link
Member

sgotti commented Jul 5, 2018

@saranhada yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants