Skip to content

Conversation

bharat-p
Copy link
Member

@bharat-p bharat-p commented Sep 2, 2017

Resolves #369

ES number of replicas was getting set to 1 even when specifying --es.num-replicas=0 when starting collector and query.

Before fix:

output of localhost:9200/_cluster/health?pretty

{
  "cluster_name" : "jaeger-cluster",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 4,
  "active_shards" : 4,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 4,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.0
}

ES cluster health check shows status 'yellow' and you will notice active_primary_shards=4 and unassigned_shards=4, which means some indexes are configured with value replicas=1, this can be confirmed by checking output of :

http://localhost:9200/jaeger-span-2017-09-02/_settings?pretty

{
  "jaeger-span-2017-09-02" : {
    "settings" : {
      "index" : {
        "mapping" : {
          "nested_fields" : {
            "limit" : "50"
          }
        },
        "number_of_shards" : "1",
        "provided_name" : "jaeger-span-2017-09-02",
        "mapper" : {
          "dynamic" : "false"
        },
        "creation_date" : "1504368267533",
        "requests" : {
          "cache" : {
            "enable" : "true"
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "AkWNKfQSRlCa9pIoxilj5g",
        "version" : {
          "created" : "5050299"
        }
      }
    }
  }
}

Verified after fix:
output of http://localhost:9200/_cluster/health?pretty

{
  "cluster_name" : "jaeger-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 10,
  "active_shards" : 10,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

After fix output of http://localhost:9200/jaeger-span-2017-09-02/_settings?pretty

{
  "jaeger-span-2017-09-02" : {
    "settings" : {
      "index" : {
        "mapping" : {
          "nested_fields" : {
            "limit" : "50"
          }
        },
        "number_of_shards" : "5",
        "provided_name" : "jaeger-span-2017-09-02",
        "mapper" : {
          "dynamic" : "false"
        },
        "creation_date" : "1504369115915",
        "requests" : {
          "cache" : {
            "enable" : "true"
          }
        },
        "number_of_replicas" : "0",
        "uuid" : "Ou2TIOavSCyIDDqlj_4Gxg",
        "version" : {
          "created" : "5050299"
        }
      }
    }
  }
}

@yurishkuro
Copy link
Member

Personally I would rather have a parameter called "replication factor" that defines the total number of copies of the data, similar to Cassandra. Then the setting es.numReplicas = replicationFactor-1 could be done internally, which would allow RF=0 to be treated as empty value and overwritten by a default, since RF=0 is never valid (while NumReplicas=0 is valid).

I know that this would go against the nomenclature of ES configuration, thus a bit controversial.

@bharat-p
Copy link
Member Author

bharat-p commented Sep 2, 2017

@yurishkuro that should be ok, but as you already pointed out it goes against ES' nomenclature and might be confusing to ES users, also will cause issue with folks who are already relying on this parameter.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 100.0% when pulling 9984acd on bharat-p:es-replica-fix-#369 into 413942a on uber:master.

@yurishkuro
Copy link
Member

maybe we can keep the command line switch name & meaning the same but default it to something like -1 if not specified, and handle the rest internally (like converting internal option to "rf" instead of "numReplicas"). My main concern with this change is that it breaks the convention in Jaeger codebase (and in Go language in general) that empty value (0) can/will be replaced by a default. After this change the value of 0 starts having a meaning rather than being an empty value.

@bharat-p
Copy link
Member Author

bharat-p commented Sep 4, 2017

Another option is we change default value for es.num-replicas to 0 instead of current 1.
cc @mh-park as he is the original author of ES storage plugin for his opinion.

pavolloffay and others added 19 commits September 5, 2017 14:31
…cing#377)

* Makefile use  GNU sed and force thrift to use the same user

* sed hack to run -i in BSD and GNU version

* try -i.bak workaround

* Add comment to makefile
* Update to Apache 2.0 License

Per https://github.com/uber/jaeger/issues/252

* update in actual files
We're expanding our production deployment of jaeger at UA!
This is required of CNCF projects.
yurishkuro and others added 25 commits February 7, 2018 15:53
Signed-off-by: Yuri Shkuro <ys@uber.com>
* Collect ES bulk metrics

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Fix tests

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
The `ProcessorConfiguration` can be exported, but the custom type
`model` and `protocol` can not, so it's hard to new a
`ProcessorConfiguration` outside the pacaage.

Signed-off-by: kun <oiooj@qq.com>
* Binding all interfaces on hotrod

Signed-off-by: Guilherme Baufaker Rêgo <gbaufake@redhat.com>

* Hotrod Containerized on Docker

Signed-off-by: Guilherme Baufaker Rêgo <gbaufake@redhat.com>

* Moving HotRod Containerized to Hotrod folder

and some minor adjustments on Dockerfile

Signed-off-by: Guilherme Baufaker Rêgo <gbaufake@redhat.com>

* Copy only Hotrod binary to Dockerfile

- exposing more ports on Dockerfile
and copy only binary to Dockerfile

Signed-off-by: Guilherme Baufaker Rêgo <gbaufake@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
The calls to Add should execute before the statement creating the
goroutine or other event to be waited for.

Signed-off-by: kun <oiooj@qq.com>
…nerator seeds (jaegertracing#718)

Signed-off-by: Yuri Shkuro <ys@uber.com>
Signed-off-by: Eundoo Song <eundoo.song@gmail.com>
…ng#726)

* Use different namespaces for Cassandra primary and archive storage
* Fix a bug where archive factory wasn't used in query/main
* Use new expvar metrics module from jaeger-lib 1.4.0

Signed-off-by: Yuri Shkuro <ys@uber.com>
…aegertracing#735)

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Signed-off-by: Louis-Etienne Dorval <louis-etienne.dorval@ticketmaster.com>
* Publish hotrod to dockerub

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Fix review comments

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Add env to dockerfile to handle collector hostport

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Use alpine

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Use scratch and entrypoint hotrod all

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Remove cd -

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Fix cd

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Add ls

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Remove bindata.go

Signed-off-by: Pavol Loffay <ploffay@redhat.com>

* Add wait until 200

Signed-off-by: Pavol Loffay <ploffay@redhat.com>
Signed-off-by: Pavol Loffay <ploffay@redhat.com>
@yurishkuro
Copy link
Member

There's a new PR #754, but the discussion above was left unresolved.

@pavolloffay Since you were doing work with ES and its configuration, do you have an opinion?

cc @manannayak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES replicas defaults to 1 even when specifying --es.num-replicas=0 when starting collector and query