Skip to content

add sync function for incremental update #81

@fishjam

Description

@fishjam

Requirements

  • current esm use scroll + bulk to migrate data between es index
  • can NOT support delete dest records when the records already delete in source index unless use force to delete the dest index first
  • it will bulk all source records to dest index even though the dest records hasn't changed ( can confirmed by GET //_stats?pretty index_total count )

Solutions

  • add --sync to scroll and compare the source and dest index records, and just index/update/delete the changed records
  • example:
 esm --sync --count=10000 --sleep=1
   --source=http://localhost:9200 --src_indexes=bank 
   --dest=http://localhost:9200 --dest_index=bank_sync

Implementation

  • Correctness and memory consumption need to be considered
  • Scroll order by _id => compare records => index/update/delete for changed records => nextscroll for src/dst

Other changes

  • add config: SortField, TruncateOutFile, SkipFields, so can dump es index to local file and compare the data easily
   esm --sort=_id --source=http://localhost:9200 --src_indexes=bank --truncate_output --skip=_index --output_file=src.json
   esm --sort=_id --source=http://localhost:9200 --src_indexes=bank_sync --truncate_output --skip=_index --output_file=dst.json
   diff -W 200 -ry --suppress-common-lines src.json dst.json
  • add some functions in esapi, ClusterVersion() and DeleteScroll(), add ParseEsApi
  • move bulk.go to migrator.go, and add some functions
  • refactor all http method(GET/Post/DoRequest) to sinle Request method, and support proxy. BTW: seems this change also fix some erro: example: in original version, try use --regenerate_id --repeat_times=10 to generate some test data, but it can not work(can not generate target index).
  • delete some commented and useless code
  • run following command to format code: gofmt -l -w .

TODO

    1. now use single goroutine to scroll for src + dest index, and insert/update/delete records, need consider performance after more check
    1. add range condition support( example: modifydate gt xxx ), so can do sync more effective

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions