forked from hoffoo/elasticsearch-dump
-
Notifications
You must be signed in to change notification settings - Fork 266
Closed
Description
Requirements
- current esm use scroll + bulk to migrate data between es index
- can NOT support delete dest records when the records already delete in source index unless use
force
to delete the dest index first - it will bulk all source records to dest index even though the dest records hasn't changed ( can confirmed by GET //_stats?pretty
index_total
count )
Solutions
- add
--sync
to scroll and compare the source and dest index records, and just index/update/delete the changed records - example:
esm --sync --count=10000 --sleep=1
--source=http://localhost:9200 --src_indexes=bank
--dest=http://localhost:9200 --dest_index=bank_sync
Implementation
- Correctness and memory consumption need to be considered
- Scroll order by _id => compare records => index/update/delete for changed records => nextscroll for src/dst
Other changes
- add config: SortField, TruncateOutFile, SkipFields, so can dump es index to local file and compare the data easily
esm --sort=_id --source=http://localhost:9200 --src_indexes=bank --truncate_output --skip=_index --output_file=src.json
esm --sort=_id --source=http://localhost:9200 --src_indexes=bank_sync --truncate_output --skip=_index --output_file=dst.json
diff -W 200 -ry --suppress-common-lines src.json dst.json
- add some functions in esapi, ClusterVersion() and DeleteScroll(), add ParseEsApi
- move bulk.go to migrator.go, and add some functions
- refactor all http method(GET/Post/DoRequest) to sinle Request method, and support proxy. BTW: seems this change also fix some erro: example: in original version, try use
--regenerate_id --repeat_times=10
to generate some test data, but it can not work(can not generate target index). - delete some commented and useless code
- run following command to format code:
gofmt -l -w .
TODO
-
- now use single goroutine to scroll for src + dest index, and insert/update/delete records, need consider performance after more check
-
- add range condition support( example: modifydate gt xxx ), so can do sync more effective
Metadata
Metadata
Assignees
Labels
No labels