Handling large number of indexing requests
I have an application that takes a bunch of information from a MongoDB database and indexes it into Elasticsearch. It's a quite large database, and I have indexing requests throttled in order to not overload the Elasticsearch cluster. However, if I throttle it too much the full re-indexing takes forever (+ 3 days). Therefore, I throttle the indexing but also modify the cluster settings slightly to allocate more threads and a larger queue size for indexing requests.
The Sense developer console inside the Marvel plugin is great for doing these type of modifications.
In my experience, if I don't do this, even with a throttled application, I'll get the following exceptions:
Caused by: org.elasticsearch.ElasticsearchException: Unable to execute index operation on cluster Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@263666cd at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62) ~[elasticsearch-1.0.0.jar:na]
You could also use the following curl commands:
curl d '{ "transient": { "threadpool.index.size":16 } }' curl d '{ "transient": { "threadpool.index.queue_size":1000 } }' curl -XGET 'localhost:9200/_cluster/settings'