Handling large number of indexing requests

I have an application that takes a bunch of information from a MongoDB database and indexes it into Elasticsearch. It's a quite large database, and I have indexing requests throttled in order to not overload the Elasticsearch cluster. However, if I throttle it too much the full re-indexing takes forever (+ 3 days). Therefore, I throttle the indexing but also modify the cluster settings slightly to allocate more threads and a larger queue size for indexing requests.

The Sense developer console inside the Marvel plugin is great for doing these type of modifications.

In my experience, if I don't do this, even with a throttled application, I'll get the following exceptions:

Caused by: org.elasticsearch.ElasticsearchException: Unable to execute index operation on cluster
Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1@263666cd
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:62) ~[elasticsearch-1.0.0.jar:na]

You could also use the following curl commands:

curl -XPUT 'localhost:9200/_cluster/settings' -d '{ "transient": { "threadpool.index.size":16 } }'

curl -XPUT 'localhost:9200/_cluster/settings' -d '{ "transient": { "threadpool.index.queue_size":1000 } }'

curl -XGET 'localhost:9200/_cluster/settings'

elasticsearchFranklin AnguloMay 17, 2014elasticsearch, indexing1 Comment