2021-03-31

SolrCloud Time Routed Alias Architecture

This is a broader question around building the architecture for SolrCloud Time Routed Alias application. I'm using SolrCloud to ingest time-series data on a regular basis and have SolrCloud running in a Kubernetes Cluster. A Solr node gets attached every time we add a new Pod to our cluster.

Since I'm trying to use Time Routed Aliases, it creates a new collection with preemptive calculation and currently places them across the Solr pods based on how much free-disk space is available in a pod, so new pods will get selected for shard placements whenever a new pod is introduced.

However, I would like to design a solution where we can avoid hot-spotting Solr nodes by distributing the shards across older pods and yet still maintaining SolrCloud architecture that grows in size as data is ingested every day.

I'm unsure what the best configuration would be at a collection/cluster level based on the available policies in https://solr.apache.org/guide/8_6/solrcloud-autoscaling-policy-preferences.html

I'm currently creating collections at weekly-intervals and my use-cases involve searching across data at least 2 weeks old. Because ingested data will be placed in newer pods, my client side facing applications will be bombarding the newer pods every time.

What level of configuration on a collection/alias/cluster level should I use in order to avoid hot-spotting?



from Recent Questions - Stack Overflow https://ift.tt/31vC0uZ
https://ift.tt/eA8V8J

No comments:

Post a Comment