Apache Solr 6.3.0 released

quickben · on Nov 12, 2016

After spending a week evaluating if we are to pick Solr or Elastic, here are some of the more interesting conclusions:

- Solr comes with optional security config; Elastic charges for Shield. At the time of the evaluation, Shield pricing wasn't online (and there were hints online it starts at 1600/year and goes upwards from there).

- Elastic can have a 'split brain' from time to time, and some data can be lost.

- Elastic is more user friendly, but that comes at a price of installing Kibana + other software. Solr is all-in-one.

- Both will let you shoot yourself in the foot with dynamic fields and schema, but if you go the hardcoded schema way, solr is easier to setup.

- Solr has DataImportHandler, which makes configuring the entire DB import from an existing database, 2 minutes job once you know what you are doing. The import seemed to be doing 50k records / second, and it can also do partial syncs if things have changed.

- Performance wise, they are both overkill if one is staying away from managed schemas and provide decently beefed-up VMs. But Solr+DataImportHandler, gives you one more option (aside from zookeeper) for cheap scaling. If you have mostly read heavy instances, you can put separate VMs behind a load balancer to ensure near 100% uptime and robustness, with the minimal fuss of clicking a button once a month to add the deltas to the separate VMs.

aaroninsf · on Nov 12, 2016

Having inherited an ES cluster shortly after it replaced Solr, and built out many clusters since then, I can comment that the 'split brain' problem which attracted a lot of attention 2-3 years ago thanks to the infamous Jepsen 'Call Me Maybe' torture test is effectively a non-issue in a well-configured environment, today.

Theoretically possible, perhaps; in practice, just not an issue. Our initial cluster suffered from it thanks to improperly having been run across data centers with default settings of various kinds.

Once I got that under control, zero recurrence despite our clusters living in what I like to call a 'hostile network environment.'

I am not a fan of the 'Zen' cluster management system they adopted; but it does work and Elastic has put a ton of work into cleaning out the corner cases.

For my purposes,

One of the distinctions I would say is most interesting today is that Elastic the company has been aggressively chasing the analytics market, at the expense of developing the text/freeform search capabilities of ES.

At Elasticon last year there were some interesting talks that touched on this in various ways; and I was part of a group of people who traded chagrin that Elastic does not have a strong user community in this area.

Instead they have put a lot of work on their 'stack' and data shippers and cloud deployment.

Solr on the other hand has continued to move forward and for an environment in which the primary application is going to be free-form text search (vs. say structured metadata, or analytics) it is probably stronger.

Fwiw I find deployment, configuration, inspection, etc., all refreshingly clean. Imperfect, sure, but the ES team did have the advantage of starting with a blank page in a lot of respects. It shows.

(Currently running 6+ clusters, 100+ nodes, doing metadata and full-text and a small amount of analytics)

ksec · on Nov 12, 2016

I read this as basically a Win to Solr?

quickben · on Nov 12, 2016

They are very very close. The differences are minute at this point. Unless you are doing some heavy relational algebra, it will be hard to distinguish real world results for simple queries.

In the end I chose Solr because it comes with 'ecosystem' of semi-related projects that you can pull on (spark, ambari, etc etc).

On a sidenote, I'm personally more comfortable with forkable sourcebases, than singular companies with paid plugins. I've been too long in this industry and have seen all kinds of dirty tricks pulled off.

emmelaich · on Nov 12, 2016

For us the authentication options might tip it. Also the ecosystem point is important; we're using Spark/Avro/..

sheeshkebab · on Nov 12, 2016

I'm using both Solr and Elastic right now (on different projects) - second on those items above.

One other plus for Solr is it has a flexible replication and consistency model, out of the box - much better fit for large clusters and multi data center deployments (or multi region high availability AWS deploys).

Elastic lacks in replication area (although something appears to be coming soon, via again their paid plugins).

IndianAstronaut · on Nov 12, 2016

How do you handle the data for Solr with AWS rehydration? We are constantly taking backups of the data into S3 to then use to put data back into Solr.

janhoy · on Nov 18, 2016

I think what made ES particularly attractive in the beginning was

- Easier (linux-pks) install and setup a small cluster

- "Just throw some data at it"

- JSON and pure REST

While ES is still more compact and faster/easier to install, Solr has of course grown up with robust SolrCloud scaling, start scripts, schemaless mode and all the config APIs.

And I think the change of mindset from files to APIs is crucial. Your search engine should be like a DB, just install it and then start using it from a client. Your schema stays close to your data in your app where it belongs.

A few next important steps for Solr which are being worked on are

- New set of APIs /v2/ that gives a more unified experience

- More analyzers/tokenfilters/components configurable via APIs

- Easier getting-started with basic auth and SSL, add LDAP and more

- Mature the streaming/SQL stuff

All in all, the competition drives both products to new hights every year!

Disclaimer. I'm a Lucene/Solr committer and consult on both products

StreamBright · on Nov 12, 2016

Just by experience, be careful with DataImportHandler, if you have many joined tables it can surprise you sometimes with weird behaviour. We ended up building views for just having a simple table to index with Solr.

emmelaich · on Nov 12, 2016

Thanks, excellent review.

jonbaer · on Nov 12, 2016

Good (and fair) article on comparison ... https://dzone.com/articles/10-reasons-to-choose-apache-solr-...

coredog64 · on Nov 12, 2016

Is there anything like Kibana for Solr?

jstrassburg · on Nov 12, 2016

Yes, a port called banana: https://github.com/LucidWorks/banana

IndianAstronaut · on Nov 12, 2016

Another interesting use case for Solr is to use it as a prebuilt backend with a REST API for a data based web service/dashboard. We have had quite a bit of luck doing this and getting our service up and running was quite a bit faster than trying to build out our own APIs.

brian-armstrong · on Nov 11, 2016

I'd be curious to see the distribution of Solr vs Elasticsearch in production. Anyone know how you would gather that statistic?

fusiongyro · on Nov 11, 2016

As someone who uses Solr in production and doesn't blog about it, I would estimate that happens a lot. :) Maybe you could look for Blacklight installations? http://projectblacklight.org

arafalov · on Nov 12, 2016

And Cloudera, and DataStax Enterprise, and IBM something, and Adobe Experience Manager that don't wave Solr flag about unless you explicitly look and therefore don't show up in the easy stats. And Drupal, and Typo3, and.....

gglanzani · on Nov 12, 2016

And Riak!

mahmoudhossam · on Nov 12, 2016

I think StackShare should provide something close to what you're after: ES: http://stackshare.io/elasticsearch Solr: http://stackshare.io/solr

brian-armstrong · on Nov 12, 2016

I hadn't seen this before and after looking at it I suspect there's some selection bias going on. There probably aren't as many new Solr instances going out but I suspect there are still many legacy installations

user5994461 · on Nov 12, 2016

Google trends:

https://www.google.fr/trends/explore?q=elasticsearch,apache%...

jle17 · on Nov 12, 2016

Same query but changing "apache solr" to "solr" shows much less difference : https://www.google.fr/trends/explore?q=elasticsearch,solr,lu...

user5994461 · on Nov 12, 2016

I'm afraid that "solr" is a common 4 letter words that may refer to all kind of things not related to apache solr.

elygre · on Nov 12, 2016

What kind of things would that be?

(I just did a google search for "solr", and all results on the first four pages referred to the engine, so I am curious.)

user5994461 · on Nov 12, 2016

Stuff related to "solar" [energy] and any variation of that. 4 letters words are always popular ^^

You can't judge by the first page of results anymore. Google is tweaking the results individually, as per your "programmer profile". Try doing a search on someone else computer.

Thaxll · on Nov 12, 2016

Elasticsearch is more popular than Solr nowdays.

arafalov · on Nov 12, 2016

Is that in absolute install number (see my other thread's comment), in % of growth number (from 0-base for ES as of only a several years ago), or the excitement for early features of ElasticSearch that have been disabled or rolled-back over the last couple of years.

I am somewhat biased because I am a - recent - Solr committer. But I also did try to do a very balanced comparison presentation two years ago: http://www.slideshare.net/arafalov/solr-vs-elasticsearch-cas... (kind of popular, 49k views...).

Solr - still - is more flexible than Elasticsearch by several degrees of magnitude. But Elasticsearch has a better vertical integration story, especially for analytics. And size (it IS small), because it does not ship OOTB with documentation (online), full examples (any?), Admin UI (plugin), rich-text extraction (plugin), data import mechanism (with rivers gone), etc.

If you don't need those, the choice is much more close. And many people do not need those. The problem is that most of the people don't know what they need and don't need when they do the evaluation, so they look at the most surface parameters.

Solr 6 has good startup scripts, (overly) comprehensive examples, REST APIs, well maintained 700 pages manual, etc. Some of these features are a bit harder to discover due to legacy information also being very visible. But it is there.

IndianAstronaut · on Nov 12, 2016

As a commiter, are there plans to improve the SOLR SQL interface and add features to make it more in line with standard SQL? Also, any plans on adding search syntax to the SQL?

arafalov · on Nov 12, 2016

Not by me, but the SQL interface is the bleeding edge of Solr. If the last time you looked was a month ago, look again. The feature set probably doubled since. And, for this, don't rely on documentation, it is not complete yet. Rather, check the Lucene Solr Revolution 2016 slides/videos and search Jira for SQL keyword. And if you have solid use cases, feel free to create a feature-request Jira or even help to test and/or develop it.

donretag · on Nov 12, 2016

It depends on which context.

Elasticsearch has abandoned their search engine origins and embraced logging and analytics. Those that work in NLP and text processing are still using Solr. The edmismax parser is far more flexible and you can easily create your own query parser.

Elasticsearch excels at ease of use and supporting multiple programming languages via various clients. Solr is slowly moving away from XML, but cryptic query parser syntax remains.

conradfr · on Nov 12, 2016

I went to Lucene/Solr Revolution 2016 in Boston last month and it was great, there's a lot good things happening. I can even tolerate the "Java/xmlisation" of Solr nowadays ;)

Some colleagues went to an Elastic conference and felt there was foremost in a sales pitch than anything.