Archive

Posts Tagged ‘rdbms’

Casual mysql fail – careful with that group_concat

4 October 2010 1 comment

I was doing a minor maintenance job on a non-normalized table which required large grouping and gathering of id’s, which in turn were to be processed by other routine. Each group would returned a comma-separated list using mysql’s group_concat. Yet, some groups failed to do so, resulting with strings that ended with a comma (this was a bug that was very hard to trace, actually).

As it turns out, the group_concat resulting string has a limit! Now, let’s say for a moment that this was acceptable RDBMS design (it is not!) – the server does not throw exception on exceeding this limit, nor does it give any warning – which was the reason for some tough debugging. It simply truncated the result in a shorter list!

That is plain wrong!

In no single imaginable scenario an aggregate function is allowed to return false result. I really don’t care if you swap-up my server or wait an hour – the most important thing is that my query gets the correct data. No optimization is ever more valuable than reliability and getting the right data.

Advertisements
Categories: programming Tags: , ,

Try to use the relational database to its full potential

24 September 2010 Leave a comment

or:

How the search engines did not do for me what an RDBMS did.

It will be a quick post however.

For my latest project I have highly relational schema for a database on a PostgreSQL server. The search API steadily grew more complicated and I finally had the chance to install and utilize the Apache Solr search engine. To simplify matters for this post, let’s say that for each document I had a short title, a category, two full-text fields, and two different sets of tags. I had a pretty good idea how to set up the solr, its data-import-handler, how the delta import will work, how the schema will be designed, even a faceting strategy for my web interface. I did a lot of work for some of this, and then I got stuck.

You can not query the solr fields with: *text*.

This was a fail. It burned my time. I understand why the solr people didn’t implement such thing. As a user – I would not want to search for part of word inside words, however, this was not only something the client insisted on, the nature of the application demanded such lookups.  I had to have *text* queries.

A colleague of mine heavily insists on sphinxsearch. I am aware it has good performances, and you can also work-out different sorts of matching, *text* included. But very soon I gave up – it had a very bad support for delta imports.

Then I decided to use a brute force. I created an algorithm that put the multivalues field in one field and a method for querying it, and then put all the documents to be indexed in one big postgresql table, one row per document. Made a cron job for delta-updates on documents, and switch the search API from the complex joins to simple query of the big table. Of course, I made a whole bunch of different simple and composite indices on the big table for this occasion after the initial population.

One can not expect the same speed of querying as with solr, but the speed is satisfactory. And, most importantly, I retained the possibility to query what I like, and update in optimized manner. Finally, I suppose the point of this task was that you should not rush things and use more technologies in your product, just try to use tools you already have, in this case your RDBMS. Sometimes even the most complicated problems have very simple solutions.