Project Log
I'm in the midst of the information gathering stage:
For Wednesday 9/11/02:
I've read about some of the basics of query optimization in Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke.
I've also read various internet articles about the basic concepts behind genetic algorithms.
For Wednesday 9/18/02:
I've read quite a few journal articles, though there's still a couple--especially invovling genetic algorithms and their implementation--that I'd like to look through before next Wednesday.
I've written up the current list of sources that are a part of my bibliography (which can be found on the main project page). I am also trying to prepare for the presentation of my survey of query optimization and genetic algorithms, as well as working on the written survey due one week later.
For Wednesday 9/25/02:
I've read several more articles on genetic algorithms, and a couple more on query optimization. I've updated the bibliography section of the main project page accordingly.
I have prepared for and given the presentation of my genetic algorithm and query optimization topical survey. All in all, it went pretty well, although I should have made use of pre-drawn figures and pictures.
Now I need work on finishing up the written version of the topical survey.
For Wednesday 10/02/02:
I have finally completed the written survey. It took a lot of work, but I'm happy with the way it turned out. I've posted links to the survey on the main project page.
Now I need to try to find some papers specifically concerning the implementation of query optimizers using genetic algorithms. Previously, I had been avoiding such papers so that I could gain an understanding of the basics of these topics on their own.
I also need to start making some decisions about how exactly I should begin implementing the project. Specifically, I need to consider in concrete terms how I can represent the plans in the genetic algorithm so that reproduction, crossover, and mutation are easily implementable.
For Wednesday 10/09/02:
I found some papers specifically concerning the use of genetic algorithms for database query optimization. Now I just need to read them and see if they give me any ideas. Specifically, it would be useful to see how other people have encoded the query evaluation plans for use by the genetic algorithm--a sting of bits is just cannot possible contain enough information.
For Wednesday 10/16/02
I've read all the papers I found last week, and they have proven quite useful. The article by Bennett, Ferris, and Ioannidis has been especially helpful both in terms of offering possible ways of encoding plans for the genetic algorithm, and in terms of possible ways of implementing the genetic operators.
I'm trying to spend some time this week searching for genetic algorithm libraries. I'm sure that such libraries exist, but I don't know if any will be helpful to me. I'm betting that they will be generalized to the point of not being useful; that is to say, it would actually be easier for me to build the genetic operators from the ground up. Still, I think it will be useful to at least look around.
The specific focus of my project is starting to come together (which is a good thing, seeing as the project proposal is due in a couple of weeks), and I think this weekend-long break will give me some time to really compose and solidify what my objectives are and how I will achieve them.
For Wednesday 10/23/02
I've looked for and found several genetic algorithm libraries for C and C++ and it seems that the best one is GAlib. This library looks like it is really powerful and probably relatively easy to use, but nevertheless I've decided to abandon the whole library route and just tey building my GA from the gound up. My reasoning is that I know I'll need to spend lots of time fiddling with the internals of the GA in order to fine tune it, and I think it would be best if I was the one that actually who implemented (and so had an intimate knowledge of the effective knobs and dials) the algorithm. I'm sure I could fine-tune my algorithm even if I used GAlib, but my was seems more direct and makes me more comfortable.
I've been trying to get to work on setting up the projecrt environment that I will work in, but lots of little things keep getting in my way--little things that seem to be out of my control. Things such as where to park my coderoot, where to set up the PostgreSQL instance, enabling me with the correct permessions in the correct places. Its kind of frustrating, but I know it will all get sorted out eventually.
For Wednesday 10/30/02
Now that jumped through the necessary hoops, my major goal so far this week has been to set up and configure an appropriate environment in which to test and implement my project. This has involved setting up a PostgreSQL instance and configuring it to fit my needs (i.e. outputting the parse tree and evaluation plan for each query, enabling TCP/IP connections to the postmaster, etc).
I've also been working on writing a little PERL script that will enable me to quickly create or delete numerous tables in my postgres database. I've got the skeleton for this script working, and for the next phase I need to consider how I am going to configure the schema in terms of the key relationships and the indexes. I also need to think about what kind of data I should insert, and how I should generate such data. Once this part is complete, I should be able to easily generate flexible and useful test data. When I get to the final stages of my project, however, I may want to spend some time developing (or finding someone else's) realstic schemas and data sets.
Finally, I need to stop procrastinating and just get to work on my proposal (both the write-up and the presentation). I've been thinking about the necessary components for weeks now, so I've got lots of stuff swimming 'round my head, but I need to take some time to make these ideas concrete and flesh them out. Hopefully the presentations this week will give me a good idea of how I should focus my ideas.
For Wednesday 11/6/02
I've been spending a lot of time this week working on my written project proposal and the presentation as well. It has been very constructive to put this information down concretely--especially in terms of how I will implement my GA encoding and how the GA operators will interact with the encoded representation.
This week I've also really put some time into creating a script that will help to manage the test data that I need to maintain. Creating useable schemas with over 20 relations, and then being able to fill them with pertinent data is not a trivial problem. Hopefully this script will give me some flexibility in this regard--especially when I start testing, when I need to be able to do some fine-tuning with the data.
Decoding the PostgreSQL plan syntax has also been one of the focuses this week. This also is not a trivial task, since the syntax is very comlpex and arcane. I've basically been feeding ad hoc queries into PostgreSQL and examining the plan output, hoping to be able to find (and later parse out) the information I need. It's a painstaking process.
For Wednesday 11/13/02
My presentation went well, I think, and I'm happy with the way my paper turned out. Now I just need to get down to business and start really getting work done.
I've added some new links to my proposal-related documents; they're on the main project page.
I talked with Charlie recently, and he suggested that I may want to change how I represent the query plan in terms of the GA. Specifically, my current way may be too complex to be able to properly assign cost to the plans. I'd really like to use the encoding I developed for my project proposal, but I need to think about how cost/fitness will work in terms of this method.