Monday, May 14, 2007

Status: Improving Eclipse Search


Official name: Eclipse search plugin: providing a better, faster, more relevant Eclipse search

May 15th


Training


I'm currently implementing an Eclipse plugin (about AJAX development – currently dealing mostly with manipulating/listening DOM document and nodes in embedded SWT.Mozilla browser – trying to implement a WYSIWYG HTML document designer) as a project at school so I had (and continue having) experience with Eclipse plugin development process with our project team at school. I've read a number of chapters from the book “Eclipse: Building Commercial-Quality Plug-ins and I'm still reading.


After our initial discussions with Francois, this was a list of ideas about usability from Francois to familiarize myself with source code related with search features of Eclipse.


A few ideas for search usability
1) improve Ctrl-J search:
- Ctrl-J : have Ctrl-V paste the current ring in it
- Ctrl-J : if there's a selection, have a second Ctrl-J take this selection as search string
- Ctrl-J followed by Ctl-R would switch to regex search
- Ctrl-J to global search: transform a Ctrl-J to to a global search actually, drive all searches from the keyboard incremental search would be good: there should be a set of keyboard shortcuts on Ctrl-J to set up a complete search, and then execute it globally.
2) Display search results. the view that displays the search results is very poor currently. Probably a tree view is not perfect. It should show the matches, for example. It should have more shortcuts...
3) improve the search dialog. More TBD
4) real search/replace:
- search and replace in multiple files (all files/subset of files)
- selective (based on language elements...)


I had time to checkout and browse source codes in Eclipse CVS, find out which component does what and how it does that. However, I didn't have time to grasp the whole picture.


Problems/ Possible Enhancements to Address


Here is a list of issues from my little research in Eclipse Bugzilla about possible enhancements/problems to address:

1. find/replace search should have a tick-box for "ignore comments" (enhancement request)
https://bugs.eclipse.org/bugs/show_bug.cgi?id=161398

2. Show as package tree
https://bugs.eclipse.org/bugs/show_bug.cgi?id=160481 (enhancement request)

3. Store Previous Searches for Startup (enhancement request)
https://bugs.eclipse.org/bugs/show_bug.cgi?id=169252

4. Search dialog "Java Search" Scope should include Hierarchy (WONTFIX bug)
https://bugs.eclipse.org/bugs/show_bug.cgi?id=110252

5. New text search shown line not a great help (LATER)
https://bugs.eclipse.org/bugs/show_bug.cgi?id=127672
(This bug is waiting for applying styles on a tree item.)

6. Result in Table (LATER)
https://bugs.eclipse.org/bugs/show_bug.cgi?id=129185

These two are related:
7. Search shows duplicate results in nested projects (LATER)
https://bugs.eclipse.org/bugs/show_bug.cgi?id=144959

8. Resource exclusion filters
https://bugs.eclipse.org/bugs/show_bug.cgi?id=84988
(read together with: https://bugs.eclipse.org/bugs/show_bug.cgi?id=144959 - about duplicate results for nested folder structures)

9. Search Enhancements
https://bugs.eclipse.org/bugs/show_bug.cgi?id=108223

10. Search in files: see matched lines
https://bugs.eclipse.org/bugs/show_bug.cgi?id=72575


Francois investigated the list and commented on it and it seems to me that I can make use of / address many issues in this list, either directly or indirectly.


Thoughts about the Main Part ( Lucene, OpenGrok etc. )


I tried to keep myself focused on research at the beginning so I read about both Lucene and OpenGrok. I understood how OpenGrok achieved to be a great tool for source code browsing and cross-referencing. Since it makes use of Lucene, either way (if I use OpenGrok or not) Lucene is a building block of this project. I've read a few articles and browsed a few presentations about Lucene and I'm still trying to learn more about it.


I've been consistently asking questions to myself about index sizes and frequent updates to the index. As a basic solution, I thought of keeping a partitioned index ( e.g. A project having an index that is distributed to multiple Lucene indexes and these index files being accessed/changed when it's necessary. Shortly, we may call this partitioning the index file. ).


This absolutely has a tradeoff because of the cost of not using one index for the whole project. But, when a good number of partitions is selected, I think that the tradeoff will seem small. (I thought of computing the ideal number in the future, like the computation of ideal bucket size in file processing literature.)


I thought of this “partitioning” method because an Eclipse user can get very aggressive when we tell him that we can “linearize” the time to search his files but in order to do this, he has to keep like ~ 50 MB for the index in RAM. ( I've tried indexing different things and I think that this value is a good estimate for large projects. And by the way, a possible question is why do I have to keep things in RAM and do not write things to disc. One answer: dynamic nature of source codes and frequent updates. )


Partitioning approach gives us one advantage: we only have to index a part of the project when you are working on a set of files. Since the number of active editors is considerably less than the actual project size, it is an idea that can work. And since we can keep a relatively small index in RAM, indexes containing your active file set can be updated fast.


However, I still see other tradeoffs in this approach and it is just a start. By the way, I'm trying not to focus on algorithm-wise enhancements for now and I don't plan to implement this idea in my prototypes during the early days of coding. I plan to start coding by addressing usability enhancements first.


I think early search plugin prototypes will makes use of RAM – HDD data transition features of Lucene and just write the index to a file on the disc after a certain threshold.


Todo List for Rest of the Interim Period


Well, I didn't have time to update the Wiki page of my project but I'll do it in 1-2 days.

Second thing that I feel I have to do before interim period ends is implementing little usability ideas from the list of Francois.


That's all for now.

If you have related ideas please feel free to send an e-mail. I'll be really glad to hear ideas about usability issues or Lucene / OpenGrok etc.

3 comments:

Fred said...

I started using Google Coop customized search engines that I embed in Eclipse Help as subject searches..

see my list of:

http://shareme.tiddlyspot.com/#DeveloperMashup

I just type test in search box and use the resulting url to put into eclipse help search advance web search settings after I modiiy it.

But its temporary as it doe snot allow the other goole oprators to search a site/domain and etc..

aka shareme:

http://www.jroller.com/page/shareme/Weblog

firesteel said...

Hey I'm really glad you're looking into this subject. I'm writing code using both Jedit and Eclipse. I could ditch jedit in favor of eclipse, except that it has really good searching. It does a lot of what you mentioned here. The xsearch plugin for Jedit is even better. I heavily use the hyper-search feature. The only improvement I could think of would be to have the search term high-lighted in the hyper-search window.
It would be so sweet to have jedit's xsearch functionality in eclipse. Jedit plugins are written in java, eclipse plugins are written in java, maybe it's not a huge job to someone who knows what they're doing.

AJ said...

btw:
http://marketplace.eclipse.org/content/InstaSearch