Skip to content

Duplicate Items in Gregarius. Duplicate Items in Gregarius.

Duplicate items in Gregarius.
The problem with duplicate items showing up in Gregarius, which was troubling me earlier last month has started again. As I mentioned on the Gregarius developers' mailing list, the problem is ridiculously hard to debug, since it occurs in a seemingly random fashion, and is only happening to me.

Part of me thinks that this is one of the worst possible times for this to happen. Since I should be packing right now (or more accurately, re-packing, since everything is already in boxes - it is just not sorted into things I want to take back to school and those that will moulder here until next June), I do not really have time to mess with the Gregarius code to see if I can find the actual cause of the problem. Even if I did have about 6 hours to spare, I think it is exceedingly likely that I would find that there is no problem in Gregarius, but that my mySQL server is at fault.

Another part of me thinks this is one of the best things that could happen. Since this problem changes my use of Gregarius from a thoroughly enjoyable experience to one that is filled with annoyances (my inner perfectionist insists that I must manually remove all duplicate items from my Gregarius database[1]), this means that I will pack instead of using Gregarius. This is a resolution that my inner Luddite-hermit rejoices at, because packing means I will spend more time communing with my self instead of following your stories of the trips you are taking to concerts, Broadway musicals, Taiwan, and gas stations with ridiculously good tire pumps.

Update: I forgot to mention that the first time the problem occurred, it stopped when I emptied my MagpieRSS cache. The second time, it disappeared after I thought I stopped MagpieRSS from caching pages (which is bad because it wastes bandwidth, so you should not do it), but later realized that I had simply moved the caching location. I am not sure whether this means the problem is with MagpieRSS, though, since it does not seem to be fixed when I empty the MagpieRSS cache by removing all of the items in it.

[1]It occurs to me that I could write a plugin that provides a link at the bottom of each item to delete it. It also occurs to me that if "being productive" means packing, and not working on Gregarius, this is a Bad Thing.

RSS feed | Trackback URI

4 Comments »

Comment by Sameer
2005-09-01 10:31:16

Ah I feel your pain. It should be possible to write an sql query to get rid of duplicates. A google search brings up some examples.

BTW, I have this problem in Safari RSS as well. It is quite random. Could it possibly be because some feeds are screwing up things?

Comment by Martey
2005-09-02 09:53:41

I thought the problem might be with the feeds at first, but whenever I look at them, they seem fine (no duplicate items). Of course, it is possible - I should probably insert a bunch of logging code into my copy of Gregarius in order to properly debug this.

 
 
Comment by Zhasper
2005-09-04 22:28:54

Actually, I've been seeing this too

I've been assuming it's just my fault though. It happens reliably to certain feeds - eg, any bugs on trac cause two entries in gregarius.

I've been assuming I've subscribed to the same feed twice somehow.. I should check into that, I guess..

I believe if the title changes length gregarius will believe it's a new post - maybe people are removing whitespace from the end of a title and thus changing the length?

Comment by Martey
2005-09-05 00:14:27

I know that the Trac feeds generate duplicate items on a regular basis - some will have URLs in the form of http://svn.gregarius.net/projects/gregarius/trac/ while others use http://svn.gregarius.net/trac/. Since Gregarius checks for duplicate items by checking to see if any older items in the feed have the same URL, it does not detect these duplicates.

The duplicates that I have been seeing do not seem to be the result of publishers updating errors or making changes in their feeds. The duplicate items appear next to each other in the MySQL table (when ordered by ID), suggesting that they were inserted one after the other. As a result, I suspect that either certain feeds are occasionally being parsed incorrectly, resulting in singular items being parsed twice, or certain edge conditions are causing items to be inserted into the database twice. I worry that if the latter is the case, the problem may not be Gregarius at all, but my MySQL server.

Sameer and I both use Dreamhost, and we both started seeing the problems after our servers were upgraded to Sarge. Since the only configuration change we have found so far was a change in the mbstring PHP module, it is possible that this is some weird string handling problem.

 
 
Name (required if not using OpenID)
E-mail (required - never shown publicly)
URI or OpenID
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.