Friday, December 30, 2005

Going ahead with replication and GIFT/SIFT routing

My goal is to finish my Phd till March 31st. No excuses. Whatever it takes, finish it FAST...
--------------------------------------------------------------------------------------------------------------------------------------

I think I have got the Replication mostly worked out. Things to do mainly are:

1. We need to mangle it so that the new node falls in the same group !!. This is important. So we basically mangle the bits except the group number.

2. Test by insterting more documents than just the slashdot stuff. Insert lets say upto 350 documents and then see what happens !!

3. Need to change the reply system so that it also looks at the replica tables to return replies....

-----------GIFT routing-----------

After that add auto GIFT update capability based on caching...

1. When doing normal node routing, insert a answer bit along with home node ip. This way everytime the node routing is done, it returns its address to the initial node so that it can add it to its GIFT table !!

2. After that, whenever we need to go for a keyword search or store, we can directly use the GIFT routing to go the right group and then jump to the right node in the group...


------------------------

Is the replication period same for all nodes ??? It doesn't make sense to replicate if lot of new information or visits haven't been done...Keep
a visit counter and execute replicate based on number of visits to node....Formula => visits =

OPTIMIZATION : Adaptive replication : each node has a different replication schedule based on number of updates to the node....

----------
Testing for replication

Wednesday, December 28, 2005

Log: Complete replication in Magnolia

Main ToDo for today:
Till 10 PM.

1. Replication :
Each node periodically ( period depends on rate of failure I guess or rate of updates for documents/traffic ), copies a chunk of its documents, rehashes their id ( gets a 2nd/3rd etc hash id and just sends all the document information to the next node in the group....corresponding to that hash id.

The issue is how to compute the hash ids: do we do it individually for each document ? That would be quite expensive.

It is important to tie the replication to the documentId, not to the node, because the node that stores the document can change frequently.

Each node should have its independent replication period which changes according to number of updates/traffic and failure rate. Eventually...

* Implemented the replication function where the chunk size dynamically depends on the silbing table size....
* Now need to test it. start its events and see what happens, how does the replication go...


List of parameters ( more ) :
1. Query sibling size : How many siblings to send query to for each forwarded query....



2. Gift/Sift



----------------------LONG TERM------------------------
1. The fingertables generation currently is centralized. Not acceptable !! Need a join function with finger table entry update.
For the paper it may not matter, but is important for the final implementation....

Tuesday, December 27, 2005

Working on replication aspects

ToDo for today
1. Fix the bugs in document placement.
2. After the bug is fixed , start replicating the document on the local node to the next replication node, based on popularity....


-------------------------------------
The current problem I am facing is that I am getting seperate document keys for the same document, something is really going weird in here.
What are the possible problems:
1. I am not reinit the seed whenever I get the random value for the document. That must be the issue there...Fix that...
Implemented Push/Pop Mother Stack
Still debugging

2. Fixed the problem of different assignments to same document for same group : the sibling table formation for each node was not complete....
3. Fixed another bug: When sending to other groups for all keywords for current document, need to exclude my own group !!.
4. New bug: Global popularity update issue: why is it updating different document ids for same document ??

Later on things:
-------------------
Search for deals for Dell
or Mail Peter if he has an extra machine mainly for internet access.




Friday, December 23, 2005

What are the main things left now ?

How do we do the visits ?

One idea is to just visit one of the keyword groups. Then that keyword group decides to forward the query further to the global node or other keyword groups, that is its choice.... Till 1:30 today

I am facing two bugs here:
1. First of all the chosing a random keyword for first time storage somehow always chooses the same keyword ! ( Stuff)
2. For the same keyword and document title, the store request is going to a different node in the relevant group for that keyword.

-----------------

The issue I am facing now is that why doesn't the random number generator work ? Is there a memory leak somewhere ??

Problem resolved: In the keyword hash function I was reinit the Mother random generator each time ! which caused the random number to be re-seeded each time !!. Subtle bug

--------------
The main steps for today are:
1. Check if the routing to home node and other keyword group nodes works, when the keyword is inserted for first time for a particular document.
2. After that, add the replication aspect so that for its chunk of data for each node. Depending on the popularity of the document, each keyword group replicates the document in its group...periodic replication...

--------------

The parameters we have in the system are:
1. Size of the group
2. Rates for discovering siblings, updating successors etc
3. Replication update rates.
4. How is local data broken into for replication into the local keyword group.
5. The popularity information pushed to the global document node and back : its rate




Monday, December 19, 2005

this is an audio post - click to play

Thursday, June 02, 2005

Repository for Magnolia

I have checked in all the stuff.
 
1. The Magnolia repository can be checked out from :
cvs -d :ext:you@skysaw:/home/P2P/CVS_P2P co magnoliapaper
 
It contains the
a. current paper (explains how Magnolia works, though someone needs to read it to figure out if its clear and what is not clear),
b. NSDI Proposal
c. Current simulator code in code directory
d. Sketch code in sketchcode directory
 
 
2. The Search site DoubleTrust (www.doubletrust.net) is also checked in the same repository.
It contains all the HTML, clipart, cgi files
 
To check it out:
cvs -d :ext:you@skysaw.cs.northwestern.edu:/home/P2P/CVS_P2P co DoubleTrust
 
See you at the meeting tomorrow !
 
regards
Ashish
 
----- Original Message -----
From: Peter Dinda
To: Ashish Gupta
Sent: Thursday, June 02, 2005 4:11 PM
Subject: Magnolia, etc

Ashish, can you do the following for me:
 
1. put the magnolia nsdi proposal (the one pager) into the p2p repository and tell me where it is.  I'm writing a small proposal to fund both this and dpi and I want to use some of the text.  
 
2. tell me where the in progress magnolia tr is.  We want to give a copy of this to the students tomorrow morning.
 
3. check in your magolia simulator and the sketches code into the p2p respository.
 
4. check in your search site code---this is very cool - I think it will save a lot of time.
 
Peter
 

This page is powered by Blogger. Isn't yours?