1.Preface
In China, Pirate Party, as hot as The Communist Party, can get some resources like movies, books, tv series, easily and free, without danger because of some reason.
And there are too many ways to get them, such as http, ftp, bt, ed2k and too many tools to work in protocol as refer. However, one of the worst things is how to find the links.
BT and EMule is the most popular, because it works by p2p tech (peer to peer, a computing or networking distributed application architecture that partitions tasks or workloads among peers)
DHT is more advantaged architecture, very interesting, which has two key feature:
a, distributed, peer to peer
b, decentralized
decentralized (no boss, democracy) and distributed (every small person involved in the jobs, but the team, the big team is great, beyond your imagination)
Some product list is like this:
2.Magnet Searcher Engine
A search engine should have three parts:
a, data
The key module, you have to own the magnet links.
Searcher engine, like google, baidu? yep ,one way.
The greater way is get them by dht crawler which i will give the desciption later.
b, index & weight
Making the index, for latter search more quickly.
Give every link some weight, and it will be used in rank
c, search & rank
the service for keyword search rank it some post process, like making unique, making some special format better rank, etc
3, Data Module - DHT Crawler
It works in THREE steps: getting id, detail and file
a, get the source id
According to DHT procotol, you can pretend a p2p client, get all info by peers’s “get_peers” package
The more code: https://github.com/arekyao/dht-crawler
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
b, get the detail info, like whether availble
Send “announce_peer” package to peers, to announce that:
You wanna the resouce, bla bla
c, get the torrent file, and the desciption
with the infohash, you can easy to get the file by this:
“http://torrage.com/torrent/” ++ MagHash ++ “.torrent”
“https://zoink.it/torrent/” ++ MagHash ++ “.torrent”
http://bt.box.n0808.com
1 2 3 4 5 6 |
|
Yep, you already have the torrent files, decode the files, and you can get more info.
4, Index module & Rank module
It’s typical search engine modules.
Easy to get lots of infomations on Internet.
5, About DHT protocol
Four Query Operation:
-
ping: check a node alive
-
find_node: find some node
-
get_peers: get some resouce and querying by id
-
announce_peer: announce downloading some resouce.
Reference
- http://blog.csdn.net/liweisnake/article/details/9207919
- http://codemacro.com/2013/07/02/dhtcrawler2/
- http://gobismoon.blog.163.com/blog/static/5244280220100893055533/