~ Conference in Helsinki (T2'05 Conference) ~
(Date: September 15-16, 2005 in Helsinki, Finland); Location: Hilton Helsinki Kalastajatorppa, Helsinki: Kalastajatorpantie 1



Petit image
Back to searchlores

Fravia's talk at the T2'05 Conference
by Fravia+, August 2005, DRAFT version 1.1





ABRIDGED PRESENTATION
The web: Cornucopia of garbage





The title of my own contribution is "The web: bottomless cornucopia & immense garbage dump", and in fact, as we will see, the web is both: a shallow cornucopia of emptiness and a deep mine of jewels, hidden underneath tons of commercial garbage.

This will be a talk about contradictions. I will try to present some effective web-searching techniques, that will (should) allow anyone interested to take advantage from some of these very contradictions.

The organizers of this event have chosen to give me two slots, therefore there will be a presentation with a more "general" tone, followed the next day by a more "concrete" searching-oriented workshop.

Most talks in this kind of conferences are (correctly) incentrated on some specific aspects. This one will -instead- touch many different searching lore. I wish to present a BROAD palette of searching techniques.

Let's examine some of the most startling contradictions of the web. It isn't just a matter of curiosity: such findings may give us some clues about future developments, and -as we will see- may even help us to improve a little our searching skills: knowing WHERE TO FIND an answer is tantamount to knowing the answer itself. And today's Internet is a truly huge library without indexes: everything that can be digitized is there, from music to images, from documents to books, from software to confidential memos, it is there indeed, but where?

The stake is very high: if we learn to search effectively (and evaluate correctly our findings) the entire human knowledge will become available, at our command and disposal, no matter where, or how, somebody may have "hidden" our targets.

As you will see most searching techniques are -a posteriori- very simple. Note for instance how interesting, for searchers, a simple "softwarez" querystring can be:

+StreamDown ("wares" OR "warez" OR "appz" OR "gamez" OR "abandoned" OR "pirate" OR "war3z") ("download" OR "ftp" OR "index of" OR "cracked" OR "release" OR "full") ("nfo" OR "rar" OR "zip" OR "ace")


The web was made for SHARING, not for selling and not for hoarding, so -as we will see- its very "building bricks" deny to the commercial vultures the possibility of enslaving parts of it. This is but one of many www-contradictions.

Private and commercial databases are for instance mostly open to seekers: here an interesting list of oracle_default_passwords.

But we do not have always to resort to 'tricks': the 'real' web of knowledge is still alive and kicking, albeit unconfortably buried underneath the sterile sands of the commercial desert. This is very important for seekers, it means that we have a 'double' edge: we can exploit more or less freely all commercial repositories and we are able to quickly find the relevant scientific public ones.

A quick look at what the web looks like from a searcher's point of view may prove useful before continuing.
Some points:

Nobody knows how big the web is. Moreover there's an "invisible" (or "deep") "hidden databases" web and a "visible" (or "surface") web.
The "hidden databases" web is made out of dynamic, not persistent, pages.
The content of these searchable databases can only be found by a direct query. Such pages often possess a unique URL address that allows them to be retrieved again later, yet without a direct query, the database does not publish a specific page.
The "hidden databases" invisible/deep web is supposed to be (potentially) at least 500 TIMES bigger than the visible/surface web, and most researchers believe the visible "surface" bulk to be around 32 Billion (milliards) pages, only less than one half of it covered by the main search engines. It is still growing, albeit at a slower pace than some years ago.

All the main search engines TOGETHER cover just LESS THAN ONE HALF (and probably less than a quarter) of the "visible" web, and only scattered pages of the "hidden databases" (depending from link encountered on the static pages).
This limit is VERY IMPORTANT for searchers: it means that you should use OTHER searching techniques instead of relying on the main search engines (or, even worse, on google alone: the main search engines do not overlap that much after all).

However getting rid of the "commercial noise" is not an option, when searching effectively the web: it is a PRIORITY.

To make an example, you may concentrate your search on non commercial sites: you may for isntance find more relevant signal using specific SCHOLAR search engines (and limiting the query to the most recent months):
ddos "june | july | august | september" +2005" This is a pretty useful "ddos" query

However this is all simple "googling": seeking, once more, is NOT (or only in part) made using the main search engines.

In order to understand searching strategies, a lore which you'll find relevant for your security hobbies and for your real life as well, you have to grasp not only how the web looks like, but also how the web-tides move.
First of all the web is at the same time extremely static AND a quicksand, an oxymoron? No, just another of the many contradictions we will see today.

See: Only less than one half of the pages available today will be available next year.
Hence, after a year, about 50% of the content on the Web will be new. The Quicksand.
Yet, out of all pages that are still available after one year (one half of the web), half of them (one quarter of the web), have not changed at all during the year. The static aspect

Those are the "STICKY" pages.

Given the low rate of web pages' "survival", historical archiving, as performed by the Internet Archive, is of critical importance for enabling long-term access to historical Web content. In fact a significant fraction of pages accessible today will be QUITE difficult to access next year.

Another contradiction is the fact that VERY OLD exploits REMAIN ALWAYS VALID and can be used ("The web is a sticky quicksand").

Some exploits gathering sites could come handy when searching: metasploit, Icat and Common Vulnerabilities Exposures deserve their links :-)

Simple searching rules

Some simple rules when searching:
1. always use more than one search engine! "Google alone and you'll never be done!"
2. Always use lowercase queries! "Lowercase just in case"
3. Go regional!
4. Always use MORE searchterms, not only one "one-two-three-four, and if possible even more!" (5 words searching);


Anonymity (and lack of anonymity) reperesents one of the most startling contradictions of today's web. ISPs are now bound to keep track of ALL loggings and emails of all their users, burning them on dvds and delivering them at once, for any whimsical reason, to the powers that be.

As you all know, a typical ISP-logging logs EVERYTHING and then some.

Is it the end of anonymity?

Nope. In fact the other side of this very interesting contradiction is to be seen with any wardriving laptop.

WEP encryption is a joke, and anyone using Kismet for GNU-Linux (source code here) or Retina Wi-Fi scanner for Windoze can bypass it pretty quickly.

But teher's not even the need to bypass weak WEP-encryptions: you'll find a plethora of completely open access points everywhere. Provided you are a tag careful whit your personal data -especially when uploading- and provided you remember that THERE ARE MANY OTHER IDENTIFIERS inside your box -and not only your wifi MAC_address- you may browse the web with some amount of relative anonymity.

The main reason you should use more than one main search engine is that search engines overlap FAR less than you would think.

Hence the importance of using OTHER METHODS to search the web, and not only the main search engines. Here some hints (more about these techniques tomorrow):
1) go regional, then go regional again
2) go FTP
3) go IRC
4) go USENET/MESSAGEBARDS/BLOGS (yet remember that blogs are nothing more than messageboards where only the owner can start a thread, this being the main reason -with few exceptions- of their quick obsolescence, short duration and scant utility)
5) use homepages/rings/webarchives, cached repositories
6) use luring & social engineering
7) use stalking & trolling

Book searching is quite important for seekers, and in this context "Rapidshare" searches are worth a digression per se.

We will examine various book searching approaches, anyway at the moment -with books- even banal arrows will deliver whatever you want.

The depth and quantity of information available on the web, once you peel off the stale and useless commercial crusts, is truly staggering. I will present some examples intended to give "a taste" of the deep depths and currents of the web of knowledge...

The possibility to find whatever the human race can digitize on the web means basically means that the guardian of the light tower, the young kid in central africa and the yuppie in new york all have access to the same resources: location is now irrelevant for mankind.

Yet remember that there are not only files on the web, but also solutions. These may prove to be more and more important in the near future.

Of course, as a consequence (and last contradiction) on such an open web learning to discern CRAP (and learning to reverse advertisers' tricks) will be MORE and MORE important in the future

Your capacity of not being fooled, of understanding the rhetorical tricks will be of PARAMOUNT importance. So evaluation is the other face of the searching medal.















Petit image 2124 bytes
Petit image
Petit image
Petit image
Back to allinone
The Door
the Hall
The Library
The Studio
The Garden path