
"The World Wide Web" and "WWW" redirect here.
The World Wide Web (commonly shortened to the Web)
is a system of interlinked hypertext documents accessed via the
Internet. With a Web browser, one can view Web pages that may contain
text, images, videos, and other multimedia and navigate between them
using hyperlinks. The World Wide Web was created in 1989 by English
scientist Tim Berners-Lee, working at the European Organization for
Nuclear Research (CERN) in Geneva, Switzerland, and released in 1992.
Berners-Lee played an active role in guiding the development of Web
standards (such as the markup languages in which Web pages are
composed), and in more recent years advocated his vision of a Semantic
Web.
Many countries regulate web accessibility as a requirement for web sites.
How it works
Viewing
a Web page on the World Wide Web normally begins either by typing the
URL of the page into a Web browser, or by following a hyperlink to that
page or resource. The Web browser then initiates a series of
communication messages, behind the scenes, in order to fetch and display
it.
First, the server-name portion of the URL is resolved into
an IP address using the global, distributed Internet database known as
the domain name system, or DNS. This IP address is necessary to contact
and send data packets to the Web server.
The browser then
requests the resource by sending an HTTP request to the Web server at
that particular address. In the case of a typical Web page, the HTML
text of the page is requested first and parsed immediately by the Web
browser, which will then make additional requests for images and any
other files that form a part of the page. Statistics measuring a
website's popularity are usually based on the number of 'page views' or
associated server 'hits', or file requests, which take place.
Having
received the required files from the Web server, the browser then
renders the page onto the screen as specified by its HTML, CSS, and
other Web languages. Any images and other resources are incorporated to
produce the on-screen Web page that the user sees.
Most Web pages
will themselves contain hyperlinks to other related pages and perhaps
to downloads, source documents, definitions and other Web resources.
Such a collection of useful, related resources, interconnected via
hypertext links, is what was dubbed a "web" of information. Making it
available on the Internet created what Tim Berners-Lee first called the WorldWideWeb (a term written in CamelCase, subsequently discarded) in 1990.
History
The
underlying ideas of the Web can be traced as far back as 1980, when, at
CERN in Switzerland, Sir Tim Berners-Lee built ENQUIRE (a reference to Enquire Within Upon Everything,
a book he recalled from his youth). While it was rather different from
the system in use today, it contained many of the same core ideas (and
even some of the ideas of Berners-Lee's next project after the World
Wide Web, the Semantic Web).
In March 1989, Berners-Lee wrote a
proposal which referenced ENQUIRE and described a more elaborate
information management system. With help from Robert Cailliau, he
published a more formal proposal for the World Wide Web on November 12,
1990. The proposal was modeled after EBT's (Electronic Book Technology, a
spin-off from the Institute for Research in Information and Scholarship
at Brown University) Dynatext SGML reader that CERN had licensed. The
Dynatext system, however technically advanced (a key player in the
extension of SGML ISO 8879:1986 to Hypermedia within HyTime) was
considered too expensive and with an inappropriate licensing policy for
general HEP (High Energy Physics) community use: a fee for each document
and each time a document was charged.
A NeXTcube was used by
Berners-Lee as the world's first Web server and also to write the first
Web browser, WorldWideWeb, in 1990. By Christmas 1990, Berners-Lee had
built all the tools necessary for a working Web: the first Web browser
(which was a Web editor as well), the first Web server, and the first
Web pages which described the project itself.
On August 6, 1991,
he posted a short summary of the World Wide Web project on the
alt.hypertext newsgroup. This date also marked the debut of the Web as a
publicly available service on the Internet.
The first server outside of Europe was created at SLAC in December 1991.
The
crucial underlying concept of hypertext originated with older projects
from the 1960s, such as the Hypertext Editing System (HES) at Brown
University--- among others Ted Nelson and Andries van Dam--- Ted
Nelson's Project Xanadu and Douglas Engelbart's oN-Line System (NLS).
Both Nelson and Engelbart were in turn inspired by Vannevar Bush's
microfilm-based "memex," which was described in the 1945 essay "As We
May Think."
Berners-Lee's breakthrough was to marry hypertext to the Internet. In his book Weaving The Web, he explains that he had repeatedly suggested that a marriage between the two technologies was possible to members of both
technical communities, but when no one took up his invitation, he
finally tackled the project himself. In the process, he developed a
system of globally unique identifiers for resources on the Web and
elsewhere: the Uniform Resource Identifier.
The World Wide Web
had a number of differences from other hypertext systems that were then
available. The Web required only unidirectional links rather than
bidirectional ones. This made it possible for someone to link to another
resource without action by the owner of that resource. It also
significantly reduced the difficulty of implementing Web servers and
browsers (in comparison to earlier systems), but in turn presented the
chronic problem of link rot. Unlike predecessors such as HyperCard, the
World Wide Web was non-proprietary, making it possible to develop
servers and clients independently and to add extensions without
licensing restrictions.
On April 30, 1993, CERN announced that
the World Wide Web would be free to anyone, with no fees due. Coming two
months after the announcement that the Gopher protocol was no longer
free to use, this produced a rapid shift away from Gopher and towards
the Web. An early popular Web browser was ViolaWWW, which was based upon
HyperCard.
Scholars generally agree, however, that the turning
point for the World Wide Web began with the introduction of the Mosaic
Web browser in 1993, a graphical browser developed by a team at the
National Center for Supercomputing Applications at the University of
Illinois at Urbana-Champaign (NCSA-UIUC), led by Marc Andreessen.
Funding for Mosaic came from the High-Performance Computing and Communications Initiative, a funding program initiated by the High Performance Computing and Communication Act of 1991,
one of several computing developments initiated by Senator Al Gore.
Prior to the release of Mosaic, graphics were not commonly mixed with
text in Web pages, and its popularity was less than older protocols in
use over the Internet, such as Gopher and Wide Area Information Servers
(WAIS). Mosaic's graphical user interface allowed the Web to become, by
far, the most popular Internet protocol.
The World Wide Web
Consortium (W3C) was founded by Tim Berners-Lee after he left the
European Organization for Nuclear Research (CERN) in October, 1994. It
was founded at the Massachusetts Institute of Technology Laboratory for
Computer Science (MIT/LCS) with support from the Defense Advanced
Research Projects Agency (DARPA)—which had pioneered the Internet—and
the European Commission.
Standards
Many
formal standards and other technical specifications define the
operation of different aspects of the World Wide Web, the Internet, and
computer information exchange. Many of the documents are the work of the
World Wide Web Consortium (W3C), headed by Berners-Lee, but some are
produced by the Internet Engineering Task Force (IETF) and other
organizations.
Usually, when Web standards are discussed, the following publications are seen as foundational:
- Recommendations for markup languages, especially HTML and XHTML, from the W3C. These define the structure and interpretation of hypertext documents.
- Recommendations for stylesheets, especially CSS, from the W3C.
- Standards for ECMAScript (usually in the form of JavaScript), from Ecma International.
- Recommendations for the Document Object Model, from W3C.
Additional publications provide definitions of other essential technologies for the World Wide Web, including, but not limited to, the following:
- Uniform Resource Identifier (URI), which is a universal system for referencing resources on the Internet, such as hypertext documents and images. URIs, often called URLs, are defined by the IETF's RFC 3986 / STD 66: Uniform Resource Identifier (URI): Generic Syntax, as well as its predecessors and numerous URI scheme-defining RFCs;
- HyperText Transfer Protocol (HTTP), especially as defined by RFC 2616: HTTP/1.1 and RFC 2617: HTTP Authentication, which specify how the browser and server authenticate each other.
Privacy
"We are left
with the alarming question as to whether privacy should be put before
global security." wrote Abhilash Sonwane of Cyberoam. Among services
paid for by advertising, Yahoo! could collect the most data about
commercial Web users, about 2,500 bits of information per month about
each typical user of its site and its affiliated advertising network
sites. Yahoo! was followed by MySpace with about half that potential and
then by AOL-TimeWarner, Google, Facebook, Microsoft, and eBay. About 27
percent of websites operated outside .com addresses.
Security
The
Web has become criminals' preferred pathway for spreading malware.
Cybercrime carried out on the Web can include identity theft, fraud,
espionage and intelligence gathering. Web-based vulnerabilities now
outnumber traditional computer security concerns, and as measured by
Google, about one in ten Web pages may contain malicious code. Most
Web-based attacks take place on legitimate websites, and most, as
measured by Sophos, are hosted in the United States, China and Russia.
The
most common of all malware threats is SQL injection attacks against
websites. Through HTML and URLs the Web was vulnerable to attacks like
cross-site scripting (XSS) that came with the introduction of JavaScript
and were exacerbated to some degree by Web 2.0 and Ajax web design that
favors the use of scripts. Today by one estimate, 70 percent of all
websites are open to XSS attacks on their users.
Proposed
solutions vary to extremes. Large security vendors like McAfee already
design governance and compliance suites to meet post-9/11 regulations,
and some, like Finjan have recommended active real-time inspection of
code and all content regardless of its source. Some have argued that for
enterprise to see security as a business opportunity rather than a cost
center, "ubiquitous, always-on digital rights management" enforced in
the infrastructure by a handful of organizations must replace the
hundreds of companies that today secure data and networks. Jonathan
Zittrain has said users sharing responsibility for computing safety is
far preferable to locking down the Internet.
Java
A
significant advance in Web technology was Sun Microsystems' Java
platform. It enables Web pages to embed small programs (called applets)
directly into the view. These applets run on the end-user's computer,
providing a richer user interface than simple Web pages. Java
client-side applets never gained the popularity that Sun had hoped for a
variety of reasons, including lack of integration with other content
(applets were confined to small boxes within the rendered page) and the
fact that many computers at the time were supplied to end users without a
suitably installed Java Virtual Machine, and so required a download by
the user before applets would appear. Adobe Flash now performs many of
the functions that were originally envisioned for Java applets,
including the playing of video content, animation, and some rich GUI
features. Java itself has become more widely used as a platform and
language for server-side and other programming.
JavaScript
JavaScript,
on the other hand, is a scripting language that was initially developed
for use within Web pages. The standardized version is ECMAScript. While
its name is similar to Java, JavaScript was developed by Netscape and
has very little to do with Java, although the syntax of both languages
is derived from the C programming language. In conjunction with a Web
page's Document Object Model (DOM), JavaScript has become a much more
powerful technology than its creators originally envisioned. The
manipulation of a page's DOM after the page is delivered to the client
has been called Dynamic HTML (DHTML), to emphasize a shift away from static HTML displays.
In
simple cases, all the optional information and actions available on a
JavaScript-enhanced Web page will have been downloaded when the page was
first delivered. Ajax ("Asynchronous JavaScript and XML") is a group of
interrelated web development techniques used for creating interactive
web applications that provide a method whereby parts within a Web
page may be updated, using new information obtained over the network at
a later time in response to user actions. This allows the page to be
more responsive, interactive and interesting, without the user having to
wait for whole-page reloads. Ajax is seen as an important aspect of
what is being called Web 2.0. Examples of Ajax techniques currently in
use can be seen in Gmail, Google Maps, and other dynamic Web
applications.
Publishing Web pages
Web
page production is available to individuals outside the mass media. In
order to publish a Web page, one does not have to go through a publisher
or other media institution, and potential readers could be found in all
corners of the globe.
Many different kinds of information are
available on the Web, and for those who wish to know other societies,
cultures, and peoples, it has become easier.
The increased
opportunity to publish materials is observable in the countless personal
and social networking pages, as well as sites by families, small shops,
etc., facilitated by the emergence of free Web hosting services.
Statistics
According
to a 2001 study, there were massively more than 550 billion documents
on the Web, mostly in the invisible Web, or deep Web. A 2002 survey of
2,024 million Web pages determined that by far the most Web content was
in English: 56.4 percent; next were pages in German (7.7 percent),
French (5.6 percent), and Japanese (4.9 percent). A more recent study,
which used Web searches in 75 different languages to sample the Web,
determined that there were over 11.5 billion Web pages in the publicly
indexable Web as of the end of January 2005. As of June 2008, the
indexable web contains at least 63 billion pages. On July 25, 2008,
Google software engineers Jesse Alpert and Nissan Hajaj announced that
Google Search had discovered one trillion unique URLs.
Over 100.1
million websites operated as of March 2008. Of these 74 percent were
commercial or other sites operating in the .com generic top-level
domain.
Speed issues
Frustration
over congestion issues in the Internet infrastructure and the high
latency that results in slow browsing has led to an alternative,
pejorative name for the World Wide Web: the World Wide Wait.
Speeding up the Internet is an ongoing discussion over the use of
peering and QoS technologies. Other solutions to reduce the World Wide
Wait can be found on W3C.
Standard guidelines for ideal Web response times are:
- 0.1 second (one tenth of a second). Ideal response time. The user doesn't sense any interruption.
- 1 second. Highest acceptable response time. Download times above 1 second interrupt the user experience.
- 10 seconds. Unacceptable response time. The user experience is interrupted and the user is likely to leave the site or system.
These numbers are useful for planning server capacity.
Caching
If
a user revisits a Web page after only a short interval, the page data
may not need to be re-obtained from the source Web server. Almost all
Web browsers cache recently-obtained data, usually on the local hard
drive. HTTP requests sent by a browser will usually only ask for data
that has changed since the last download. If the locally-cached data is
still current, it will be reused.
Caching helps reduce the amount
of Web traffic on the Internet. The decision about expiration is made
independently for each downloaded file, whether image, stylesheet,
JavaScript, HTML, or whatever other content the site may provide. Thus
even on sites with highly dynamic content, many of the basic resources
only need to be refreshed occasionally. Web site designers find it
worthwhile to collate resources such as CSS data and JavaScript into a
few site-wide files so that they can be cached efficiently. This helps
reduce page download times and lowers demands on the Web server.
There
are other components of the Internet that can cache Web content.
Corporate and academic firewalls often cache Web resources requested by
one user for the benefit of all. (See also Caching proxy server.) Some
search engines, such as Google or Yahoo!, also store cached content from
websites.
Apart from the facilities built into Web servers that
can determine when files have been updated and so need to be re-sent,
designers of dynamically-generated Web pages can control the HTTP
headers sent back to requesting users, so that transient or sensitive
pages are not cached. Internet banking and news sites frequently use
this facility.
Data requested with an HTTP 'GET' is likely to be
cached if other conditions are met; data obtained in response to a
'POST' is assumed to depend on the data that was POSTed and so is not
cached.
Link rot and Web archival
Over
time, many Web resources pointed to by hyperlinks disappear, relocate,
or are replaced with different content. This phenomenon is referred to
in some circles as "link rot" and the hyperlinks affected by it are
often called "dead links."
The ephemeral nature of the Web has
prompted many efforts to archive Web sites. The Internet Archive is one
of the most well-known efforts; it has been active since 1996.
WWW prefix in Web addresses
The
letters "www" are commonly found at the beginning of Web addresses
because of the long-standing practice of naming Internet hosts (servers)
according to the services they provide. So for example, the host name
for a Web server is often "www"; for an FTP server, "ftp"; and for a
USENET news server, "news" or "nntp" (after the news protocol NNTP).
These host names appear as DNS subdomain names, as in "www.example.com."
This
use of such prefixes is not required by any technical standard; indeed,
the first Web server was at "nxoc01.cern.ch", and even today many Web
sites exist without a "www" prefix. The "www" prefix has no meaning in
the way the main Web site is shown. The "www" prefix is simply one
choice for a Web site's host name.
However, some website
addresses require the www. prefix, and if typed without one, won't work;
there are also some which must be typed without the prefix.
Some
Web browsers will automatically try adding "www." to the beginning, and
possibly ".com" to the end, of typed URLs if no host is found without
them. All major web browser will also prefix "http://www."
and append ".com" to the address bar contents if the Control and Enter
keys are pressed simultaneously. For example, entering "example" in the
address bar and then pressing either just Enter or Control+Enter will
usually resolve to "http://www.example.com", depending on the exact browser version and its settings.
Pronunciation of "www"
In English, "www" is pronounced "double-you double-you double-you". It is sometimes shortened to "triple-double-you" or "dub, dub, dub".
The English writer Douglas Adams once quipped:
It is also interesting that in Mandarin Chinese, "World Wide Web" is commonly translated via a phono-semantic matching to wàn wéi wǎng (万维网), which satisfies "www" and literally means "myriad dimensional net", a translation that very appropriately reflects the design concept and proliferation of the World Wide Web.