2010年11月26日星期五

Blog Topic 3 - Long Tail Effect, Streisand Effect and Web 2.0

Introduction to Web 2.0
- What are the Long Tail Effect and Streisand Effect?

- How are they related to Web 2.0?
- Give real-life examples how they take place in Web 2.0 age.

What are the Long Tail Effect and Streisand Effect?

The long tail refers to the statistical property that a larger share of population rests within the tail of a probability distribution than observed under a 'normal' or Gaussian distribution. The term has gained popularity in recent times as a retailing concept describing the niche strategy of selling a large number of unique items in relatively small quantities – usually in addition to selling fewer popular items in large quantities.




l          to the left are the few that dominate (selling large volumes of a reduced number of popular items)
l          to the right is the long tail (selling small volumes of hard-to-find items to many customers)

Given a large enough availability of choice, a large population of customers, and negligible stocking and distribution costs, the selection and buying pattern of the population results in a power law distribution curve, or Pareto distribution. This suggests that a market with a high freedom of choice will create a certain degree of inequality by favouring the upper 20% of the items ("hits" or "head") against the other 80% ("non-hits" or "long tail"). This is known as the Pareto principle or 80–20 rule.

The "long-tail effect" is all about profitable niche businesses (hard-to-find item) replacing the traditional mass market (popular item).


The Streisand effect is a primarily online phenomenon in which an attempt to hide or remove a piece of information has the unintended consequence of causing the information to be publicized widely and to a greater extent than would have occurred if no contrary action had been attempted. It is named after American entertainer Barbra Streisand, following a 2003 incident in which her attempts to suppress photographs of her residence inadvertently generated further publicity.

How are they related to Web 2.0?

The term Web 2.0 is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centred design, and collaboration on the World Wide Web. A Web 2.0 site gives its users the free choice to interact or collaborate with each other in a social media dialogue as creators (prosumers) of user-generated content in a virtual community, in contrast to websites where users (consumers) are limited to the passive viewing of content that was created for them. Examples of Web 2.0 include social-networking sites, blogs, wikis, video-sharing sites, hosted services, web applications, mashups and folksonomies.

Although the term suggests a new version of the World Wide Web, it does not refer to an update to any technical specifications, but rather to cumulative changes in the ways software developers and end-users use the Web.

Web 2.0 technologies are changing the way messages spread across the Web. A number of online tools and platforms are now defining how people share their perspectives, opinions, thoughts and experiences. Here are some popular web 2.0 tools:

Podcasting
Blogs
RSS
Social bookmarking
Social networking

Web 2.0 emphasizes user-centred design, collaboration, the free choice to interact or collaborate with each other; user role is changed from traditional passive viewing to actively create the content. It generates a new business opportunity (Long Tail Effect) and social network culture (Streisand effect).



Let‘s review a real case that benefit from Long Tail Effect:

Netflix, Inc. is an American corporation that offers both on-demand video streaming over the internet, and flat rate DVD and Blu-ray Disc rental-by-mail in the United States and Canada (streaming only). Netflix stocks movies in centralized warehouses, its storage costs are far lower and its distribution costs are the same for a popular or unpopular movie. Netflix is therefore able to build a viable business stocking a far wider range of movies than a traditional movie rental store. Those economics of storage and distribution then enable the advantageous use of the Long Tail. With the Web 2.0, consumer can find out their favourite movies, review other comment more easily even on "unpopular" movies, consumer has far more choice than ever from the Web. Netflix finds that in aggregate, "unpopular" movies are rented more than popular movies.

Moreover, the long tail means that consumer have more choice of the movies, some consumer will shift from the popular one to unpopular one to fit the real interests. It means that the long tail is growing. It is the trend that the consumer bypass the hit-driven culture promoted in the mainstream media, and spending the time (and money) pursuing the personal tastes.


Web 2.0 also generates social network culture (Streisand effect), let’ review a real case:

In April 2007, an attempt at blocking an AACS key from being disseminated on Digg caused an uproar when cease-and-desist letters demanded the code be removed from several high-profile websites. However, with the Web 2.0 feature, collaboration, user-generated content, social networking, this led to the key's proliferation across other sites and chat rooms in various formats, with one commentator describing it as having become "the most famous number on the internet". Within a month, the key had been reprinted on over 280,000 pages, printed on T-shirts and tattoos, and had appeared on YouTube in a song played over 45,000 times. It shows how powerful the Web 2.0 influences the social network culture; it could sometimes be a good thing but sometimes is not.



Reference



2010年11月25日星期四

Blog Topic 1 - Session Hijacking


Hypertext Transfer Protocol
- What is “session hijacking”? What are its security threats?

- How can web developers avoid it?

Session hijacking describes all methods by which an attacker can access another user's session. In computer science, session hijacking refers to the exploitation of a valid computer session, sometimes also called a session key, to gain unauthorized access to information or services in a computer system. In particular, it is used to refer to the theft of a magic cookie used to authenticate a user to a remote server. The HTTP cookies used to maintain a session on many web sites can be easily stolen by an attacker using an intermediary computer or with access to the saved cookies on the victim's computer.

TCP session hijacking happens when a hacker takes over a TCP session between two machines. Since most authentications only occur at the start of a TCP session, this allows the hacker to gain access to a machine.

A popular method is using source-routed IP packets. This allows a hacker at point A on the network to participate in a conversation between B and C by encouraging the IP packets to pass through its machine.

If source-routing is turned off, the hacker can use "blind" hijacking, whereby it guesses the responses of the two machines. Thus, the hacker can send a command, but can never see the response. However, a common command would be to set a password allowing access from somewhere else on the net.

A hacker can also be "inline" between B and C using a sniffing program to watch the conversation. This is known as a "man-in-the-middle attack".

A common component of such an attack is to execute a denial-of-service (DoS) attack against one end-point to stop it from responding. This attack can be either against the machine to force it to crash, or against the network connection to force heavy packet loss.


There are three main methods used to perpetrate a session hijack.

Session fixation,
the attacker sets a user's session id to one known to him, for example by sending the user an email with a link that contains a particular session id. The attacker now only has to wait until the user logs in.

Email with link to fraudulent website with the domain name "www.hkboc.net", which looks similar to the official website of Bank of China (Hong Kong) Limited (BOCHK) “www.bochk.com”.

Session sidejacking,
where the attacker uses packet sniffing to read network traffic between two parties to steal the session cookie. Many web sites use SSL encryption for login pages to prevent attackers from seeing the password, but do not use encryption for the rest of the site once authenticated. This allows attackers that can read the network traffic to intercept all the data that is submitted to the server or web pages viewed by the client. Since this data includes the session cookie, it allows him to impersonate the victim, even if the password itself is not compromised. Unsecured Wi-Fi hotspots are particularly vulnerable, as anyone sharing the network will generally be able to read most of the web traffic between other nodes and the access point.
Alternatively, an attacker with physical access can simply attempt to steal the session key by, for example, obtaining the file or memory contents of the appropriate part of either the user's computer or the server.

Cross-site scripting,
where the attacker tricks the user's computer into running code which is treated as trustworthy because it appears to belong to the server, allowing the attacker to obtain a copy of the cookie or perform other operations.

Example of Cross-Site Scripting
To initiate the attack, the attacker must convince the user to click on a carefully crafted hyperlink, for example, by embedding a link in an email sent to the user or by adding a malicious link to a newsgroup posting. The link points to a vulnerable page in your application that echoes the unvalidated input back to the browser in the HTML output stream. For example, consider the following two links.

Here is a legitimate link:
www.awebapplication.com/logon.aspx?username=bob

Here is a malicious link:
www.awebapplication.com/logon.aspx?username=<script>alert('hacker code')</script>
If the Web application takes the query string, fails to properly validate it, and then returns it to the browser, the script code executes in the browser. The preceding example displays a harmless pop-up message. With the appropriate script, the attacker can easily extract the user's authentication cookie, post it to his site, and subsequently make a request to the target Web site as the authenticated user.
There is something that the web developer can do to avoid the session hijacking.

Implement logout functionality to allow a user to end a session that forces authentication if another session is started.

Regenerating the session id after a successful login. This prevents session fixation because the attacker does not know the session id of the user after he has logged in.

Encryption of the data passed between the parties such as using SSL thought out the communication; in particular the session key. This technique is widely relied-upon by web-based banks and other e-commerce services, because it completely prevents sniffing-style attacks. However, it could still be possible to perform some other kind of session hijack.

Limit the expiration period on the session cookie if you do not use SSL. Although this does not prevent session hijacking, it reduces the time window available to the attacker.

Change the value of the cookie with each and every request. This dramatically reduces the window in which an attacker can operate and makes it easy to identify whether an attack has taken place, but can cause other technical problems. However, it prevents the back button from working properly on the web.

Perform thorough input validation, to prevent cross-site scripting, the applications must ensure that input from query strings, form fields, and cookies are valid for the application. Consider all users input as possibly malicious, and filter or sanitize for the context of the downstream code. Validate all input for known valid values and then reject all other input. Use regular expressions to validate input data received via HTML form fields, cookies, and query strings.

Blog Topic 2 - XML Schema ID/IDREF/IDREFS unique

XML Schema
- What are ID, IDREF, and IDREFS simple types in XSD?

- How is xs:unique used to constrain values?
- Give examples to explain the usage of these XSD constructs.

[Definition:]  
ID represents the ID attribute type from [XML 1.0 (Second Edition)]. The ·value space· of ID is the set of all strings that ·match· the NCName production in [Namespaces in XML]. The ·lexical space· of ID is the set of all strings that ·match· the NCName production in [Namespaces in XML]. The ·base type· of ID is NCName. ID should be used only on attributes.

ID has the following ·constraining facets·:

•length
•minLength
•maxLength
•pattern
•enumeration
•whiteSpace

[Definition:]  
IDREF represents the IDREF attribute type from [XML 1.0 (Second Edition)]. The ·value space· of IDREF is the set of all strings that ·match· the NCName production in [Namespaces in XML]. The ·lexical space· of IDREF is the set of strings that ·match· the NCName production in [Namespaces in XML]. The ·base type· of IDREF is NCName. IDREF should be used only on attributes.

IDREF has the following ·constraining facets·:

•length
•minLength
•maxLength
•pattern
•enumeration
•whiteSpace

[Definition:]  
IDREFS represents the IDREFS attribute type from [XML 1.0 (Second Edition)]. IDREFS are derived from IDREF. The ·value space· of IDREFS is the set of finite, non-zero-length sequences of IDREFs. The ·lexical space· of IDREFS is the set of space-separated lists of tokens, of which each token is in the ·lexical space· of IDREF. The ·itemType· of IDREFS is IDREF.

IDREFS has the following ·constraining facets·:

•length
•minLength
•maxLength
•enumeration
•whiteSpace
•pattern

xs:unique is a identity-constraint definition validation rule.
If the {identity-constraint category} is unique, then no two members of the ·qualified node set· have ·key-sequences· whose members are pairwise equal, as defined by Equal in [XML Schemas: Datatypes].



See the following example of ID/IDREF:
XSD Schema - Author is referenced by Author ID in Book
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="BookList">
    <xs:complexType>
            <xs:sequence>
                <xs:element name="AuthorList">
                    <xs:complexType>
                        <xs:sequence>
                        <xs:element name="Author" maxOccurs="unbounded">
                            <xs:complexType>
                                <xs:simpleContent>
                                    <xs:extension base="xs:string">
                                        <xs:attribute name="ID" type="xs:ID" use="required"/>
                                    </xs:extension>
                                </xs:simpleContent>
                            </xs:complexType>
                        </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
                <xs:element name="Book" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:simpleContent>
                            <xs:extension base="xs:string">
<xs:attribute name="ISBN" type="xs:string" use="required"/>
                                <xs:attribute name="AuthorID" type="xs:IDREF" use="required"/>
                            </xs:extension>
                        </xs:simpleContent>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
    </xs:complexType>
</xs:element>
</xs:schema>


XML Data
<?xml version="1.0" encoding="UTF-8"?>
<BookList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="BookList_Schema_1.xsd">
    <AuthorList>
        <Author ID="A01">Steve Krug</Author>
        <Author ID="A02">Brian Proffitt</Author>
        <Author ID="A03">Amy Shuen</Author>
        <Author ID="A03">Kelly Shuen</Author><!-- The value 'A03' of attribute 'ID' on element 'Author' is not valid with respect to its type, 'ID'. -->
    </AuthorList>
    <Book ISBN="978-1435458994" AuthorID="A02">Take Your iPad to Work</Book>
    <Book ISBN="978-0321657299" AuthorID="A01">Rocket Surgery Made Easy: The Do-It-Yourself</Book>
    <Book ISBN="978-0596529963" AuthorID="A03">Web 2.0: A Strategy Guide</Book>
    <Book ISBN="978-0321344755" AuthorID="A01">Don't Make Me Think: A Common Sense Approach to Web Usability</Book>
    <Book ISBN="978-0321344756" AuthorID="A04">XML Cookbook</Book><!-- There is no ID/IDREF binding for IDREF 'A04'. -->
</BookList>

From the XML Data, it shows the Author ‘Steve Krug’ represented by ID ‘A01’ is the author of 2 books. By applying ID and IDREF, the duplication of keeping Author name in different books is removed; it makes the data keeping and updating more efficient. It can be used to model one-to-many relationship. ID acts as the primary key and IDREF acts as the foreign key.

In the example, the ID/IDREF constraint validation on the value confines the value of ID type cannot repeat such as Author ID ‘A03’. The value of IDREF type should be binding to the value of ID type and it is an error on book with Author ID ‘A04’ as there is no Author with ID ‘A04’.

Moreover, by using ID/IDREF type, it gains the advantage in building transformation style sheet. xsl:key serves as a reference to link the ID and IDREF type data.


See the following example of xsl:key:
XSLT Style Sheet – Book attribute AuthorID link to Author ID to show the name
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl"
    exclude-result-prefixes="xd"
    version="1.0">
    <xsl:output method="text" indent="yes"/>
    <xsl:key name="Author_Key" match="Author" use="@ID"/>
    <xsl:template match="/BookList/AuthorList"/>
    <xsl:template match="/BookList/Book">
        Book: <xsl:value-of select="."/>
        ISBN: <xsl:value-of select="@ISBN"/>
        Author: <xsl:value-of select="key('Author_Key', @AuthorID)"/>
       
    </xsl:template>
</xsl:stylesheet>

Sample Output:
        Book: Take Your iPad to Work
        ISBN: 978-1435458994
        Author: Brian Proffitt
   
        Book: Rocket Surgery Made Easy: The Do-It-Yourself
        ISBN: 978-0321657299
        Author: Steve Krug
   
        Book: Web 2.0: A Strategy Guide
        ISBN: 978-0596529963
        Author: Amy Shuen
   
        Book: Don't Make Me Think: A Common Sense Approach to Web Usability
        ISBN: 978-0321344755
        Author: Steve Krug
   


See the following example of ID/IDREFS:
XSD Schema - Authors are referenced by Author ID in Book
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="BookList">
    <xs:complexType>
            <xs:sequence>
                <xs:element name="AuthorList">
                    <xs:complexType>
                        <xs:sequence>
                        <xs:element name="Author" maxOccurs="unbounded">
                            <xs:complexType>
                                <xs:simpleContent>
                                    <xs:extension base="xs:string">
                                        <xs:attribute name="ID" type="xs:ID" use="required"/>
                                    </xs:extension>
                                </xs:simpleContent>
                            </xs:complexType>
                        </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
                <xs:element name="Book" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:simpleContent>
                            <xs:extension base="xs:string">
<xs:attribute name="ISBN" type="xs:string" use="required"/>
                                <xs:attribute name="AuthorID" type="xs:IDREFS" use="required"/>
                            </xs:extension>
                        </xs:simpleContent>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
    </xs:complexType>
</xs:element>
</xs:schema>


XML Data
<?xml version="1.0" encoding="UTF-8"?>
<BookList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation=" BookList_Schema_2.xsd">
    <AuthorList>
        <Author ID="A01">Steve Krug</Author>
        <Author ID="A02">Brian Proffitt</Author>
        <Author ID="A03">Amy Shuen</Author>   
</AuthorList>
    <Book ISBN="978-1435458994" AuthorID="A02 A01">Take Your iPad to Work</Book>
    <Book ISBN="978-0321657299" AuthorID="A01 A03 A02">Rocket Surgery Made Easy: The Do-It-Yourself</Book>
    <Book ISBN="978-0596529963" AuthorID="A03">Web 2.0: A Strategy Guide</Book>
    <Book ISBN="978-0321344755" AuthorID="A01">Don't Make Me Think: A Common Sense Approach to Web Usability</Book>
</BookList>

This example shows that the book with attribute AuthorID of IDREF type can bind to set of space-separated values from ID type Author ID.
It can be used to represent many-to-many relationship.


See the following example of xs:unique:
XSD Schema - Book ISBN should be unique
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="BookList">
    <xs:complexType>
            <xs:sequence>
                <xs:element name="Book" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:simpleContent>
                            <xs:extension base="xs:string">
<xs:attribute name="ISBN" type="xs:string" use="required"/>
                            </xs:extension>
                        </xs:simpleContent>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
    </xs:complexType>
<xs:unique name="ISBNKey">
          <xs:selector xpath="Book"/>
          <xs:field xpath="@ISBN"/>
    </xs:unique>
</xs:element>
</xs:schema>



XML Data
<?xml version="1.0" encoding="UTF-8"?>
<BookList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="BookList_Schema_3.xsd">
    <Book ISBN="978-1435458994">Take Your iPad to Work</Book>
    <Book ISBN="978-0321657299">Rocket Surgery Made Easy: The Do-It-Yourself</Book>
    <Book ISBN="978-0596529963">Web 2.0: A Strategy Guide</Book>
    <Book ISBN="978-0321344755">Don't Make Me Think: A Common Sense Approach to Web Usability</Book>
    <Book ISBN="978-0321344755">XML Cookbook</Book><!-- Duplicate unique value [978-0321344755] declared for identity constraint "ISBNKey" of element "BookList". -->
</BookList>

The XSD Schema constraints the attribute ISBN should be unique, the xs:selector specific the path and xs:field specific the field to be unique.
The Book with ISBN ‘978-0321344755’ is duplicated and violate the unique identity constraint.


In additions, by applying the ID/IDREF with xs:unique, it can model one-to-one relationship.

See the following example of ID/IDREF and xs:unique:
XSD Schema - Author is referenced by Author ID in Book
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="BookList">
    <xs:complexType>
            <xs:sequence>
                <xs:element name="AuthorList">
                    <xs:complexType>
                        <xs:sequence>
                        <xs:element name="Author" maxOccurs="unbounded">
                            <xs:complexType>
                                <xs:simpleContent>
                                    <xs:extension base="xs:string">
                                        <xs:attribute name="ID" type="xs:ID" use="required"/>
                                    </xs:extension>
                                </xs:simpleContent>
                            </xs:complexType>
                        </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
                <xs:element name="Book" maxOccurs="unbounded">
                    <xs:complexType>
                        <xs:simpleContent>
                            <xs:extension base="xs:string">
<xs:attribute name="ISBN" type="xs:string" use="required"/>
                                <xs:attribute name="AuthorID" type="xs:IDREF" use="required"/>
                            </xs:extension>
                        </xs:simpleContent>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
    </xs:complexType>
<xs:unique name="AuthorKey">
          <xs:selector xpath="Book"/>
          <xs:field xpath="@ AuthorID"/>
    </xs:unique>
</xs:element>
</xs:schema>



XML Data
<?xml version="1.0" encoding="UTF-8"?>
<BookList xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="BookList_Schema_4. xsd">
    <AuthorList>
        <Author ID="A01">Steve Krug</Author>
        <Author ID="A02">Brian Proffitt</Author>
        <Author ID="A03">Amy Shuen</Author>
    </AuthorList>
    <Book ISBN="978-1435458994" AuthorID="A02">Take Your iPad to Work</Book>
    <Book ISBN="978-0321657299" AuthorID="A01">Rocket Surgery Made Easy: The Do-It-Yourself</Book>
    <Book ISBN="978-0596529963" AuthorID="A03">Web 2.0: A Strategy Guide</Book>
    <Book ISBN="978-0321344755" AuthorID="A01">Don't Make Me Think: A Common Sense Approach to Web Usability</Book><!-- Duplicate unique value [A01] declared for identity constraint "AuthorKey" of element "BookList". -->
</BookList>

This example shows that Author ID under Author List can appear in book attribute AuthorID one and only one. By applying the ID/IDREF and xs:unique, one-to-one relationship can be modelled.