Help:Selecting pages

From SUWS-wiki
Revision as of 19:45, 11 September 2011 by DavidNewman (talk | contribs) (Added help page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The most important part of the Semantic search features in Semantic MediaWiki is a simple format for describing which pages should be displayed as the search result. Queries select wiki pages based on the information that has been specified for them using Categories, Properties, and maybe some other MediaWiki features such as a page's namespace. The following paragraphs introduce the main query features in SMW.

Categories and property values

In the introductory example, we gave the single condition [[Located in::Germany]] to describe which pages we were interested in. The markup text is exactly what you would otherwise write to assert that some page has this property and value. Putting it in a semantic query makes SMW return all such pages. This is a general scheme: The syntax for asking for pages that satisfy some condition is exactly the syntax for explicitly asserting that this condition holds.

The following queries show what this means:

  1. [[Category:Actor]] gives all pages directly or indirectly (through a sub-, subsub-, etc. category) in the category.
  2. [[born in::Boston]] gives all pages annotated as being about someone born in Boston.
  3. [[height::180cm]] gives all pages annotated as being about someone having a height of 180cm.

By using other categories or properties than above, we can already ask for pages which have certain annotations. Next let us combine those requirements:

[[Category:Actor]] [[born in::Boston]] [[height::180cm]] 

asks for everybody who is an actor and was born in Boston and is 180cm tall. In other words: when many conditions are written into one query, the result is narrowed down to those pages that meet all the requirements. Thus we have a logical AND. By the way: queries can also include line breaks in order to make them more readable. So we could as well write:

  [[Category:Actor]] 
  [[born in::Boston]] 
  [[height::180cm]]

to get the same result as above. Note that queries only return the articles that are positively known to satisfy the required properties: if there is no property for the height of some actor, that actor will not be selected.

When specifying property values, SMW will usually ignore any initial and trailing whitespace, so the two conditions [[height::180cm]] and [[height:: 180cm ]] mean the same. Datatypes such as number may have additional features such as ignoring commas that might be use to separate the thousands. SMW will also treat synonymous page names the same, just like MediaWiki would usually consider Semantic wiki, Semantic_wiki, and semantic wiki to refer to the smae page.

Property values: wildcards and comparators

In the examples above, we gave very concrete property conditions, using «Boston» and «180cm» as values for properties. In many cases, one does not look for only one particular values, but for a whole range of values, such as all actors that are taller than 180cm. In some cases one may even just look for all pages that have any values for a given property at all. For example, the deceased people could be those which have a value for the property «date of death». Such general conditions are possible with the help of comparators and wildcards.

  • Wildcards are written as "+" and allow any value for a given condition. For example, [[born in::+]] returns all pages that have any value for the property «born in».

Comparators are special symbols like < or >. They are placed after :: in property conditions. SMW currently supports the following comparators:

  • > and <: greater than/less than or equal
  • !: unequal
  • ~: «like» comparison for strings (disabled by default)

Comparators work only for property values, but not for conditions on categories. A wiki installation can limit which comparators are available, which is done by the administrator by modifying the value of $smwgQComparators as explained in the file SMW_Settings.php.

Greater than or equal, less than or equal

With numeric values, you often want to select pages with property values within a certain range. For example

[[Category:Actor]] [[height::>6 ft]] [[height::<7 ft]]

asks for all actors that are between 6 feet and and 7 feet tall. Note that this takes advantage of the automatic unit conversion: even if the height of the actor was set with [[height::195cm]] it would be recognized as a correct answer (provided that the datatype for height understands both units, see Help:custom units). Note that the comparator means greater/less than or equal – the equality symbol = is not needed.

Such range conditions on property values are mostly relevant if values can be ordered in a natural way. For example, it makes sense to ask [[start date::>May 6 2006]] but is is not really helpful to say [[homepage URL::>http://www.somewhere.org]].

If a datatype has no natural linear ordering, Semantic MediaWiki will just apply the alphabetical order to the normalised datavalues as they are used in the RDF export. You can thus use greater than and less than to select alphabetic ranges of a string property. For example, you could ask [[surname::>Do]] [[surname::<G]] to select surnames between «Do» and up to «G». For wiki pages, the comparator refers to the name of the given page (without the namespace prefix).

Here and in all other uses of comparators, it might happen that a searched for value really starts with a symbol like &lt. In this case, SMW can be prevented from interpreting the symbol as a comparator if a space is inserted after ::. For example, [[property:: <br>]] really searches for pages with the value «<br>» for the given property.

Not equal

You can select pages that have a property value which is unequal to a given value. For example, [[Area code::!415]] will select pages that have an area code which is not «415». Note that this is query description does not look for pages which do not have an area code 415. Rather, it looks for all pages that (also) have a code unequal to 415. In particular, pages that have no area code at all cannot be the result of the above query.

As with the (default) equality comparator, the use of custom units may require rounding in numeric conversions that can lead to unexpected results. For example, [[height::!6.00 ft]] may still select someone whose height displays as «6.00 feet» simply because the exact numeric value is not really 6. In such situations, it might be more useful to query for pages that have a property value outside a certain range, expressed by taking a disjunction (see below) of conditions with < and >.

String comparisons: Like

The comparator ~ only works for properties of Type:String. In a like condition one uses '*' wildcards to match any sequence of characters and '?' to match any single character. For example, one could ask [[Address::~*Park Place*]] to select addresses containing the string «Park Place», or [[Honorific::~M?.]] to select both «Mr.» and «Ms.».

Unions of query results: disjunctions

Disjunctions are OR-conditions that admit several alternative conditions on query results. SMW has two ways of writing disjunctions in queries:

  • The operator OR is used for taking the union of two queries.
  • The operator || is used for disjunctions in values, page, and category names.

In any case, the disjunction requires that at least one (but maybe more than one) of the possible alternatives is satisfied (logical OR). For example, the query

[[born in::Boston]] OR [[born in::New York]]

describes all pages of people born in Boston or New York. This can also be written with || as as [[born in::Boston||New York]]. In the latter case, «Boston||New York» describes a value that may be either of the two alternatives. Writing queries with || is usually more concise, but not all disjunctions can be written in this way. The following is an example that can not be expressed with ||:

[[born in::Boston]] OR [[Category:Actor]]

The || syntax can be used not only in property values, but also with catgories, like in the query [[Category:Musical actor||Theatre actor]].

Describing single pages

So far, all conditions depended on some or the other [annotation given within an page. But there are also conditions to directly select some pages, or pages from a given namespace.

Directly giving some page title (possibly including a namespace prefix), or a list of such page titles separated by ||, selects the pages with those names. An example is the query

[[Brazil||France||User:John Doe]]

which has three results (at least if the pages exist). Note that the result does not display any namespace prefixes; see the hover box or status bar of the browser, or follow the links to determine the namespace. Restricting the set based on an attribute value one could ask, e.g., «Who of Bill Murray, Dan Aykroyd, Harold Ramis and Ernie Hudson is taller than 6ft?». But direct selection of articles is most useful if further properties of those articles are asked for, e.g. to simply print the height of Bill Murray.

To select a category in this way, a : must be put before the category name. This avoids confusing [[Category:Actor]] (return all actors) and [[:Category:Actor]] (return the category «Actor»).

Restricting results to a namespace

A less strict way of selecting given pages is via namespaces. The default is to return pages in every namespace. To return pages in a particular namespace, specify the namespace with a «wildcard», e.g. write [[Help:+]] to return every page in the «Help» namespace. Since the main namespace usually has no prefix, write [[:+]] to select only pages in the main namespace.

Disjunctions work again with the || syntax as above. For example, to return pages in either the main or «User» namespace, write [[:+||User:+]]. To return pages in the «Category» namespace, a : is again needed in front of the namespace label to prevent confusion.

Subqueries and property chains

Enumerating multiple pages for a property is cumbersome and hard to maintain. For instance, to select all actors that are born in a Italian city one could write:

[[Category:Actor]] [[born in::Rome||Milan||Turin||Florence||...]]

To generate a list of all these Italian cities one could run another query

[[Category:City]] [[located in::Italy]]

and copy and paste the results into the first query. What one would like to do is to use the city query as a subquery within the actor query to obtain the desired result directly. Instead of a fixed list of page names for the property's value, a new query enclosed in <q> and </q> is inserted within the property condition. In this example, one can thus write:

[[Category:Actor]] [[born in::<q>[[Category:City]] [[located in::Italy]]</q>]]

Arbitrary levels of nesting are possible, though nesting might be restricted for a particular site to ensure performance. For another example, to select all cities of the European Union you could write:

  [[Category:Cities]]
  [[located in::<q>[[member of::European Union]]</q>]]

(view results)

In the above example, we essentially have cosntructed a chain of properties «located in» and «member of» to find things that are located in something which is a member of the EU. Queries can be written in a shorter form for this common case:

[[Category:Cities]] [[located in.member of::European Union]]

This query has the same meaning as above, but with much less special sybols required. In general, chains of properties are created by listing all properties separated by dots. In the rare case that a property should contain a dot in its name, one may start the query with a space to prevent SMW from interpreting this dot in a special way.

Using templates and variables

Arbitrary templates and variables can be used in a query. An example is a selection criteria that displays all future events based on the current date:

 [[Category:Event]]
 [[end date::>{{CURRENTYEAR}}-{{CURRENTMONTH}}-{{CURRENTDAY}}]]

Another particularly useful variable for inline queries is {{FULLPAGENAME}} for the current page with namespace, which allows you to reuse a generic query on many pages. For an example of this, see Property:Population. Read about inline queries for more information.

Sorting results

It is often helpful to present query results in a suitable order, for example to present a list of European countries ordered by population. Special:Ask has a simple interface to add a sorting condition to a query. The name of the property to sort by is entered into a text input, and ascending or descending order can be selected. SMW will usually attempt to sort results by the natural order that the values of the selected property may have: numbers are sorted numerically, strings are sorted alphabetically, dates are sorted chronologically. The order therefore is the same as in the case of the < and > comparators in queries. If no specific sorting condition is provided, results will be ordered by their page name.

It is possible to provide more than one sorting condition. If multiple results turn out to be equal regarding the first sorting condition, the next condition is used to order them and so on. A query for actors, e.g., could be ordered by year of birth and use the last name of the author as a second ordering condition. All actors that were born in the same year would thus be ordered alphabetically by there last name instead of appearing in random order.

Sorting a query can also influence the result of a query, because it is only possible to sort by property values that a page actually has. Therefore, if a query is ordered by a property (say «Population») then SMW will usually restrict the query results to those pages that have at least one value for this property (i.e. only pages with specified population appear). Therefore, if the query does not require yet that the property is present in each query result, then SMW will silently add this condition. But SMW will always try to find the ordering property withint the given query first, and it is even possible to order query results by subproperties. Soem examples should illustrate this:

  • [[Category:City]] [[Population::+]] ordered by «Population» will present the cities with population in ascending order. The query result is the same as without the sorting.
  • [[Category:City]] ordered by «Population» will again present the cities with population in ascending order. The query result may be modified due to the sorting condition: if there are cities without a population given, then these will no longer appear in the result.
  • [[Category:City]] [[has location country.population::+]] ordered by «Population» will present the cities ordered by the population of the countries they are located in. The query result is not changed, but «population» now refers to a property used in a subquery.

If a property that is used for sorting has more than one value for some page, then this page will still appear only once in the result list. The position that the page takes in this case is not defined by SMW and may correspond to either of the property values. In the above examples, this would occur if one city would have multiple population numbers specified, or if one citiy is located in multiple countries each of which has a population. It is suggested to avoid such situations.


Query results displayed in a result table can also be ordered dynamically by clicking on the small sort icons found in the table heading of each column. This function requires JavaScript to be enabled in the browser and will sort only the displayed results. So if, e.g., a query has retrieved the twenty world-largest cities by population, it is possible to sort these twenty cities alphabetically or in reverse order of population, but the query will certainly not show the twenty world-smallest cities when reversing the order of the population column. the dynamic sorting of tables attepts to use the same order as used in SMW queries, and in particular orders numbers and dates in a natural way. However, the alphabetical order of strings and page names may slightly vary from the wiki's alphabetic order, simply because there are many international alphabets that can be ordered in different ways depending on the language preference.


Linking to Semantic Search Results

Links to semantic query results on Special:Ask can be created by means of the inline query feature in SMW as explained in its documentation. It is not recommended to create links directly, since they are very lengthy and use a specific encoding. Developers who create extensions that link to Special:Ask should also use SMW's internal functions for building links. Understanding the details of SMW's encoding of queries in links is therefore not required for using SMW.

Things that are not possible

Subqueries for properties

It is not possible to use a subquery to obtain a list of properties that is then used in a query. One can, however, use a query that returns a list of properties, and copy and paste the result into another query.

Queries with special properties

SMW currently does not support queries for the values of any of SMW's built-in Special properties such as «Has type», «Allows value» or «Equivalent URI».