WebService
Goals
The webservice aims to extract textual content from a Mediawiki in order to integrate it as a whole or as a part in a thid-party website. The extracted data could also be processed on these websites to produce schemes, graphics, maps etc.
On the other hand, the webservice must allow data harvesting from external sources (like GRIN, Mansfeld, Prota4U?, ...) to complete and improve the content already available on the MW.
- Comment Gregor Hagedorn: I think these are very different tasks and are best discussed and implemented separately.
Content Export
The export of textual content will have 2 objectives :
- Displaying text on a distant website, with the possibility of displaying it as a whole or only a part of it.
- Propose a structure for exported data in order to allow processing similar to relational database processings
How ?
1) Explore all the possibilities included in Mediawiki for exporting pages.
- two webservices exist: index.php (the normal browser) and API.php (a real API).
- we have an internal docu page: http://biowikifarm.net/meta/Wiki_API - please add your understanding or links found to that page.
- batch mode possible through command line: dump - but probably not good for this task.
2) Work on standardization of pages structure on our MW
- It is possible to obtain a page and analyze its heading structure. For large blocks of text, this is the most wiki-like way.
- Alternatively, a template can be used. In this case, you can also create nice forms for the template. Example: Any page from: http://offene-naturfuehrer.de/web/Literatur then click on "Mit Formular bearbeiten" = Edit with form. Or look at the images already using the metadata form and template, see http://species-id.net/openmedia/Special:WhatLinksHere/Template:Metadata - click Edit with form.
3) Uniformize the structure for data of the same type (taxobox, phylogeny, ...)
4) Develop webservices based on REST architecture to get XML file related to page using common structure
5) Use these WebServices? to render information from the wiki in third-party websites.
6) Export contents through these Web services.
- allow exporting a whole page.
- allow exporting a precise dataType ex : a particular template.
7) Get (XML ?) contents from webservices and display them as graphs/schemes.
Import Content
Importing content from external websites will answer two needs :
- Give existing groups useful informations to enhance collaboration.
- Complete informations of the MW with data extracted from internationnal databanks.
How?
1) Define what is the common structure for all the reference website (maybe HTML)
2) Discover specific data structures of GRIN, MANSFELD and GBIF
3) Create a Web Service for the actual import of data into the MediaWiki
4) Define which data are of interest and import them into the MW.
Technological choice
Not define at this time : maybe PHP in conjonction with JQUERY and REST architecture, and data encoded in JSON.
Eventually, the WebServices must be able to recognize requests from both a partially controlled (MW) and Out of Control (third party sites) souces, also to recognize the relevant information, retrieve, format, and return them to the sender of the request.
- Comments Gregor: It is possible to put part of information into separate pages (or subpages), which are then included into the main pages.
- Includes replace in the hypertext system relational database joins. They are powerful, but not fully as powerful.
- Advantage of this pattern: main pages remain human-editable, auto-updated pages can carry a warning "do not edit me".
- auto-updated pages can be placed into main namespace, if they shall be searchable/findable, or into Template-namespace for the opposite.
- Example of pattern: page "Balsawood" contains the plant-uses information, page "Template:IncludeGBIF/Balsawood" contains information from GBIF. On the page "Balsawood" a footer contains a template, e.g., something like {{Autoinclude}}, or it could be part of a main edited template like a taxobox. This template in turn contains code that automatically includes the corresponding subpage with the automatically updated content (using the current page name through a mediawiki variable).