Tutorial2

Fra mn/ifi/camsem
Revisjon per 5. jul. 2010 kl. 11:32 av Larsereb@uio.no (diskusjon | bidrag)

(diff) ← Eldre revisjon | Nåværende revisjon (diff) | Nyere revisjon → (diff)
Hopp til: navigasjon, søk

Tutorial 2: Compare counties while having data for municipalities


When having data for municipalities, you often want graphs about the counties they are a part of instead. In the Anzo-tools by Cambridge Semantics, this is an easy task. A quite similar strategy would work in cases where you want to show years, and have data for the months, quarters, etc. This tutorial will show you the step-by-step procedure to accomplish these tasks, both for municipalities and counties, and quarters and years.


Step 1: Making the connections between municipality and county
Step 1.1: Making the ontology
First off, we need the ontology to say something about the relation between counties and municipalities. When making such an ontology, it is a common 

Tutorial 2 bilde 1.jpg

mistake to think of municipality as a subclass of county. This is of course wrong; a municipality is not a kind of county, but lies within a county. Therefore, we must add a property, for instance “has County” with municipality as domain, and county as range. This is a functional property.


Let us make a new ontology named “Geography”, with two classes “Municipality” and “County”. The class Municipality should in this example have tree properties: Its name, its identifying number and name, and its county. The county-class needs, in this example, just a name-property. (The example-picture has more properties, but don’t worry about that for this tutorial.)
Be sure to add the ontology to the correct Linked Data Set! For this tutorial, create a Linked Data Set with an appropriate name and be sure to add “Geography” to this Linked Data Set.


Step 1.2: Making the matching-table
Now we need to make a table in Excel, matching the correct counties to the appropriate municipalities using their names. This tutorial should have an appendix included with such a table over the municipalities and counties in the region of Nord-Norge (North of Norway) in Norway. (Tutorial 2.xls). A source for information about the relations between counties and municipalities could for instance be Wikipedia. Just make sure that all the information is correct.


Step 1.3: Upload the information with the ontology
Finally we need to upload the data. Select the matching-table in the workbook.


Tutorial 2 bilde 2.jpg
Chose the menu-option “Link the active workbook”. Select the proper Data Set which you created in the previous step. Under Type, select Municipality (the class) below Geography (the ontology). Make sure the orientation is set to row, and that the Headers-option is correct. Select the property Name, and then hook off the check-box “Search on Name”. The Name-property should now look like the picture below. (Please ignore properties in the picture which don’t exist in your own ontology.)
Tutorial 2 bilde 3.jpg
Next, double-click on County, select Name and press “Search on name”.

Tutorial 2 bilde 4.jpg
Press the Upload-button, and the data should be uploaded. You may be prompted with the following:

Tutorial 2 bilde 5.jpg

Hook off “Add key values from the workbook to this Municipality” and double-click on “Auto create a new Municipality”. The same must be done for County.


Step 2: Making the connections between quarters and years
The procedure here is very similar to municipalities and counties. Make an ontology for Time, name it, for instance, "Time periods". Make a class for Year and a class for Quarter, and let Quarter have a Year-property, pointing to its proper year. Also make two literal properties for quarter, one for the number of the quarter, and one for textual representation of year and quarter. Also add a literal property for the year in the Year-class.

Tutorial 2 bilde 6.jpg

The statistics we will use in this tutorial represent quarters as a string with the year and the quarter thus: “2007Q1”. We will then need a table matching years with the respective quarters. We have included such a table in the “Tutorial 2.xls”-file:
Tutorial 2 bilde 7.jpg
With the values under the Quarter-row, we are able to sum up values for on quarter, across the years, in a graph. This enables us to view graphs based on quarters, and compare the quarters with each other.
Upload the table in the same fashion you uploaded the data for municipalities and counties.


Step 3: Adding statistics
3.1 The ontology for the statistics
Make an ontology representing all the data you want to cover. In our case we are using statistics from Statistisk Sentralbyrå (SSB), a.k.a. Statistics Norway. We named the ontology SSB Data. We want to upload statistics about kindergarten coverage and authorized sick leave (obviously to see if there is some kind of connections between these two).

Tutorial 2 bilde 8.jpg
The process of making classes and properties should be familiar by now. Make a class for each of the subjects you have statitics on. Add municipality-properties to both the Sick Leave and Kindergarten-classes, a Quarter-property for Sick Leave, a Year-property for Kindergarten. All of these should have their respective ranges set to the proper classes from the Geography and Time periods-ontologies. Add one literal-property belonging to each class with a suitable name, and make the range of it a Double.
You may also want to make an ontology for Demographics at this point, since the statistics about Sick Leave are specified by gender. You could also just make this property a literal for this exercise, as we have done above.


3.2 Uploading your data
Go to the Worksheet with the data about Sick Leave. Select the proper Data Set and then, under Type, select the proper class. Tag the data in your worksheet, the result should look like the example below.

Tutorial 2 bilde 9.jpg

Click Upload, and wait while the data is being uploaded.
Do the same operations for the data about Kindergarten Coverage:

Tutorial 2 bilde 10.jpg

4. Inspecting the statistics in Anzo on the Web
We now want to create a view in Anzo on the Web, so we can inspect our lovely data in the form of graphs. Log in to your Anzo on the Web-homepage and select "Create a new view". Type in a title and press OK.
Tutorial 2 bilde 11.jpg
Add the dataset which contains data about Sick Leave and Kindergarten coverage, by clicking Add Linked Data Sets, search for the Data Set, click it and press OK.
Select the types of data you want to see. This would be Kindergarten Coverage and Sick Leave.
Create a new lense (from template), choose the chart-template, and give it a cool name as title. Choose Column, and press OK.
Tutorial 2 bilde 12.jpg
Choose Column and press OK. In the next window, check the check box next to legend.
Tutorial 2 bilde 17.jpg
Then go to the Data-tab on the left. Type in “Kindergarten coverage” as the series-name. In label, choose Municipality, then County then Name under “Kindergarten coverage”.
In value, select “Kindergarten coverage”.
In Group By, make the same selection as in Label. In the drop-down box “Value”, choose Avg.
Tutorial 2 bilde 13.jpg
Press “Add new series configuration” and do the same for Sick leave.
Tutorial 2 bilde 14.jpg
Press Save and wait. After some seconds, this screen should show up:

Tutorial 2 bilde 15.jpg
We can now view information about kindergarten coverage and sick leave in three counties, based on the municipalities!

There is one huge problem about this procedure. Let’s say you want to view just information about one year, and create a filter for years. We make the filter, and select 2007. Now we can just view graphs only for sick leave or from kindergarten coverage. So you can’t actually compare the counties in just one year. Anzo calculates the average over the years:

Tutorial 2 bilde 16.jpg

And this concludes the second tutorial.