How to parse and sort an XML Database

Hi to all…

First I must say that I’m new to Applescript. I had in the past a good experience in programming with Basic in the 80’ later with dBase, Clipper and xBase++ and a little bit in C++ on Windows system before I’ve moved to OsX middle of 2006.

For one month ago I was reading the blog of Allan Craig and the book “Learn Applescript” from H. Sanderson and H. Rosenthal which gave me the interest to try again…

As I say before this is my first application, it run fine. For sure it’s not perfect and could be optimized.
May be you can help? :cool:

you can download it here: https://dl.dropbox.com/u/2883279/iClipping.zip

May be you have the same problem as me. I take a lot of clipping, notes, download internet pages as pdf for later use. But for example one year later, as I need the info’s again, the problem begin for me: where is it?, where did I read about this?. So it will be easier for me if I had a unique place to find these infos.

This app is the first Step/Part of a simple database system, to keep clippings, notes/informations using XML format. I like XML, because you can edit your file with any Texteditor or browse.

iClipping save every day a new file on my Desktop. The next Step is to create an application which could read, eventually edit the file and save the datas sorted by “tags” to another XML file (my Database). I wish also to add parallel index files.

Now I’m coming to the point where you can help me.

  • is it possible to make such app. in Applescript or should I take the way to ApplescriptObjC?
  • is it possible to write it without the use of a library like Satimage.fr or LateNightSoft?
    for the sport, I will prefer to program it by my self. But in this time I have no idea how todo that.

Your wonderful site is full of examples and documentations, but I did’t found examples go-in in the way I wish to go.

I will really appreciate your advices, may be you can give me some links to examples showing how to manipulate XML in Applescript.

Thanks in advance for your comments.:slight_smile:

Model: MBP late 2007
Browser: Safari 536.26.17
Operating System: Mac OS X (10.7)

Personally, I always use SQLite for any AppleScript projects that need a database. The functionality is built into the Mac OS, and is easily accessed from AppleScript via the do shell script command. Anytime you work with XML data in AppleScript, or even in AppleScriptObjC, you have to do all of your own parsing, which significantly increases the coding burden.

Thanks Craig!

Is SQLite able to read, write, lookup XML-Record?

I found two SQL installations on my Mac:

  • sqlite3 under → /usr/bin/
  • mysql under → /usr/local/mysql-5.1.53-osx10.6-x86

I can remember that I installed mysql, for 2 or 3 years ago, to manage my blog/homepage by Strato (German provider). According the pathname, it seems to be the 32bit version for snow-leopard.

According your advice, I will first concentrate my effort on SQLite3.

Thanks again.

Hello.

I was thinking of this:

For the first, if you implement an abstraction layer with handlers for say adding, deleting, changing and retrieving records, the burden seems alike to me.

Secondly:

You don’t sort an xml database per se. You sort the values gotten from the retrival keys, and then get the information related to the value for that key. The xml at least on mac os x, is hashed, so that the retrival times are almost equal at all times.

Thirdly:

Are you sure you want to make this effort for your task? I have had similiar ideas, and I admit, that I have to fuzz about sometimes to find my stuff. I blame spotlight for some of it, or that I in my laziness haven’t provided enough places for Spotlight to look. I must also admit that I have a set of small databases, that either relates to projects, or knowledge areas, but those are aside of the other information systems, as I mostly collect, and look over it, and I know when the information is stored there.

But I utilize Spotlight, and most of the time I get good results. When I don’t look for stuff by Spotlight, then I have a folder system, resembling the internet archieve, so if I for instance need to dump something about the C-language onto my disk, it ends up in computer/Languages/C. What goes for a project, ends up i projectname/resources or projectname background or something.

I am very travel with using the “kind” keywords with spotlight, so that when I know it was in a webarchieve, then I use kind:webarchieve (in my language). I even browse my browser history by using kind:safari :slight_smile:

My system for finding stuff is not ideal, but more than less than ideal, I think you are having a good project though, it is just me that isn’t fit for storing the necessary meta-data to keep such a database up to date.

And for the fourth:

I am thinking of porting an xml database manager to Mac Os X, as soon as I am finished with my current project, if you are interested, let me know. :slight_smile:

You should also have a look for open source database engines Here at macports.org. Maybe you’ll find something you might be able to use. :slight_smile:

Hello McUsrII,

thanks for your comments! It seems you have similar habit as mine, regarding your data saving/sorting.

I must agree with you that could be an interesting project.

They are a lot of databases systems, in my previous job I was working on 20 years with dBase. Relational databases have their limitations for example length of fields etc… It make difficult to mix informations.
XML is much more flexible, can be edited/viewed on any computer system with on board applications.

When I look back in my life, for 20-30 years ago we need to have a lot of books to learn something… To day you open your browser, google a bit, 1-2-3 and you have 100 answers. We can communicate with people all over the world, exchange datas per mail… Let say for 10 years ago this was reserved to a small part of the population.

What have made this possible? First the technique, but much more the web and the unification of data exchange.

That’s why, I choose XML for my small project.

I have free time this week to concentrate my research on this theme. Today I was study the library from satimage.fr, which look very promising.

A last question: do you know if it possible to manipulate an XML file with SQL without converting the data in an database form like database event?

Hello.

You have Yojimbo, instapaper, and Evernote, and VoodoPad (Light=free), that all tries to solve at least some of the same problem, then you have a whole slew of tagging applications, and whatnot, I hope I haven’t made someone grumpy because I didn’t mention their favoirte app.

This is all about finding something that works for you in your workflow, in a way that doesn’t obstruct you.

There is Sql -like front ends for xml, try googling for it.

The space issue regarding field lengths in a database is a very thin excuse for not wishing to use a relational database.
The disk price is slightly higher than air, nowadays.

It is also hard, I think, to create a network database with XML, but you have the interopability thing, that it makes it easy for many apps to share data. I don’t however see how that is coming into play with a personal database. Frankly, I like to have my notes encrypted, and not in a structure that other people can query at will.

But I am sure it is a good idea, and a good project, don’t let me put you off. :slight_smile:

I also see the advantages of using XML for such a project, since you can have and define/create different type of records which you can store in a heterogenous collection.

Hello.

Have a look at this; you’d use this the same way you’d use sqllite, via do shell script.

FLAIM

It is already ported to Mac Os X, but I think you’ll need the developer tools to compile it.

Hello.

Maybe a backend in prolog, would be ideal, it should be, for creating a network database, it is just an alternative for “hooking up” the different record types you will have to deal with, if I have understood what you want to make.

It would have been interesting to participate or follow your progress on datamodel and spec with you on google docs, or similiar.

Hi!

First I want to thanks you McUsrII for all your tips! :slight_smile:

I took my time to learn the System Events - XML Datas and Elements.
Interesting and easy to use, specially with small file.

Here is a small script, showing the results of parsing an xml File.
May be it can help other ‘newbies’ in this material.
The script can open any xml file.



--  parse the content of an XML File using System Events routines
-- the comments shows results when using the sample file posted by  
-- this should run with any XML file, only the results will change according to the file content

-----------------------------------------------------  Variables ------------------------------------------------------------
set xmlChildsNames to {} 
set xmlChildsText to {} 
set xmlElementsNames to {}  
set xmlChildsProp to {}
set xmlChildAttributeName to {}
set xmlChildsText to {}
set theChooseList to {}
------------------------------------------------------  Main()  --------------------------------------------------------------
set xmlFilePath to (choose file) as string --> i.e: "Lion:Users:rene:Desktop:xmlTest.xml"

tell application "System Events"
	
	set fileTypeClass to class of xmlFilePath --> text / That's the class of the variable 'xmlFilePath'
	set xmlFileID to id of xmlFilePath
	--> list of 35 Items / this is the content of 'xmlFilePath'  "Lion:Users:rene:Desktop:xmlTest.xml"
	-- id is not a file ID as I expected 
	
	-- now the next calls will read the file and take his all content into memory in one read
	tell XML element 1 of XML file xmlFilePath
		set xmlRootName to name --> "Main"
		set xmlChildsCount to count of XML elements --> 3 / yes the file contains 3 Records
	end tell
	
	set xmlDatas to contents of XML file xmlFilePath
	--> contents of XML file "Lion:Users:rene:Desktop:xmlTest.xml"
	-- xmlDatas contains all Items 
	
	set xmlDataClass to class of xmlDatas --> XML data (inherited from item)
	
	set xmlDataId to id of xmlDatas -->354512912 the ID change on each running 
	
	set xmlDataName to name of xmlDatas --> seams to be not defined ???
	
	-- hold the content of the file into memory in form of one record
	set xmlDataProperties to properties of xmlDatas
	-- return a record {name:... , class:... , id:... , text:...} of 4 items
	-- {name:missing value,     <-- same as 'set xmlDataName to name of xmlDatas' 
	--	 class:XML data, 
	--	 id:354512914, 
	--	 text:"<?xml version=\"1.0\" encoding=\"utf-8\" standalone=\"yes\"?>
	--		  	<Main>\n\n  
	--			  	<DataElement DataValue=\"8086:293e\">\n    
	--				  	<type>Audio1</type>\n    
	--				  	<name>Some Name1</name>\n    
	--				  	<displayname>Some Display Name1</displayname>\n    
	--				  	<download>http://somesite1.com</download>\n  
	--			  	</DataElement>\n\n  
	--			  	<DataElement DataValue=\"8086:295e\">\n    
	--				  	<type>Audio2</type>\n    
	--				  	<name>Some Name2</name>\n    
	--				  	<displayname>Some Display Name2</displayname>\n    
	--				  	<download>http://somesite2.com</download>\n  
	--			  	</DataElement>\n\n  
	--			  	<DataElement DataValue=\"8086:297e\">\n    
	--				  	<type>Audio3</type>\n    
	--				  	<name>Some Name3</name>\n    
	--				  	<displayname>Some Display Name3</displayname>\n    
	--				  	<download>http://somesite3.com</download>\n  
	--			  	</DataElement>\n\n  
	--		  	</Main>"}
	--
	-- 
	-- check the value of the record ID 
	-- it is different as the value of the variable xmlDataId 
	-- let say that: 
	-- xmlDataId is the top file ID 
	-- and xmlDataProperties.id:... is the id of the first record which is the ROOT 'Main'
	-- I don't know ?
	
	
	set xmlDataText to get text of xmlDatas
	-- return the content of the record text (see: line 59)
	-- for info: this is a Rich (styled) text
	
	-- hier another way given the same result as 'tell XML element 1 of XML file xmlFilePath' (see: line 38)
	tell XML element 1 of xmlDatas
		set xmlRootName2 to name --> "Main"
		set xmlChildsCount2 to count of XML elements --> 3 / yes the file contains 3 Records
	end tell
	
	-- until now the first Element 'XML Element 1' point to the ROOT named 'Main'
	--
	-- lets go ahead to the second Element 'XML Element 1 of XML Element 1' which
	-- should be the first CHILD
	
	
	repeat with thisChild from 1 to xmlChildsCount
		
		tell XML element thisChild of XML element xmlRootName of xmlDatas
			
			set xmlChildsNames to xmlChildsNames & (name & "__" & thisChild as text) --> "DataElement__1"
			
			-- here i will use the properties which is a record and place it in one list
			-- xmlChildsProp is a list - look the code below how to get the value of it
			if exists XML attributes then --> true
				set xmlChildsProp to xmlChildsProp & (properties of XML attributes)
				--> {{value:"8086:293e", class:XML attribute, name:"DataValue"}}
				set xmlChildAttributeName to xmlChildAttributeName & (name of item 1 of xmlChildsProp as string)
				--> {"DataValue"}
				set xmlChildsText to xmlChildsText & (value of item 1 of xmlChildsProp as string)
				--> {"8086:293e"}
				
			else
				set xmlChildsText to xmlChildsText & (value of XML element 1 as string)
			end if
			
			-- in this sample case, because the names of the elements contained in each Child are the
			-- same I will break the loop here...
			if thisChild = 1 then
				set xmlElementsCount to (count of XML elements) --> 4
				repeat with thisElement from 1 to xmlElementsCount
					set xmlElementsNames to xmlElementsNames & name of (XML element thisElement)
				end repeat
				-- return a list of 4 items {"type", "name", "displayname", "download"}
			end if
			
		end tell
		
	end repeat
	(*
	as Result we have 4 lists, each list containing 3 Items (number of Childs):
	
		-  xmlChildAttributeName containing the Name of the Child added (from me) 
		  with a number to deffirentiate them of each other: 
		 --> {"DataElement__1", "DataElement__2", "DataElement__3"}
		
		- xmlChildsProp a list of properties (records)
		--> {{value:"8086:293e", class:XML attribute, name:"DataValue"}}, 
			{{value:"8086:295e", class:XML attribute, name:"DataValue"}},
			{{value:"8086:293e", class:XML attribute, name:"DataValue"}}
		
		- xmlChildAttributeName a list with the name or attribute of Child  
		--> {"DataValue", "DataValue", "DataValue"}

		- xmlChildsText a list with the text value contained in the child or
		  the contains of the attribute in this case
		  --> {"8086:293e", "8086:295e", "8086:297e"}
	*)
end tell

(*
		at least let us try to search inside the file.
		Let says you wish to find the datas of '<type>Audio2</type>'
		one aprach will to create a list of 'type', then present a choose list
		to the user and show the result of the search thru a dialog box. 
	
	   because this sample script can open different xml file having different structure
	   of elements we need to go thru two choose list/prompt before we get the result. 
*)

-- first let the user to select the field for the search
set thePrompt to "Select only one field for the search..."
set n to item 1 of (choose from list xmlElementsNames with prompt thePrompt) --> "type"
-- choose from list {"type", "name", "displayname", "download"} with prompt "Select only one field for the search..."

if (n is false) then
	display dialog ("You did not choose!")
end if

(*
	before inserting the next 2x lines, the script was returning 
	errors: 'could'nt get value ...'

	I think the end of the file was reached, tht's why it could'nt find the value
	May be, this going over my understanding...

   	But it work now and I'm able to present a choice and get the content
*)

tell application "System Events"
	
	set xmlDatas to contents of XML file xmlFilePath
	
	---- build a list with the content/text of the choice
	repeat with thisChild from 1 to xmlChildsCount
		tell XML element thisChild of XML element xmlRootName of xmlDatas
			set theChooseList to theChooseList & (value of XML elements whose name is n as string)
		end tell
	end repeat
	
end tell
--> {"Audio1", "Audio2", "Audio3"}


Next step I will test the ‘sat image.fr’ library which look very promising with a lot of functions similar to database.
I will keep you informed :rolleyes:

Sorry I forgot to include the test file



<?xml version="1.0" encoding="utf-8"?>

<Main>

  <DataElement DataValue="8086:293e">
    <type>Audio1</type>
    <name>Some Name1</name>
    <displayname>Some Display Name1</displayname>
    <download>http://somesite1.com</download>
  </DataElement>

  <DataElement DataValue="8086:295e">
    <type>Audio2</type>
    <name>Some Name2</name>
    <displayname>Some Display Name2</displayname>
    <download>http://somesite2.com</download>
  </DataElement>

  <DataElement DataValue="8086:297e">
    <type>Audio3</type>
    <name>Some Name3</name>
    <displayname>Some Display Name3</displayname>
    <download>http://somesite3.com</download>
  </DataElement>

  </Main>


HI,

here a small test using the sat image.fr library


--
--  parse the content of an XML File using Satimage.fr LIbrary
-- you can download XMLLib.osax 3.6.1 for free at 
-- http://www.satimage-software.com/en/index.html
--
-- the comments shows results when using the sample file   
--
-- this script should run with any XML file, only the results will change according to the file content
--
-----------------------------------------------------  Variables ------------------------------------------------------------
set xmlElementsNames to {}
set xmlChildsProp to {}
set xmlChildAttributeName to {}
set xmlChildsText to {}
set theChooseList to {}
------------------------------------------------------  Main()  --------------------------------------------------------------
set xmlFilePath to (choose file) as string --> i.e: "Lion:Users:rene:Desktop:xmlTest.xml"

(*	First we need to open the File
 
	POSIX alternative run 
 	set xmlFileID to XMLOpen (POSIX path of (xmlFilePath as alias))	--> run
 	set xmlFileID to XMLOpen (POSIX path of xmlFilePath)	--> run
 	but this dosn't run
 	set xmlFileID to XMLOpen xmlFilePath	--> error

	Important! Did not forget to XMLClose your file
*)
-- return a pointer of type XMLRef
set the_xmlRef to XMLOpen xmlFilePath as alias --> run
--> «data XMLR04000000721200000000000000000000»

-- return a pointer of type XMLRef to the <root> 
set the_Root to XMLRoot the_xmlRef
«data XMLR060000007212000010FC9B0E01000000»

-- return the name of the <root>
set the_xmlNameOfRoot to XMLTagName the_Root
--> "Main"

-- return a list of records
XMLNodeInfo the_Root
(*
	{kind:"ELEMENT_NODE", 
	name:"Main",	 
	paragraph:3}		--> the Line Nbr.: 3 from BOF	            
*)

-- return the Number of Childs 
set xmlChildrenCount to XMLCount the_Root
--> 3

-- return a pointer of type XMLRef to the 1st child 
set the_xmlFirstChild to XMLChild the_Root index 1 without nodes
--> «data XMLR0E0000007212000070B7181001000000»

-- return the Name of the Child
set the_ChildName to XMLTagName the_xmlFirstChild
--> "DataElement"

(*
	Let us try to get the Name of all Children, their Attributes if exits
*)
set the_ChildrenNames to {}
set the_ChildrenValues to {}
set the_ChildrenAttributes to {}
set the_ElementsNames to {}
set the_ElementsValues to {}
repeat with thisChild from 1 to xmlChildrenCount
	
	-- return a pointer of type XMLRef to the Child
	set the_Child to XMLChild the_Root index thisChild
	--> «data XMLR1800000072120000705D0F1001000000»
	
	-- set the_ChildrenNames to the_ChildrenNames & (XMLTagName the_Child)
	--> "DataElement"
	
	-- return a list of record
	set the_Infos to XMLNodeInfo the_Child
	(*
	--> {kind:"ELEMENT_NODE", 
		 name:"DataElement", 
		 attribute:{"DataValue", "8086:297e"}, 
		 paragraph:19}
	*)
	
	set the_ChildrenNames to the_ChildrenNames & (name of the_Infos)
	--> {"DataElement", "DataElement","DataElement"}
	try
		if (attribute of the_Infos) is not missing value then
			set the_ChildrenValues to the_ChildrenValues & (item 1 of attribute of the_Infos)
			--> {"DataValue", "DataValue", "DataValue"}
			set the_ChildrenAttributes to the_ChildrenAttributes & (item 2 of attribute of the_Infos)
			--> {"8086:293e", "8086:295e", "8086:297e"}
		end if
	end try
	
	(*	let us get the values of the Children of this Child 
		using XMLTagName and XMLGetText
	*)
	repeat with thisElement from 1 to XMLCountElement the_Child
		set the_ElementsNames to the_ElementsNames & (XMLTagName of (XMLElement the_Child index thisElement))
		set the_ElementsValues to the_ElementsValues & (XMLGetText of (XMLElement the_Child index thisElement))
	end repeat
	
end repeat


(*
	the Function XMLDisplayXML
	return a string with the content of the child and his 4x children
*)
set theText to XMLDisplayXML (XMLChild the_Root index 1)
(*
	"<DataElement DataValue=\"8086:293e\">\n  
		<type>Audio1</type>\n  
		<name>Some Name1</name>\n  
		<displayname>Some Display Name1</displayname>\n  
		<download>http://somesite1.com</download>\n
	</DataElement>"
*)

display dialog theText

XMLClose the_xmlRef


The Library can do much more…
Easy to learn, tkx their well explained tutorials.

Hello.

I wonder if you have seen this: Collections is a new approach to personal information gathering in Mac Os X

I have forgotten to mention, that I use BibDesk, happily, I think, to systemize information. It is really for keeping references, but I use it to keep track of information (web pages, and documents, I also can associate the various documents with 1…n keywords, and there are smart folders.

HI,

sorry for late reply :frowning:

I don’t know ‘Collections’.

You spoke about BibDesk. No, I don’t know about it, as I’ve seen on Wikipedia it’s works with BibTex which I know from Latex.

I will check this evening.

Actually I make my first try in ApplescriptObj-C :confused:
I 've buy the book from Shane and works the examples.
I’m fitting with errors, but that’s OK, it’s best way to learn and make progress… :wink:

I had also make a lot of research about XML Database and XMLParser.
My choice goes to GDataXML from Google. It’s look more simple as xmllib2, let see…

First step, I want to make a small XMLparser to test my capabilities in ASOC technics

I will keep you informed.