<< Chapter < Page
  Online humanities scholarship:     Page 15 / 32
Chapter >> Page >

Indexing full-text in a relational database does not give optimum performance or results: in fact, the performance degradation could be described as exponential in relation to the size of the database. Keeping both advantages and disadvantages in mind, it was proposed that all REKn binary data be stored in a file system rather than in the database. File systems are designed to store files, whereas the PostgreSQL database is designed to store relational data. To get the two mixed together defeats the advantages of each. Moreover, in testing the proof of concept, users found speed to be a significant issue, with many unwilling to wait five minutes between operations. In its proof-of-concept iteration, the computing interaction simply could not keep pace with the cognitive functions it was intended to augment and assist. We recognized that this issue could be resolved in the future by recourse to high-performance computing techniques—in the meantime, however, we decided to reduce the REKn data to a subset, which would allow us to imagine and work on functionality at a smaller scale.

Having decided to store all binary data in a file system, we had to develop a standardized method of storing and linking the data, one that accounted for both linking the relational data to the file system data as well as keeping the data mobile (such as would allow migrating the data to a new server or distributing the files over multiple servers). Flexibility was also flagged as an important design consideration, since the storage solution might eventually be shared with many different organizations, each with their own particular needs. This method would also require the implementation of a search technology capable of performing fast searches over millions of documents. In addition to the problem posed by the sheer volume of documents, the variety of file types stored would require the employ of an indexing engine capable of extracting text out of encoded files. After a survey of the existing software tools, it was decided that Lucene was the perfect fit for our project requirements: it is an open source full-text indexing engine capable of handling millions of files of various types without any major degradation in performance, and it is extensible with plug-ins to handle additional file types should the need arise.

4.2. challenge: document harvesting

The question of how to go about harvesting data for REKn, or indeed any content-specific knowledgebase, turned out to be a question of negotiating with the suppliers of document collections for permission to copy the documents. Since each of these suppliers (such as the academic and commercial publishers and the publication amalgamator service providers) has structured access to the documents differently, scripts to allow for harvesting their documents had to be tailored individually for each supplier. For example, some suppliers provide an API to their database, others use HTTP, and still others distribute their documents via tapes or CDs of files. Designing an automated process for harvesting documents from suppliers could be accomplished by combining all of these different scripts together with a mechanism for automatically detecting the various custom access requirements and selecting the correct script to use.

Questions & Answers

how to study physic and understand
Ewa Reply
what is conservative force with examples
Moses
what is work
Fredrick Reply
the transfer of energy by a force that causes an object to be displaced; the product of the component of the force in the direction of the displacement and the magnitude of the displacement
AI-Robot
why is it from light to gravity
Esther Reply
difference between model and theory
Esther
Is the ship moving at a constant velocity?
Kamogelo Reply
The full note of modern physics
aluet Reply
introduction to applications of nuclear physics
aluet Reply
the explanation is not in full details
Moses Reply
I need more explanation or all about kinematics
Moses
yes
zephaniah
I need more explanation or all about nuclear physics
aluet
Show that the equal masses particles emarge from collision at right angle by making explicit used of fact that momentum is a vector quantity
Muhammad Reply
yh
Isaac
A wave is described by the function D(x,t)=(1.6cm) sin[(1.2cm^-1(x+6.8cm/st] what are:a.Amplitude b. wavelength c. wave number d. frequency e. period f. velocity of speed.
Majok Reply
what is frontier of physics
Somto Reply
A body is projected upward at an angle 45° 18minutes with the horizontal with an initial speed of 40km per second. In hoe many seconds will the body reach the ground then how far from the point of projection will it strike. At what angle will the horizontal will strike
Gufraan Reply
Suppose hydrogen and oxygen are diffusing through air. A small amount of each is released simultaneously. How much time passes before the hydrogen is 1.00 s ahead of the oxygen? Such differences in arrival times are used as an analytical tool in gas chromatography.
Ezekiel Reply
please explain
Samuel
what's the definition of physics
Mobolaji Reply
what is physics
Nangun Reply
the science concerned with describing the interactions of energy, matter, space, and time; it is especially interested in what fundamental mechanisms underlie every phenomenon
AI-Robot
what is isotopes
Nangun Reply
nuclei having the same Z and different N s
AI-Robot
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Online humanities scholarship: the shape of things to come. OpenStax CNX. May 08, 2010 Download for free at http://cnx.org/content/col11199/1.1
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Online humanities scholarship: the shape of things to come' conversation and receive update notifications?

Ask