GSoC 2013 "Cloud Ready" Project Specification » History » Revision 3
Revision 2 (Robin Mills, 23 Feb 2013 10:11) → Revision 3/17 (Robin Mills, 23 Feb 2013 10:28)
h1. GSoC 2013 "Cloud Ready" Project Specification
There are four subprojects:
# HTTP http I/O support for Exiv2 (GSoC 2013 Student)
# exiv2(.exe) to run as a service (daemon) on a web socket.
# client-side use of the exiv2 service (using the web socket)
# JSON support
This is quite a large project. Robin Mills intends to implement the daemon/web-socket support during Spring 2013. The GSoC student is expected to implement the http I/O support.
h2. 1 HTTP I/O support (GSoC 2013 Student)
Today we provide support files available on the file system. These files can be memory mapped if this feature is supported by the host OS.
With the increasing interest in "cloud" computing, it's become ever more common for files to reside in remote locations which are not mapped to the file system. Very common cases today are ftp and http. For example: http://bla/bla/bla/file.jpg. Today there are a myriad of "Cloud" storage products, such as AWS, DropBox, Google Drive, Sky Drive, Box,iCloud, Just Cloud and more.
The proposal is to support http, ftp and ssh. This can be done by deriving a new Class from the BasicIO abstract class. The exiv2 command would accept filenames with a URL. For example:
<pre>
exiv2 -pt http://clanmills.com/files/Robin.jpg
exiv2 -pt ftp://username@password:/clanmills.com/Robin.jpg
exiv2 -pt ssh://username@password:/clanmills.com/Robin.jpg
</pre>
In most image files, the meta-data is defined in the first 100k of the file, so the implementation should only read blocks on demand from the server and avoid copying the complete file.
The simplest possible implementation of this proposal for exiv2 to detect the protocol and use a helper application such as curl or ssh. This implementation probably requires copying the complete file from the remote storage to a temporary file in the local file system. While such an implementation can be constructed quickly, this does not satisfy the project aim to make efficient use of band-width.
It is very desirable to use a robust implementation of the web protocols and a library such as libcurl should be considered. The selection of the protocol support library must respect build implications. We should be careful to avoid adding a large library (such as boost) to the build dependencies. Additionally, the implementation is required to be written in C++ and run on Mac/Windows/Linux without dependency on platform frameworks such as .Net, Java, or Cocoa. It may be that build switches can be provided to enable Exiv2 to use platform frameworks. This could be especially useful on mobile platforms such as Android and iOS.
The implementation should provide bi-directional support (both read and write) with read-access being the first priority.
h2. 2 and 3 Exiv2 daemon server and client
enable exiv2 to run as a service (daemon) on a web socket. I imagine two types of clients:
# exiv2 itself of course
# JavaScript/WebSocket client
To do this we could do something like this:
<pre>
Server: # exiv2 --daemon --port 54321
Client: $ exiv2 -pt exv://server:54321:/Robin.jpg
Even better: $ exiv2 -pt exv://server:54321:/http://clanmills.com/files/Robin.jpg
</pre>I don't want to get into detail concerning the JavaScript API for this. Something like this:
<pre>
<script src="js/Exiv2.js">
var exiv2 = new Exiv2( { server : 'clanmills.com' , port : 54321 });
var metadata = eval(exiv2.command('--JSON -pt /Robin.jpg '));
// or even better
var metadata = eval(exiv2.command('--JSON -pt http://clanmills.com/files/Robin.jpg'));
</pre>To get the most from this functionality, we should provide JSON (and/or XML) support which I discuss below.
h2. 4 JSON Support
5 years ago, I became interested in exiv2 to implement a GeoTagging application. I decided to use Python as an excuse to learn the language. I used the pyexiv2 wrapper, written by Olivier, and the project was a success. Building exiv2 and pyexiv2 on Windows and MacOSX was a challenge (to say the least).
Since then, I've worked steadily on the exiv2 msvc and msvc64 built environments and I believe both are working very well.
Sadly, building pyexiv2 is remains a challenge because it requires boost and the scons build utility. (scons is/was another GSoC project.) The consequence is that my python script seldom uses the latest exiv2 and is not available on all my machines (Windows/Cygwin/Mac/Kubuntu). The script is stable (hardly been changed in 5 years), however building the pyexiv2 wrapper is a maintenance challenge. The pyexiv2 has to be built for specific versions of python (2.6, 2.7 etc), architecture (32/64 bit), platform (windows/cygwin/macosx/linux).
This is not a criticism of Olivier's pyexiv2 wrapper. Olivier has done a very good job. Python wrappers which link C++ are a severe maintenance challenge. I haven't worked for years with Perl's C++ support (XS and/or SWIG), however I anticipate similar pain and trouble.
JSON to the rescue. My proposal is to provide a JSON interface to read and write meta-data in the exiv2 command-line utility.
As an sample application to prove our JSON support, provide wrappers for Perl and Python. The wrappers can be written entirely in the scripting language and use the language's JSON support. There is no need to get involved with C++ integration challenges such as boost/scons/pyexiv2, xs and swig. When reading from files, the wrapper will call exiv2.exe ONCE to capture all JSON to file. When writing to files, the wrapper will call exiv2.exe ONCE. This strategy will enable the wrappers to and run on all platforms on which exiv2.exe is available.
h2. Expected results:
#
1) To deploy a webservice to provide Exiv2 services. service.
# 2) To provide a JavaScript library to enable developers use the Exiv2 service.
# An engineering assessment of the effort involved in providing access to cloud servers such as AWS.
h2. GSoC Mentor:
Robin Mills http://clanmills.com/files/CV.pdf
I've been a volunteer on the Exiv2 project for 5 years. I worked for Adobe for 10 years, where I implemented reading PDF and JDF files over http (without copying the complete file). I'm now a freelance contractor and I've been working on a mobile app which uses WebSockets. I've worked on both server and client code.
There are four subprojects:
# HTTP http I/O support for Exiv2 (GSoC 2013 Student)
# exiv2(.exe) to run as a service (daemon) on a web socket.
# client-side use of the exiv2 service (using the web socket)
# JSON support
This is quite a large project. Robin Mills intends to implement the daemon/web-socket support during Spring 2013. The GSoC student is expected to implement the http I/O support.
h2. 1 HTTP I/O support (GSoC 2013 Student)
Today we provide support files available on the file system. These files can be memory mapped if this feature is supported by the host OS.
With the increasing interest in "cloud" computing, it's become ever more common for files to reside in remote locations which are not mapped to the file system. Very common cases today are ftp and http. For example: http://bla/bla/bla/file.jpg. Today there are a myriad of "Cloud" storage products, such as AWS, DropBox, Google Drive, Sky Drive, Box,iCloud, Just Cloud and more.
The proposal is to support http, ftp and ssh. This can be done by deriving a new Class from the BasicIO abstract class. The exiv2 command would accept filenames with a URL. For example:
<pre>
exiv2 -pt http://clanmills.com/files/Robin.jpg
exiv2 -pt ftp://username@password:/clanmills.com/Robin.jpg
exiv2 -pt ssh://username@password:/clanmills.com/Robin.jpg
</pre>
In most image files, the meta-data is defined in the first 100k of the file, so the implementation should only read blocks on demand from the server and avoid copying the complete file.
The simplest possible implementation of this proposal for exiv2 to detect the protocol and use a helper application such as curl or ssh. This implementation probably requires copying the complete file from the remote storage to a temporary file in the local file system. While such an implementation can be constructed quickly, this does not satisfy the project aim to make efficient use of band-width.
It is very desirable to use a robust implementation of the web protocols and a library such as libcurl should be considered. The selection of the protocol support library must respect build implications. We should be careful to avoid adding a large library (such as boost) to the build dependencies. Additionally, the implementation is required to be written in C++ and run on Mac/Windows/Linux without dependency on platform frameworks such as .Net, Java, or Cocoa. It may be that build switches can be provided to enable Exiv2 to use platform frameworks. This could be especially useful on mobile platforms such as Android and iOS.
The implementation should provide bi-directional support (both read and write) with read-access being the first priority.
h2. 2 and 3 Exiv2 daemon server and client
enable exiv2 to run as a service (daemon) on a web socket. I imagine two types of clients:
# exiv2 itself of course
# JavaScript/WebSocket client
To do this we could do something like this:
<pre>
Server: # exiv2 --daemon --port 54321
Client: $ exiv2 -pt exv://server:54321:/Robin.jpg
Even better: $ exiv2 -pt exv://server:54321:/http://clanmills.com/files/Robin.jpg
</pre>I don't want to get into detail concerning the JavaScript API for this. Something like this:
<pre>
<script src="js/Exiv2.js">
var exiv2 = new Exiv2( { server : 'clanmills.com' , port : 54321 });
var metadata = eval(exiv2.command('--JSON -pt /Robin.jpg '));
// or even better
var metadata = eval(exiv2.command('--JSON -pt http://clanmills.com/files/Robin.jpg'));
</pre>To get the most from this functionality, we should provide JSON (and/or XML) support which I discuss below.
h2. 4 JSON Support
5 years ago, I became interested in exiv2 to implement a GeoTagging application. I decided to use Python as an excuse to learn the language. I used the pyexiv2 wrapper, written by Olivier, and the project was a success. Building exiv2 and pyexiv2 on Windows and MacOSX was a challenge (to say the least).
Since then, I've worked steadily on the exiv2 msvc and msvc64 built environments and I believe both are working very well.
Sadly, building pyexiv2 is remains a challenge because it requires boost and the scons build utility. (scons is/was another GSoC project.) The consequence is that my python script seldom uses the latest exiv2 and is not available on all my machines (Windows/Cygwin/Mac/Kubuntu). The script is stable (hardly been changed in 5 years), however building the pyexiv2 wrapper is a maintenance challenge. The pyexiv2 has to be built for specific versions of python (2.6, 2.7 etc), architecture (32/64 bit), platform (windows/cygwin/macosx/linux).
This is not a criticism of Olivier's pyexiv2 wrapper. Olivier has done a very good job. Python wrappers which link C++ are a severe maintenance challenge. I haven't worked for years with Perl's C++ support (XS and/or SWIG), however I anticipate similar pain and trouble.
JSON to the rescue. My proposal is to provide a JSON interface to read and write meta-data in the exiv2 command-line utility.
As an sample application to prove our JSON support, provide wrappers for Perl and Python. The wrappers can be written entirely in the scripting language and use the language's JSON support. There is no need to get involved with C++ integration challenges such as boost/scons/pyexiv2, xs and swig. When reading from files, the wrapper will call exiv2.exe ONCE to capture all JSON to file. When writing to files, the wrapper will call exiv2.exe ONCE. This strategy will enable the wrappers to and run on all platforms on which exiv2.exe is available.
h2. Expected results:
#
1) To deploy a webservice to provide Exiv2 services. service.
# 2) To provide a JavaScript library to enable developers use the Exiv2 service.
# An engineering assessment of the effort involved in providing access to cloud servers such as AWS.
h2. GSoC Mentor:
Robin Mills http://clanmills.com/files/CV.pdf
I've been a volunteer on the Exiv2 project for 5 years. I worked for Adobe for 10 years, where I implemented reading PDF and JDF files over http (without copying the complete file). I'm now a freelance contractor and I've been working on a mobile app which uses WebSockets. I've worked on both server and client code.