[FRIAM] scraping a web site
Robert J. Cordingley
robert at cirrillian.com
Wed Jan 4 02:00:27 EST 2017
Hi Nick
Your old Earthlink site seems to comprise just about ten 'pages' of
content, with many of those pages (Published Works) listing many
bibliographic citations, each with a link to an image and further link
to a pdf document. Grabbing all the content manually is perhaps tedious
but doable. Saving all the pages as HTML is also doable but don't see a
lot of point in that. Populating your Research Gate website should be
possible too with in browser Copy and Paste - but I'm not familiar with
RG - as should any other website builder, Wix, Squarespace, WordPress as
well as hosting company website builders. I don't know of an automated
system but the Internet Archive must have something and already has
multiple captures of past versions of your site - see
https://web.archive.org/web/20151206005021/http://home.earthlink.net/~nickthompson/naturaldesigns/.
I think what you're really looking for is a web/content migration tool
more so than web scraping tools which tend to be focused on capturing
specific data, say contact information. Vamosa seems to offer a service
that should do exactly what you want, see
http://www.vamosa.com/vamosa-content-migrator-c124 but suspect that's
aimed at large corporate clients. I have no experience with them.
Googling 'website migration tools' produces lots of results - some
questionable.
Hope this helps.
Thanks, Robert
On 1/3/17 9:49 PM, Nick Thompson wrote:
>
> Dear Phellow Phriammers,
>
> I am in the uncomfortable position of being bound by threads of steel
> to Earthlink. Many, MANY, years I go I started a website on
> Earthlink, {http://home.earthlink.net/~nickthompson/naturaldesigns/
> <http://home.earthlink.net/%7Enickthompson/naturaldesigns/>
>
> }, and put a lot of my writing, and some commentary up on it. The
> website creation and editing medium (trellix) was pretty good for its
> time, and there are many ways that I find the site quite satisfying.
> But gradually Earthlink has withdrawn its support, and now I am not
> sure I could get in to edit or change it. Meantime, Research Gate has
> gotten started, and provides a somewhat better place to meet the world
> and archive my stuff. And also, having the site on earthlink binds me
> to them and their 22 dollar a month fee. So. …
>
> I am wondering if there is a way (or a service that would) scrape the
> website and, possibly, dump it into a new and more reliable, more
> website creation medium? Please, ambulatory knowledge only. I don’t
> want a people doing deep searches to answer this question .
>
> Thanks, as always .
>
> Nick
>
> Nicholas S. Thompson
>
> Emeritus Professor of Psychology and Biology
>
> Clark University
>
> http://home.earthlink.net/~nickthompson/naturaldesigns/
> <http://home.earthlink.net/%7Enickthompson/naturaldesigns/>
>
>
>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/ by Dr. Strangelove
--
Cirrillian
Web Design & Development
Santa Fe, NM
http://cirrillian.com
281-989-6272 (cell)
Member Design Corps of Santa Fe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20170104/77b6585b/attachment-0002.html>
More information about the Friam
mailing list