[FRIAM] scraping a web site

Robert J. Cordingley robert at cirrillian.com
Wed Jan 4 02:00:27 EST 2017


Hi Nick

Your old Earthlink site seems to comprise just about ten 'pages' of 
content, with many of those pages (Published Works) listing many 
bibliographic citations, each with a link to an image and further link 
to a pdf document. Grabbing all the content manually is perhaps tedious 
but doable. Saving all the pages as HTML is also doable but don't see a 
lot of point in that. Populating your Research Gate website should be 
possible too with in browser Copy and Paste - but I'm not familiar with 
RG - as should any other website builder, Wix, Squarespace, WordPress as 
well as hosting company  website builders. I don't know of an automated 
system but the Internet Archive must have something and already has 
multiple captures of past versions of your site - see 
https://web.archive.org/web/20151206005021/http://home.earthlink.net/~nickthompson/naturaldesigns/. 


I think what you're really looking for is a web/content migration tool 
more so than web scraping tools which tend to be focused on capturing 
specific data, say contact information. Vamosa seems to offer a service 
that should do exactly what you want, see 
http://www.vamosa.com/vamosa-content-migrator-c124 but suspect that's 
aimed at large corporate clients. I have no experience with them. 
Googling 'website migration tools' produces lots of results - some 
questionable.

Hope this helps.

Thanks, Robert


On 1/3/17 9:49 PM, Nick Thompson wrote:
>
> Dear Phellow Phriammers,
>
> I am in the uncomfortable position of being bound by threads of steel 
> to Earthlink.  Many, MANY, years I go I started a website on 
> Earthlink, {http://home.earthlink.net/~nickthompson/naturaldesigns/ 
> <http://home.earthlink.net/%7Enickthompson/naturaldesigns/>
>
> }, and put a lot of my writing, and some commentary up on it.  The 
> website creation and editing medium (trellix) was pretty good for its 
> time, and there are many ways that I find the site quite satisfying.  
> But gradually Earthlink has withdrawn its support, and now I am not 
> sure I could get in to edit or change it.  Meantime, Research Gate has 
> gotten started, and provides a somewhat better place to meet the world 
> and archive my stuff.  And also, having the site on earthlink binds me 
> to them and their 22 dollar a month fee.  So. …
>
> I am wondering if there is a way (or a service that would) scrape the 
> website and, possibly, dump it into a new and more reliable, more 
> website creation medium? Please, ambulatory knowledge only.  I don’t 
> want a people doing deep searches to answer this  question .
>
> Thanks, as always .
>
> Nick
>
> Nicholas S. Thompson
>
> Emeritus Professor of Psychology and Biology
>
> Clark University
>
> http://home.earthlink.net/~nickthompson/naturaldesigns/ 
> <http://home.earthlink.net/%7Enickthompson/naturaldesigns/>
>
>
>
> ============================================================
> FRIAM Applied Complexity Group listserv
> Meets Fridays 9a-11:30 at cafe at St. John's College
> to unsubscribe http://redfish.com/mailman/listinfo/friam_redfish.com
> FRIAM-COMIC http://friam-comic.blogspot.com/ by Dr. Strangelove

-- 
Cirrillian
Web Design & Development
Santa Fe, NM
http://cirrillian.com
281-989-6272 (cell)
Member Design Corps of Santa Fe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://redfish.com/pipermail/friam_redfish.com/attachments/20170104/77b6585b/attachment-0002.html>


More information about the Friam mailing list