03 August 2019

Digital Archive Of News Media

Harsha Man Maharjan

Given the ever-increasing volume of digital Nepali news materials, it is important that they are archived and retained. Two institutions are important for the news archive: Internet Archive and Press Council Nepal. Wayback Machine is a service provided by the Internet Archive to archive worldwide website content since 1996. Press Council Nepal focuses mainly on news media and it collects and archives Nepali news materials. There are major gaps in the collection and the technologies have limitations in what they can access and archive. Internet is an important artefact for scholars but the archival efforts have been piecemeal.

Wayback Machine is an initiative started to automatically visit the webpages and scan them. It was started in November 2000 by Brewster Kahle and Bruce Gilliat. In an interview with journalist Kara Swisher in 2017, Kahle mentioned that a webpage remains for 100 days and then it changes or is deleted so he wanted to archive it. In case of news media, the webpages change much faster.

Two ways are important for the Wayback Machine to get the pages. Alexa search engine (Alexa.com) searches the Internet and accumulates webpages and related information. These pages are collected through automated programs (crawlers) which take screenshots. These collections are sent to the Internet Archive after a six-month period. It was the deal between the Internet Archive and Amazon when the former sold Alexa to the later in 1999. Individuals also can save the pages that are still available on the net. This is the second way the digital content is added to the Wayback Machine collection.

Through the Wayback Machine users can view multiple versions and updates of the websites easily. A user types the web-address in the address box. The program returns the date of original site creation, number and date of site updates, and links to archived websites. For example, to search for www.south-asia.com, one of the early Nepali news websites created by Mercantile Communications in 1996, users have to type the web-address www.south-asia.com in the address box provided in the Wayback Machine’s homepage. They are then presented with a timeline of all available updates to the website. The earliest date is December 8, 1996. If they select this particular date, the snapshot of the website taken on that day appears on the screen. The publications section contains the content of two daily news media (The Kathmandu Post, and The Rising Nepal), one weekly (The Independent), and one monthly (Himal South Asia). Other websites such as nepalnews.com, kantipuronline.com, gorkhapatra.org.np, parewa.com, dainikee.com, and onlinekhabar.com are also available in this archive.

Though Wayback Machine is very useful, it is not a library. There are gaps in the collection. Not every material that appears in the original webpage is archived. As Neil Brugger has discussed, sometimes elements in websites are not archived properly. It might be different from the live-web because websites are frequently changing. The archive captures the snapshot of a certain point in time.

Press Council Nepal now holds 32 lakh scanned pages of newspapers, 22 lakh pieces of news from news websites, 33 lakh clips of radio programmes and 24 hours recording of Nepali television channels of the last three years. It has archived weeklies from 1974 to 2006 and dailies from 1974 to 2013.
he digitised newspaper was not in the original plan of Press Council Nepal. Its objective was to preserve the hard bound newspapers by storing them in microfilms. On May 19, 1983 it took a decision to undertake a feasibility study and correspond with the National Archive. Binod Dhungel in his book on Press Council Nepal mentions that the council also decided to contact editors and publishers whose newspapers it did not collect. Though the plan did not materialise, we could see this initiative as the continuation of the microfilming of newspapers especially Nepali newspaper, Gorkhapatra from 1901 to 1970 by the National Archives.

In 1995 Press Council Nepal did another consultation to microfilm the newspapers; however, it concluded that a new technology CD-ROM was better and cheaper than microfilm. It bought a larger format scanner and started to microfilm some dailies by December 2000. When the work was halted in 2006, 2,268,374 pages of newspapers had been scanned and 1,873,932 pages of them were available on CD. By that time, 107 dailies, 519 weeklies, 2 fortnightlies were available in microfilm. In 2070 BS new a scanner was bought and the scanning resumed. In that year 2,14,383 pages of newspapers were scanned and 72,281 pages were recorded in CDs.

Mainly researchers, students and journalists are approaching Press Council Nepal to get digitised newspapers materials. Its annual report mentions that fifteen individuals and one institution bought the data of Rs. 1,30,745.49 in the fiscal year 2074/75. Ujjwal Prasai requested for copies of Gorkhapatra from 2023 BS to 2025 BS ; Ruprekha from 2031 BS to 2032 BS, The Kathmandu Post of 1996 May, and Kantipur from 2057 till 2061 BS to work on the biography of writer, Khagendra Sangraula. He could not get the issues of Gorkhapatra and Ruprekha from the council as they were not scanned. A team of people who were making the movie Whole Timer, requested for the data of weekly Janadesh and Jana Astha from 2060 BS to 2061 BS. Similarly a PhD student from University of Sydney requested copies of The Rising Nepal, The Kathmandu Post and Jana Astha from 2052 Fagun 1 to 2064 Baishakh 20.

Researchers will encounter a few problems in the archive. The images (jpg files) are not searchable. The scans are of varying quality. Though the council claims that it has scanned all the issues of the newspapers, many issues are missing. It could have collaborated with other libraries that had the missing issues.
Despites the challenges mentioned above, these two options are complimentary for researchers. To know the structure and the content of the websites of Nepali news media Wayback Machine is helpful. For digitised newspapers, Press Council Nepal is an important source. From 2017, the council is automating news collection from 500 news portals with software. It collects data from each of the websites within 90 minutes. The council has started an initiative, Newspaper Management System recently. On July 16, 2019 Press Council Nepal put a notice on its website for newspapers to log-in to its server and upload newspapers in pdf format. This reduces the burden to scan the newspapers. Such initiatives are for digital materials that is produced from now onwards. However, what about the newspapers before 1974? What about the dailies and weeklies which have not been scanned? Press Council Nepal must archive them as well. Internet Archive has a worldwide interest and will only capture a small fraction of the Nepali digital content. What we can do is to save the pages we want for future.
Published in The Rising Nepal on August 3, 2019. Also available at http://therisingnepal.org.np/news/33515.