Google’s search engine for scientists upgraded for better data scouring
Google’s search engine for datasets, the cunningly named Dataset Search, is now out of beta, with new tools to better filter searches and access to almost 25 million datasets.
Dataset Search launched in September 2018, with Google hoping to slowly unify the fragmented world of online, open-access data. Although many institutions like universities, governments, and labs publish data online, it’s often difficult to find using traditional search. But by adding open-source metadata tags to their webpages, these groups can have their data indexed by Dataset Search, which now covers a huge range of information — everything from skiing injuries to volcano eruptions to penguin populations.
Google would not share any specific usage figures for the search engine, but it said “hundreds of thousands of users” have tried Dataset Search since its launch, and the reaction from the scientific community was overall positive.
Natasha Noy, a research scientist at Google AI who helped create the tool, tells The Verge that “most [data] repositories have been very responsive” and that the engine’s launch meant older scientific institutions are now taking “publishing metadata more seriously.”
“For example, [the prestigious scientific journal] Nature is changing its policies to require data sharing with proper metadata,” Noy says, highlighting a change that will make the data underpinning top-flight scientific research more accessible in future.
New features added to Dataset Search include the ability to filter data by type (tables, images, text, etc), whether it’s free to use, and the geographic areas it covers. The engine is also now available to use on mobile and has expanded dataset descriptions.
Google says the corpus covered by the search engine — almost 25 million datasets — is only a “fraction of datasets on the web,” but a “significant” one all the same. The largest topics indexed are geosciences, biology, and agriculture, and the most common queries include “education,” “weather,” “cancer,” “crime,” “soccer,” and “dogs.” The US is also the leader in open government datasets, publishing more than 2 million online.
Noy would not comment on future plans for Dataset Search, but she says the team was thinking about a number of functions they hope would be useful, including “understanding how datasets are cited and reused” and “helping users explore datasets in Dataset Search when they don’t necessarily know what they are looking for.”
“And, of course, continuing to expand the corpus,” says Noy. There’s always more data out there.
Google’s search engine for datasets, the cunningly named Dataset Search, is now out of beta, with new tools to better filter searches and access to almost 25 million datasets. Dataset Search launched in September 2018, with Google hoping to slowly unify the fragmented world of online, open-access data. Although many…
Recent Posts
- Google I/O 2024 live blog: it’s AI time
- Comcast’s StreamSaver bundle will put Netflix, Apple TV Plus and Peacock all under the same roof – and for a ‘vastly reduced price’
- Amazon to produce a live-action Tomb Raider series
- Senua’s Saga: Hellblade II highlights the next round of May Game Pass titles
- AMD RDNA 4 graphics cards could be imminent, as huge driver-related hint is dropped
Archives
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- December 2011