Piyush's workspace - Thread Archive

Nityesh Agarwal December 09, 2019 at 06:39 AM

Updates: I have started working on this analysis. Here's my notebook - https://www.kaggle.com/nityeshaga/ted-talks-analysis-wip

What I have done till now -
* Some basic exploration taking help from @Erik Kristofer Anderson’s Hello World
* Converted the duration of the talks into seconds
* Figured out the sex of each speaker by analysing his/her profile

What do you think?

Also, it's a public notebook. So, you are welcome to fork it, download it and use it for your own analysis

Nityesh Agarwal December 05, 2019 at 07:16 AM

Things to try:

• Topic modelling
Refer - https://juliasilge.com/blog/text-mining-stack-overflow/
• Comparing word usage - b/w TED/TEDplus, men/women, etc
• Changes in word use
Refer - https://www.tidytextmining.com/twitter.html, http://varianceexplained.org/r/hn-trends/
• Words at the beginning and end
• Sentiment analysis
Refer - http://varianceexplained.org/r/tidytext-plots/, https://juliasilge.com/blog/if-i-loved-nlp-less/
• Gender and verbs analysis
Refer - http://varianceexplained.org/r/tidytext-gender-plots/, https://juliasilge.com/blog/gender-pronouns/, http://culturalanalytics.org/2016/12/understanding-gender-and-character-agency-in-the-19th-century-novel/

Nityesh Agarwal November 30, 2019 at 09:35 AM

I have gone through the detailed description of the project provided by the authors and jotted down the exact paragraphs where the authors suggested things to find using this dataset. I am opening the edits to this post. So, please add your own questions/ideas here.

1 reply

Nityesh Agarwal November 29, 2019 at 05:00 AM

Great initiative, indeed @Erik Kristofer Anderson!

Erik Kristofer Anderson November 28, 2019 at 03:03 PM

Thanks!

Erik Kristofer Anderson November 27, 2019 at 10:58 PM

Good new: I build a hello world python program for the ted talk data.
I did the following: signed up for repl.it , made a repl (it's what they call the place where you keep files and data and can run python files in their environment). It was very user friendly, although at times I stumbled against some unfamiliar features of it.
Anyway, I wrote a program which downloads the full data set then prints some information about it, including the full text of the first talk.
I hope this helps people get started playing with this dataset!

Here's the link. I believe you can open it, run it, and tinker with it and you don't even need to sign up to do that. Although if you'd like to save your changes you'd probably need to sign up. https://repl.it/@ErikKrisotferA/ted-talk-hello-world

Erik Kristofer Anderson November 27, 2019 at 09:30 PM

@NITISH SARIN I'm also new here, and fairly new to data analysis stuff. I guess a place to start would be brainstorming questions to ask the data. (If I may anthropomorphize data.) Nityesh mentions on the dev post that the suggestion is to "Analyze these transcripts to reveal some intracasies about out culture"
So let's start there. I'll be back in a bit once I've thought of a few.

2 replies

Luiz Oliveira November 24, 2019 at 09:15 PM

@Nityesh Agarwal, I agree. This week I'll be traveling for work, but I'll try to analyze and propose ideas as soon as I can

Nityesh Agarwal November 24, 2019 at 07:03 PM

So I mentioned a few ideas for interesting project ideas in my article - https://dev.to/nityeshaga/fantastic-programming-project-ideas-and-where-to-find-them-the-beginner-friendly-version-9d5. The TED talks dataset is one on them. Here, in this channel, we are going forward with this idea and trying to do some analysis using that dataset.

Nityesh Agarwal November 24, 2019 at 06:58 PM

Awesome! Great to have you @NITISH SARIN

NITISH SARIN November 24, 2019 at 06:54 PM

Hi. I am fairly new to this Data Analysis stuff.
Any leads on what we are trying to do?
I am from a Java development background. But up for anything new.
Probably a gist of what we are trying to achieve would be helpful. 🙂

1 reply

Nityesh Agarwal November 24, 2019 at 06:45 PM

BTW, here's the Data Is Plural entry on this dataset:

TED talks. Katherine M. Kinnaird and John Laudun — professors whose research includes cultural analytics and computational folklore studies — have created a dataset of 2,656 TED talks, with metadata and transcripts, and have published a detailed description of the project.

• Katherine M. Kinnaird
• John Laudun
• [dataset of 2,656 TED talks]https://github.com/kinnaird-laudun/data/tree/master/Release_v0)
• detailed description of the project

Nityesh Agarwal November 24, 2019 at 06:42 PM

IMHO we should first try to explore the dataset and everything related to it with the goal of finding questions that we might answer using the data. What do you think?

Nityesh Agarwal November 24, 2019 at 06:41 PM

Hi @Luiz Oliveira @NITISH SARIN 🙂

#ted-talks-analysis

All Messages